IDA series, part 1: the Hex-Rays decompiler

What’s this about?

In what I call the IDA series, I will be explaining a bunch of interesting and useful stuff I’ve discovered about the Interactive Disassembler, IDA Pro.

I’m writing this because of two reasons:

Disassembly <-> pseudocode mapping

The first thing you will learn about the decompiler is that it takes disassembly and transforms it into pseudocode. This might sound redundant, but it’s something a lot of people don’t realize. For instance, suppose you have a function that compares a variable that holds a bunch of flags against a static value, like this:

wrong output

Hex-Rays thinks it’s an offset because the value happens to look like an address, but it’s actually a flag check.

There is no option in the pseudocode window to fix this, but since we know that the decompiler maps assembly to pseudocode, we can fix it by placing the cursor on the offending &loc_401001, pressing TAB to go to the corresponding disassembly, and then, with the cursor over the offending offset, H, to mark the item as a value, not as an offset. After pressing F5 to decompile again, we can see that the code is decompiled as expected:

okay output

The decompiler does not eat your branches

Some people claim that the decompiler lies. That it just eats branches whenever it wants, and that it’s not reliable, so instead they spend hours staring at kilometric listings of unoptimized, repetitive, compiler-bloat assembly. That’s wrong. The decompiler is your friend, and exposes many ways for you to hint it how you want your decompiled pseudocode.

For example, observe this function:

random function

Now look at the disassembly:

disassembly

As you can see, there’s a cmp and a jz, yet the decompiler only shows the first branch. Not only that; looking at the graph overview we can see that the function has much more to it:

graph overview

So why is it that Hex-Rays doesn’t decompile that part of the code?

When decompiling the function for the first time, we see this warning:

decompiler warning

It is telling us that data references to that segment will be replaced with constant values, i.e. it’ll optimize the output by assuming that whatever is in .r_data will never change.

Now, back at the disassembly, we see:

.text:00401799                 cmp     ds:dword_40E000, 0
.text:004017A0                 jz      short loc_4017E6

There’s a reference to data there, dword_40E000. If we check where it is placed, we see it’s in .r_data:

.r_only:0040E000 ; Segment type: Pure data
.r_only:0040E000 ; Segment permissions: Read
.r_only:0040E000 _r_only         segment para public 'DATA' use32
.r_only:0040E000                 assume cs:_r_only
.r_only:0040E000                 ;org 40E000h
.r_only:0040E000 dword_40E000    dd 0                    ; DATA XREF: sub_401770+29

The decompiler assumes that the above disassembly is roughly equal to:

if (dword_40E000) {
	// ...
}

Since the value of dword_40E000 is 0, and its region is recognized as read-only, it means that the above code is basically useless code, equivalent to:

if (0) {
	// ...
}

In theory, it does nothing, so it optimizes it away. To fix that, we can either mark the specific variable as volatile (the inverse 2 of a constant / const / read-only variable), or by marking the entire segment as read-write.

volatile variable

(marking the variable volatile)

segment attributes

(changing segment attributes)

Not only that, however. Since Hex-Rays is an optimizing decompiler, it will optimize away junk code that has no effect on the program:

disasm

optimized pseudocode


That’s it for today. If you want me to talk about a specific topic, please manifest yourself in the comments.


If you want to leave a comment, please do it on reddit or hacker news.

  1. stolen from here ↩︎

  2. volatile is not strictly the inverse of const, but in this case, it is ↩︎