RecStudio Decompiler

 

Home Page

 

The REC decompiler was the first decompiler to approach decompilation from a compiler-independent and target processor-independent point of view. Its design started 20 years ago, when compilers generated simplistic code that was suitable for deterministic translation to the original C source (if you're curious, I had started the REC project after having decompiled the source code of the Unix "rogue" role-playing game from PDP-11 assembly to K&R C and having realized that most code sequences could be automatically translated to C).

Since those days, compiler technology has evolved tremendously, helped by ever more powerful processors, which in turn allow ever more aggressive optimizations techniques. Consequently, REC required a significant re-design in order to keep up with current code generators.

Recognizing the problems that a decompiler needs to solve is a good first step towards trying to address them. In these pages you'll find the rationale guiding the design of a next generation C decompiler.

Various existing decompilers attempt to produce different levels of C source code, with varying degrees of success. The reverse engineering page has a good list with a comparison of the generated code.

The Boomerang open-source decompiler was the best attempt at a theoretically sound decompiler.
Unfortunately it seems that development has stopped after the departure of the main 2 developers.

Other tools can be used to understand how a binary program works:

  • using a system call tracer (e.g. strace(1) on Linux);
  • using a debugger;
  • using a disassembler;

A debugger can be invaluable to understand how a program works, provided it has enough information to execute the program at the source level. This is obviously not the case for programs compiled without symbolic information (that is almost all non-open source programs), for programs compiled with high levels of optimizations, and for programs that run in an environment where there are no development tools available (e.g. programs for embedded systems that use processors that are not produced anymore).

Assuming that a debugger cannot be used, a disassembler is the next best tool. However, most disassemblers lack the high-level analyses that are useful to the engineer to understand the program as a whole. Some advanced disassembler do a very good job, such as IDA Pro and Hex-View, but their focus on an assembly-centric view of the program poses certain limits to how far they can go at global program comprehension.

Here's where a decompiler can help.

Decompiler Goals

Much as when writing a compiler, one must face often contrasting goals when designing a decompiler, and it is not always possible to achieve them all at the same time.

The main 2 goals for using a decompiler are:

  1. To understand how the program works at a higher level than what allowed by the assembly language;
  2. To be able to make changes to the program and recompile it, possibly on a different host environment.

These two goals are incompatible because most of the information present in the original source code, which would have allowed both understanding and recompilation, is lost during the compilation phase, and is non-recoverable.

RecStudio will try to achieve (1) by presenting a high-level view of the program that is geared towards humans, but is not acceptable by a compiler; or alternatively it will achieve (2) by using a much lower level (but more precise) representation of the program which is suitable for recompilation, but is not abstract enough for a human to understand.

This is akin to the use of optimization levels in a compiler, where level zero is used to compile inefficient code that can be easily debugged, and where higher optimization levels will achieve better code size and speed, at the expense of blurring the ability of a debugger to follow the generated code.

Because a decompiler has to work on the result produced by a number of development tools, it has to implement the knowledge to handle each of the original development tools (and more).

In particular, knowledge in the following areas is essential in the development of a decompiler:

  • object file formats;
  • assembly-level microprocessor programming;
  • code generation techniques used by compilers;
  • various levels of code optimizations;
  • symbol table design and implementation;
  • high-level programming languages, such as C and C++;
  • run-time environments (system calls, calling conventions, use of shared libraries);

Each of these areas will be described in details, in the following pages.

Next : Object File Formats