REC - Reverse Engineering Compiler User's Manual

Home Page | User's Manual

Table of Content

Starting REC
Interactive Mode
Command Files Syntax
Theory of Operation
Output examples
List of Options
    
  Starting REC
REC is invoked with the following command line syntax:
            rec [{+|-}optionname ...] exec_file
To activate an option, precede its name with a + (plus) sign. To disable an option, precede it with a - (minus) sign. To get the list of all the options and their current value, type:
            rec +help

The minimum input to REC is the binary executable file. For example:

            rec file.exe
If file.exe is in one of the recognized formats, it will be read, and a file.rec will be produced using the default options, without further intervention from the user.

REC can operate in three modes:

The other options are used to debug the program, or to tune its output. A complete list of the options requires an understanding of the algorithms and phases that REC performs to transform an executable file in a source file. If you don't know the meaning of one option, you can experiment by enabling it and check if the output is clearer. Note that some option is only valid if another option is enabled.

The same set of options is available regardless of the host/target combination.

  Interactive Mode 

Interactive mode is used to analyze the program being decompiled. This mode is useful to access the hexadecimal viewer, and to inspect many of the internal lists maintained by REC, such as the strings list, the labels list, etc.

To use REC in interactive mode, the user must invoke it with the following command line:

rec +interactive file.exe

REC will start analyzing file.exe to find which area contains strings, code and data. It will also build the list of labels and branches, and then will try to build a list of the procedures contained in the program.

After this phase, the main menu will be presented:

Reverse Engineering Compiler 1.4 (C) Giampiero Caprino (Nov. 15 1998)

r : show regions
d : dump regions
l : show labels
b : show branches
j : show jump tables
s : show strings
y : show symbols
p : show procedures
o : show options
D : hexdump file
Q : quit program

REC's user interface is based on a simple list browser. The user can type the following keys while in the list browser:

Region List

The region list shows how the input file is organized. Structured files formats, like COFF and ELF have separate areas for code, data and auxiliary information. The region list shows which area REC will consider for decompilation (marked with the text type), and which areas will be searched for ASCII strings (marked with the data type).
The user can force REC to consider a file region to be text or data via the command file region: command.

Labels List

The labels list shows all the addresses that are the destination of a branch or call instruction. This list is used when building the procedure list. If REC incorrectly treats a data area as a text area, it can create labels that are not part of any text region. This usually causes an incorrect procedure list. The user can then change the region list until all incorrect labels are eliminated.

Branch List

The branch list shows all the addresses that have a branch, call or return instruction. This list is used when building the procedure list.   If REC incorrectly treats a data area as a text area, it can create branches whose destination is not part of any text region. This usually causes an incorrect procedure list. The user can then change the region list until all incorrect branches are eliminated.

Jump Table List

The jump table list shows all those areas that may contain a table of addresses inside a text region. These are usually generated when compiling switch() statements. It is important that REC recognizes these tables because the control flow analyzer depends on this data to identify all the instructions of a procedure, and also to avoid treating data bytes as instructions.

Strings List

The string list shows those portions of data regions that may have ASCII strings. These strings will then be used as parameter to functions like printf() and strcpy(), among the others.

Symbols List

This list shows every symbolic name associated with addresses. These are usually names of procedures (belonging to a text region) or names of global variables (belonging to a data region). The symbol names and addresses are taken from the file's symbol table, if available. The symbol list also shows the list of imported symbols (from a types: or prototype file), and the list of user specified symbols (entered via the symbol: command in a .cmd file).

Procedure List

The procedure list shows all the addresses where REC has identified a user procedure. Some of these addresses may come from the Symbols List, in which case the name of the procedure is also shown. For static functions and for files without a symbol table, the entry point of the procedure is used as its name.

Options List

The option list allows the user to enable or disable each option. Some options are used to produce a better output, some to enable alternative analysis algorithms, and some enable internal debugging features.

Hexdump Viewer

The hexdump viewer shows the content of the input file in hexadecimal, one page at a time. The usual cursor movement characters can be used to navigate through the dump.  This mode is very useful to look at areas that REC has not recognized as code or data.
  Theory of Operation
The following block diagram shows REC's interaction with the files it uses/produces:


The minimum input to REC is the binary executable file. For example:

                rec file.exe
If file.exe is in one of the recognized formats, it will be read, and a file.rec will be produced using the default options, without further intervention from the user.

However, since decompilation is a very difficult process, the more additional information can be provided by the user, the better the output.

For example, alternative algorithms could be selected, based on the compiler used to compile the executabile file, or based on readability or output preferences. To change any of the default options, the content of the file .recrc (rec.cfg on MSDOS and Windows) is read. Each line in this file contains an option, as if that option was entered on the command line. For example, if you always want REC to start in interactive mode and to always print numeric constants in hexadecimal, use the following lines in the .recrc file:

                +interactive

                +hexconst
These options can be overridden by command line options. For example, to run REC in batch mode even though the .recrc has a +interactive option, invoke REC with the following command line:
                rec -interactive file.exe

A type file is used to tell REC the name and declaration of high-level objects, like struct, union, array and functions. By providing a type file, the user can improve the readability of the generated output, because variables will have symbolic names.

This is particularly useful to specify the name, type and number of function parameters. A number of type files for several Linux and Windows system calls are provided from the download page. To use the prototype files, you either need to specify them using the types: command of a .cmd file, or by adding their pathname to the proto.lst file, and put this file in the same directory where REC is run from.

   Command Files and handling unrecognized formats
The input file could be in a format not yet recognized by REC. In this case, REC has no knowledge of which areas of the file contain data, which contain code and which contain auxiliary information. In this case, REC can be given this information in an ASCII file, called a command file. In this command file, a lot more information can be provided, including predefined types, addresses of functions and configuration options. For example, REC could be invoked with the following command line:
            rec file.cmd
where file.cmd has the following content:
        #!wrec

        option: +hexconst
        types: string.o
        types: stdio.o

        file: file.exe 0x50 0x53

        region: 0x80100000 0x801009b4 0x800 data
        region: 0x801009b4 0x8010c1e8 0x11b4 text
        region: 0x8010c1e8 0x80120800 0xc888 data

        symbol: 0x80107fe0, 0x80108077 T CrearImage()
        symbol: 0x80108078, 0x801080d7 T LoadImage(char *, int, int)
        symbol: 0x801080d8, 0x8010813b T StoreImage()
        symbol: 0x8010813c, 0x801081ff T MoveImage(char *, int, int)

        patterns: libmips.pat
The file starts with a magic-id : #!wrec. This must be on the first line. Each line contains one command followed by a colon sign (:) and by some arguments. Comments are preceded by a '#' character. The remainder of the line after the '#' is ignored.

Each of the option: command sets one of REC's options. These options override those provided on the command line.

The types: commands specify one or more ELF files with STAB symbolic information. This file is read to get predefined types and function prototypes. To create a types file, you can simply use Linux' system compiler (or gcc on a Solaris system) with the -g option. For example, to let REC know the types of the functions defined in the string.h header file, you can compile the following C source with the command line "gcc -g -c string.c":

        /* string.c - types defined by string.h */
        char *strcmp(const char *s1, const char *s2) { }
        char *strncmp(const char *s1, const char *s2, int len) { }
        char *strcpy(char *dst, const char *src) { }
        char *strchr(const char *, int ch) { }
        ....
REC will add the prototype information to the symbols specified by the symbol: commands or to those found by the patterns: command. The actual code for the compiled functions is ignored, as well as their addresses. Note that the compiler will not generate symbolic information for functions that are not defined in the file, hence the { } at the end of each function.

The file: command specifies the binary file to be loaded. There should be only one file: command. After the file name, the magic argument specifies an optional identifier that must be present at the beginning of the file (magic number).

The region: commands specify the layout of the binary file. The arguments are the start and end memory address at which the code and data will be loaded into memory, and the file offset where the section starts. Note that no actual loading occurs. The addresses are only used for informational purposes (they must be correct for call statements to be meaningful). The last argument is the region type, and affects the operation performed on the content of the region. Only text regions are considered for decompilation. Data regions are scanned to find ASCII strings and generic pointers.
In the example:

region: 0x80100000     0x801009b4     0x800          data
        start addr     end addr       region offset  type
The symbol: commands specify starting and ending addresses of functions, along with a symbolic name and possibly a list of parameters for the function. The ending address is optional, and can be computed by REC automatically (see later). Also the ANSI-C style prototype is optional, and actually its use is discouraged, as types should be defined in a type file (see the types: command later). It is better to simply specify that the symbol is a function by adding ( ).

The patterns: commands specify one or more files containing a list of hex strings (pattern) and symbolic names. REC will search in the executable file for each pattern, and when found, it will assign the symbolic name associated with the pattern to the address where the pattern begins. The following is an example of a pattern file:

            open() size: 16
            A0 00 0A 24 08 00 40 01
            00 00 09 24 00 00 00 00
            ;
            lseek() size: 16
            A0 00 0A 24 08 00 40 01 01 00 09 24 00 00 00 00
            ;
            ...
Each pattern can be up to 256 bytes. These patterns are sometimes called signatures in the literature. The size: option tells REC how many bytes the function occupies in the binary file. For example, you can specify a 16 bytes pattern for a 3000 bytes function.
  Output Examples
When the end of the command file is reached, and/or when REC has finished analyzing the executable file, it will either enter interactive mode, or it will process the entire executable file. Currently there can be two types of output:
  1. If the +disasmonly option was specified, a file with the .dis extension will be produced. In this file, every region with the text attribute will be disassembled, and every region with the data attribute will be hexdumped.
  2. Without any option, a file with the .rec extension will be produced with a C-like representation of each procedure in each text section. The C-like representation is not perfect, and cannot be fed to a compiler to recreate the original binary. Its goal is to provide the user a better understanding of the structure of the program. The following is an example of the C-like output:
  3. hexdump(char * fname)
    {
    	unsigned char  buff[16];
    	unsigned long  offset;
    	struct _IO_FILE* fp;
    	struct stat st;
    	int cnt;
    
        if(stat(fname, & st) != 0) {
            fp = fopen(fname, "rb");
            if(fp != 0) {
                offset = 0;
    L08048867:
                if(st.st_size > offset) {
                    cnt = fread( & buff, 1, 16, fp);
                    if(cnt != 0) {
                        dumpline( & buff, offset, cnt);
                        offset = offset + cnt;
                        goto L08048867;
                    }
                } else {
                }
                fclose(fp);
                eax = 0;
            } else {
                perror(fname);
                eax = 1;
            }
        } else {
            perror(fname);
            eax = 1;
        }
    }
    
    

Additional output files could be produced if any of the debugging options were enabled. These files are used to produce the intermediate representation of the decompiled file during different stages of the decompilation process.

  Options List
The following is a list of all the options supported by REC. The options are presented in hierarchical order. This means that some options are meaningful only if the parent option has been enabled. I might add more options as I add other features.

TODO List:

Things that I still need to add (I'm working on them in my spare time):

Copyright © 1997 - 2007 Backer Street Software -- All right reserved.

Last revised on 10 Mar. 1999

Home Page | User's Guide