Decompiler Design - Saved Registers


  Beginners
  Introduction
  Forward tools
  Reverse tools
  Simple decompiler
  Problems
 
  Intermediate
  Architecture
  Object formats
  Code vs. data
  Basic blocks
  Control flow graph
  Loops
  Statements
  Data Flow
  Expression propagation
  The stack
  Stack frames
  Frameless Functions
  Saved Registers
  Recovering Types
 
  Advanced
  Advanced topics
  Switch statements
  Code vs. Data 2
  Call Optimizations
 

Prev: Frame-less Functions


Saved Registers

The final part of the stack frame analysis involves the detection of saved registers.

Saved registers are those registers that are mandated by a certain calling convention to be preserved across function calls. This concept is very important for a compiler (and thus for a decompiler) to be able to allocate variables to registers and to avoid having to save to memory those registers whenever there is a call to a function that could have been generated by another compiler.

It is worth noting that different calling conventions define different sets of registers to be preserved across function calls. This is because as long as both sides, the calling side and called function side, agree on which information is to be preserved, then the compiler can generate the most efficient code for that particular call.

In the context of a decompiler, we must detect which registers are preserved by each function, so that we can exactly compute the liveness information for each register.

The basic code to save registers is generated by compilers as part of the function's prologue after having set up the stack frame. Correspondingly, each register is restored in the function's epilogue, before de-allocating the stack frame. For example:

Framed functions              Frame-less functions

   PUSH FP
   FP = SP
   SP -= #local_size            SP -= #local_size
   PUSH R1                      PUSH R1
   PUSH R2                      PUSH R2
   ...                          ...
   POP  R2                      POP R2
   POP  R1                      POP R1
   SP = FP                      SP += #local_size
   RET                          RET

How does saved registers information affect the compression of expressions? Consider the following code sequence:

   R1 = 1
   R2 = 2            R2 = 2
   PUSH R1
   CALL FUNC         FUNC(R1)           FUNC(1)
   ADD SP, #4
   M1400 = R2        M1400 = R2         M1400 = 2?

How can we be sure that the value 2 is still in register R2 after the function call? Only if we know that FUNC() does not use nor change R2 then we can collapse the R2 = 2 and the M1400 = R2 into a single statement. If FUNC() uses R2 as one of its input values, then we cannot remove the R2 = 2 assignment. If FUNC() changes R2, it's possible that R2 is actually an output of the function, and we may actually have to translate the call into M1400 = FUNC(R1).

Collapsing the R2 = 2 and M1400 = R2 can be done using liveness information, but only if the liveness information is accurate for FUNC(). That means that we need to translate FUNC() before we translate the function that calls FUNC(). How do we decide the order to analyze functions? We can use a proven technique, that is, computing the depth-first order (DFO) for each function, and then visiting the functions with the highest DFO id first.

The following algorithm computes the DFO id for each function:

   id = procList.length
   computeDFO(entryFunction)
   qsort(procList, by_DFO_id)
   for each function in procList in reverse order
      computeInputOutputRegisters(function)

computeDFO(function)
{
    int succ;

    function.visited = true
    for each destFunc in function.callsTo
        if destFunc.visited
            continue
        computeDFO(destFunc)
    }
    function.dfoID = id
    --id

After this algorithm, each function will have a DFO id, with leaf functions (functions that don't call any other function) with higher ids, while functions closer to the entry point will have lower ids.

We can then analyze functions with the highest DFO id. Those will initially be leaf functions. Since they won't have any function call, we can precisely compute the set of input registers, output registers and saved registers. As we continue to analyze functions with lower DFO ids, we can be sure that all the functions they call will have been already analyzed, thus we can be sure of which registers are alive at any given point in the program.



Next: Recovering Types © 2009 Giampiero Caprino, Backer Street Software