RecStudio Decompiler Design

In the previous page we noticed that we can convert particular configurations of basic blocks into one the "if-then-(else)" conditional statement. All "if" statements have forward jumps, that is the destination of the jump(s) is some code that has not been executed, yet.

Compilers use conditional statements to control another high-level construct: these are statements that implement loops. In this case, the destination of the jump is code that has already been executed at least once.

We could simply leave the conditional statements as "if-goto" pairs, as in the pictures on the previous page, since that's what the processor really executes, (processors don't know about loops except when using very specific instructions, such as the REP prefix in x86), but that would not make clear what the code does, as the reader would have to reconstruct the loop by checking the destination of the goto. So we'll try to discover the loops from the configuration of the basic blocks.

Every loop can be described with a "do-while" statement.

The "while" loop (pre-tested loop) is a special case of a "do-while" loop, where the bottom condition is always true, and the first statement of the loop is an "if" that jumps out of the loop.

The "for" loop is a special case of the "while" loop.

Real do-while loop

while loop

for loop

Lhead:
   do {
      statements;
Lcontinue:
   } while(cond);
Lbreak:

Lhead:
   do {
Lcontinue:
      if(cond)
         goto Lbreak;
      statements;
   } while(true);
Lbreak:

Lhead:
   var = init_expr;
   if(cond)
      goto Lbreak;
   do {
      statements;
Lcontinue:
      var += incr_expr;
   } while(cond);

To identify loops we need to know where the loop is entered, so that we can place the "do" statement there, and where the loop ends, so that all statements belonging to the loop can be placed inside the "do" statement.

The entry point of a loop is in what is called the "header" block of the loop. All execution paths must go through the header block to enter the loop (i.e. we don't consider loops with multiple entry points, although these are possible).

In order to find loops, we first compute the set of "dominators" for each block in the control flow graph.

We're not going to describe the theoretical foundation of dominators. Plenty of resources are available on the Internet.

Instead, we're showing the practical implementation of the algorithm that finds the set of dominators for each block.

First we use a bit vector where each bit corresponds to a basic block in the control flow graph.
We associate one such bit vector to each basic block.

Then we use bit-wise and and or operations to reduce the number of bits that are set in each vector, until there are no more changes.

Here's the algorithm:

ComputeDominators()
{
   int nBlocks = block_list.size();
   int i = 0;
 
   for each block in block_list {
      block->id = i++;
      block->dominators = new BitVector(nBlocks)
      block->dominators->SetAllBitsTo(1)
   }
 
   block = block_list[0]
   block->dominators->SetAllBitsTo(0)
   block->dominators->SetBit(block->id)
 
   BitVector T(nBlocks)
 
   do {
      changed = false
 
      for each block in block_list {
 
        if (block == entry_block)
            continue
 
        for each pred in block->predecessors {
            T.SetAllBitsTo(0)
            T.Or(block->dominators)
            block->dominators->And(pred->dominators)
            block->dominators->SetBit(block->id)
            if (block->dominators != T)
                changed = true
 
        }
 
      }
 
   } while (changed);
}

Based on the dominators information we can compute the set of basic blocks that belong to each loop using the following two functions:

NaturalLoopForEdge(Block header, Block tail)
{
    Stack workList;
    Loop  loop;
 
    loop = new Loop();
 
    loop->header = header
    loop->blocks += header
 
    if header != tail {
        loop->blocks += tail
        workList += tail
    }
    while workList not empty {
        block = workList.pop
        for each pred in block->predecessors {
            if not loop->find(pred) {
                loop->blocks += pred
                workList += pred
            }
        }
    }
    return loop
}
 
ComputeNaturalLoops()
{
    LoopSet   loopSet;
 
    for each block in block_list {
        if (block == entry_block)
            continue

        for each succ in block->successors {
 
            // Every successor that dominates its predecessor
            // must be the header of a loop.
            // That is, block -> succ is a back edge.
 
            if block->dominators->find(succ)
                loopSet += NaturalLoopForEdge(succ, block);
 
        }
    }
}

The algorithm work like this: first we find the back edges, that is places where execution goes back to a previously visited block. This is done by checking if the successor of a block dominates the block itself. The successor block will be the header of the loop.

Then, since to reach the (tail) block execution must go through its dominator, we can traverse the predecessors of each block starting from the tail, until we reach the header block (which is always in the loop since we put it there at the beginning of NaturalLoopForEdge() ).

Next: Creating Statements

Decompiler Design - Loop Analysis

Loop Analysis