Disciple of Dawn


@online{identifying-loops-whMcPherson2018,
    author={Jack McPherson},
    date={2018-02-02},
    title={Identifying Loops While Reverse Engineering},
    url={https://jmcph4.github.io/2018/02/02/identifying-loops-while-reverse-engineering.html},
    urldate={2020-05-16},
}
[Translate] [Related]

Identifying Loops While Reverse Engineering

by Jack McPherson | 2018-02-02 00:00:00 +1000

Introduction

Often when reverse engineering a binary, it's useful to have some patterns to help identify higher-level code constructs. This post goes through the general structure of various loops in a higher-level language (C) and their corresponding assembly format.

for Loops

for loops are probably the most common loop in use today. As you're probably aware, for loops are normally used to generate a sequence or to iterate over some container. They look like this:

for(i=A;i<B;i+=C)
{
    /* ... */
}

Admittedly, for loops usually just increment their loop counter, but we can be even more general. The corresponding machine code for this code snippet would look something like this:

mov dword [rbp - 4], A
jmp 5
; ...
add dword [rbp - 4], C
cmp dword [rbp - 4], B-1
jbe 1

For the snippet above, I've used line numbers for the jumps on lines 2 and 6, but in practice they would be the corresponding addresses in memory for these instructions.

Let's look at a concrete example. Consider the following C program:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    for(unsigned int i=0;i<3;i++)
    {
        printf("%d", i);
    }
    
    return EXIT_SUCCESS;
}

I'm using my 64-bit Debian machine, so I've included my uname output for reference. I'm compiling using GCC and using objdump to get the disassembly of our executable.

$ uname --all
Linux devlaptop 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux
$ gcc for.c -Wall -Wextra -Wshadow -pedantic -std=c11 -o for
$ objdump -d -Mintel for

Let's just look at the disassembly of main (I've included my own comments on relevant lines):

00000000000006b0 
: 6b0: 55 push rbp 6b1: 48 89 e5 mov rbp,rsp 6b4: 48 83 ec 10 sub rsp,0x10 6b8: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0 ; initialise loop counter to zero 6bf: eb 1a jmp 6db <main+0x2b> ; jump to loop condition check 6c1: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 6c4: 89 c6 mov esi,eax 6c6: 48 8d 3d a7 00 00 00 lea rdi,[rip+0xa7] # 774 <_IO_stdin_used+0x4> 6cd: b8 00 00 00 00 mov eax,0x0 6d2: e8 89 fe ff ff call 560 6d7: 83 45 fc 01 add DWORD PTR [rbp-0x4],0x1 ; increment loop counter by 1 6db: 83 7d fc 03 cmp DWORD PTR [rbp-0x4],0x2 ; check if loop counter has reached 2 (terminating condition) 6df: 76 e0 jbe 6c1 <main+0x11> ; jump based on the result of this comparison 6e1: b8 00 00 00 00 mov eax,0x0 6e6: c9 leave 6e7: c3 ret 6e8: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 6ef: 00

The first three instructions are the function prologue for main. The fourth instruction initialises our loop counter variable (i) to zero, and the fifth instruction performs a jump to our loop condition check (the 12th instruction). The code at 0x6c1-0x0x6d2 simply sets up the parameters to printf and calls it. At 0x6d7 our loop counter is incremented (in our case, by one). The next instruction after that checks whether our loop counter has reached 2 yet and the instruction after that jumps based off of the result of this comparison.

while Loops

What about while loops? The analagous while loop version of our for loop template above looks like this:

i = A;
while(i < B)
{
    /* ... */
    i += C;
}
If we rewrite our toy program (in for.c) to use a while loop instead of a for loop and recompile, we can check the resultant machine code in the same way we did previously.
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    unsigned int i = 0;
    
    while(i < 3)
    {
        printf("%d", i);
    }

    return EXIT_SUCCESS;
}

Our resulting disassembly is as follows:

00000000000006b0 <main>:
 6b0:   55                      push   rbp
 6b1:   48 89 e5                mov    rbp,rsp
 6b4:   48 83 ec 10             sub    rsp,0x10
 6b8:   c7 45 fc 00 00 00 00    mov    DWORD PTR [rbp-0x4],0x0
 6bf:   eb 1a                   jmp    6db <main+0x2b>
 6c1:   8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
 6c4:   89 c6                   mov    esi,eax
 6c6:   48 8d 3d a7 00 00 00    lea    rdi,[rip+0xa7]        # 774 <_IO_stdin_used+0x4>
 6cd:   b8 00 00 00 00          mov    eax,0x0
 6d2:   e8 89 fe ff ff          call   560 
 6d7:   83 45 fc 01             add    DWORD PTR [rbp-0x4],0x1
 6db:   83 7d fc 03             cmp    DWORD PTR [rbp-0x4],0x3
 6df:   76 e0                   jbe    6c1 <main+0x11>
 6e1:   b8 00 00 00 00          mov    eax,0x0
 6e6:   c9                      leave  
 6e7:   c3                      ret    
 6e8:   0f 1f 84 00 00 00 00    nop    DWORD PTR [rax+rax*1+0x0]
 6ef:   00

Exactly the same! Even diff agrees.

Conclusion

Ultimately, both types of loops essentially boil down to jumping, comparing, and jumping based on the comparison (after all, that's what looping is). While these patterns aren't hard to spot by hand, tools such as radare2 provide helpful ASCII art arrows or other visual aids to show control flow in general, not just for loops. As I myself am still new to reverse engineering, I find it helpful to have such patterns in mind to help provide structure to what can seem like an endless pool of machine code.

If this article was interesting to you and you'd like to apply these techniques to some more complicated binaries, check out this project of mine.