@online{identifying-loops-whMcPherson2018, author={Jack McPherson}, date={2018-02-02}, title={Identifying Loops While Reverse Engineering}, url={https://jmcph4.github.io/2018/02/02/identifying-loops-while-reverse-engineering.html}, urldate={2020-05-16}, }[Translate] [Related]
Identifying Loops While Reverse Engineering
| 2018-02-02 00:00:00 +1000Introduction
Often when reverse engineering a binary, it's useful to have some patterns to help identify higher-level code constructs. This post goes through the general structure of various loops in a higher-level language (C) and their corresponding assembly format.
for
Loops
for
loops are probably the most common loop in use today. As you're probably aware, for
loops are normally used to generate a sequence or to iterate over some container. They look like this:
for(i=A;i<B;i+=C) { /* ... */ }
Admittedly, for
loops usually just increment their loop counter, but we can be even more general. The corresponding machine code for this code snippet would look something like this:
mov dword [rbp - 4], A jmp 5 ; ... add dword [rbp - 4], C cmp dword [rbp - 4], B-1 jbe 1
For the snippet above, I've used line numbers for the jumps on lines 2 and 6, but in practice they would be the corresponding addresses in memory for these instructions.
Let's look at a concrete example. Consider the following C program:
#include <stdio.h> #include <stdlib.h> int main(void) { for(unsigned int i=0;i<3;i++) { printf("%d", i); } return EXIT_SUCCESS; }
I'm using my 64-bit Debian machine, so I've included my uname
output for reference. I'm compiling using GCC and using objdump
to get the disassembly of our executable.
$ uname --all Linux devlaptop 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux $ gcc for.c -Wall -Wextra -Wshadow -pedantic -std=c11 -o for $ objdump -d -Mintel for
Let's just look at the disassembly of main
(I've included my own comments on relevant lines):
00000000000006b0: 6b0: 55 push rbp 6b1: 48 89 e5 mov rbp,rsp 6b4: 48 83 ec 10 sub rsp,0x10 6b8: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0 ; initialise loop counter to zero 6bf: eb 1a jmp 6db <main+0x2b> ; jump to loop condition check 6c1: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 6c4: 89 c6 mov esi,eax 6c6: 48 8d 3d a7 00 00 00 lea rdi,[rip+0xa7] # 774 <_IO_stdin_used+0x4> 6cd: b8 00 00 00 00 mov eax,0x0 6d2: e8 89 fe ff ff call 560 6d7: 83 45 fc 01 add DWORD PTR [rbp-0x4],0x1 ; increment loop counter by 1 6db: 83 7d fc 03 cmp DWORD PTR [rbp-0x4],0x2 ; check if loop counter has reached 2 (terminating condition) 6df: 76 e0 jbe 6c1 <main+0x11> ; jump based on the result of this comparison 6e1: b8 00 00 00 00 mov eax,0x0 6e6: c9 leave 6e7: c3 ret 6e8: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 6ef: 00
The first three instructions are the function prologue for main
. The fourth instruction initialises our loop counter variable (i
) to zero, and the fifth instruction performs a jump to our loop condition check (the 12th instruction). The code at 0x6c1-0x0x6d2 simply sets up the parameters to printf
and calls it. At 0x6d7 our loop counter is incremented (in our case, by one). The next instruction after that checks whether our loop counter has reached 2 yet and the instruction after that jumps based off of the result of this comparison.
while
Loops
What about while
loops? The analagous while
loop version of our for
loop template above looks like this:
i = A; while(i < B) { /* ... */ i += C; }If we rewrite our toy program (in
for.c
) to use a while
loop instead of a for
loop and recompile, we can check the resultant machine code in the same way we did previously.
#include <stdio.h> #include <stdlib.h> int main(void) { unsigned int i = 0; while(i < 3) { printf("%d", i); } return EXIT_SUCCESS; }
Our resulting disassembly is as follows:
00000000000006b0 <main>: 6b0: 55 push rbp 6b1: 48 89 e5 mov rbp,rsp 6b4: 48 83 ec 10 sub rsp,0x10 6b8: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0 6bf: eb 1a jmp 6db <main+0x2b> 6c1: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 6c4: 89 c6 mov esi,eax 6c6: 48 8d 3d a7 00 00 00 lea rdi,[rip+0xa7] # 774 <_IO_stdin_used+0x4> 6cd: b8 00 00 00 00 mov eax,0x0 6d2: e8 89 fe ff ff call 5606d7: 83 45 fc 01 add DWORD PTR [rbp-0x4],0x1 6db: 83 7d fc 03 cmp DWORD PTR [rbp-0x4],0x3 6df: 76 e0 jbe 6c1 <main+0x11> 6e1: b8 00 00 00 00 mov eax,0x0 6e6: c9 leave 6e7: c3 ret 6e8: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 6ef: 00
Exactly the same! Even diff
agrees.
Conclusion
Ultimately, both types of loops essentially boil down to jumping, comparing, and jumping based on the comparison (after all, that's what looping is). While these patterns aren't hard to spot by hand, tools such as radare2 provide helpful ASCII art arrows or other visual aids to show control flow in general, not just for loops. As I myself am still new to reverse engineering, I find it helpful to have such patterns in mind to help provide structure to what can seem like an endless pool of machine code.
If this article was interesting to you and you'd like to apply these techniques to some more complicated binaries, check out this project of mine.