@online{writing-a-simple-fuzMcPherson2018, author={Jack McPherson}, date={2018-01-19}, title={Writing a Simple Fuzzer in Python}, url={https://jmcph4.github.io/2018/01/19/writing-a-simple-fuzzer-in-python.html}, urldate={2020-05-16}, }[Translate] [Related]
Writing a Simple Fuzzer in Python
| 2018-01-19 00:00:00 +1000I have had an interest in fuzzing for quite some time now, and had decided that it was time to start writing some of my own (very basic) fuzzing tools. In this post, we'll step through some of the basic things we might expect from a fuzzer and how we might achieve them using some of the code I have written.
Introduction
Firslty, we need to grab our dependencies. For this tutorial, we'll be using two projects of mine: MPH and Fuzzbang (these will be discussed later in the tutorial):
$> git clone https://github.com/jmcph4/mph.git $> git clone https://github.com/jmcph4/fuzzbang.git
Secondly, create a directory for our project and cd
into it:
$> mkdir fuzztut $> cd fuzztut
Thirdly, copy the actual Python packages into our project directory:
$> cp ../mph/mph mph -r $> cp ../fuzzbang/fuzzbang fuzzbang -r
Our project directory should now look like this:
$> ls fuzzbang mph
We're now ready to commence development!
The Target
While both of the main packages this post centres around are designed to be as generic and adaptable as possible, a concrete example is often easier to learn with. So, let's create our fuzzing target, name
:
All name
does is prompts the user for their name via stdout
, then accepts an arbitrarily long input from the user on stdin
, and finally greets the user on stdout
with their supplied name. This is a textbook example of a buffer overflow (those familiar with the concept probably already saw this). To compile name
, simply run:
$> gcc name.c -Wall -Wextra -Wshadow -pedantic -std=c11 -g3 -o name
Our project now looks like this:
$> ls fuzzbang mph name.c name
Talking to the Target
Now that we have our target program, we need some way to talk to it from our Python code. Python 3 has the subprocess
module for this reason, but we will use a simple wrapper around it: MPH. MPH is a simple Python package which essentially just wraps the parts of subprocess
relevant for our purposes. MPH allows a Python programmer to execute a program with various inputs and capture the output (not just stdout
and stderr
, but also the return code and any signals received). From the example in the MPH README:
from mph import program prog = program.Program("/path/to/myprog", []) # initialise program prog.append_string_stdin("Hello, world!") # write to stdin prog.exec() # run program # check return value of guest executable if prog.retval == 0: print(prog.stdout) else: print("Inferior returned with return code " + str(prog.retval) + "\n")
In this example, the string "Hello, world!"
could be replaced with any arbitrary string (insertion of arbitrary binary data is available via Program.append_stdin
).
Let's start writing our fuzzer in Python. Create a file called fuzztut.py
and use your preferred editor to open it. We'll start off by writing a function that sends a string to name
and then executes it:
import sys from mph.program import Program from fuzzbang.alphanumericfuzzer import AlphaNumericFuzzer PATH_TO_NAME = "" # fill this in yourself def run(string): """ Sends the provided string to the `name` program and runs it with that input. Returns the return value `name` gives us """ prog = Program(PATH_TO_NAME, []) prog.append_string_stdin(string) prog.exec() return prog.retval
Note that you will need to provide an absolute path to name
on line 5. For example, my version of line 5 would look like:
PATH_TO_NAME = "/home/jack/dev/fuzztut/name" # fill this in yourself
Our project now looks like this:
$> ls fuzzbang fuzztut.py mph name name.c
Generating Test Data
At this point, we can send strings to name
, run it, and see what the result was. This alone is just a convoluted way of running programs - what we need is a way to generate meaningful test data to give to name
. For this, we'll use Fuzzbang. Fuzzbang is a Python 3 package providing a framework for producing fuzzing data. Consulting the README:
from fuzzbang.alphanumericfuzzer import AlphaNumericFuzzer N = 10 # number of test cases # bounds on length of alphanumeric strings MIN_LEN = 0 MAX_LEN = 8 f = AlphaNumericFuzzer(MIN_LEN, MAX_LEN) # fuzzer object # generate test cases for i in range(N): data = f.generate() # generate string print("(" + str(len(data)) + ")") # print length of string print(data) # print string itself
Let's now write a function that generates an alphanumeric string of a certain maximum length. It should be noted at this point that there is no special reason to use just alphanumeric strings - arbitrary binary data could be used (and could potentially even expose further, more subtle vulnerabilities in programs). We'll use alphanumeric input out of simplicity. In fuzztut.py
:
def generate_input(n): """ Returns an alphanumeric string with a length no greater than n. """ fuzzer = AlphaNumericFuzzer(0, n) return fuzzer.generate()
A few examples of how you might call generate_input
:
generate_input(0) # (empty string) generate_input(1) # q generate_input(1) # T generate_input(1) # o generate_input(8) # rCyJblUl generate_input(8) # (empty string) generate_input(8) # M9R
These are some examples taken from actual calls to the function on my local machine. Calls are repeated to demonstrate the pseudorandom nature of the output. Note that some calls to generate_input(8)
returned strings less than eight characters long.
Implementing the Main Program
We now have a way of interacting with name
(run
) and a way of generating (pseudo)random inputs for name
(generate_input
). All we need now is some code to drive the actual fuzzing process by bringing the two together.
if _name__ == "__main__": # usage if len(sys.argv) != 3: print("usage: python3 fuzztut.py num_cases max_length") exit(1) # command-line arguments num_cases = int(sys.argv[1]) # number of test cases to run max_length = int(sys.argv[2]) # maximum length of each string results = [] # list for storing the result of each test # main loop for i in range(num_cases): input = generate_input() # generate input string return_value = run(input) # run name with our input # save test results to our global results list test_result = {} test_result["num"] = i test_result["input"] = input test_result["output"] = return_value results.append(test_result) # print summary for test in results: print("Case #{:d}:".format(test["num"])) print(" IN: " + test["input"]) print(" OUT: {:4d}".format(test["output"])) print("\n")
Using our fuzzer is simple:
$> python3 fuzztut.py usage: python3 fuzztut.py num_cases max_length
For example,
$> python3 fuzztut.py 10 8
Fuzzing the Target
Now that our fuzzer works, we can focus on fuzzing name
rather than writing the fuzzer. Recall that name
allocates a buffer of fixed size, yet accepts arbitrarily long input. Studying the code for name
, it's obvious that the buffer is 16 characters long. With this in mind, it makes sense that inputs longer than 16 characters are likely to cause issues. Let's try it:
$> python3 fuzztut.py 10 32 Case #0: IN: wgZ0S7rF08 OUT: 0 Case #1: IN: y6tLHoJ2u4LRs158aAIlrHsVOHT OUT: -11 Case #2: IN: X0Ji7b5Z4TgYLYRpC0RAE740Xk OUT: -11 Case #3: IN: 6sOweDnPfmZdIxLiKm OUT: 0 Case #4: IN: LTU OUT: 0 Case #5: IN: XlAOQtgptB OUT: 0 Case #6: IN: rYAi73kaZnwY OUT: 0 Case #7: IN: B3LOMahprORnA69ROD9yI49OP OUT: -7 Case #8: IN: 6Tyrvvn0IK2GeURZoElR OUT: 0 Case #9: IN: TZjgYFR OUT: 0
Notice that name
returned -11 on inputs 26-27 characters long, and -7 on an input 25 characters long. Let's see what happens if we execute name
ourselves with the same inputs:
$> ./name What's your name? y6tLHoJ2u4LRs158aAIlrHsVOHT Hi there, y6tLHoJ2u4LRs158aAIlrHsVOHT! Segmentation fault $> ./name What's your name? B3LOMahprORnA69ROD9yI49OP Hi there, B3LOMahprORnA69ROD9yI49OP! Bus error
Our fuzzer has revealed multiple inputs that cause name
to reliably crash!
Conclusion
The bugs we found in the previous section warrant further investigation; however, actually fixing the bugs in name
is outside the scope of this tutorial. These bugs will be addressed and explained in another post.
In summary, we used two small, very simple Python packages to write a (very simple) fuzzer to help us diagnose issues in a program we wrote. While name
is only a toy program (with an intentional vulnerability), most of what we did in this tutorial - including the code we wrote - can be easily abstracted to any general binary executable. Both MPH and Fuzzbang provide facilities for doing so.
The complete code for fuzztut.py
is available as a Gist here.