@online{writing-a-simple-fuzMcPherson2018,
author={Jack McPherson},
date={2018-01-19},
title={Writing a Simple Fuzzer in Python},
url={https://jmcph4.github.io/2018/01/19/writing-a-simple-fuzzer-in-python.html},
urldate={2020-05-16},
}
[Translate]
[Related]
Writing a Simple Fuzzer in Python
| 2018-01-19 00:00:00 +1000I have had an interest in fuzzing for quite some time now, and had decided that it was time to start writing some of my own (very basic) fuzzing tools. In this post, we'll step through some of the basic things we might expect from a fuzzer and how we might achieve them using some of the code I have written.
Introduction
Firslty, we need to grab our dependencies. For this tutorial, we'll be using two projects of mine: MPH and Fuzzbang (these will be discussed later in the tutorial):
$> git clone https://github.com/jmcph4/mph.git $> git clone https://github.com/jmcph4/fuzzbang.git
Secondly, create a directory for our project and cd into it:
$> mkdir fuzztut $> cd fuzztut
Thirdly, copy the actual Python packages into our project directory:
$> cp ../mph/mph mph -r $> cp ../fuzzbang/fuzzbang fuzzbang -r
Our project directory should now look like this:
$> ls fuzzbang mph
We're now ready to commence development!
The Target
While both of the main packages this post centres around are designed to be as generic and adaptable as possible, a concrete example is often easier to learn with. So, let's create our fuzzing target, name:
All name does is prompts the user for their name via stdout, then accepts an arbitrarily long input from the user on stdin, and finally greets the user on stdout with their supplied name. This is a textbook example of a buffer overflow (those familiar with the concept probably already saw this). To compile name, simply run:
$> gcc name.c -Wall -Wextra -Wshadow -pedantic -std=c11 -g3 -o name
Our project now looks like this:
$> ls fuzzbang mph name.c name
Talking to the Target
Now that we have our target program, we need some way to talk to it from our Python code. Python 3 has the subprocess module for this reason, but we will use a simple wrapper around it: MPH. MPH is a simple Python package which essentially just wraps the parts of subprocess relevant for our purposes. MPH allows a Python programmer to execute a program with various inputs and capture the output (not just stdout and stderr, but also the return code and any signals received). From the example in the MPH README:
from mph import program
prog = program.Program("/path/to/myprog", []) # initialise program
prog.append_string_stdin("Hello, world!") # write to stdin
prog.exec() # run program
# check return value of guest executable
if prog.retval == 0:
print(prog.stdout)
else:
print("Inferior returned with return code " + str(prog.retval) + "\n")
In this example, the string "Hello, world!" could be replaced with any arbitrary string (insertion of arbitrary binary data is available via Program.append_stdin).
Let's start writing our fuzzer in Python. Create a file called fuzztut.py and use your preferred editor to open it. We'll start off by writing a function that sends a string to name and then executes it:
import sys
from mph.program import Program
from fuzzbang.alphanumericfuzzer import AlphaNumericFuzzer
PATH_TO_NAME = "" # fill this in yourself
def run(string):
"""
Sends the provided string to the `name` program and runs it with that
input. Returns the return value `name` gives us
"""
prog = Program(PATH_TO_NAME, [])
prog.append_string_stdin(string)
prog.exec()
return prog.retval
Note that you will need to provide an absolute path to name on line 5. For example, my version of line 5 would look like:
PATH_TO_NAME = "/home/jack/dev/fuzztut/name" # fill this in yourself
Our project now looks like this:
$> ls fuzzbang fuzztut.py mph name name.c
Generating Test Data
At this point, we can send strings to name, run it, and see what the result was. This alone is just a convoluted way of running programs - what we need is a way to generate meaningful test data to give to name. For this, we'll use Fuzzbang. Fuzzbang is a Python 3 package providing a framework for producing fuzzing data. Consulting the README:
from fuzzbang.alphanumericfuzzer import AlphaNumericFuzzer
N = 10 # number of test cases
# bounds on length of alphanumeric strings
MIN_LEN = 0
MAX_LEN = 8
f = AlphaNumericFuzzer(MIN_LEN, MAX_LEN) # fuzzer object
# generate test cases
for i in range(N):
data = f.generate() # generate string
print("(" + str(len(data)) + ")") # print length of string
print(data) # print string itself
Let's now write a function that generates an alphanumeric string of a certain maximum length. It should be noted at this point that there is no special reason to use just alphanumeric strings - arbitrary binary data could be used (and could potentially even expose further, more subtle vulnerabilities in programs). We'll use alphanumeric input out of simplicity. In fuzztut.py:
def generate_input(n):
"""
Returns an alphanumeric string with a length no greater than n.
"""
fuzzer = AlphaNumericFuzzer(0, n)
return fuzzer.generate()
A few examples of how you might call generate_input:
generate_input(0) # (empty string) generate_input(1) # q generate_input(1) # T generate_input(1) # o generate_input(8) # rCyJblUl generate_input(8) # (empty string) generate_input(8) # M9R
These are some examples taken from actual calls to the function on my local machine. Calls are repeated to demonstrate the pseudorandom nature of the output. Note that some calls to generate_input(8) returned strings less than eight characters long.
Implementing the Main Program
We now have a way of interacting with name (run) and a way of generating (pseudo)random inputs for name (generate_input). All we need now is some code to drive the actual fuzzing process by bringing the two together.
if _name__ == "__main__":
# usage
if len(sys.argv) != 3:
print("usage: python3 fuzztut.py num_cases max_length")
exit(1)
# command-line arguments
num_cases = int(sys.argv[1]) # number of test cases to run
max_length = int(sys.argv[2]) # maximum length of each string
results = [] # list for storing the result of each test
# main loop
for i in range(num_cases):
input = generate_input() # generate input string
return_value = run(input) # run name with our input
# save test results to our global results list
test_result = {}
test_result["num"] = i
test_result["input"] = input
test_result["output"] = return_value
results.append(test_result)
# print summary
for test in results:
print("Case #{:d}:".format(test["num"]))
print(" IN: " + test["input"])
print(" OUT: {:4d}".format(test["output"]))
print("\n")
Using our fuzzer is simple:
$> python3 fuzztut.py usage: python3 fuzztut.py num_cases max_length
For example,
$> python3 fuzztut.py 10 8
Fuzzing the Target
Now that our fuzzer works, we can focus on fuzzing name rather than writing the fuzzer. Recall that name allocates a buffer of fixed size, yet accepts arbitrarily long input. Studying the code for name, it's obvious that the buffer is 16 characters long. With this in mind, it makes sense that inputs longer than 16 characters are likely to cause issues. Let's try it:
$> python3 fuzztut.py 10 32
Case #0:
IN: wgZ0S7rF08
OUT: 0
Case #1:
IN: y6tLHoJ2u4LRs158aAIlrHsVOHT
OUT: -11
Case #2:
IN: X0Ji7b5Z4TgYLYRpC0RAE740Xk
OUT: -11
Case #3:
IN: 6sOweDnPfmZdIxLiKm
OUT: 0
Case #4:
IN: LTU
OUT: 0
Case #5:
IN: XlAOQtgptB
OUT: 0
Case #6:
IN: rYAi73kaZnwY
OUT: 0
Case #7:
IN: B3LOMahprORnA69ROD9yI49OP
OUT: -7
Case #8:
IN: 6Tyrvvn0IK2GeURZoElR
OUT: 0
Case #9:
IN: TZjgYFR
OUT: 0
Notice that name returned -11 on inputs 26-27 characters long, and -7 on an input 25 characters long. Let's see what happens if we execute name ourselves with the same inputs:
$> ./name What's your name? y6tLHoJ2u4LRs158aAIlrHsVOHT Hi there, y6tLHoJ2u4LRs158aAIlrHsVOHT! Segmentation fault $> ./name What's your name? B3LOMahprORnA69ROD9yI49OP Hi there, B3LOMahprORnA69ROD9yI49OP! Bus error
Our fuzzer has revealed multiple inputs that cause name to reliably crash!
Conclusion
The bugs we found in the previous section warrant further investigation; however, actually fixing the bugs in name is outside the scope of this tutorial. These bugs will be addressed and explained in another post.
In summary, we used two small, very simple Python packages to write a (very simple) fuzzer to help us diagnose issues in a program we wrote. While name is only a toy program (with an intentional vulnerability), most of what we did in this tutorial - including the code we wrote - can be easily abstracted to any general binary executable. Both MPH and Fuzzbang provide facilities for doing so.
The complete code for fuzztut.py is available as a Gist here.