[BibTeX]

@online{writing-a-simple-fuzMcPherson2018,
    author={Jack McPherson},
    date={2018-01-19},
    title={Writing a Simple Fuzzer in Python},
    url={https://jmcph4.github.io/2018/01/19/writing-a-simple-fuzzer-in-python.html},
    urldate={2020-05-16},
}

[Translate] [Related]

Writing a Simple Fuzzer in Python

by Jack McPherson | 2018-01-19 00:00:00 +1000

I have had an interest in fuzzing for quite some time now, and had decided that it was time to start writing some of my own (very basic) fuzzing tools. In this post, we'll step through some of the basic things we might expect from a fuzzer and how we might achieve them using some of the code I have written.

Introduction

Firslty, we need to grab our dependencies. For this tutorial, we'll be using two projects of mine: MPH and Fuzzbang (these will be discussed later in the tutorial):

$> git clone https://github.com/jmcph4/mph.git
$> git clone https://github.com/jmcph4/fuzzbang.git

Secondly, create a directory for our project and cd into it:

$> mkdir fuzztut
$> cd fuzztut

Thirdly, copy the actual Python packages into our project directory:

$> cp ../mph/mph mph -r
$> cp ../fuzzbang/fuzzbang fuzzbang -r

Our project directory should now look like this:

$> ls
fuzzbang mph

We're now ready to commence development!

The Target

While both of the main packages this post centres around are designed to be as generic and adaptable as possible, a concrete example is often easier to learn with. So, let's create our fuzzing target, name:

All name does is prompts the user for their name via stdout, then accepts an arbitrarily long input from the user on stdin, and finally greets the user on stdout with their supplied name. This is a textbook example of a buffer overflow (those familiar with the concept probably already saw this). To compile name, simply run:

$> gcc name.c -Wall -Wextra -Wshadow -pedantic -std=c11 -g3 -o name

Our project now looks like this:

$> ls
fuzzbang mph name.c name

Talking to the Target

Now that we have our target program, we need some way to talk to it from our Python code. Python 3 has the subprocess module for this reason, but we will use a simple wrapper around it: MPH. MPH is a simple Python package which essentially just wraps the parts of subprocess relevant for our purposes. MPH allows a Python programmer to execute a program with various inputs and capture the output (not just stdout and stderr, but also the return code and any signals received). From the example in the MPH README:

from mph import program

prog = program.Program("/path/to/myprog", [])   # initialise program
prog.append_string_stdin("Hello, world!")       # write to stdin
prog.exec()                                     # run program

# check return value of guest executable
if prog.retval == 0:
    print(prog.stdout)
else:
    print("Inferior returned with return code " + str(prog.retval) + "\n")

In this example, the string "Hello, world!" could be replaced with any arbitrary string (insertion of arbitrary binary data is available via Program.append_stdin).

Let's start writing our fuzzer in Python. Create a file called fuzztut.py and use your preferred editor to open it. We'll start off by writing a function that sends a string to name and then executes it:

import sys
from mph.program import Program
from fuzzbang.alphanumericfuzzer import AlphaNumericFuzzer

PATH_TO_NAME = "" # fill this in yourself

def run(string):
    """
    Sends the provided string to the `name` program and runs it with that
    input. Returns the return value `name` gives us
    """
    prog = Program(PATH_TO_NAME, [])
    prog.append_string_stdin(string)
    prog.exec()
    
    return prog.retval

Note that you will need to provide an absolute path to name on line 5. For example, my version of line 5 would look like:

PATH_TO_NAME = "/home/jack/dev/fuzztut/name" # fill this in yourself

Our project now looks like this:

$> ls
fuzzbang fuzztut.py mph name name.c

Generating Test Data

At this point, we can send strings to name, run it, and see what the result was. This alone is just a convoluted way of running programs - what we need is a way to generate meaningful test data to give to name. For this, we'll use Fuzzbang. Fuzzbang is a Python 3 package providing a framework for producing fuzzing data. Consulting the README:

from fuzzbang.alphanumericfuzzer import AlphaNumericFuzzer

N = 10 # number of test cases

# bounds on length of alphanumeric strings
MIN_LEN = 0
MAX_LEN = 8

f = AlphaNumericFuzzer(MIN_LEN, MAX_LEN) # fuzzer object

# generate test cases
for i in range(N):
    data = f.generate() # generate string
    print("(" + str(len(data)) + ")") # print length of string
    print(data) # print string itself

Let's now write a function that generates an alphanumeric string of a certain maximum length. It should be noted at this point that there is no special reason to use just alphanumeric strings - arbitrary binary data could be used (and could potentially even expose further, more subtle vulnerabilities in programs). We'll use alphanumeric input out of simplicity. In fuzztut.py:

def generate_input(n):
    """
    Returns an alphanumeric string with a length no greater than n.
    """
    fuzzer = AlphaNumericFuzzer(0, n)
    
    return fuzzer.generate()

A few examples of how you might call generate_input:

generate_input(0) # (empty string)
generate_input(1) # q
generate_input(1) # T
generate_input(1) # o
generate_input(8) # rCyJblUl
generate_input(8) # (empty string)
generate_input(8) # M9R

These are some examples taken from actual calls to the function on my local machine. Calls are repeated to demonstrate the pseudorandom nature of the output. Note that some calls to generate_input(8) returned strings less than eight characters long.

Implementing the Main Program

We now have a way of interacting with name (run) and a way of generating (pseudo)random inputs for name (generate_input). All we need now is some code to drive the actual fuzzing process by bringing the two together.

if _name__ == "__main__":
    # usage
    if len(sys.argv) != 3:
        print("usage: python3 fuzztut.py num_cases max_length")
        exit(1)
        
    # command-line arguments    
    num_cases = int(sys.argv[1]) # number of test cases to run
    max_length = int(sys.argv[2]) # maximum length of each string
       
    results = [] # list for storing the result of each test
    
    # main loop
    for i in range(num_cases):
        input = generate_input() # generate input string
        return_value = run(input) # run name with our input
        
        # save test results to our global results list
        test_result = {}
        test_result["num"] = i
        test_result["input"] = input
        test_result["output"] = return_value
        results.append(test_result)

    # print summary
    for test in results:
        print("Case #{:d}:".format(test["num"]))
        print("    IN: " + test["input"])
        print("    OUT: {:4d}".format(test["output"]))
        print("\n")

Using our fuzzer is simple:

$> python3 fuzztut.py
usage: python3 fuzztut.py num_cases max_length

For example,

$> python3 fuzztut.py 10 8

Fuzzing the Target

Now that our fuzzer works, we can focus on fuzzing name rather than writing the fuzzer. Recall that name allocates a buffer of fixed size, yet accepts arbitrarily long input. Studying the code for name, it's obvious that the buffer is 16 characters long. With this in mind, it makes sense that inputs longer than 16 characters are likely to cause issues. Let's try it:

$> python3 fuzztut.py 10 32
Case #0:
    IN: wgZ0S7rF08
    OUT:    0
Case #1:
    IN: y6tLHoJ2u4LRs158aAIlrHsVOHT
    OUT:  -11
Case #2:
    IN: X0Ji7b5Z4TgYLYRpC0RAE740Xk
    OUT:  -11
Case #3:
    IN: 6sOweDnPfmZdIxLiKm
    OUT:    0
Case #4:
    IN: LTU
    OUT:    0
Case #5:
    IN: XlAOQtgptB
    OUT:    0
Case #6:
    IN: rYAi73kaZnwY
    OUT:    0
Case #7:
    IN: B3LOMahprORnA69ROD9yI49OP
    OUT:   -7
Case #8:
    IN: 6Tyrvvn0IK2GeURZoElR
    OUT:    0
Case #9:
    IN: TZjgYFR
    OUT:    0

Notice that name returned -11 on inputs 26-27 characters long, and -7 on an input 25 characters long. Let's see what happens if we execute name ourselves with the same inputs:

$> ./name
What's your name? y6tLHoJ2u4LRs158aAIlrHsVOHT
Hi there, y6tLHoJ2u4LRs158aAIlrHsVOHT!
Segmentation fault
$> ./name
What's your name? B3LOMahprORnA69ROD9yI49OP
Hi there, B3LOMahprORnA69ROD9yI49OP!
Bus error

Our fuzzer has revealed multiple inputs that cause name to reliably crash!

Conclusion

The bugs we found in the previous section warrant further investigation; however, actually fixing the bugs in name is outside the scope of this tutorial. These bugs will be addressed and explained in another post.

In summary, we used two small, very simple Python packages to write a (very simple) fuzzer to help us diagnose issues in a program we wrote. While name is only a toy program (with an intentional vulnerability), most of what we did in this tutorial - including the code we wrote - can be easily abstracted to any general binary executable. Both MPH and Fuzzbang provide facilities for doing so.

The complete code for fuzztut.py is available as a Gist here.