You can try it out here:
This article will demonstrate the different versions of the disassembler, discuss the code generator, describe how to extend
mipsdis, provide instructions for getting it all running yourself, and conclude by listing some of the extra documentation available for the project.
The source code is available here:
An outline of the code is available here:
I chose MIPS because it is very simple and regular, and is small enough to fit in your head. I think that writing an x86 disassembler would take about ten times as long, and I get the feeling that ARM is more complex.
I decided to use source code generation for a few reasons.
- Writing a bunch of slightly different dispatch and decoding routines by hand is tedious and error-prone. Manipulating metadata using a simple and commented text format is more pleasant.
- The metadata and code generation will come in handy for some other low-level projects that I am working on.
mipsdis runs in the browser. There is a quick inline version displayed below, and there is an expanded version you can try by clicking the "Live Demo" link.
The data in the input pane consists of the program counter in hex followed by the MIPS opcode. This is a similar format to that used by objdump.
If you click disassemble, the browser will disassemble the program and show the mnemonic and operands in the output pane.
The default program shown is a fragment of a hello world binary. The program uses a loop to load a byte from the C string "hello world" and store it to a memory location that will cause it to be written to the console.
Click here for the expanded version:
The C version of
mipsdis runs from the command-line. Below I show how to disassemble the same MIPS program as in the browser demo.
If you wish to disassemble an existing MIPS binary, you will need to adapt it into the format that
mipsdis understands. For a raw binary---that is, a bare binary file just containing opcodes and no metadata, as you might get from extracting part of an executable or from a memory dump---you can use
mipsdis as a filter.
The pipeline shown above hexdumps the raw binary file
test/hello.bin, extracts the address and opcode column, and then disassembles the results with
mipsdis (using the '-' option to read the input from stdin). Note that the starting address
0x80001000 needs to be specified because the raw binary file does not contain this information. Without it, branch offsets and jump targets would not be decoded correctly.
For convenience, scripts/rawdump.sh can be used to do the same thing.
To disassemble a binary in a specific format, such as ELF, you will need to use the appropriate tool, such as
gdb, to extract the opcodes.
Obtaining and installing a MIPS toolchain is outside the scope of this article, but imgtec does provide one here.
mipsdis function identically to the C version.
The metadata used by
mipsgen was created by a process of reading the architecture manuals and then compiling the information on each mnemonic into a form that makes sense for both humans and machines.
The resultant metadata is contained in these files:
- tables/dispatch_tables.txt is used to drive decoder switch statements.
- tables/opcode_bits.txt specifies the bitwise layout of each opcode.
- tables/asm_format.txt specifies the human readable format of each mnemonic.
- tables/field_format.txt specifies the printf format of fields inside
mipsgen parses this data, and then hands it off to specific code generation routines that generate the appropriate switch statements and formatting code for each language.
This keeps the amount of language-specific code for each disassembler down to around 300 lines. The amount of auto-generated code is around 1300 lines for each version.
I opted to generate fairly simple and verbose code, consisting of large switch statements, as well as explicit decoding functions for each mnemonic. I did it this way to make the generated code easier to read and debug, and to improve the output of code coverage tools. It also made the code easier to write, and kept the code generation consistent across the different languages.
An alternative would be to generate more compact routines that take advantage of language-specific features such as macros and polymorphism. For the purposes of this project, I would rather have those smarts in
mipsgen than in the code it generates.
mipsgen is invoked by the build system of
mipsdis as part of the process of building the different versions of the disassembler.
Adding support for another language such as python to the code generator would be straightforward; you would have to do something like this:
- Adjust the method names accordingly
- Modify the print statements in
py_gen_switchto emit the appropriate python code.
disasm/py/mipsdis.py. This is mostly a matter of tweaking the bitwise field extraction routines to make the semantics correct for python.
Adding new opcodes would be a bit harder, depending on how complex they are. You would have to do something like this:
- Add the correct entry to
- Add the bitwise layout of the opcode to
- Add the format of the command to
- If the new opcode introduces new fields or a new interpretation of those fields, new
getxxxfunctions will have to be added. See
getbroffin disasm/c/mipsdis.c for details.
To get the source and run it:
The configure script will check for these dependencies:
- nodejs (optional)
I have tested on OSX 10.11.6 and Ubuntu 16.04. Other unix variants are likely to work with minimal or no changes.
The expected output was generated by running the test input through
objdump and filtering the output slightly to match the style used by
Apart from this post, there are a few other bits and pieces of documentation that are worth looking at.
There is an outline of the code here that contains a summary of each module. The modules are listed in topological order.
Each file in the project has (or should have) a blurb at the top that contains a verbose description of what it is and why it is present. If necessary, the blurb contains cross references to other related files.
There is some more documentation on the code generation, as well as a static copy of the auto-generated code, inside the
I found the following references very useful:
- MIPS Registers
- MIPS Instruction Formats
- MIPS Instruction Reference (basic subset)
- MIPS32 Architecture for Programmers Vol 2
- MIPS32 4K User's manual
I would like to thank Awal, paul_t and other reviewers for the valuable feedback that they provided.