Last year our team had to analyze V8 bytecode. Back then, there were no tools in place to decompile such code and facilitate convenient navigation over it. We decided to try writing a processor module for the Ghidra framework. Thanks to the features of the language used to describe the output instructions, we obtained not only a readable set of instructions, but also a C-like decompiler. This article is a continuation of the series (1, 2) on our Ghidra plugin.
Several months went by between writing the processor module and this article. In this time, the SLEIGH specification remained unchanged, and the described module works on versions 9.1.2 – 9.2.2, which have been released during the last six months.
On ghidra.re and in the documentation distributed with Ghidra there is a fairly good description of the capabilities of the language. These materials are worth reading before writing your own modules. Preexisting processor modules by the framework’s developers might be excellent examples, especially if you know their architecture.
You can see in the documentation that the processor modules for Ghidra are written in SLEIGH, a language derived from the Specification Language for Encoding and Decoding (SLED), which was developed specifically for Ghidra. It translates machine code into p-code (the intermediate language that Ghidra uses to build decompiled code). As a language for describing processor instructions, it has a lot of limitations, although they can be reduced with the p-code injection mechanism implemented as Java code.
The source code of the new processor module is presented on github. This article will review the key concepts that are used in the development of the processor module using pure SLEIGH, with certain instructions as examples. Working with the constant pool, p-code injections, analyzer, and loader will be or have already been reviewed in other articles. Also you can read more about analyzers and loaders in The Ghidra Book: The Definitive Guide.
Continue reading