Guide to P-code Injection: Changing the intermediate representation of code on the fly in Ghidra

When we were developing the ghidra nodejs module for Ghidra, we realized that it was not always possible to correctly implement V8 (JavaScript engine that is used by Node.js) opcodes in SLEIGH. In such runtime environments as V8 and JVM, a single opcode might perform multiple complicated actions. To resolve this problem in Ghidra, a mechanism was designed for the dynamic injection of p-code constructs, p-code being Ghidra’s intermediate language. Using this mechanism, we were able to transform the decompiler output from this:

to this:

Let’s look at an example with the CallRuntime opcode. It calls one function from the list of the so-called V8 Runtime functions using the kRuntimeId index. This instruction also has a variable number of arguments (range is the number of the initial argument-register; rangedst is the number of arguments). The instruction in SLEIGH, which Ghidra uses to define assembler instructions, looks like this:

This means you have to complete a whole lot of work for what would seem to be a fairly simple operation:

Search for the required function name in the Runtime function array using the kRuntimeId index.
Since arguments are passed through registers, you need to save their previous state.
Pass a variable number of arguments to the function.
Call the function and store the call result in the accumulator.
Restore the previous state of registers.

If you know how to do this in SLEIGH, please let us know. We, however, decided that all this (especially the working with the variable number of register-arguments part) is not that easy (if even possible) to implement in the language for describing processor instructions, and we used the p-code dynamic injection mechanism, which the Ghidra developers implemented precisely for such cases. So, what is this mechanism?

We can create a custom user operation, such as CallRuntimeCallOther, in the assembler instruction description file (SLASPEC). Then, by changing the configuration of your module (more on this below), you can arrange it so that when Ghidra finds this instruction in the code, it will pass the processing of that instruciton back to Java dynamically, executing a callback handler that will dynamically generate p-code for the instruction, taking advantage of Java’s flexibility.

Let’s take a closer look at how this is done.

Creating User-Defined SLEIGH Operations

The CallRuntime opcode is described as follows. Read more about the description of processor instructions in SLEIGH in Natalya Tlyapova’s article.

We create the user-defined operation:

define pcodeop CallRuntimeCallOther;

And describe the instruction itself:

:CallRuntime [kRuntimeId], range^rangedst is op = 0x53; kRuntimeId; range; rangedst {
	CallRuntimeCallOther(2, 0);
}

By doing this, any opcode that starts from byte 0x53 will be decoded as CallRuntime. When we try to decompile it, the CallRuntimeCallOther operation handler will be called with arguments 2 and 0. These arguments describe the instruction type (CallRuntime) and help us write one handler for several similar instructions (such as CallWithSpread and CallUndefinedReceiver).

Necessary Housekeeping

We add a housekeeping p-code injection class: V8_PcodeInjectLibrary. We inherit this class from ghidra.program.model.lang.PcodeInjectLibrary, which implements most of the methods needed for p-code injection.

Let’s start writing the class V8_PcodeInjectLibrary from this template:

package v8_bytecode;

import …

public class V8_PcodeInjectLibrary extends PcodeInjectLibrary {

	public V8_PcodeInjectLibrary(SleighLanguage l) {

	}


}

V8_PcodeInjectLibrary won’t be used by the custom code, rather by the Ghidra engine, so we need to set the value of the pcodeInjectLibraryClass parameter in the PSPEC file so that the Ghidra engine knows which class to use for p-code injection.

<?xml version="1.0" encoding="UTF-8"?>
<processor_spec>
  <programcounter register="pc"/>
  <properties>
  	<property key="pcodeInjectLibraryClass" value="v8_bytecode.V8_PcodeInjectLibrary"/>
  </properties>
</processor_spec>

We will also need to add our CallRuntimeCallOther instruction to the CSPEC file. Ghidra will call V8_PcodeInjectLibrary only for instructions defined this way in the CSPEC file.

	<callotherfixup targetop="CallRuntimeCallOther">
		<pcode dynamic="true">			
			<input name=”outsize"/> 
		</pcode>
	</callotherfixup>

After all of these uncomplicated procedures (which, by the way, were barely described in the documentation at the time our module was being created), we can move on to writing the code.

Let’s create a HashSet, in which we will store the instructions we have implemented. We will also create and initialize a member of our class — the language variable. This code stores the CallRuntimeCallOther operation in a set of supported operations and it performs a number of housekeeping actions (we won’t go into too much detail on them).

public class V8_PcodeInjectLibrary extends PcodeInjectLibrary {
	private Set<String> implementedOps;
	private SleighLanguage language;

	public V8_PcodeInjectLibrary(SleighLanguage l) {
		super(l);
		language = l;
		String translateSpec = language.buildTranslatorTag(language.getAddressFactory(),
				getUniqueBase(), language.getSymbolTable());
		PcodeParser parser = null;
		try {
			parser = new PcodeParser(translateSpec);
		}
		catch (JDOMException e1) {
			e1.printStackTrace();
		}
		implementedOps = new HashSet<>();
		implementedOps.add("CallRuntimeCallOther");
	}
}

Thanks to the changes we have made, Ghidra will call the getPayload method of our V8_PcodeInjectLibrary class every time we try to decompile the CallRuntimeCallOther instruction. Let’s create this method, which, if there is an instruction in the list of implemented operations, will create an instance of the V8_InjectCallVariadic class (we will implement this class a little later) and return it.

@Override
	/**
	* This method is called by DecompileCallback.getPcodeInject.
	*/
	public InjectPayload getPayload(int type, String name, Program program, String context) {
		if (type == InjectPayload.CALLMECHANISM_TYPE) {
			return null;
		}

		if (!implementedOps.contains(name)) {
			return super.getPayload(type, name, program, context);
		}

		V8_InjectPayload payload = null; 
		switch (name) {
		case ("CallRuntimeCallOther"):
			payload = new V8_InjectCallVariadic("", language, 0);
			break;
		default:
			return super.getPayload(type, name, program, context);
		}

		return payload;
	}

P-Code Generation

The dynamic generation of p-code will be implemented in the V8_InjectCallVariadic class. Let’s create it and describe the operation types.

package v8_bytecode;

import …

public class V8_InjectCallVariadic extends V8_InjectPayload {

public V8_InjectCallVariadic(String sourceName, SleighLanguage language, long uniqBase) {
		super(sourceName, language, uniqBase);
	}
// Operation types. In this example, we are looking at RUNTIMETYPE
	int INTRINSICTYPE = 1;
	int RUNTIMETYPE = 2;
	int PROPERTYTYPE = 3;

	@Override
	public PcodeOp[] getPcode(Program program, InjectContext context) {
			}

	@Override
	public String getName() {
		return "InjectCallVariadic";
	}

}

It’s not hard to guess that we need to develop our implementation of the getPcode method. First, we will create a pCode object instance of the V8_PcodeOpEmitter class. This class will help us create p-code instructions (we will learn more about them later).

V8_PcodeOpEmitter pCode = new V8_PcodeOpEmitter(language, context.baseAddr, uniqueBase);

Then, we can get the address of the instruction from the context argument (the context of the code injection), which we’ll find useful later.

Address opAddr = context.baseAddr;

Using this address will help us get the object of the current instruction:

Instruction instruction = program.getListing().getInstructionAt(opAddr);

Using the context argument, we’ll also get argument values that we described earlier in SLEIGH.

Integer funcType = (int) context.inputlist.get(0).getOffset();
Integer receiver = (int) context.inputlist.get(1).getOffset();

Now we implement instruction processing and p-code generation:

// check instruction type
if (funcType != PROPERTYTYPE) {
// we get kRuntimeId — the index of the called function
			Integer index = (int) instruction.getScalar(0).getValue();
// generate p-code to call the cpool instruction using the pCode object of the V8_PcodeOpEmitter class. We will focus on this in more detail below.
			pCode.emitAssignVarnodeFromPcodeOpCall("call_target", 4, "cpool", "0", "0x" + opAddr.toString(), index.toString(), 
					funcType.toString());
		}
...


// get the “register range” argument
Object[] tOpObjects = instruction.getOpObjects(2);
// get caller args count to save only necessary ones
Object[] opObjects;
Register recvOp = null;
if (receiver == 1) {
...
}
else {
opObjects = new Object[tOpObjects.length];
System.arraycopy(tOpObjects, 0, opObjects, 0, tOpObjects.length);
}


// get the number of arguments of the called function
try {
	callerParamsCount = program.getListing().getFunctionContaining(opAddr).getParameterCount();
}
catch(Exception e) {
	callerParamsCount = 0;
}

// store old values of the aN-like registers on the stack. This helps Ghidra to better detect the number of arguments of the called function
Integer callerArgIndex = 0;
for (; callerArgIndex < callerParamsCount; callerArgIndex++) {
	pCode.emitPushCat1Value("a" + callerArgIndex);
}

// store the arguments of the called function in aN-like registers
Integer argIndex = opObjects.length;
for (Object o: opObjects) {
	argIndex--;
	Register currentOp = (Register)o;
	pCode.emitAssignVarnodeFromVarnode("a" + argIndex, currentOp.toString(), 4);
}

// function call
pCode.emitVarnodeCall("call_target", 4);

// restore old register values from the stack
while (callerArgIndex > 0) {
	callerArgIndex--;
	pCode.emitPopCat1Value("a" + callerArgIndex);
}

// return an array of p-code operations
return pCode.getPcodeOps();

Let’s now look at the logic of the V8_PcodeOpEmitter class, which is largely based on a similar module class for JVM. This class generates p-code operations using a number of methods. Let’s take a look at them in the order in which they are addressed in our code.

emitAssignVarnodeFromPcodeOpCall(String varnodeName, int size, String pcodeop, String… args)

To understand how this method works, we’ll first consider the concept of Varnode — a basic element of p-code, which is essentially any variable in p-code. Registers, local variables — they are all Varnode.

Back to the method. This method generates p-code to call the pcodeop function with the args arguments and stores the result of the function in varnodeName. The result is:

varnodeName = pcodeop(args[0], args[1], …);

emitPushCat1Value(String valueName) and emitPopCat1Value (String valueName)

Generates p-code for analogous push and pop assembler operations with Varnode valueName.

emitAssignVarnodeFromVarnode (String varnodeOutName, String varnodeInName, int size)

Generates p-code for a value assignment operationvarnodeOutName = varnodeInName.

emitVarnodeCall (String target, int size)

Generates p-code for the target function call.

Conclusion

Thanks to the p-code injection mechanism, we have managed to significantly improve the output of the Ghidra decompiler. As a result, dynamic generation of p-code is now yet another building block in our considerable toolkit — a module for analyzing Node.js scripts compiled by bytenode. The module source code is available in our repository on github.com. Happy reverse engineering!

Many thanks to my colleagues for their research into the features of Node.js and for module development: Vladimir Kononovich, Natalia Tlyapova, and Sergey Fedonin.

Author