Know the basic tools in Soot

7 min readApr 30, 2020

Finally, I could find some time during the pandemic to be a little productive and write the second post to introduce Soot a little bit more. In this post, I’m going to show you some basic API methods in Soot. This post is a short reference list of API methods and helper classes in Soot that can be used in future posts. In the next post, I’ll show how to use these tools to instrument an Android app to log the executed methods without having access to the source code.

I’m not going to describe the various configuration options for Soot explicitly (at least in this post). Instead, I provided a “working” set of options in the example code and explained options whenever it’s necessary. However, understanding these options is sometimes crucial for a correct analysis. If you want to know more about configuring Soot, this official doc is a good starting point. Besides that, I wrote some comments on the options that I used and can be found in this repository https://github.com/noidsirius/SootTutorial. For the rest of the post, I suggest you clone the repository and continue reading by having an eye on BasicAPI.java (containing all code snippets in this post) and Circle.java (the example code under analysis).

The example class for analysis

SootClass

Classes are the pivotal data structures in analyzing code in Soot. You can access a class through Scene (a singleton class contains everything related to the code under analysis) by

circleClass = Scene.v().getSootClass("Circle")

where “Circle" is the name of the class file. Note that if the class belongs to a package, it should contain the package name as well, e.g., com.example.Circle. Each SootClass has a Type which shares the same string with the class and can be accessed by circleClass.getType() .

Not all the times the queried class exists in Scene. If not, the result of getSootClass can return a phantom class or throws an exception according to the Soot options. A phantom object is created artificially by Soot to continue the analysis even if some information is missing. This option is quite useful when you are only interested only in the application classes (under analysis) and not other libraries. If you want to make sure the queried class exists, you can retrieve it by getSootClassUnsafe(className, false) (if it doesn't exist the result is null) or check if the class is phantom using circleClass.isPhantom()

For more information, check the method reportSootClassInfo in BasicAPI.java or read SootClass java doc or Class Loading in Soot.

SootField and SootMethod

A class can contain fields and methods. You can retrieve a field of a class by providing its name and type. For example:

SootField radiusField = circleClass.getField("radius", IntType.v())

returns the field radius with type int. Note that, primitive types such as int or double has a singleton in Soot. Methods can be retrieved similarly; however, because of method overloading, it may be a little more tricky. The figure below describes the different parts of a method signature.

If the name of a method is unique in its class, then you can retrieve the method using getMethodByName , for examplecircleClass.getMethodByName("getCircleCount") . Otherwise, e.g., areamethods, this method will throw an exception since it cannot distinguish which method is requested. In those case, you can provide the subsignature and use circleClass.getMethod("int area(boolean)") .

Modifier

A class, method, and field have some information regarding their access modes such as public, private, final, abstract, etc. These data are stored in an integer called modifier and can be parsed using the class Modifier. For example, to check if a method is static, you can use Modifier.isStatic(method.getModfiers())

For more information, check method reportSootMethodInfo and reportSootFieldInfo in BasicAPI.java or read SootField, SootMethod, or Modifier Java docs.

Body

Each method has a body that is represented in one of the Soot’s Intermediate Representations (IR). We talked a little about Jimple, Locals and Stmts in the previous blog post. Now I want to show the constructs above and some new ones in the Jimple body of the method <Circle: int area(boolean)> :

Jimple representation of a method

Let’s describe this Jimple a little bit (it’s better to first look at its corresponding Java code). The first line shows the subsignature of the method, and the following lines (3 to 7) define the Local s. Statements (Stmts) are in lines 8 to 24.

Each statement consists of some values (Value). For example, line 12 ($i4 = virtualinvoke r0.<Circle: int area()>();)is an assignment statement consists of two values: $i4 is a Local value, and the right side of the assignment is an expression (Expr). I suggest you be familiar with expressions hierarchy; it can be handy sometimes. The expression mentioned above is an InvokeExpr, InstanceInvokeExpr, and VirtualInvokeExpr .
References (Ref) capture the usages of methods or fields in a statement. For example, the right side of line 19 ($i2 = r0.<Circle: int radius>;) is a FieldRef that accesses a SootField(<Circle: int radius>). A FieldRef has contextual information, for example, r0.<Circle: int radius> shows that the field belongs to the local variable r0 .
Traps (Trap) has the information about exception handling procedures (denoted as try/catch). Line 24 shows a trap in the method. It defines the coverage of the try (from label1 to label2 or lines 10 to 13) and also determines which statement is responsible for handling the exception (label4 or line 15).

Switch

One naive way to define the type of Stmt, Expror Ref is using instanceof and compare it with all options, e.g., if(stmt instanceof AssignStmt) …. However, Soot provides a more elegant way to do that: switches. A switch is an abstract class that distinguishes different kinds of a Soot object by providing unique methods. I show an example of switches right after telling you how we can use But first

Modifying code

You can use Soot as a compiler and alter the body of a method without having access to its source code. For example, the code below changes the target of an IfStmt (the new target of the statement is its next statement). If you want to be sure that the change didn't make the code incorrect, you can validate the body. This API method verifies a few sanity conditions on the content of the body, such as using a local before assigning a value to it.

stmt.apply(new AbstractStmtSwitch() {
    @Override
    public void caseIfStmt(IfStmt stmt) {
        stmt.setTarget(body.getUnits().getSuccOf(stmt));
    }
});
body.validate();

For more information, you can check methods main, reportLocalInfo, doesInvokeMethod,modifyBodyand reportFieldRefInfo in BasicAPI.java or read the following docs from the official wiki of Soot:1, 2, 3, and 4.

Call Graph

Everything that we discussed so far did not tell us anything about inter-procedural information. Call graphs in static analysis is an essential construct that captures lots of inter-procedural data. In a call graph, nodes are methods, and directed edges indicate the source method may invoke the target method. In Soot, each edge is annotated with the invocation statement. The figure below shows the call graph of methods in the Circle class.

There are some points regarding this call graph.

<init> and <clinit> methods are the constructor and static initializer of the class. As the name suggests, a static initializer initialized the static fields of a class.
The blue node is the constructor of the class Object . The Circle's constructor calls this method because Circle inherits from Object .
Soot artificially added edges for each static field accesses. As a result, the method int area() does not have an edge void <clinit> since it does not have any static field access. Note that, because of these artificial edges, you need to be careful when analyzing the call graph. For example, there is a cycle for the method void <clinit> in this graph, but no method is actually invoked in it.

To access the call graph, you can use Scene.v().getCallGraph() , and you can request an iterator of edges using the source method, source statement, or target method. For example, callGraph.edgesOutOf(areaMethod) is an iterator of edges that their source nodes are areaMethod . Do not forget to enable the cgpack and set the whole-program mode. So, what is a pack? Let's take a quick look at the internal constructs of Soot.

A small note on Soot’s internal constructs

According to the Soot’s documentation: “Soot’s execution is divided in a set of different packs, and each pack contains different phases.” You can find more details in the link but in summary, packs help Soot analyze code fast (intra- and inter-procedurally) and enable you to customize Soot to transform, optimize, and annotate code. In the next post, we use a transformation pack to instrument an Android app.

Conclusion

In this post, I explained some basic API methods that I found essential in using Soot. Knowing this stuff is not enough for a “real” static analysis, but helps you to leverage Soot properly.

The Soot Tutorial Series