A beginner’s guide to static program analysis using Soot

Navid Salehnamadi
6 min readOct 27, 2019

--

In this blog post, I will show you an example that uses Soot to provide some insights about a Java program. This post is designed for the people who know Java programming and want to do some static analysis in practice but do not know anything about Soot and static analysis in theory. The repository that contains the example can be found at https://github.com/noidsirius/SootTutorial.

The Soot Tutorial Series

If you have some knowledge about static program analysis I suggest you learn Soot from here. There is no need to mention that I appreciate any feedback since I like to continue writing about Soot and static analysis in the future.

Why another tutorial for Soot? (or how did I motivate myself to write this post)

A while ago, I wanted to analyze an Android app and instrument it for one of my course projects. I was told Soot is one of the greatest frameworks for analyzing Java and Android programs both for researchers and practitioners. So I decided to learn Soot as fast as I can. There were several resources for learning Soot. Especially the Soot’s wiki page listed some useful and easy tutorials.

According to A Survivor’s Guide to Java Program Analysis with Soot’s authors, “Soot is a large framework which can be quite challenging to navigate without getting quickly lost” which suggests Soot has a steep learning curve. They were right; I got lost so quickly. Most of Soot’s guidelines assume the readers are familiar with theoretical parts of static programming analysis such as lattices or flow functions which I didn’t have any prior knowledge about. Moreover, these guidelines try to explain everything in Soot that most of Soot users won’t need at all in my opinion.

One way or another I learned Soot, at least some basic parts of it, but I realized those tutorials are not suitable for people with no background in static analysis. As a result, I decided to write this blog post to introduce Soot and static analysis using very simple (but working) examples. I heavily used Soot wiki and A Survivor’s Guide to Java Program Analysis with Soot to write this post.

Analyze FizzBuzz Statically

Static program analysis, in its simplest form, is a black box that inputs a program (code) and outputs some of the properties of the program. For example, let’s say we are interested in finding all the branch statements in a method and call this analysis BranchDetectorAnalysis. To illustrate this example, I am going to use a trivial program: FizzBuzz. FizzBuzz prints each number from 1 to 100 , but if the number is divisible to 3/ 5/ 15 it should print Fizz / Buzz / FizzBuzz instead of the number. Here is a Java class that implements FizzBuzz.

https://github.com/noidsirius/SootTutorial/blob/master/demo/HelloSoot/FizzBuzz.java

As a result, the input of BranchDetectorAnalysis is a Java method (printFizzBuzz) and the output will be the statements that branch the execution of the code ( lines 4, 6, and 8). Note that line 10 is not considered a branch statement since its condition is implicitly determined in line 8.

Alright! Let’s really do this dummy analysis with Soot. To get started, clone the SootTutorial repository into your machine.

git clone git@github.com:noidsirius/SootTutorial.git

This repository contains the code that we will use through this post. You can open it with Intellij IDEA or just use gradle in a command-line terminal. Please make sure that your Java version should not be higher than 8 since the current version of Soot does not support JPMS, Java Platform Module System, yet (you can use jEnv to manage different Java versions).

In order to verify the project is set up correctly, run the tests:

cd SootTutorial
./gradlew check

If everything goes well, you can run the analysis by

./gradlew run --args="HelloSoot"

The output should be the signature ofprintFizzBuzz, its argument and this variables, the body of printFizzBuzz in Jimple, and finally the branch statements in the method. Now we will review how Soot will produce this information and what Jimple is. Please note that the main analysis method can be found in dev.navids.soottutorial.HelloSoot.java .

Setup Soot

As I mentioned earlier, Soot is a complex software that has lots of configurable settings. As a result, I don’t go through the details of the setup except for the most important part which is setting Soot classpath. Soot considers all Java classes in this classpath as its input. In this example, the classpath is demo/HelloSoot which contains FizzBuzz.class. For more information regarding this part, you can check this link out.

Method body retrieval

In order to do BranchDetectorAnalysis on printFizzBuzz, we have to retrieve its body. But we should locate the method first. Soot has some data structures to represent classes, methods, and statements of the input program.

Fundamental classes in Soot

Scene is a singleton class that keeps all classes which are represented bySootClass. Each SootClass may contain several methods (SootMethod) and each method may have a Body object that keeps the statements (Units). So, after setting up the Soot, we can access these objects via Soot API. The code snippet below, get the FizzBuzz's SootClass, find printFizzBuzz method, and finally retrieve its JimpleBody that contains the statements of the method.

SootClass mainClass = Scene.v().getSootClass("FizzBuzz");
SootMethod sm = mainClass.getMethodByName("printFizzBuzz");
JimpleBody body = (JimpleBody) sm.retrieveActiveBody();

But what is Jimple?

Soot provided several Intermediate Representation (IR) of Java programs in order to make the static analysis more convenient. The default IR in Soot is Jimple (Java Simple) which is something between Java and Java byte codes. Java language is preferable for humans since they can read it easily and Java byte code is suitable for machines. Jimple is a statement based, typed (every variable has a Type) and 3-addressed (every statement has at most 3 variables) intermediate representation. The code below is the representation of theprintFizzBuzz method in Jimple.

There is nothing implicit in Jimple. For example, this is represented as r0 which is a Local object (the data structure of variables in Soot). Or the argument of the function is explicitly defined in i0 and its type is int. Each line represents a Unit (or Stmt since the default IR is Jimple). There are 15 different types of Stmts in Jimple, but in BranchDetectorAnalysis, we are interested only in one of them; JIfStmt. Here is the code that prints branch statements:

for(Unit u : body.getUnits()){
if (u instanceof JIfStmt)
System.out.println(u.toString());
}

body.getUnits() returns the list (or more precisely Chain)of units in printFizzBuzz body. We simply iterate over these units and print any of them that are subclasses of JIfStmt which are lines 4, 9, and 14.

Control-Flow Graph

The branch statements control the flow of the execution of statements. All possible paths that may be executed in a method are represented as Control-Flow Graph (CFG). Soot is capable of creating the CFG of methods through an interface called UnitGraph . The image below visualizes the CFG of the printFizzBuzz method. You can draw this image by running

./gradlew run --args="HelloSoot draw"
Control-Flow Graph of printFizzBuzz

Here you can see there are four possible paths from the start of the method to its end and three branch statements are colored in blue. These paths are representing the numbers divisible to 3, 5, 15, or none of them.

Conclusion

In this post, I tried to show you how to use Soot in order to get some insight into a Java method. My primary goal was showing a working example (and provide its code and environment) to get a sense of the basic building blocks of Soot without knowing the complex Soot configurations. I hope to write another blog post to do a real static analysis with Soot.

--

--

Navid Salehnamadi
Navid Salehnamadi

Written by Navid Salehnamadi

I’m a Ph.D. student in Software Engineering at UCI. I like to automate things and play music.