A beginner’s guide to static program analysis using Soot

In this blog post, I will show you an example that uses Soot to provide some insights about a Java program. This post is designed for the people who know Java programming and want to do some static analysis in practice but do not know anything about Soot and static analysis in theory. The repository that contains the example can be found at https://github.com/noidsirius/SootTutorial.

The Soot Tutorial Series

If you have some knowledge about static program analysis I suggest you learn Soot from here. There is no need to mention that I appreciate any feedback since I like to continue writing about Soot and static analysis in the future.

Why another tutorial for Soot? (or how did I motivate myself to write this post)

According to A Survivor’s Guide to Java Program Analysis with Soot’s authors, “Soot is a large framework which can be quite challenging to navigate without getting quickly lost” which suggests Soot has a steep learning curve. They were right; I got lost so quickly. Most of Soot’s guidelines assume the readers are familiar with theoretical parts of static programming analysis such as lattices or flow functions which I didn’t have any prior knowledge about. Moreover, these guidelines try to explain everything in Soot that most of Soot users won’t need at all in my opinion.

One way or another I learned Soot, at least some basic parts of it, but I realized those tutorials are not suitable for people with no background in static analysis. As a result, I decided to write this blog post to introduce Soot and static analysis using very simple (but working) examples. I heavily used Soot wiki and A Survivor’s Guide to Java Program Analysis with Soot to write this post.

Analyze FizzBuzz Statically

https://github.com/noidsirius/SootTutorial/blob/master/demo/HelloSoot/FizzBuzz.java

As a result, the input of BranchDetectorAnalysis is a Java method (printFizzBuzz) and the output will be the statements that branch the execution of the code ( lines 4, 6, and 8). Note that line 10 is not considered a branch statement since its condition is implicitly determined in line 8.

Alright! Let’s really do this dummy analysis with Soot. To get started, clone the SootTutorial repository into your machine.

git clone git@github.com:noidsirius/SootTutorial.git

This repository contains the code that we will use through this post. You can open it with Intellij IDEA or just use gradle in a command-line terminal. Please make sure that your Java version should not be higher than 8 since the current version of Soot does not support JPMS, Java Platform Module System, yet (you can use jEnv to manage different Java versions).

In order to verify the project is set up correctly, run the tests:

cd SootTutorial
./gradlew check

If everything goes well, you can run the analysis by

./gradlew run --args="HelloSoot"

The output should be the signature ofprintFizzBuzz, its argument and this variables, the body of printFizzBuzz in Jimple, and finally the branch statements in the method. Now we will review how Soot will produce this information and what Jimple is. Please note that the main analysis method can be found in dev.navids.soottutorial.HelloSoot.java .

Setup Soot

Method body retrieval

Fundamental classes in Soot

Scene is a singleton class that keeps all classes which are represented bySootClass. Each SootClass may contain several methods (SootMethod) and each method may have a Body object that keeps the statements (Units). So, after setting up the Soot, we can access these objects via Soot API. The code snippet below, get the FizzBuzz's SootClass, find printFizzBuzz method, and finally retrieve its JimpleBody that contains the statements of the method.

SootClass mainClass = Scene.v().getSootClass("FizzBuzz");
SootMethod sm = mainClass.getMethodByName("printFizzBuzz");
JimpleBody body = (JimpleBody) sm.retrieveActiveBody();

But what is Jimple?

There is nothing implicit in Jimple. For example, this is represented as r0 which is a Local object (the data structure of variables in Soot). Or the argument of the function is explicitly defined in i0 and its type is int. Each line represents a Unit (or Stmt since the default IR is Jimple). There are 15 different types of Stmts in Jimple, but in BranchDetectorAnalysis, we are interested only in one of them; JIfStmt. Here is the code that prints branch statements:

for(Unit u : body.getUnits()){
if (u instanceof JIfStmt)
System.out.println(u.toString());
}

body.getUnits() returns the list (or more precisely Chain)of units in printFizzBuzz body. We simply iterate over these units and print any of them that are subclasses of JIfStmt which are lines 4, 9, and 14.

Control-Flow Graph

./gradlew run --args="HelloSoot draw"
Control-Flow Graph of printFizzBuzz

Here you can see there are four possible paths from the start of the method to its end and three branch statements are colored in blue. These paths are representing the numbers divisible to 3, 5, 15, or none of them.

Conclusion

I’m a Ph.D. student in Software Engineering at UCI. I like to automate things and play music.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store