Hybrid Concolic Execution, Part 1 (Background)

GrammaTech is heavily invested in technologies that improve the security and reliability of software. Our research is advancing the state of the art in various program analysis techniques, so there’s a lot of exciting stuff we do that makes my job awesome.

In my experience, though, software research rarely results in individual earth-shattering algorithmic breakthroughs. Rather, like research in other fields, it’s often slow and iterative—a process of fitting together small innovations until the incremental progress adds up to something useful.

In this series of posts, I’m going to provide a quick look at one of those ideas: a maybe-not-momentous-but-quite-necessary improvement we made to Grace, GrammaTech’s concolic engine.

Fuzzing

The volume of software vulnerabilities that researchers are discovering on a monthly basis should be a good reminder that none of us are testing our software well enough. The tests we do get around to implementing are often designed to detect regressions, not uncover unknown behaviors. Finding those sorts of things isn’t easy or fun (well, sometimes), so many developers have turned to automation.

One common approach for automated testing is fuzzing: subjecting a program to a stream of random data. This can range from using completely random data, to taking valid inputs (e.g., a valid PDF file) and mutating portions of them to generate structured, but still random, inputs based on some form of a specification, like a context-free grammar.

Fuzzing has been used with success by both white and black hat communities, but it’s not always possible to fuzz a program effectively. For one thing, when you’re working with an arbitrary binary, figuring out what inputs it expects can be very challenging. Plus, random data may fail to trigger complex control flow decisions deep within the program’s logic.

Concolic Execution

A more deliberate approach is concolic testing, which combines concrete and symbolic execution of the program and employs an SMT solver to generate program inputs (command line, file, network, etc.).

Consider this example from Wikipedia:

void f(int x, int y)

{

int z = 2*y;

if (x == 100000) {

if (x < z) {

assert(0);

}

}

}

Without additional information, using fuzzing to find values of x and y that satisfy the two ‘if’ conditions in the example above will take an unreasonable amount of time, if it ever even succeeds at all.

Grace and other concolic engines tackle this more methodically. The technique begins with a concrete execution of the program on an arbitrary input, building up a symbolic representation of the path as it runs. The symbolic path representation is essentially a set of logical constraints on the input data. The constraints make sure that any input that satisfies them will make the program follow the observed execution path.

Concolic execution example

On the following runs, we force the program to take new paths by systematically negating the constraints that correspond to branch conditions and feeding the modified constraint sets to the SMT solver. The models for satisfiable constraint sets are used to derive additional program inputs. E.g., if on the first run x ≠ 100000, we ask the solver to find an input that results in a value of x that does equal 100000. The new input is fed into another run where the process is repeated. You can follow the full example on Wikipedia.

On my machine, Grace takes about 2 seconds to find inputs that cover every path of that function, in a small program that gets the values of x and y via the command line. Unfortunately, useful programs aren’t that simple, so those 2 seconds grow exponentially, even for relatively straightforward tools like coreutils (the basic *nix commands you’re used to like ls and cp). There are things we do to cut down this complexity, such as modeling the behavior of commonly-used library functions, but we’re still left with a lot of branches.

And if command-line programs are complicated affairs, what about programs with a GUI? A user can use a mouse to click anywhere on the interface. On Linux, even trivial X programs link in over half a dozen libraries containing untold numbers of branches.

X is a networked windowing system. We can run Grace on X-enabled programs and hope it eventually assembles meaningful commands out of low-level network traffic, but it would likely spend an enormous amount of time exploring uninteresting library code that parses X packets.

We could model GUI actions at a high level, but it’s not obvious how one would go about symbolically modeling things like mouse movements and clicks and relate them to GUI control elements. Our experience from modeling file I/O informs us that it can be difficult and time-consuming to build and debug this sort of thing.

What we came up with was a hybrid approach: Grace would still generate command line, file, and network inputs with the solver, but GUI actions would be produced randomly.

To be continued in Part 2...

Part 2 will explore how Grace handles GUI applications and how effectively it does so. Come back next week for the rest.