Improving Static Analysis Around Binary Libraries


_Grammatech_web_diagram_Diversity1.jpg

INTRODUCTION:

Many software projects rely on third party code, system libraries and re-used binary code from other projects. Advanced static analysis tools reason about the program as a whole. The fact that the library content is not available can lead to both false positives as well as false negatives in the analysis. This is caused by the fact that most static analysis tools have no idea as to what the logic inside the binary libraries is. Much more accurate results can be achieved by using binary static code analysis to build up a model of the logic in the libraries and then analyzing that in conjunction with the source analysis. This post discusses extending static analysis beyond the source/binary barrier and the benefits of doing so and introduces a variant of CodeSonar, CodeSonar/Libraries.

Related:


Whole Program (Source) Analysis

Advanced static analysis tools differentiate themselves in the way they analyze an entire program or application. Having the full scope of source available for analysis means better results for static analysis - less false positives (warnings from the tool that are incorrect) and less false negatives (real errors missed by the tool). Using this approach means that data and control flow can be traced from one unit to another that can be used for tainted data analysis, for example. Regardless of the source completeness there are still missing pieces of the analysis – the binary code in libraries, operating system calls, and any other binary objects linked into the final executable.

Missing Components

Third party code is a fact of life in embedded systems. A recent report from VDC pointed out the growth in third party source in embedded systems. In fact, the use of open source, commercial off the shelf and other third party code reaches 50% of modern projects.

Code by Creation Source.png

Figure 1: A graph showing the distribution of code origin for different classes of projects. Source “Software Assembly Practices Necessitate More Precautions” – VDC Research, 2016.

Given such high ratios of third party software in embedded projects, it follows that many of the dependencies required for complete static analysis are missing. Without this detail, tools have to make assumptions about the library behavior which might be incorrect or incomplete. Consider the following code example, the function handlePacket() is part of a binary third party library which is linked to the application:

packetT * myPacket;
myPacket = malloc(sizeof(packetT));
if (handlePacket(myPacket) == -1)
    return -1;
 

Analyzing this from the tools perspective raises many questions. Does the library own the lifecycle of the myPacket object or does the programmer have to free the memory? Are the contents of myPacket guaranteed to be initialized after the library call? Here’s another example using the libexpat (a streaming XML parser):

XML_Parse(p, Buff, len, done);

A static analysis tool doesn’t understand the details of this library therefore assumptions are required: Will the library check for null pointers? Can the library overflow Buff?

Increasing Analysis Depth

Although it’s possible to build-in support for well-known library functions into static analysis tools, it’s impossible to expand this to the general case. However, increasing the analysis depth into these external libraries increases the ability of the tools to make better decisions about bugs and security vulnerabilities.

Introducing CodeSonar/Libraries

GrammaTech CodeSonar/Libraries is unique in its ability to analyze source and binary code at the same time. Leveraging GrammaTech’s unique binary code analysis, CodeSonar/Libraries enables hybrid analysis of both source and binary objects in a project. In addition, CodeSonar/Libraries can handle any type of binary library including those without debug information and stripped and optimized binaries. The details gleaned from the binaries are included in the entire program model, greatly increasing the precision of the results.

CodeSonar/Libraries focus is to find and report errors in your source code. It increases accuracy by understanding library logic and deepening the whole program analysis. It doesn’t report on errors that originate in the library itself, assuming that the library vendor has done a proper job of testing.

CodeSonar for Binaries goes one step further for people that are concerned about the content of the libraries themselves. It reports on problems in the binaries, such that you can have a discussion with the library vendor to correct them.

Both products are available from GrammaTech today, a free evaluation version is available.


CONCLUSION:

Advanced static analysis tools can perform high fidelity analysis on source but can be limited when code paths venture in external code where source isn’t available. CodeSonar/Libraries extends static analysis into external binary objects to create a better program model for the analysis which, in turn, increases the precision of the static analysis results.


Want to learn more? Watch a recent webinar "Extending Static Analysis to Include Third Party Libraries" that introduces CodeSonar/Libraries and discusses the types of problems it can uncover in your projects.