Software Assurance            Software Hardening            Autonomic Computing

Adding C++, Python, Java, and C# Bindings for the CodeSonar API (Part 1)

This is the first in a series of posts about adding additional language bindings for the CodeSonar API.

[Read the first part | second part | third part | fourth part | fifth part]


Example #1

Get the procedure containing a program point.

// C

cs_result r;

cs_pdg container;

r = cs_pdg_vertex_pdg( pnt, &container );

if( r != CS_SUCCESS ) abort();

// C++, Java, and C#

procedure container = pnt.get_procedure();

# Python

container = pnt.get_procedure()


Around March of 2013, I started working on an evening project: port the CodeSonar (and CodeSurfer) API to some more popular and higher-level languages. It took a little longer than I had hoped, but as of July 2013, I had created C++, Python, Java, and C# ports of the API, using a tool called SWIG to do much of the work for Python, Java, and C#.

The new APIs appeared in CodeSonar 4, and are currently beta features. They should be substantially easier to use than the C or Scheme APIs, provided that the scarcity of human-authored documentation doesn’t scare people off too much. In any case, feel free to email us questions and comments about these APIs — that’s why they are in beta.

What does the CodeSonar API do?

Using the CodeSonar API, users can author custom software analyses. These analyses might detect violations of simple rules (i.e. never take the address of the variable xyz). More ambitious users might, for instance, attempt to solve the halting problem. Historically, the CodeSonar API has been exposed in C and Scheme, but as of CodeSonar 4, Python, C++, Java, and C# are also supported.

The bulk of the API is concerned with providing an interface to the intermediate representation of the program: files, procedures, program points, asts, etc.

Custom analyses are especially powerful for enforcing codebase-specific rules. For example, Boston Scientific uses a custom CodeSonar analysis to detect concurrency issues in their medical devices. Another organization has a plugin to detect incorrect exception propagation and logging. The possibilities are wide-ranging.

History

Around 1999, GrammaTech was working on something that would eventually be named CodeSurfer. It was marketed primarily as a program slicing tool and code browser, but also made a good program analysis platform for C/C++ software. It parses code and provides a programmatic interface to CFGs, ASTs, and various other IR elements.

CodeSurfer needed a UI, and GrammaTech wanted to implement it in something higher level than C. Tcl/Tk was popular at the time, but the company is comprised largely of programming-language snobs who knew better than to use Tcl. GrammaTech selected Scheme Tk (STk), which is exactly what it sounds like. In order to implement the UI, GrammaTech created a scheme interface for accessing the intermediate representation.

This scheme interface doubled as the CodeSurfer public API. Scheme is a concise, dynamically typed, functional language. Eventually, though, it became evident that scheme should not be the sole interface to the IR, for several reasons:

  • STk is orders of magnitude slower than equivalent C, which is painful for program analyses.
  • Scheme is unpopular. Most people are not about to learn a language just to program our API.
  • STk is no longer maintained (except for changes I make to our internal version).
  • The library support is poor compared to more popular languages (or other Scheme implementations, for that matter).
  • Dynamic typing can be unwieldy for large projects with many contributors.

In the mid-2000s, I came to the conclusion that CodeSonar should not be chiefly implemented in Scheme against the Scheme API. So we created a C API, but of course C has some of its own issues. The Scheme API was rewritten as a client of the C API and we fixed a number of latent bugs and inconsistencies along the way. CodeSonar was implemented primarily as a client of CodeSurfer’s C API. The CodeSurfer API is a subset of the CodeSonar API.

So why C and not C++?

C++ still isn’t a reasonable base language for APIs because you can’t count on every compiler using a standard ABI — every version of every compiler does its own thing. In theory, efforts like the IA64 C++ ABI will help with this someday. Today, we can create portable C++ APIs as veneer layers above C APIs. The other issue with C++ is that it is complicated, and not everyone loves it. I have mixed feelings about it myself.

Continue to the next post to read about the challenges faced by clients of the C API.