Adding C++, Python, Java, and C# Bindings for the CodeSonar API (Part 2)December 5, 2014 Tweet
This is the second in a series of posts about adding additional language bindings for the CodeSonar API.
[Read the first part | second part | third part | fourth part | fifth part]
Invoke any function that returns a string
char *buf = NULL;
r = cs_ast_pretty_print(node, NULL, 0, &bn);
if( r == CS_TRUNCATED )
buf = malloc( bn );
if( !buf ) abort();
else if( r != CS_SUCCESS )
r = cs_ast_pretty_print(node, buf, bn, &bn);
if( r != CS_SUCCESS ) abort();
printf( "ast pretty prints as %s\n", buf );
cout << "ast pretty prints as " << node.pretty_print() << endl;
print 'ast pretty prints as', node.pretty_print()
System.out.println("ast pretty prints as " + node.pretty_print());
Using the C API correctly is hard
C makes it easy to do the wrong thing, and often you can get away with it. When you can’t get away with it, it isn’t always obvious what the problem is—maybe things crash some time later, for instance. This can be especially frightening when programming against an unfamiliar API.
Many of our C APIs resemble getrlimit: they take an input parameter, return a status code, and on success write to an output parameter. If you use the output parameter without checking the status code, you get undefined behavior. If you are lucky, the process might crash immediately. The cs_ast_get_field is an example of such a function. It retrieves a child of an AST:
cs_ast_field *out_field );
Some APIs resemble getcwd: they return an array through an output parameter, and can fail if the buffer is insufficiently large. Additionally, GrammaTech’s API functions will always produce an output parameter specifying how many bytes were needed. The bytes unit is always used for consistency, and if unit confusion does occur, at least there won’t be a buffer overrun due to allocating too little space or claiming to have more capacity than there is in reality.
size_t *bytes_needed );
The most common misuse scenarios to date are the following:
- Ignoring the status code
- Misunderstanding which parameter is the output parameter
- Misunderstanding the type system
- Needlessly allocating the output parameter on the heap because it is pointer typed
- Other unnecessary indirection and/or complexity
- Leaking memory or objects
- Unit confusion regarding buffer capacity and/or desired buffer capacity
Additionally, locating the right function for a task is hard — there are about 1000 functions, and it isn't always clear where to start looking.
The lack of a standard library (comparable to STL) has also made using the C API somewhat painful. For example, there isn't a hash map data structure to be found in libc—either you download one off the internet or roll your own.
In short, writing correct C code can be a laborious process. C++ has language features that can help with many of these issues: exceptions and objects are helpful, in particular. Organizing the big pile of functions into classes with methods is also beneficial from an organizational point of view.
Modernizing Arcane Type Names
The C and Scheme APIs use unfamiliar names for several types. The terms originate from certain program slicing literature from many years ago, but are confusing to new users in the context of a general purpose program analysis framework. The new API is an opportunity for a clean break from the old names. I've tried to use the most mainstream name possible for each type. Here are some of the most important types:
|Old name||New name||Description|
|sdg||project||An entire project|
|uid||compunit||Compilation unit source file|
|sf||sfile||Source or header file|
|sfid||sfileinst||File instance (include tree node)|
|abs_loc||symbol||Variable or procedure|
|pdg_vertex||point||Program point (statement, roughly)|
|ast||ast||Abstract syntax tree|
In the next post, David describes the design and implementation of C++, Python, Java, and C# APIs. »»