Adding C++, Python, Java, and C# Bindings for the CodeSonar API (Part 5)December 18, 2014 Tweet
This is the fifth in a series of posts about adding additional language bindings for the CodeSonar API.
[Read the first part | second part | third part | fourth part | fifth part]
Historically, we have used doxygen to generate the C API documentation and human beings for the Scheme documentation. Due to the large number of methods, it is quite laborious to maintain the Scheme documentation. The new APIs have similarly large numbers of methods. The CodeSurfer Python API exposes exactly 1234 methods, for example. There are tools we can use to assist creating documentation for additional languages, but it would be nice to have human-authored language-specific content.
Documentation can be found in the CodeSonar 4.0 manual’s table of contents, as shown to the right.
It was easy enough to make doxygen work on the C++ API, too. However, a human being still needs to write all the doxygen comments above the C++ method declarations, especially for methods where the functionality is not obvious from the name and signature.
Javadoc is the obvious choice for generating the java documentation, and we run it on the SWIG-generated Java classes. The nice thing about using Javadoc is that Java programmers will be immediately familiar with the organization and styling of the documentation.
C#, as far as I can tell, has no analogous tool. The C# API is nearly identical to the Java API anyway, so I foresee sending users to the Java documentation until there is evidence of demand for something more.
The Python community uses a tool named Sphinx to generate the Python manual, among other things. I’ve hand-written some tutorials to give a tour of the python API, and I must say that Sphinx is another nice tool from pocoo.org.
Sphinx, however, cannot automatically generate python API reference material from Python code, most likely because Python isn’t statically typed. Luckily, the fuzzer knows all the function signatures (including most possible exceptions), so I have arranged for it to output markup for Sphinx. It gives us something that has been requested but unavailable, even for the C API, until now: An end-to-end code example for every API function. This also serves to provide some mildly entertaining insight into what the fuzzer is doing. Here are a few examples:
# Fuzzer-generated example for the method ast_pattern_compilation_error.get_pattern
<<< except ast_pattern_compilation_error, e:
<<< v0 = e
Being computer-generated, some of the examples are silly, but it is useful none-the-less. For example, it is quite fond of showing that x == x to document the __eq__ method.
# Fuzzer-generated example for the method cfg_edge_set.__eq__
<<< v0 = project.current()
<<< v1 = v0.procedures_vector()
<<< v2 = v1.exit_point()
<<< v3 = v2.cfg_successors()
<<< v3 == v3
Generally, the shortest examples operating on non-empty inputs and producing non-empty outputs are selected for documentation. Without this objective function, the examples sometimes involve hundreds of statements.
Again, because the Python manual itself is generated by Sphinx, Python users should feel familiar with the look and feel our Python documentation.
I instantiated bindings for what seem to be the most popular languages of the age with SWIG support, but I may have skipped your favorite language (I personally would like to see ocaml happen). We ship all the SWIG inputs with CodeSonar, so it is possible to build a C++ plugin that lifts the C++ API to another language, in theory.
Exactly how tricky this is seems to depend largely on how mature SWIG’s support for that language is: I think some languages lack support for features like exceptions and directors, which would be a show-stopper. I did make a number of changes to GrammaTech’s version of SWIG to deal with our particular needs, which would be needed to process our header files. I am happy to share them on request, and would like to eventually get them to the point where they could be contributed back to SWIG.
Some more adventurous users have already started using the new API bindings. Internally, using the C++ API, Paul Anderson prototyped of order 100 MISRA rules not yet implemented by CodeSonar. Stephen Westin is working to finish this work for inclusion in CodeSonar 4.1. Besides a missing copy constructor, I do not believe any bugs in the API have been located. This suggests the fuzzer was effective in finding problems, and gives us some confidence that the API is ready for action.
While we don’t expect to use anything but the C++ API for production purposes, we frequently get user requests for specific extensions, and python is great for this. We can write the extension fast, the user doesn’t need to compile or configure anything, and some users can figure out how to do minor adjustments (e.g., internationalization) on their own. For example, we recently delivered the following python plugin, which implements part of CERT 5.7:
unspec_perms = cs.analysis.create_warningclass('Unspecified file permissions', 'CERT:5.7', 40.0, cs.warningclass_flags.PADDING)
if str(p.callee()) == 'open' and len(p.actuals_in_as_list()) == 2:
unspec_perms.report(p, 'open() should always be called with 3 parameters to avoid unspecified permissions.')
# This will happen if p isn't a call site
The other half of CERT 5.7 can be implemented trivially by forbidding use of fopen entirely. This is most easily accomplished by using the “BAD_FUNCTION” configuration options, but could also be done in a plugin.
We still have a lot of documentation to produce, but we hope the API proves useful in its current state. Our visibility into what users do outside GrammaTech is limited, but we have gotten a few questions about the new APIs at firstname.lastname@example.org — do you have a good API use case? Let us know!