Advanced static analysis for reusable libraries and subsystems
November 01, 2014
Advanced static analysis tools are no longer novelties but rather are becoming true strategic elements in the standard professional developer’s...
Advanced static analysis tools are no longer novelties but rather are becoming true strategic elements in the standard professional developer’s toolkit.
Some early static analysis tools operated strictly on a function-by-function or module-by-module basis, with little or no cross-module analysis. Next came "whole program" analysis, where the tools could find possible run-time failures, but only if given all of the source code for the program of interest. Today, we are seeing tools that can be effective on parts of programs, such as a software library or a software subsystem, without requiring artificial "driver" code to activate the code within the library or subsystem.
Static analysis is becoming a standard part of the professional developer's set of tools. The U.S. Department of Defense has recognized the importance of static analysis in helping to identify security vulnerabilities and weaknesses in software-intensive systems. Many non-defense organizations have also begun to incorporate static analysis as part of their daily software hygiene, because of the unique advantages static analysis provides in early identification of problems that are not easily detected using more conventional testing. But now that their benefits have been established, we need to look more closely at the capabilities of the various tools, and see how they can be deployed most effectively in a modern development environment.
Almost all software development organizations promote reuse of libraries or subsystems of code, at least within the organization. Unfortunately, many static analysis tools are not well adapted to analyzing anything less than a full executable program. This is because many static analysis tools operate by first finding all of the callers of a given routine, and then effectively substituting in the set of parameter values passed at any of these calls to determine whether the routine of interest might fail on the given inputs. To perform static analysis on a library, the programmer must therefore first create a wide-ranging set of unit tests and then apply static analysis to those tests. This to some extent reduces the potential advantage of static analysis tools, because a full set of dynamic tests is needed to make effective use of such static analysis tools.
With an approach to the static analysis of software libraries based on inferring as-built pre- and postconditions (often called "contracts") from the code itself, while also accommodating programmer-provided assertions and pre- and post-condition contracts, the analysis tool very effectively analyzes all of the code within a library, and as a side benefit produces human-readable contracts that summarize the requirements and effects of each of the library routines.
Analyzing reusable libraries
A new breed of static analysis tool is emerging that escapes this need to have drivers or harnesses for the code being analyzed (Figure 1). These tools can work bottom up, starting from the leaf routines of the program, or a library, and working toward higher-level routines. Such tools infer a contract (preconditions and postconditions) for the routine from the code itself, determining what range of values or combination of values can be handled by the routine's algorithm, and what sets of values or combinations will cause run-time problems such as indexing outside an array, overflowing a numeric calculation, or dereferencing a potentially null pointer. This bottom-up, contract-based approach allows these advanced static analysis tools to provide valuable insights into pieces of programs, from a single module to libraries and subsystems. The inferred contracts are designed to be human readable, which provides useful as-built documentation to help facilitate manual code reviews, as well as identify mismatches between the original requirements and current reality. Examples of such tools include the CodePeer analyzer from AdaCore, and the CC-Check tool from Microsoft Research.
Inferring preconditions and postconditions
During manual code review, the contracts inferred by an advanced static analysis tool can help identify an immediate problem with the code. Here is an example of a routine where clearly the name of the routine does not match its function, as illustrated by the inferred postcondition that the analysis tool produced (Figure 2).
The inferred postcondition (identified by the --#postcondition comment) shows that the routine returns the number of days since the beginning of the year for the given month, while the name of the routine implies it should be returning the number of days in the month. Clearly either the programmer or the person naming the routine was confused. This is just one example of the benefit of having as-built contracts inferred by the analyzer. Many other cases occur, where either the inferred precondition or inferred postcondition indicates that the algorithm chosen by the programmer is clearly wrong, given the requirements of the routine.
A static analyzer can infer preconditions and postconditions for a routine by using a clever technique that starts with assuming that the inputs to the routine could take on any possible value, and then proceeds by eliminating the input values, or combinations of inputs, that would cause run-time failures during execution. Once the analyzer reaches the end of the routine, the possible values that remain that do not lead to run-time failures represent the only values that may always be safely passed to the routine, and therefore represent the effective preconditions of the routine. Postconditions are determined by taking the set of values of the inputs that satisfy the preconditions, and computing the sets of values they produce for the outputs of the routine.
This technique for inferring contracts works quite well for simple straight-line routines, but does not capture the full story for routines that have code that is executed on some but not all calls. For those, we need to consider conditional preconditions, namely, preconditions that apply only on certain paths through the routine. Here is an example that illustrates the need for conditional preconditions (Figure 3).
Here the analyzer has inferred preconditions to prevent numeric overflow when computing Y +/- 1, but it needed different preconditions for the two paths through the routine. It handles this by emitting conditional preconditions, which are of the form "not
Documenting presumptions about unanalyzed code
Another issue that comes up when analyzing libraries or subsystems is that they often depend on other lower-level libraries or subsystems, and there is a desire to analyze one library relatively independently from the libraries on which it might depend. This introduces a different challenge for the advanced static analyzer, namely dealing with calls from code being analyzed, to code that is not included in the current analysis. When a call occurs on a lower-level routine that is included in the current analysis, then the bottom-up analysis approach supplies the inferred preconditions and postconditions for the called routine, enabling further analysis of the higher-level routine. When the called routine is not in the current analysis, a different approach is appropriate – namely, the static analysis tool can track how the values returned by this unanalyzed routine are used, and indicate what presumptions are being made about this unanalyzed code in the routine being analyzed. For example, if a call is made on a routine that returns a pointer, and the calling routine immediately dereferences the pointer without first checking whether it is null, clearly it is presuming that the pointer returned by this unanalyzed routine is non-null (in the example, the call occurs on line 12, indicated by @12, Figure 4).
Similarly, presumptions might be made by the calling code about the range of values of a numeric returned value, or the initialization state of a more complex returned object. By explicitly documenting all such presumptions of the calling code, the analyzer provides the library implementor more insight into exactly what is being expected of the lower level libraries. These presumptions can then be compared against the actual behavior of the lower level library, to verify that the lower-level library is being used properly.
The importance of static analysis tools
As the use of static analysis tools becomes an integral part of the software development process, the capabilities of those tools can determine the overall value gained. Mature software organizations are always working to create reusable software in the form of libraries or subsystems, as it is well recognized that the key to overall productivity is based on writing less new code for each application. Advanced static analysis tools can help with the challenging task of testing the reusability and robustness of reusable components by directly analyzing libraries or subsystems without requiring the drivers, harnesses, or stubs that might be needed to create a complete executable program. These tools can accomplish this goal by operating in a bottom-up fashion, starting first with the leaf routines of the reusable components, and then inferring human-readable information in the form of preconditions, postconditions, and presumptions along the way. By so doing, they enable the library or subsystem developer to gain precise insight into the as-built behavior of the components, without the expense of developing an expansive dynamic test suite.
 K. Baldwin et al, US DoD Revitalization of System Security Engineering Through Program Protection, http://www.acq.osd.mil/se/docs/IEEE-SSE-Paper-02152012-Bkmarks.pdf
 AdaCore, CodePeer Static Analyzer, http://www.adacore.com/codepeer
 Microsoft Research, Code Contracts, http://research.microsoft.com/en-us/projects/contracts/