AdaCore Technologies for cyber security, part 2: The challenge of secure software
June 19, 2018
This article considers why building and operating secure computer systems appears to be so difficult, as evidenced by the frequency and magnitude of attacks reported in the media.
This is part two in a series of excerpts from a book written by Roderick Chapman and Yannick Moy that cover AdaCore Technologies for Cyber Security. It considers why building and operating secure computer systems appears to be so difficult, as evidenced by the frequency and magnitude of attacks reported in the media and closes with some principles that can be applied and justify AdaCore’s position and technical approach.
Why is it so hard?
Security is a system-level property that's commonly defined as the protection of assets against threats that may compromise confidentiality, integrity, or availability. Thus, security means protection against unauthorized access to, corruption of, and denial of access to the assets. In a cyber system, software plays a key role in whether and how these requirements are met, and it does this in two ways:
- Providing the relevant security functionality (for example, cryptographic functions), and
- For the rest of the software, avoiding vulnerabilities that, if triggered, could violate the confidentiality, integrity, and/or availability requirements.
In short, security entails demonstrating, with a level of confidence commensurate with the value of the assets, that the security functions do what they are supposed to do, and the rest of the system cannot do anything to place the assets at risk.
The development of secure software sets particular challenges that place it beyond a mere “quality control” problem, though. The following sub-sections expand on these challenges to set the scene and justify a rational technical approach.
The malicious environment
AdaCore customers have always been involved with building ultra-reliable safety-critical systems. In this world, the engineers usually have strict control of the software’s operational environment, or the environment may be assumed to be benign or well-behaved in some way.
This is no longer the case. Software is often connected to open networks where benign behavior cannot be assumed. In short, software must be built to withstand and continue to operate correctly in an overtly malicious environment.
Ross Anderson and Roger Needham coined the phrase “Programming Satan’sc” to characterize this challenge. Satan’s computer doesn’t fail randomly either—it fails intelligently, in the worst-possible way, at the worst-possible time, and it can fail in ways that you don’t even know about (yet).
Asymmetry of capability
It gets worse. The bad guys (attackers, malicious actors) are smarter than you, have more money than you, and have more time than you. Furthermore, their capabilities don’t grow in some linear, predictable fashion. A public leak of a government’s arsenal of hacking tools can put nation-state level capability within reach of anyone at all in a matter of days, and these capabilities become commoditized rapidly.
Asymmetry of effort
The software developer is responsible for preventing every possible defect that might lead to any sort of security exploit or failure of the system. An attacker, on the other hand, has to find just one defect to mount a successful attack.
Asymmetry of knowledge
Computer security (and, in particular, its sub-genre cryptography) is a strange discipline, since the total knowledge of the field is far greater than what is published in the scientific literature. In short, various groups (including attackers and certain notable government agencies) know things that typical developers don’t. Examples include the RSA cryptosystem in the early 1970’s, differential power analysis pre-1999, and (until very recently) the so-called Spectre and Meltdown problems in modern CPUs.
This sets the software developer with a particular challenge: defend software against attacks that the developer doesn’t even know about.
Asymmetry of impact
In software, it is difficult to predict the relationship between a particular defect (or class of defects) and its potential impact on the system’s behavior, the system’s customers/users, or the developing organization’s business. The impact of security issues can range from negligible to those that destroy the reputation and share-price of a company overnight. Given that attacks exist that developers don’t know about, trying to decide whether a particular defect is high impact, and therefore worthy of being fixed, can be almost impossible.
The limits of test
In the 1972 Turing Award lecture, Edsger Dijkstra famously pointed out that, “program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence.”
Given an arbitrarily intelligent and motivated attacker, developers of secure systems must assume that systems will be attacked with input data and in states that have never been tested. For any non-trivial software system, any claim of security solely based on “lots of testing” must be regarded with extreme skepticism. Even a well-organized and independent penetration testing activity offers no guarantee of finding all the bugs.
The limits of talent
Some projects like to claim that their software quality is okay “because we only hire really good people,” or something like that. Such claims do not stand up to the most cursory inspection. Data from the Personal Software Process/Team Software Process (PSP/TSP) group at the US Software Engineering Institute show that even the best performing programmers inject around 20 defects per 1000 logical lines of code in their work. Truly critical systems might aim for 0.2 defect per kloc delivered to the customer—a factor of 100 times better, or equivalent to those programmers preventing or finding and fixing 99 out of every 100 defects.
It is also obvious to note that this approach does not scale with team size—projects need technologies and disciplines that allow all developers to produce work of the required quality, instead of relying on a few “hero programmers” to save the day (or the project, or the company).
The “first release” problem
There is lots of advice that tells us to patch our software regularly to make sure we’ve got the latest (and presumably least-buggy) version. This is reasonable advice, but can be something of a challenge to implement with appropriate levels of authentication and confidentiality, particularly for small embedded devices.
For a developer, the ability to patch should not be an excuse to ship defective software to the customer, thinking that it can be patched later if defects are reported. The problem is that some defects are of such dramatic impact, and the damage is done so fast, that there is no time or opportunity for any corrective action. As an example, consider the flight control software for an aerodynamically unstable fighter aircraft. A developer approaching the test pilot before first flight and saying, “Don’t worry about the bugs, we’ll ship you a patch,” would soon be looking for a new career.
While “improve and patch” is a reasonable model for many software vendors, some software just has to be fit-for-purpose at the point of its first release. Additionally, a demonstration of fitness for purpose might need to be submitted to an independent authority or regulator. This problem requires a fundamental shift in engineering mind-set.
Standards and guidance
“The nice thing about standards is that you have so many to choose from.”
– Andrew S. Tanenbaum
The European Cyber Security Organisation has published an "Overview of existing Cybersecurity standards and certification schemes." The document (which is just a survey) is nearly 200 pages, and lists something like 107 different guidance documents and/or standards for system development, and goes on to cite nine schemes for certification of professional skills.
Even within a particular application domain or industry sector, there can still be an overwhelming volume of guidance and standards that appear to apply. Standards also vary widely in terms of the vigor with which they are enforced (from effectively mandatory to entirely optional) and their technical demands (from highly specific and onerous to general and lax).
There have also been some higher-level attempts to look at the problem which merit some attention. Examples include the US National Academy of Sciences report “Software for Dependable Systems: Sufficient Evidence?” and the US National Institute for Standards and Technology (NIST) report “Dramatically Reducing Software Vulnerabilities." The NIST report identifies five approaches that have the potential to make a dramatic impact on software quality, including the use of formal methods.
The market, lemons, and regulators
For many years, there was hope that the market would self-correct and produce a rational approach to the development of secure software. This has not happened. Firstly, the development of secure software has been seen as something of a niche, but development disciplines and approaches have been dominated by the larger market for not-so-secure software where time-to-market and features take precedence over quality. Secondly, for a buyer of software it is almost impossible to tell if some specific product is any more or less secure than any other product, since proprietary software is often cloaked in secrecy (“security by obscurity”) and restricted by prohibitive licensing terms. Economists call this “a market for lemons” after similar observations were made about the market for used cars.
In a significant change of stance, the UK’s National Cyber Security Strategy 2016-2021 (NCSS) explicitly notes:
“But the combination of market forces and government encouragement has not been sufficient in itself to secure our long-term interests in cyberspace at the pace required. Too many networks, including in critical sectors, are still insecure. The market is not valuing, and therefore not managing, cyber risk correctly.”
The NCSS goes on to promise more active intervention in critical areas. It is not yet clear what form these interventions will take, but the challenge is clear: system and software developers must be ready to improve, justify, and defend their practices before their national regulator (or worse, their insurer or a court of law) decides to step in.
A manifesto for secure software
Although this picture may seem bleak, progress is possible.
AdaCore is primarily involved with the design of software development and verification tools, but these are not the sole route to improvement. Great engineering involves an interplay between technologies (tool, languages etc.), people (their skills, disciplines, attitudes), and engineering processes.
These three elements impact one another—for example, the introduction of a new static verification tool on a project might mean that later code review processes can be modified, and that engineers change their disciplines and coding style as they learn how to get best results from the tool.
This section lays out some basic principles for high-integrity software engineering that could be applied to any project, regardless of the standards, industry, or regulatory environment that might apply.
Arguably the most important aspect of producing secure and reliable software, and also perhaps the most difficult to achieve, is to specify a complete, consistent, and unambiguous set of requirements that the software must meet, and to do so without overly constraining the solution. The requirements should specify the “what”, but not the “how”, and should be expressed in a way that facilitates verifying whether they are met. Software standards such as DO-178C (airborne software) and CENELEC EN 50128 (railway control and protection systems) recognize the critical role of the requirements specification in the software life cycle. Typically derived from overall system requirements, software requirements are most easily visualized as comprising several tiers:
- High-level requirements that relate to overall functionality, performance, capacity /response time, interface issues, usability, and safety and/or security. These requirements drive the design of the software architecture.
- Low-level requirements that emerge from the software design. These requirements are defined for each component and in particular establish what each code module (subprogram, in Ada parlance) assumes when it is invoked and what it promises to deliver when it returns.
A language such as Ada or SPARK can serve to specify requirements at both levels, in particular through the various forms of contracts (pre- and postconditions, type invariants). The developer can thus directly embed requirements in the source code and verify compliance either statically with SPARK proofs or dynamically with Ada run-time checks. SPARK also has the advantage of an unambiguous notation that can formalize requirements such as the information flow between components, key integrity properties for security and/or safety, and the detailed semantics of what a subprogram computes.
Security requirements should be formulated at the outset; issues such as the usage environment (standalone with trusted operations personnel versus networked and accessible from unvetted parties) have an obvious effect on the design and the relevant assurance level. The security-related interactions between the system and its environment need to be defined, the threats identified, and countermeasures specified. (An example of how security-related issues are being assessed in the commercial avionics industry may be found in DO-326, Airworthiness Security Process Specification.)
Security-related requirements established at the system level flow down through the software life cycle into high-level requirements (for example the strength of a cryptography function) and ultimately into low-level requirements and then source code. The chosen programming language(s) and tools have a significant effect on the ease or difficulty of demonstrating that the resulting code in fact correctly implements the requirements. As will be explained throughout this booklet, the Ada and SPARK languages together with AdaCore's development and verification tools offer particular advantages.
A system’s architecture entails its high-level design as components with well-defined interfaces and interrelationships. A good architecture provides a solid framework with effective modularization and robustness in the presence of future enhancements and requirements changes. Architecture covers issues such as redundancy, the provision of fail-safe states or modes, mitigation of security concerns by physical means (e.g. hardware design), and the separation of critical from non-critical components. The last of these also allows the most critical software components to be as small as possible, which helps to control cost.
At a more technical level, strong architecture and separation means that the most appropriate languages, tools and technologies can be used where they are best suited. For example, in the MULTOS CA system, developed by Praxis in the UK , the security kernel of the system was implemented and verified using the SPARK toolset, while the GUI was implemented in C++, based on a deliberate and strict separation of security concerns.
The fitness-for-purpose of a system should be justified by a logical argument which is supported by evidence from a wide range of sources. The evidence might be based on analysis of design artefacts, observation including all forms of testing, metrics, and both direct and indirect evidence of process compliance.
Given a need for evidence, a development approach should be chosen that generates the evidence as a natural by-product. This can be summarized as “Security should be built in, not bolted on.” This idea can be seen as a generalization of “Test-Driven Development” to cover all the forms of verification activity that are available.
In line with the Agile manifesto, verification tasks should be automated as far as possible, and embodied in a continuous integration pipeline.
Analysis over observation
“Talk is cheap. Show me the code.”
– Linus Torvalds
Verification activities can be categorized as static or dynamic.
Static verification (also known as static analysis) concentrates on analysis of development artefacts (e.g. designs, models, source code) without actually running the system. Static analyses can be performed by humans (for example, personal and peer review), or automated by machines (for example, use of a tool like CodePeer or SPARK Pro.)
Dynamic verification constitutes all activities that involve observation of the system in either a simulated or real-world environment, so it covers all forms of testing.
As noted earlier, testing of security-critical systems has severe limitations. In short, the results are only as good as the test data that have been exercised, and that will always a tiny fraction of the possible system states and inputs. Further, it must be assumed that attackers will find tests that the system’s developers haven’t tried.
In theory, a static verification activity should yield results that are true for all possible states and inputs, and therefore offers a qualitatively different level of assurance from testing alone. This tool property is referred to as soundness, meaning that the results of a static verification really are trustworthy. The concept is best illustrated with a simple example. Imagine submitting a program’s source code to a tool, asking the question, “Are there any defects?” (for example, reads of uninitialized variables) and receiving a report in response. If the tool is sound, then these are the only such defects; if the tool is unsound, then the code may have defects that were not reported. Phrased differently, if a sound tool reports that the program has no defects, then this conclusion can be trusted. If an unsound tool reports that the program has no defects, then it is still possible that unreported defects are present.
The sound tool offers a more desirable result than the unsound tool, since it provides a higher degree of confidence and requires less testing and re-work later.
For these reasons, a verification approach that emphasizes the use of sound, automated and static verification is recommended.
But, as always, there is no free lunch. It turns out that soundness in static verification is rather difficult to achieve, and all tools, whether sound or unsound, run the risk of generating false alarms, i.e. reporting a potential defect that in fact is not a problem. An ideal tool that is both sound (reporting all defects) and precise (not reporting any non-defects) is not achievable; in practice, a trade-off must be maintained. For example, first applying an unsound but precise tool to analyze legacy code and correct the reported defects, and then using a sound tool to identify the remaining defects.
To analyze the meaning (and therefore the presence or absence of important defects) of a program successfully, a tool needs to know exactly what a program means. This seems obvious, but turns out to be rather difficult to achieve with most of the popular and practical programming languages.
The problem is ambiguity in programming languages—there are features or scenarios where the meaning of a program is said to be undefined or unspecified by the definition of the language. This is easy for a compiler to resolve—it just chooses one of a range of options and carries on—but is something of a disaster for static verification tools. If a verification tool has to guess the meaning of an undefined behavior in a program, then the results can be unsound.
AdaCore’s approach is to achieve soundness in static verification as far as is possible, and with respect to a reasonable and practical set of assumptions.
In the interest of run-time performance the Ada language has provision for unspecified behavior and compiler-dependent choices. For example, the order of evaluation of sub-expressions inside an expression is unspecified, which can lead to different results depending on the order chosen if these sub-expressions have side-effects. In particular, the association of additions in an expression A + B + C is compiler dependent, which can lead to different results depending on the order chosen if one of these choices leads to an integer overflow. For each of these, the CodePeer static analyzer makes and documents its choices, so that they are as close as possible to the choices made in general, and in particular in the GNAT compiler. But it cannot guarantee that these coincide with the choices made by a specific compiler version with a specific host and target configuration, so these choices should be inspected if they could be relevant for the static analysis.
The SPARK language and toolset set a high-water mark in this regard. Soundness is achieved through a combination of language subsetting, additional language rules, and analyses. SPARK can also be thought of as a fully Formal language owing to its unambiguous semantics.