The compounding effect of complexity on SoC design cost and predictability
August 01, 2008
Even with the best tools available, some SoC design trends are dramatically affecting predictability and therefore overall development cost.
The Global Semiconductor Association (GSA, formerly known as the Fabless Semiconductor Association) and other organizations have recently highlighted how costs for designing Systems-on-Chip (SoCs) are rising rapidly. Though newer process technologies are partly to blame, this increase is mostly caused by the difficulties designers encounter when trying to deal with system complexity. David explains the reasons why SoC designs are risky projects to undertake.
Recent data shows that 89 percent of chip designs miss their deadlines by an average of 44 percent. Designers‚Äô ability to predict the time and resources a design will require to complete, how a device will perform, and when it will be ready for shipment to customers is suffering. Even with the best tools available, some SoC design trends are dramatically affecting predictability and therefore overall development cost, as illustrated in Figure 1.
The issues that cause this unpredictability crisis fall under four general categories:
- System complexity: Though designers understand individual IP cores, they struggle with system complexity, which comes from the multiplicative effects of IP cores interactions and the flow of information associated with their operation.
- Improper level of abstraction: Languages and techniques used to describe functional blocks have no provisions for describing these functions interactions in a system and how those interactions affect the overall system.
- Advanced power management techniques: Using dynamic voltage and frequency scaling places additional restrictions on traditional design methods.
- Process variability: Dealing with process variability as technology drops below 65 nm is very unpredictable, particularly with conventional techniques.
Complexity is deceptively insidious, as it appears to be an easy issue to address given the capabilities of advanced process technologies. With the large number of transistors available today, it seems as though anything that can be imagined can be built. Why then is handling system complexity so hard? The reason is that while the number of available transistors goes up roughly linearly (not accounting for utilization), system interactions that introduce uncertainty and therefore a predictability challenge grow by a multiplicative effect. This multiplicative effect is based on several compounding trends in SoC design.
First, the move from single processor systems to multiple heterogeneous and homogeneous processors is multiplying system interactions, thus increasing complexity. Modern SoCs usually contain more than one or two processors. Aside from programming challenges, coordinating these multiprocessors hardware-driven activities poses difficulties for designers. Design teams must understand a number of questions, including:
- What are the real demands placed on shared resources?
- What is a reasonable goal for using these shared resources?
- Is it possible to keep processors busy with a set of resource constraints?
- What is the overall impact of too much or too little local cache memory?
- Have all the intended applications been sufficiently modeled to assure that they will operate as intended on the collection of processors and shared resources that they require?
To understand and subsequently predict the impact of changes to shared resources, some of the most advanced SoC design teams have built complete system models using SystemC or a similar language. The investment in building and maintaining this level of sophistication might be beyond the reach of mainstream SoC design teams. Even with this approach the ability to predict the underlying hardware‚Äôs actual behavior, which is required to avoid the expensive iteration cycles that result from late implementation surprises, is restricted because of these modeling environments‚Äô limited accuracy.
Traditional SoCs with input and output driven by human interaction time and low performance data movement have changed dramatically. The number of interfaces on an SoC, type of I/O traffic, and I/O interactions with other functions in the system have become more involved.
For example, a device targeted for use in a connected home might have wireless networking, conventional wired networking, data movement to or from external storage, and video and audio output services operating simultaneously. Providing the maximum capability for each possible use - full-rate Gigabit Ethernet at the same time as full-rate access to a disk controller, wireless Ethernet controller, and USB 2.0 - is not unreasonable; the issue is how will these interfaces interact with complex processing functions? In this example, combining I/O with processing functions might involve 16 I/O streams operating concurrently with six major processing functions, all sharing common DRAM. This scenario is 100x more complex than any one interaction by itself. Determining if all these interactions can be managed properly leads to uncertainty and thus unpredictability.
Gone are the days when an SoC design team had the luxury of using a single standard interface for all the IP in a design. Virtually all SoC teams face the challenge of solving interoperability problems for legacy IP developed internally, new IP developed internally (generally by a different group or division), and externally sourced IP that might adhere to different interface standards.
For an IP reuse strategy to succeed, automating translations between various standards is critical. Achieving IP reuse, including test suites, design for manufacturing/design for yield, and reliability precludes modifying the IP for each system use. Modifying IP (including interfaces) adds too much risk to the design schedule. The design team must leave most of the IP as it is and instead make adaptations to other IP in the system, generally in custom bridges and bus adapters. However, doing this by hand makes the process unpredictable and not scalable. Interaction complexity rises quickly once interfaces reach greater than 1:1 correspondence.
A simple system with three types of interfaces (typical for most SoC teams) involves six translations: three for requests and three for responses. Variations in any of the interface characteristics multiply the six translations by the number of different characteristics that must be considered. Artificially constraining these characteristics is unrealistic because doing so causes either extreme overdesign in simpler functions or hampers performance in complex functions. Dealing with this complexity adds to the predictability challenge.
Traffic and bandwidth
The nature and amount of traffic managed within a complex SoC have also changed considerably. Rate-critical traffic such as video and audio is often intermixed with processor-oriented traffic, which tends to have much tighter latency requirements and comes in bursts. Add to this any real-time requirements, such as what might be imposed with networking traffic (Gigabit Ethernet must be served when it is available to avoid losing data). Serving these different traffic types across various functions interacting with one another is difficult.
This complication is then exacerbated by the amount of data; HD video requires at least 6x the bandwidth of SD video. Analyzing this in the context of the real hardware is very time-consuming, making it impractical to run simulations long enough to determine if a particular set of IP functions and their interactions can properly manage video frames, for example. As a result, SoC design teams focus on a few situations they believe to be the worst case and hope that everything will work. However, hope is not a predictable quantity in SoC design.
Virtually all SoCs operate in different modes. These operating modes generally involve a diverse set of critical IP functions interacting in unique ways. The number of different modes multiplies the complexity of analyzing and determining how to properly operate a particular set of interactions.
For example, a set-top box SoC might have a mode in which the media supplies two multimedia streams: one going to the display and audio systems for processing and the other being stored on a disk. This mode differs from one in which the hard disk supplies output functions, with one media input put into a picture window within the stored information. Another mode might supply information from the local disk to a networking port while decoding and displaying one or both media inputs. As can be readily seen, the number of modes dramatically expands the set of interoperating functions and data streams. Assuring that all these modes will function properly compounds the task the SoC design team faces.
These system complexity problems build upon one another. Even if there is only a minimal chance that designers will need to repeat a portion of the design, implementation, or verification, that small percentage accumulated across all these issues makes multiple iterations through one or more steps in the design flow an almost certain requirement.
Improper level of abstraction
The SoC design community continues to depend on languages and tools built to describe individual functions composed of gates. This is a significant limitation, as those languages (Verilog or C/C++/SystemC) do not contain the syntax, concepts, and constructs necessary to describe IP functions‚Äô interactions within a complex SoC.
As noted earlier, systems are a combination of complex functions interacting in various ways depending on the overall functionality being provided. To properly describe and analyze system interactions, designers must have a means of describing the system-level aspects critical to proper operation. This might include bandwidth characteristics in important time windows, traffic concurrency within a particular operating mode, different interactions that make up an operating mode, and the ways in which interactions impinge on one another. Lack of a method for capturing these aspects as requirements that can be verified leads to considerable probability that the low-level representations of the hardware interpreted by hand will not perform as desired. The current situation is somewhat analogous to designing a jet aircraft with the tools used by the Wright brothers - a bit risky.
Advanced power management techniques
Sophisticated power management methods being deployed at 65 nanometers and below focus on controlling operation frequency while changing the supply voltage for portions of the SoC. Although this ability has tremendous benefits, a whole section might be shut down completely to consume "zero" power, and the implications on the system-level design offer an additional degree of complexity.
Not only does this make clock distribution and management difficult, it also imposes restrictions on how data movement must be controlled depending on what sections of a device might be operating, not operating, or operating at a reduced rate. A portion of the device that can operate at two different rates in addition to being turned off complicates system interaction design and analysis by a factor of three. At minimum, the additional circuits required for subsystem isolation, clock management, power shutoff, and proper subsystem restoration complicate all aspects of the device design. Meanwhile, the supplementary operating points add complexity to the system analysis similar to or in concert with the operating mode complexity.
As process technology continues to move below 65 nm, variability in individual device operation introduces uncertainty in circuit functionality and performance on a local basis and across the device. While statistical static timing analysis and other techniques help analyze variability effects, they do not offer solutions that reduce the impact of variability.
To accommodate variability, additional margin is used to assure that circuits will function in the face of variability. This additional margin reduces the performance or functionality gains that might otherwise be realized with a more advanced process. The net result is less assurance that the design team can meet the goals set out at the beginning of the design. Of course, this is not known until late in the design, potentially causing a costly iteration of some steps in the design flow.
In summation, the complexity introduced by these issues in current SoC designs increases the number of possible interactions and situations to be considered by a few orders of magnitude. That factor, coupled with the lack of a proper method for specifying system-level interactions and analyzing them at the proper level of abstraction, makes designing SoCs a very risky, unpredictable, and therefore costly business.