'C' lands on FPGAs to make embedded multicore computing a reality
April 12, 2016
Through intense innovation and development, the primary face of embedded computing has changed constantly throughout the decades, but it's only relati...
Through intense innovation and development, the primary face of embedded computing has changed constantly throughout the decades, but it’s only relatively recently that parallel processing by means of multiple processor cores has even been possible. Some have argued that the single-purpose nature of embedded computers rendered multicore processing unattractive, and it’s true that even today there exists a multitude of legacy applications where single-core performance remains king – a result of developers not designing for multicore. Given the predominantly asymmetric nature of processing tasks within an embedded system, multicore processing permits lower clock rates and consequently an overall reduction in power consumption – a factor more critical in embedded than ever before.
Those in the technology world have a natural desire to avoid duplication of effort and reinventing the wheel whenever possible. To this end, even form factors that have existed for decades such as the ubiquitous PC/104 remain strong, as legacy users always seek an upgrade path that provides the least effort. The changes in availability of electronic components aren’t generally controllable to the system integrator, but the mechanics of a solution are. The industry’s continuation of one of the original embedded form factors offers innumerable legacy PC/104 projects an easy upgrade path, but there’s a problem.
The peripheral bus of the original PC/104 is ISA. PCI later drafted in to offer a higher-speed bus – PCI/104, as well as PC/104-Plus, providing both PCI and ISA. Embedded developers still use PCI to integrate (what today we’d think of as) low-bandwidth peripheral functions, but if PC/104 is to survive as a form factor, an even higher-speed peripheral bus is needed to become available and perhaps even more importantly, standardized.
The objective of the EMC² (embedded multicore systems for mixed-criticality applications in dynamic and changeable real-time environments) concept was to kill two birds with one stone. Introducing the high-speed PCI Express (PCIe) expansion bus has permitted integration with the latest bandwidth-hungry peripheral functionality, such as video streams. Retaining the PC/104 form factor yet facilitating multiple cores and processors opens up doors to revitalize creaking legacy applications, or to develop entirely new ones with the familiarity of the form factor. The stack-through PCIe interface also allows multiple processor modules to be stacked, in itself multiplying the available cores and/or processors and, compared to the customary backplane solution, saves cost and space.
One of the biggest challenges is to attain the reliability across multiple independent processing cores to achieve a safety-critical status. Embedded solutions are relied upon more than ever to undertake such tasks, and the requirements of proving that safety criticality are more stringent than ever before.
Analogous to the benefits of splitting asynchronous processing tasks over multiple cores, the trend to siphon specific processing threads into disparate hardware elements designed to execute tasks such as FPGAs is prevalent today. With the explosion in their popularity, ever-increasing complexity, and an entirely new language to code them with, a skills and resource vacuum has quickly ensued.
In combination with expensive tools, availability and restrictive costs involved with expert VHDL programmers has held back the universal deployment of FPGAs. Many system developers are still shying away from the technology and undertaking invariably heavy mathematical and algorithm-based processing in a generic processor instead, at the expense of power consumption and heat dissipation requirements, amongst others. The other hurdle is few offer a commercially viable route to integrate FPGA fabric into an off-the-shelf system. Those choosing PC/104 did so, as it provided the optimum platform commercially; designing a PCB to include the FPGA circuitry simply wasn’t viable.
Xilinx seized the opportunity by commercializing an all-programmable system-on-chip (SoC) solution, the Zynq SoC with SDSoC development environment (see Figure 1). The Zync SoC combines the software programmability of an ARM processor with the hardware programmability of an FPGA. The ARM processor satisfies more traditional processing threads as the FPGA manages those heavily mathematical functions expediently and with minimal power consumption.
With few exceptions, an embedded system has requirements for I/O. If these requirements are merely slow-speed, then USB could suffice. However, with today’s applications demanding integration of high-speed cameras or a high-speed ADC/DAC, an FPGA is required. The Zynq offers the ARM processor for USB and the FPGA to satisfy the high-speed need. The SDSoC development environment enables analytics to determine by which processing element each function is most prudently executed in to achieve optimal performance and power consumption.
While the concept was first conceived by Dr. Ian Page of Oxford University more than 20 years ago, the SDSoC development tools allow FPGA fabric programming in more traditional C/C++, allowing the power of FPGAs to be leveraged by a far wider audience than ever before. This quiet revolution effectively allows any developer, down to those even just starting their coding journeys, to leverage the massive potential of FPGAs.
Page, FIEE, CEng of Oxford University, remarks: “The core of my hardware compilation or computing without computers approach was actually developed starting in 1990, although it was only in the mid ’90s that it was given a C-like syntax as Handel-C. Prior to that, Handel was cast in an abstract syntax form. This was very deliberate so that the programmer or meta-programmer could write proofs and transformations of the Handel programs. This allowed things like the automatic generation of heterogeneous parallel implementations, which were provably correct with respect to the original software.
Although it is not the most parallel example, one particular transformation was the processor introduction transformation. This took in the software program you were interested in, created a processor design specifically to support that program, then compiled the program – partially or completely – onto the processor, and then implemented the whole thing in an FPGA. It was a bit like asking ARM to create an application-specific variant of one of their processors just for you, and then getting it back from them five minutes later.”
Mark Jensen, director of corporate software strategy at Xilinx asserts: “The SDSoC development environment provides a familiar embedded C/C++ application development experience including an easy-to-use Eclipse IDE and a comprehensive design environment for heterogeneous Zynq All Programmable SoC and MPSoC deployment. Complete with the industry’s first C/C++ full-system optimizing compiler, SDSoC delivers system-level profiling, automated software acceleration in programmable logic, automated system connectivity generation, and libraries to speed programming. It also enables end-user and third-party platform developers to rapidly define, integrate, and verify system-level solutions and enable their end customers with a customized programming environment.”
So how can this progress be translated into a commercially viable embedded solution for these customers? One that offers these advantages, but without becoming so specific that few applications end up falling within its scope? Progress has been steady for years, and integrating FPGA fabric into embedded was a critical step. The answer lies in combining the single-board solution of PC/104 and the performance scalability of the ever-popular system-on-module (SOM) methodology of design that combines the advantages of off-the-shelf SBCs and custom design. Enter the EMC2 range from Sundance Multiprocessor Technology.
The EMC2 range offers a PC/104 format carrier board, with integrated PCIe expansion and SOM interface that provides the flexibility and scalability necessary to encompass the plethora of potential embedded applications it targets. The latest offering, the EMC2-Z7030, features the Xilinx Zynq SoC with integrated dual-core ARM-A9 processor combined with Kintex-7 FPGA technology (see Figure 2). This combination offers an exponential performance increase, up to a factor of 100, over a typical FPGA SoC, whereas the power consumption remains the same. Demonstrated via on-the-fly video processing in this video is achieved by migrating a C program on an ARM processor to FPGA fabric.
Flemming Christensen, managing director of Sundance Multiprocessor Technology Ltd., states: “The combination of Sundance’s EMC2 boards and the Xilinx SDSoC development environment is another significant step forward for embedded systems design, enabling systems engineers to take advantage of the Zynq SoC’s combination of a popular ARM-A9 CPU and the flexible and fast I/O associated with FPGA technology.”
The EMC2-Z7030 is highly flexible in that it can offer stand alone the core processor and FPGA combination, yet can also be utilized as a peripheral card where an x86 CPU module can be employed to vastly extend general-purpose processing capability.
Achieving commercially viable platforms around FPGAs invariably involves squeezing into the smallest FPGA fabric one possibly can. With the flexibility of the EMC2-Z7030, designers can develop and optimize their platform on a large and fast Zynq SoC to expedite development time, then scale down for production – or even maintain a scalable product range at production phase if sub-models of their solution with varying complexity levels are demanded. The concept is far from new, but the newfound affordability drives this technology from niche into mainstream embedded computing. All that’s needed to truly stake FPGA’s claim over traditional single-core embedded solutions is those SoC FPGAs costing $10, rather than $100.
This solution is part of a $100 million industry-wide EU research and development project into “embedded multicore systems for mixed criticality,” namely, EMC².