Heterogeneous LEGaTO Hardware

November 18, 2020

Story

If today’s most efficient supercomputer, the NVIDIA DGX SuperPOD, would be scaled in size to deliver 1 exaflop of performance, it would consume 38 megawatts of power.

At the time of writing, the fastest supercomputer in the world is capable of calculating 442 petaflops. However, the official goal of various high-performance computing (HPC) centers is to achieve more than twice that: 1 exaflop, or 1018 floating-point operations per second.

If today’s most efficient supercomputer, the NVIDIA DGX SuperPOD, would be scaled in size to deliver 1 exaflop of performance, it would consume 38 megawatts of power. That is about the average power consumption of six small towns with 10,000 citizens each. And the demand for computational power and data centers is growing continuously to serve not only big companies and research institutions, but also the general public via websites and multimedia streaming, etc.

This example clearly shows that data centers’ energy efficiency needs to be increased drastically, which is the primary ambition of the LEGaTO project. While big HPC centers of the world are targeting the maximum total performance of supercomputers, the LEGaTO project aims to solve the challenge of energy-efficient computing.

LEGaTO develops, deploys, and demonstrates a software stack to support energy-efficient heterogeneous computing, using a naturally energy-efficient, task-based programming model coupled with a dataflow runtime while simultaneously ensuring security, resilience, and programmability. Four LEGaTO use cases in the areas of healthcare, machine learning, and IoT for smart homes and cities were able to demonstrate an increased energy efficiency between 5x and 15x, with one specific part even providing gains of more than 800x.

The massive increase in energy efficiency is possible thanks to the combination of software that has been optimized with LEGaTO tools and carefully selected heterogeneous hardware that is custom-tailored to the specific use case. LEGaTO uses the modular RECS|Box cluster server (called RECS|Box Arneb and RECS|Box Antares) as a baseline for supporting arbitrary mixtures of high-performance Arm server processors, low-power Arm embedded/mobile SoCs, traditional x86 processors, GPUs, and FPGAs in a heterogeneous, densely integrated environment, and coupling it with a powerful communications infrastructure.

This modularity is achieved by plugging different microservers (or computer-on-modules (COMs)) onto two different versions of carrier blades:

  • A COM Express carrier blade that supports three COM Express type 6 or type 7 modules that boot from a local M.2 SSD and can be extended with a PCI Express card or additional 2.5-inch HDD/SSDs.
  • On the low-power side, the second carrier-blade supports 16 NVIDIA Jetson TX2 microservers.

As the COM Express standard is widely used in the industry, there are numerous third-party microservers that can equip the Durin or Deneb barebone chassis, which support a wide range of Intel and AMD CPUs. More specialized microservers integrating FPGAs from Xilinx and Intel, as well as server-grade ARM64 microservers, have been developed in-house and are available for the RECS|Box systems.

Besides the RECS|Box system, which is focused on cloud applications, LEGaTO partners Bielefeld University and christmann IT developed a new edge server to allow a blend of heterogeneous compute architectures: the christmann t.RECS. For this, they leveraged the PICMG COM-HPC industry standard to allow better integration of microservers into LEGaTO’s edge server. The t.RECS provides three slots for COM-HPC microservers, supporting COM-HPC Client B and C microservers and COM-HPC Server D microservers. Apart from COM-HPC, other COM form factors like COM Express or Nvidia Jetson AGX are supported. Each microserver has a local M.2 SSD; multiple 10 GbE connections; and a high-speed, low-latency PCI Express network connection. This PCI Express network is attached to a central PCIe switch that connects PCI Express cards to one or more microservers via I/O virtualization. The microservers can also communicate with each other using this switch in a special host-to-host mode, which is also scalable across multiple t.RECS systems.

(Figure 1 | The LEGaTO project accepts two carrier cards that support COM-based microservers like PICMG’s new COM-HPC standard.)

The biggest advantage of these modular designs is the ability to equip different microservers with heterogeneous architectures. This provides the flexibility to tune a platform to the exact needs of the application, in both edge and cloud environments.

Furthermore, it offers an easy upgrade path to the latest chip technology by allowing users to replace some or all of the microservers and leaving the rest of the chassis, SSDs, GPUs, networking, etc., the same. Upgrading a system after 3-5 years massively reduces total cost of ownership (TCO) compared to traditional servers that must be replaced entirely.

For more information, visit https://legato-project.eu.

Micha vor dem Berge is Team Manager for Server Development at christmann IT.