True AI Edge Compute Arrives in Highly Integrated Processors, IP

November 27, 2019

Blog

True AI Edge Compute Arrives in Highly Integrated Processors, IP

As AI technology and deployments have progressed, several new engineering challenges have presented themselves.

The image processing subsystems of almost every high-end smartphone now integrate neural networks. Voice rec and speech processing experts will argue that what we now call AI has been running at the edge for years.

For the most part, however, these applications have leveraged SoCs and DSPs that were not designed for modern AI workloads. And as AI technology and deployments have progressed, several new engineering challenges have presented themselves:

  • The need for always-on, ultra-low-power systems that can run on battery power for extended periods and offer quick response times for inferencing
  • The requirement for integrated security that protects machine learning graphs from tampering or theft
  • The demand for flexible solutions that can adapt to rapid changes in AI models and algorithms

These trends have raised the stakes for IP and processor vendors looking to serve an embedded AI market that is now projected to be worth $4.6 billion by 2024. These companies are now delivering highly integrated, purpose-built compute solutions to capture a share of this business.

Dialing Down the Power

With AI being deployed in devices as constrained as hearing aids, power consumption has become a first-order consideration for inferencing platforms. Eta Compute has incorporated a patented dynamic voltage frequency scaling (DVFS) technology into its multicore SoCs to serve these use cases.

To conserve power, many conventional processors include a sleep function that wakes up a core when a load is present. However, most of these devices run the core at peak rate, which of course requires additional power.

With DVFS, Eta Compute devices continuously toggle the voltage supply based on the current workload, but only to the minimum power needed to execute tasks in a sufficient amount of time. The company’s ECM3531, which is based on an Arm Cortex-M3 and NXP CoolFlux DSP, are therefore able to deliver 200 kSps of 12-bit resolution at the expense of just 1 µW.

Data Set Lockdown

On-chip training data sets that are referenced during inferencing operations have been found to be exploitable. These data sets represent highly valuable intellectual property for most AI companies that can potentially be stolen. But altering pixels in an image recognition data set can make an inferencing engine misidentify objects or not identify them at all.

One highly publicized instance of this occurred when researchers tricked Google AI into believing a rifle was a helicopter, but imagine if an autonomous vehicle AI believed a pedestrian was a trash bag? Scarier still, pixel altering often can’t be detected by human engineers attempting to debug their software.

IP blocks like Synopsys’ DesignWare EV7x processor include a vision engine, DNN accelerator, and tightly coupled memory to deliver up to 35 TOPS of energy-efficient performance. However, an understated feature of the EV7x processor is an optional AES-XTS encryption engine that helps protect data passing from on-chip memory to the vision engine or DNN accelerator.

Flexibility for Future Models

From DNNs to RNNs to LSTMs, dozens of neural network types have emerged in just the last few years. While these symbolize exciting innovation in the world of AI software and algorithms, they also promote a significant problem for compute devices that are being optimized for specific types of workloads.

An ASIC can take anywhere from six months to two years from design to tapeout, potentially accelerating obsolescence for highly specialized solutions. FPGAs have gained significant traction in the AI engineering space for precisely this reason.

Xilinx devices like the popular Zynq and MPSoC platforms are hardware- and software-reprogrammable. This means that logic blocks can be optimized for today’s leading neural nets, then reconfigured months or years down the line when algorithms have evolved.

But a feature called Dynamic Function eXchange (DFX) permits a system to download partial bit files that modify logic blocks on the fly. This can occur while a device is deployed and operational, essentially adding, removing, or changing the functionality of a single Xilinx device.

Production-Ready AI at the Edge

The expectations for AI edge computing are now similar to what we projected from the IoT only a few years ago. Just like trillions of “things” will be connected, we assume the vast majority of those will be (artificially) intelligent.

While previous-generation solutions laid the foundation, next-generation solutions require a new suite of features to ensure commercial success. Processor and IP vendors are responding by integrating more and more capability into AI edge compute devices.