Hot Chips: Alibaba?s Ultra High-Performance Superscalar Processor - XuanTie910

By Abhishek Budholiya

Head of Digital Marketing

Future Market Insights

November 16, 2020

Blog

Hot Chips: Alibaba?s Ultra High-Performance Superscalar Processor - XuanTie910

At the HotChips conference 2020, Alibaba announced the Xuantie-910 RISC-V core which is an ultra high-performance processor with an AI acceleration engine based on RISC-V RV64GCV.

At the HotChips conference 2020, Alibaba announced the  Xuantie-910 RISC-V core which is an ultra high-performance processor with an AI acceleration engine based on RISC-V RV64GCV. It has a remarkable performance of around 40% more than U74 by SiFive. 

It comes with the RISC-V 0.7.1 vector extension and Sv39 memory management unit + 8/16 physical-memory protection. Sv39 is a virtual-memory system designed for RV64 systems, which supports 39-bit virtual address spaces. Its implementations support a 39-bit virtual address space, divided into 4 KiB pages. Some of the key highlights of XT910 are:

  • Ultra High-Performance Superscalar Processor 
  • RISC-V Compatible plus RISC-V Turbo Technology 
  • Dual issue Out-of-Order Memory Subsystem 
  • AI Vector Acceleration Engine

XT910 exploits homogeneous multi-core architecture with up to 4 cores per cluster. Each core supports a 32KB/64KB L1 instruction cache and a 32KB/64KB L1 data cache. Each cluster has a shared L2 cache memory. 

The front end of the pipeline consists of 7 stages. The instruction fetch unit can fetch 8 instructions per cycle. The instruction decoding unit can decode 3 instructions simultaneously per cycle and can be made up to 4 instructions per cycle using physical registers. An out-of-order engine can issue up to 8 instructions per cycle.

The back end of the pipeline features out-of-order memory access, dedicated branch processing, and out-of-order vector computing. It has multiple execution units including 2x single cycle ALUs, 1x single cycle branch jump unit, 1x dual-issue load, and store unit, 2x scalar floating point units, and 2x vector execution units. 

It supports floating point 16/32/64 and integer 8/16/32/64 operations. It directly accesses the L1 cache on the vector load and vector store. Overall, the vector engine delivers more than 300 GFLOPS of FP 16 computing power per CPU cluster. 

Other than the internal usage of XT910, Alibaba also promotes edge computing applications such as edge servers, industrial control, and ADAS based on the Wujian SoC platform.  We can now see that the first chip, which is based on XuanTie C906 (RV64GCV), is a basic development board price that starts at 12.5$ and is expected to ship within 1-2 months’ time frame. 

(Source: All the images were taken from the HotChips Conference 2020 presentation slides by Alibaba.) 

About the Author

Abhishek Jadhav is a senior undergraduate Electronics and Telecommunication engineering student and technical author. As a RISC-V Ambassador, he represents them at a global level, being the first Indian and only student. He leads an Open Hardware Developer Community across India with more than 80 students. Abhishek is a 5G enthusiast and its implementation in Smart Cities and AIoT.  

Digital marketing and brand development consultant at Future Market Insights (FMI), one of the fastest growing market research companies across the globe. With a blend of analytical and creative approach, I strive to deliver the best online experience for businesses.

More from Abhishek