NVIDIA Enables Era of Interactive Conversational AI with New Inference Software

January 02, 2020


NVIDIA Enables Era of Interactive Conversational AI with New Inference Software

NVIDIA TensorRT 7?s Compiler Delivers Real-Time Inference for Smarter Human-to-AI Interactions.

NVIDIA, a technology company that designs graphics processing units for gaming and professional markets, and system on a chip units for the mobile computing and automotive market, introduced inference software that developers can use to deliver conversational AI applications, inference latency, and interactive engagement.

NVIDIA TensorRT 7, according to the company, opens the door to smarter human-to-AI interactions, enabling real-time engagement with applications such as voice agents, chatbots and recommendation engines.

It is also estimated that there are 3.25 billion digital voice assistants being used in devices around the world, according to Juniper Research. By 2023, that number is expected to reach 8 billion, more than the world’s total population.

TensorRT 7 features a new deep learning compiler designed to optimize and accelerate the recurrent and transformer-based neural networks needed for AI speech applications. According to the company, this speeds the components of conversational AI by more than 10x compared to when run on CPUs, driving latency below the 300-millisecond threshold considered necessary for real-time interactions.

Some companies are already taking advantage of NVIDIA’s conversational AI acceleration capabilities. Among these is Sogou, which provides search services to WeChat, a frequently used application on mobile phones.

Rising Importance of Recurrent Neural Networks
With TensorRT’s new deep learning compiler, developers everywhere now have the ability to automatically optimize these networks, such as bespoke automatic speech recognition networks, and WaveRNN and Tacotron 2 for text-to-speech, and to deliver performance and low latencies. 

The new compiler also optimizes transformer-based models like BERT for natural language processing.

Accelerating Inference from Edge to Cloud
According to NVIDIA, TensorRT 7 can optimize, validate and deploy a trained neural network for inference by hyperscale data centers, embedded or automotive GPU platforms.

NVIDIA’s inference platform, which includes TensorRT, as well as several NVIDIA CUDA-X AI libraries and NVIDIA GPUs, delivers low-latency, high-throughput inference for applications beyond conversational AI, including image classification, fraud detection, segmentation, object detection and recommendation engines. Its capabilities are used by some of the world’s leading enterprise and consumer technology companies, including Alibaba, American Express, Baidu, PayPal, Pinterest, Snap, Tencent and Twitter.

TensorRT 7 will be available in the coming days for development and deployment, without charge to members of the NVIDIA Developer program from the TensorRT webpage. The latest versions of plug-ins, parsers and samples are also available as open source from the TensorRT GitHub repository.

For more information, please visit: https://www.nvidia.com/en-us/#source=pr

Featured Companies


2701 San Tomas Expressway
Santa Clara, CA 95050

Tiera Oliver, editorial intern for Embedded Computing Design, is responsible for web content edits as well as newsletter updates. She also assists in news content as far as constructing and editing stories. Before interning for ECD, Tiera had recently graduated from Northern Arizona University where she received her B.A. in journalism and political science and worked as a news reporter for the university's student led newspaper, The Lumberjack.

More from Tiera