Life on the Edge: How AI and the NVIDIA Jetson Ecosystem are Transforming Industry, Transportation, and Logistics
October 03, 2019
The following case study describes how one AI engineering company leveraged the NVIDIA technology ecosystem to rapidly deploy cost-effective, commercial-grade optical character recognition solutions.
Artificial intelligence (AI) research and development has been ongoing for decades. But the technology’s application in real-world industry is fledgling, to say the least. OEMs, ODMs, system integrators, and end users hoping to deploy AI and machine learning (ML) capabilities in systems ranging from automated manufacturing inspection to smart transit systems have been mired in technical challenges.
- The inability to compress neural network models into a footprint suitable for deployment on embedded processors at the edge
- The lack of suitable embedded compute platforms with the performance required for tasks like image recognition and object classification, as well as the performance per watt to satisfy resource-constrained edge environments
- A dearth of sufficiently hardened AI edge compute platforms that could be reasonably deployed in harsh edge environments
- Excessively prohibitive cost for solutions that meet one or more of the above requirements
However, the rapid evolution of technology is quickly remedying these setbacks. With the advent of NVIDIA’s deep learning software development stack over the last several years and the subsequent release of Jetson embedded processor modules, many of the aforementioned barriers started to fall.
The following describes how one AI engineering company leveraged the NVIDIA technology ecosystem to rapidly deploy accurate, cost-effective, commercial-ready, and industrial-grade optical character recognition (OCR) and automated license plate recognition (ALPR) solutions on a potentially massive scale.
Why the shipping industry needs AI at the Edge
Millions of shipping containers pass through the world’s seaports every year. Every one of them needs to be tracked, and that tracking actually starts well before containers are parsed and packed onto shipping vessels. It begins when they first arrive at port.
Any form of tracking begins with a unique identifier, and in the world of shipping containers, these are provided via ISO 6346 unique identification numbers. ISO 6346 numbers consist of a serial number and characters demarcating the owner, country, and size of any given shipping container (Figure 1). These are located on each side of the container.
As straightforward as these ID numbers are, the tracking systems built around them remain antiquated. At one European port, for instance, more than 5,000 vehicles pass through checkpoints each day to drop off and pick up containers from one of two shipping terminals (Figure 2). At these checkpoints, container ID numbers are recorded by hand. This process inevitably leads to delays and large traffic jams at the gates to the port.
Unfortunately, this procedure is common at ports worldwide.
Realizing the inefficiency of this process, officials at the port began investigating alternatives. They arrived at an AI-based video analytics solution from SmartCow.
Deep Learning: From R&D to Reality
SmartCow is a multi-national AI engineering firm focused on deep learning-based computer vision technologies. The company develops proprietary inferencing algorithms for OCR and ALPR, which are supported by 50 employees whose sole responsibility is to label large data sets of images that are fed into deep learning models.
As shown in Video 1, SmartCow’s ALPR inferencing algorithms running against a live CCTV video stream from a traffic camera provide almost instantaneous identification and tracking of unique identifiers – in this case, license plates. It doesn’t take much imagination to think of how this technology could be used to quickly identify individual shipping containers as they enter and exit ports, eliminating the need for people to manually record ISO numbers.
But while demonstrating optical character recognition on a CCTV stream is one thing, deploying it into a real-world environment like a seaport is another. In the case of shipping container OCR, multiple AI-enabled cameras have to be installed at the port’s gates to ensure the ISO numbers on each side of a container match. The camera systems also need to be outfitted with control functionality so that gates can be opened or closed automatically without human intervention. And, of course, sufficient lighting is required so that the computer vision systems are guaranteed a clear view of ISO numbers 24 hours a day.
In many cases, such installations cost upwards of $40,000, without even considering the backend data management systems that are required.
From a technical perspective, the most important requirement of a shipping container OCR system is that it processes video streams and runs neural network algorithms locally. By performing character recognition on or near the device, port authorities can eliminate the latency of sending video streams back to the cloud for analysis, minimize the network transmission costs associated with wirelessly streaming video, and reduce any security concerns related to sending information over the network.
The challenge this presents is that massive computational and memory resources are required to run neural networks. This is why they are typically run in a data center, not at the edge. For some perspective, Figure 3 shows a snapshot of the more than 350,000-image dataset leveraged for the ALPR inferencing algorithm referenced in the video above.
To overcome implementation, cost, and computational demands of the port OCR project, SmartCow turned to the NVIDIA Jetson TX2 Module (Figure 4).
Data Center AI in an Embedded Form Factor
The NVIDIA Jetson TX2 is, more or less, a supercomputer in a 50 mm by 87 mm form factor. Based on the NVIDIA Pascal GPU architecture, the Jetson TX2 Tegra processor integrates 256 GPU cores alongside a dual-core 64-bit NVIDIA Denver 2 CPU and quad-core Arm Cortex-A57 MPCore CPU. All of this compute performance is packaged in a power consumption envelope of 7.5 to 15W.
For real-time video streaming applications like the shipping container tracking project, the Tegra SoC provides 4kp60 H.264/H.265 video encoders and decoders, as well as two image signal processors. These subsystems help minimize latency, allowing the TX2 to process continuous streams of video over a 1.4 Gigapixel per second MIPI CSI camera interface.
Another important feature of the Jetson TX2 is the 8 GB of 1866 MHz LPDDR4 memory on board. Memory requirements often skyrocket in AI-enabled systems due to the large number of weights and activations that are needed to process a neural network. This demand only increases in GPU-powered systems because SIMD architectures look to parallelize operations across 1024-bit vector paths (Figure 5). As a result, a 32-bit floating-point operation demands more than 2 GB of memory.
The Jetson TX2 offsets this by delivering 59.7 GB per second of memory bandwidth, which minimizes bottlenecks in systems like the OCR tracking solution that constantly run computer vision algorithms against high-definition video streams.
But despite the hardware capabilities of the Jetson TX2, NVIDIA is not the only vendor of computer vision-based chips. What ultimately sets Jetson products apart is a vibrant software ecosystem. For SmartCow AI, this meant access to TensorRT, a programmable software platform that accelerates inferencing workloads running on NVIDIA GPUs (Figure 6).
TensorRT is a C++ library based on the CUDA parallel programming model that optimizes trained neural networks for deployment in production-ready embedded systems. It achieves this by compressing the neural net into a condensed runtime engine and adjusting the precision of floating-point and integer operations for the target GPU platform. This correlates directly to improved latency, power efficiency, and memory consumption in deep learning-enabled systems, all of which are of course essential in OCR-type applications.
TensorRT is part of the NVIDIA JetPack SDK, a software development suite that includes a Ubuntu Linux OS image, Linux kernel, bootloaders, libraries, APIs, an IDE, and other tools (Figure 7). The SDK is continuously updated by NVIDIA to help accelerate the AI development lifecycle.
OCR Tracking for seaport automation
With the NVIDIA technology stack at their disposal, SmartCow AI designed a high-performance, low-cost OCR system based on the Jetson TX2 that automated shipping container tracking. The company’s Gatekeeper automatic number plate recognition (ANPR) system is a small-form-factor edge computing device capable of simultaneously running up to four recurrent neural networks (RNNs). Therefore, real-time OCR inferencing can occur locally on Gatekeeper, independent of other edge servers or cloud platforms (Figure 8).
In addition to OCR, Gatekeeper can be used for applications like security and access control, as well as control functions like driving entry gates using the Denver 2 and A57 CPUs.
A major benefit of Gatekeeper for the port is that it can simultaneously process up to four 1080P or two 3.4 MP video streams at 30 frames per second. With standard Ethernet, USB, and HDMI interfaces, it easily integrates with existing IP camera infrastructure.
Aside from the obvious upgrade over traditional, manual tracking systems, the presence of standard interfaces, rugged design, and the ability to run neural network algorithms against multiple HD video streams simultaneously also reduces installation time and cost versus alternative technology-based solutions.
In fact, SmartCow’s Gatekeeper ANPR solution is priced starting at just $2,500.
Edge Intelligence & ALPR
Use cases like OCR for shipping container tracking offer a proving ground for embedded artificial intelligence and computer vision beyond the data center. And as these technologies transform efficiency and cost in industries like transportation and logistics, other potential use cases will continue to emerge. Many of these will have to scale far beyond a few physical access points in terms of quantity, and have to scale even further down in terms of performance, price, and size.
Take ALPR in India, for example. The technology is being deployed there to serve a number of purposes, from checking the registration status of vehicles and electronic toll collection to enforcing traffic laws and general security surveillance. The issue on the subcontinent is that each state uses a different style of license plate, and these styles even vary between different classes of vehicles from the same state. As shown in Figure 9, the variations include different sizes, layouts, colors, script types, and placement on a vehicle.
Today, low-cost ALPR solutions are available in the Indian market for $1,500 plus licensing fees. However, these are packaged with one 2 MP camera with a 15 fps frame rate, which limits their ability to capture high-resolution shots of the various license plates positioned at various locations on vehicles.
But there are also more technical challenges presented by the wide variance in Indian license plates. For one, the sheer number of ways that plates could be represented in an image means that an ALPR system must be trained against a much larger data set compared to a relatively controlled use case like shipping container tracking. And, because a fixed ALPR system will inevitably perform inferencing on vehicles that are parked or idling at traffic signals, the devices will re-register the same license plate numbers over and over. Both of these facts mean that storage and processing requirements increase.
At the same time, of course, the design requirements of a low-cost, low-power, small-form-factor ALPR system still apply. The ability to perform inferencing locally is also a must.
SmartCow once again turned to an NVIDIA solution, this time the Jetson Nano (Figure 10).
Scaling down to scale out
Like the TX2, the Jetson Nano module is a compact 70 mm x 45 mm embedded processor module based on another Tegra processor you’d expect to find in the data center. The system on chip at the heart of the board contains a Maxwell architecture GPU with 128 CUDA cores alongside a quad-core Arm Cortex-A57. This equates to 472 GFLOPS of total performance at the expense of only 5 to 10W power consumption.
The Nano SoC also integrates 720p, 1080p, and 4K HEVC video encoders and decoders that deliver 250 MP/s and 500 MP/s of performance, respectively. These are, understandably, key components in an ALPR application that has to apply neural networking algorithms to streaming video in real time.
In addition, the Nano supports 4 GB of 1600 MHz LPDDR4 memory for the fast, frequent memory accesses required by deep learning applications, as well as 16 GB of additional eMMC storage. Interfaces such as GbE, HDMI 2.0, eDP, and USB 3.0 are brought out through a companion carrier board.
While smaller, cheaper, and lower power than the TX2, the ingredients present on the Jetson Nano provided a solid foundation for SmartCow’s deep learning-based ALPR solution, Sentinel (Figure 11).
Sentinel is a miniature ALPR system that runs four deep learning algorithms based on a 3 million image data set. The result is near-human accuracy in license plate recognition tasks.
Sentinel currently supports four 30 fps RTSP cameras and can integrate with existing solutions. However, SmartCow is implementing direct memory access (DMA) features that would bypass the CPU when video streams are being decoded. By freeing up these extra resources, one Sentinel box will be able to support six to eight cameras simultaneously in the very near future. The devices retail for around $1,000 with no additional licensing fees, making them less than 1/8th the overall price of today’s lowest-cost solutions with the potential to be less than 1/16th the cost once eight-camera support is implemented.
To offset the problem of re-registering images of the same license plate, SmartCow developed a feature called similarity search. This registers the first time a license plate is detected and discards all duplicate images until the vehicle leaves the frame, saving valuable memory space.
In addition to enough storage capacity to save more than 2.5 million images directly on the device, Sentinel also supports low-cost 4G connectivity modules that allow metadata to be streamed back to transit authorities.
All of this at only 10W of power.
Sentinel boxes are being deployed on Marina Beach in Chennai, India, where the integrated CPU can act on inferences to open and close gates, garage doors, or cycle traffic signals.
Moving AI into the Real World
Although the core functionality of the Jetson Nano and TX2 power platforms like SmartCow’s Sentinel and Gatekeeper, the off-the-shelf NVIDIA modules cannot be deployed in production-grade systems. After all, the two modules were originally designed as development kits, and both of the applications mentioned required custom designs.
In real-world deployment settings, both Jetson platforms need a carrier board. The carrier board routes input and output signals through standard connectors, and also provides access to additional functionality. For example, the SmartCow deep learning platforms needed GPIO outputs for control tasks like driving gates and triggering relays, which were delivered over a terminal block in both designs.
Especially in the case of the ALPR solution, additional memory was a prerequisite. In order to support the 3 million region of interest (ROI) deep learning algorithms and store more than 2.5 million images, much more was needed than the 4 GB RAM and 16 GB eMMC natively available on the Jetson Nano. This meant adding extra non-volatile memory to the design.
And, of course, all of the systems’ hardware components had to be spec’d out for operation in potentially harsh outdoor environments. Once that was complete, both platforms had to be designed into compact, rugged enclosures that provided ample protection against the elements as well as sufficient heat dissipation.
But perhaps the most critical and most often overlooked element of taking these systems to market was the software stack. As part of the JetPack SDK, Jetson development kit modules come pre-loaded with a stock Linux for Tegra operating system (OS) image from NVIDIA. This provides a great starting point for software prototyping, but these OS images are not suitable for use in commercial environments.
However, in order to maintain software compatibility with JetPack tools and features that are constantly being updated by NVIDIA, a commercial-grade board support package (BSP) is required. This means creating a modified BSP with a lifecycle that aligns with JetPack releases.
As an AI, deep learning, and video analytics firm, these engineering disciplines fell outside of SmartCow’s core competency. After evaluating several potential design partners, the SmartCow selected Connect Tech Inc. (CTI) to help transition their ALPR and OCR technologies into the real world.
From Prototype to Production Grade
CTI has more than 30 years of experience in embedded hardware, software, and systems design, and is presently NVIDIA’s largest embedded ecosystem partner. To date the company has completed more than 100 custom Jetson-based product designs that are currently deployed in the field, spanning the AGX Xavier, TX1, TX2, and Nano product offerings.
The company partnered with SmartCow on the turnkey design of both the Gatekeeper and Sentinel platforms. In the case of Sentinel, CTI designed the custom, Nano-compatible carrier board that integrated the aforementioned GPIO pins for driving relays and other control functions and additional NVMe storage in support of the platform’s significant memory demands (Figure 12). They also removed JTAG headers for additional system security.
The final two-board solution was miniaturized into a 103 mm by 72 mm enclosure with more than adequate heat dissipation characteristics (Table 1).
Another substantial feature added to the Sentinel platform by CTI was support for 4G cellular modules over the M.2 interface. This allows customers to simply drop in a 4G dongle to gain access to wireless backhaul networks, allowing metadata to be transmitted back to operations centers and software updates to be deployed over-the-air to the device without the need for any cabling. This is a particularly economical option in India, where 30 GB of LTE service can be had for a modest $3.
Perhaps greatest benefit of the SmartCow/CTI partnership was the ability to complete a turnkey design quickly. CTI was able to modify the BSP from the stock Tegra for Linux OS to a custom, small-footprint Ubuntu Linux distribution from SmartCow that required much less memory.
The full system – including the hardware design, BSPs, and supporting software, and firmware – was actually complete before production-ready Jetson Nano modules were even available. CTI also provided “Flashing-as-a-Service”, helping SmartCow flash Sentinel platforms en masse to further accelerate time to market.
Similar integration work was performed on the SmartCow Gatekeeper platform.
Realize the vision of AI at the edge
Though AI may seem like it’s still an abstract concept, the technology is already pervading the world around us. Beyond OCR and ALPR, video analytics and computer vision technologies are being installed to assist with digital security and surveillance, transportation planning, retail assistance, collision avoidance, and even smartphones. Slowly but surely, these deep learning-based vision systems will become as common as the cameras they support.
SmartCow is a good indicator of this trend. In 2020, they are expecting to ship between 5,000 and 10,000 systems to partners, re-sellers, and end users.
To accommodate this rapid uptick in demand, AI and video analytics companies will need to partner with embedded hardware, software, and systems integration firms that have the experience and understanding of what it is required to take AI to the edge.
Are you ready to realize the vision of deep learning? Contact Connect Tech Inc. at
[email protected] or 1.800.426.8979 to get started.