Smart OCR Solution Using Xilinx Ultrascale+ and Vitis AI
October 05, 2020
Automatic text reading from natural environments, also known as scene text detection/recognition or PhotoOCR, has become an increasingly popular and an important research topic in computer vision.
The text is among the most brilliant and influential creations of humankind. The rich, precise high-level semantics embodied in the text helps understand the world around us and build autonomous-capable solutions that can be deployed in a live environment. Therefore, automatic text reading from natural environments, also known as scene text detection/recognition or PhotoOCR, has become an increasingly popular and an important research topic in computer vision.
As the written form of human languages evolved, we developed thousands of unique font-families. When we add case (capitals/lower case/uni-case/small caps), skew (italic/roman), proportion (horizontal scale), weight, size-specific (display/text), swash, and serifization (serif/sans in super-families), the number grows in millions, and it makes text identification an exciting discipline for Machine Learning.
Xilinx as a choice for OCR solutions
Today, Xilinx powers 7 out of 10 new developments through its wide variety of powerful platforms and leads the FPGA-based system design trends. Softnautics chose Xilinx for implementing this solution because of the integrated Vitis™ AI stack and strong hardware capabilities.
Xilinx Vitis™ is a free and open-source development platform that packages hardware modules as software-callable functions and is compatible with standard development environments, tools, and open-source libraries. It automatically adapts software and algorithms to Xilinx hardware without the need for VHDL or Verilog expertise.
Selecting the right Xilinx Platform
The comprehensive and rich Xilinx toolset and ecosystem make prototyping a very predictable process and expedites the development of the solutions to reduce overall development time by up to 70%.
Xilinx Ultrascale+ platform as it offers the best of application processing and FPGA acceleration capabilities. It also provides impressive high-level synthesis capability resulting in 5x system-level performance per watt compared to earlier variants. It supports Xilinx Vitis AI that offers a wide range of capabilities to build AI inferencing using acceleration libraries.
Xilinx Vitis AI stack and acceleration utilizing the software to create a hybrid application and implemented LSTM functionality for effective sequence prediction by porting/migrating TensorFlow-lite to ARM. It is running on Processing Side (PS) using the N2Cube Software. Image pre- and post-processing was achieved using HLS through Vivado, and Vitis was used for inferencing using CTPN (Connectionist Text Proposal Network). We eventually graduated the solution to real-time scene text detection with video pipeline and improved the model with a robust dataset.
Scene Text Detection
There are many implementations available, and new ones are being researched. Still, a series of grand challenges may still be encountered when detecting and recognizing text in the wild. The difficulties in natural scene mainly stem from three differences when compared to scripts in documents:
- Diversity and Variability are arising from languages, colors, fonts, sizes, orientations, etc.
- Vibrant background on which text is written
- The aspect ratios and layouts of scene text may vary significantly
This type of solution has extensive applicability in various fields requiring real-time text detection on a video stream with higher accuracy and quick recognition. Few of these application areas are:
Parking validation — Cities and towns are using mobile OCR to validate if cars are parked according to city regulations automatically. Parking inspectors can use a mobile device with OCR to scan license plates of vehicles and check with an online database to see if they are permitted to park.
Mobile document scanning — A variety of mobile applications allow users to take a photo of a document and convert it to text. This OCR task is more challenging than traditional document scanners because photos have unpredictable image angles, lighting conditions, and text quality.
Digital asset management - The software helps organize rich media assets such as images, videos, and animations. A key aspect of DAM systems is the search-ability of rich media. By running OCR on uploaded images and video frames, DAM can make rich media searchable and enrich it with meaningful tags.
Softnautics team has been working on Xilinx FPGA based solutions that require design and software framework implementation. Our vast experience with Xilinx and understanding of intricacies ensured we took this solution from conceptualization to proof-of-concept within 4 weeks. Using our end-to-end solution building expertise, you can visualize your ideas with the fastest concept realization service on Xilinx Platforms and achieve greatly reduced time-to-market.
Get in touch with Softnautics to explore and build your next accelerated AI solution. Read our success stories.
About Author: Prasant Agarwal
Prasant is Marketing Director at Softnautics. He has 15+ years of experience in developing cutting-edge multimedia and connectivity products for STMicroelectronics, Samsung, and Solarflare Communications (Now Xilinx) and led corporate rebranding for Persistent Systems. Leveraging his technology domain and experience, he is now focusing on enabling technology buyers to make the right business choices by bridging business challenges and best-fit technology solutions.