Automotive Ethernet for vision-based ADAS: Loss, cost, and latency
January 13, 2017
Automotive Ethernet is slowly but surely making its way into next-generation vehicle designs, but increasingly those designs also include advanced saf...
Automotive Ethernet is slowly but surely making its way into next-generation vehicle designs, but increasingly those designs also include advanced safety systems that require minimal latency. For the camera systems associated with these advanced driver assistance system (ADAS) functions, the image buffering, encoding, and decoding requirements of Ethernet could potentially have negative consequences on these real-time systems, despite the technology’s increased bandwidth.
In this interview with Marco Jacobs of IP vendor videantis GmbH, we discuss the pluses and minuses of Ethernet, and explore video codecs and alternative architectures that could compensate for concerns around using automotive-grade Ethernet in active safety applications.
How much interest do you see Ethernet garnering in the automotive space, particularly as it relates to active safety or advanced driver assistance system (ADAS) functions?
JACOBS: Broadcom started with this automotive Ethernet standard called BroadR-Reach several years ago, which has since been standardized as IEEE 802.3bw 100BASE-T1. BMW was one of the first car companies to pick it up. The automotive industry is a little bit slow, but the use of Ethernet is now rapidly growing. We see several original equipment manufacturers (OEMs) now using Ethernet for different purposes in the car.
Now you have all of these different point-to-point connections in the car and all different kinds of communications standards, and many of them very low bandwidth. So it’s a little bit messy in the car. If I think of my home, everything is connected over either Ethernet or Wi-Fi or cellular communications, but in the car there are a lot of standards that are all point-to-point. Ethernet is now becoming the backbone in the car, so a lot of various components will hook up to it for communications, and especially with the amount of cameras increasing in the car, they can just plug into this Ethernet network also.
What other competing technologies is automotive-grade Ethernet replacing for some of these automotive vision applications?
JACOBS: Right now the other key camera interconnects are LVDS (SerDes) systems, which are used a lot now. Ethernet comes from the low-cost end of the automotive spectrum, with its initial use being in rearview and surround view cameras, but steadily it is moving up in the chain into more complex automotive camera-based systems.
The throughput with Ethernet is a little bit lower than LVDS for camera systems, but the key advantage is that it’s a packet-switched network. Like when you talk on the phone and packets go through various switches and get routed around the world, such a packet-switching topology has tremendous benefits in a car because more and more electronics are being added to vehicles and all of these things need to talk to each other.
Another thing is that the cabling with Ethernet is lighter, and of course a lighter car is always good too for gas mileage.
Of course, there is the drawback of buffering, encoding, and decoding when you use Ethernet. Does that pose any concerns for automotive vision systems associated with safety applications?
JACOBS: Right now automotive Ethernet is primarily 100 mbps, and uncompressed video requires a much higher bandwidth. So, indeed, you do need to compress, then transmit, and then decompress on the other end of the cable. That does add a little bit of latency. But if you carefully optimize your encoder, carefully optimize your transport, and optimize your decoder, you can keep this latency below a few milliseconds. Definitely, you do travel some distance for every millisecond – around one foot for every 10 milliseconds when traveling at 70 miles per hour – so latency is never good, but it’s not that bad if you carefully optimize the system.
Another item with latency is that, for instance, in a surround-view system that’s in your dashboard you can see discrepancies between what is happening outside with your own eyes and what you’re seeing on the display. If there’s a latency there, it’s just kind of annoying, but similar to on a phone, you never notice latency until it’s above a couple of hundred milliseconds for voice. So there is a threshold, but if you stay below that your eyes won’t really notice.
What type of solution should automotive engineers be looking for then in terms of optimized codecs?
JACOBS: One of our customers always says, “Automotive always has three top priorities. Priority one is cost. Priority two is cost. And priority three is also cost.”
Cars are primarily procured. Auto manufacturers build cars in terms of the sheet metal and the engine, but a lot of the rest of the car is procured. This means that they are very purchasing and cost-driven organizations. Cost is paramount, so especially inside the camera, you usually don’t have DRAM because that would add a couple of dollars. That means it’s hard to store a full frame of data, so you need to have a type of encoder that does not store a full frame of data, or an intra-frame encoder. That’s one.
Two is that dynamic range is very important. So think of the headlight of another car shining in your camera in the dark, or at the end of a tunnel where it is very bright versus the rest of the tunnel, which is very dark. Those are huge differences, so dynamic range is very important for automotive vision systems. So automotive codecs are typically 10-bit through 12-bit, which is different from consumer.
There are three main codecs. One is JPEG, which doesn’t compress as well as H.264.
Then there’s H.264, which is widely used but has many different variants. The H.264 that’s often in automotive environments is very different from the H.264 that’s used on your Blu-ray. There are different profiles for H.264, so you have to make sure that you carefully match the profile of the camera – the encode side – to the profile that the decoder side can decode. But H.264 is a very good codec that is used quite a bit, and we see most of the automotive OEMs demanding H.264.
Thirdly, there’s already a successor to H.264 called HEVC, or H.265. That codec can compress better, or at the same bit rate would provide higher quality images, but there are still two issues with it. One is that the compute load is quite a bit higher, so it takes a lot more compute resources to compress a stream with that standard, and the second is the royalty situation.
As you know, both of these codecs – H.264 and H.265 have patent licensing associated with them, so they cost royalties to the people who created them and put their IP into the standard. With HEVC, that situation is not completely cleared up yet, whereas with H.264 it’s pretty clear. I’m not a patent lawyer, but this is definitely limiting the uptake of HEVC. You can see it in your phone, for instance. Most phones these days have an HEVC encoder, but if you take a video, usually H.264 is still being used.
On the codec side, say you’re using a 10-bit codec rather than a 12-bit one, or the JPEG codec for some reason. Is there a possibility that you’re going to encounter a lot of lossy images that can’t be processed by a video or computer vision algorithm with the precision and efficiency required of an active safety application?
JACOBS: This is a big question in the industry, indeed, and there’s not really an easy answer to it.
One way of looking at it is to compare the situation to watching a full HD Blu-ray movie, which has a maximum bit rate of 50 mbps. That’s a pretty perfect image if you ask me.
Secondly, you do lose a little bit of information, and the computer vision algorithms could be influenced, but this basically boils down to an engineering tradeoff. If you consider the car a cost-constrained system, the question then becomes whether you want to spend more money on cabling to get a lossless image as with LVDS, or spend less money on the cabling and use Ethernet so you have more to spend on the lens or to spend on a higher end image sensor to get more quality that way.
A third way of thinking about this is that computer vision algorithms should really be quite robust to a wide variety of input images. For instance, if a lens gets slightly dirty and the image gets blurry, it should still work. If it’s dark and the image sensor starts generating quite a bit of noise, it should still work. The noise that an image sensor generates in low-light situations has much more of an impact than the accuracy that you lose during compression and decompression.
It’s somewhat of an academic exercise to say that you just want a lossless image and therefore will not consider using Ethernet. I’m convinced it’s just a matter of time before people realize that Ethernet is not the bottleneck.
What about putting a computer vision processor on the camera or sensor platform itself?
JACOBS: Doing computer vision close to the sensor makes a lot of sense. Indeed, you could do computer vision after you do the Ethernet transportation in a more central location in a vehicle, but you could also do it at the edge, inside the camera module. This has a lot of advantages. One advantage is that you get rid of this lossy compression issue because you would do the computer vision before you do the compression on the raw image. But there’s another advantage that’s more about modularity.
In a high-end car, you can afford to put a big GPU-based system in that supports use cases with one through ten cameras. However, if you have a lower end car, your base model is probably going to have a rearview camera. If you want surround view, which is the next level up, it’s going to be too expensive in the entry-level model to put in that high-end GPU-based system just in case people want ten cameras. So it’s much better to have a modular system, where you can plug in a rearview camera that’s already smart and can do crossing traffic alerts or backover protection, and just use a low-end head unit that only needs to decode that stream; you don’t need to have a head unit with the compute resources to do the entire computer vision task. Same for surround view systems. Same for mirror replacements. Same for driver monitoring. You can just plug them in.
This modularity has a lot of advantages, and this modularity of course comes when you do the computing at the edge because you just plug in a module and it’s self-sufficient.