To Cloud or Not to Cloud: The Challenge of Deciding Where to Process IoT Data
May 20, 2019
The cloud is an enticing alternative to edge-based processing.
The majority of IoT nodes capture data. Before decisions can be made using this data, however, it needs to be processed. Today, embedded developers have the option to implement data processing everywhere from within the node itself to all the way out to the cloud.
The cloud is an enticing alternative to edge-based processing. Processing data at the edge increases node cost. And while transferring data up to the cloud requires a communications link that must be maintained as well, the cost economies are attractive. But as with any technology, there are many hidden tradeoffs to consider.
I recently talked with VC Kumar, Marketing Manager for Texas Instruments’ Catalog Processors. Kumar has been with Texas Instruments for 20+ years, primarily focusing on the digital side of processing. According to Kumar, it is essential to know how your data flow before you begin to architect your system and decide where processing should take place. Before you start calculating how much money you are going to save by eliminating processing resources and cost in nodes, you need to fully understand the operating conditions of the system, its limitations, and its risks. Put another way, you need to look at the unintended consequences of your decisions to be able to accurately assess the true cost and tradeoffs of your design choices.
The first consideration for many systems, even before cost and power, is safety. Consider a robot working next to a person. If the robot has a failure and begins to swing its arm wildly, responsiveness must be immediate to prevent injury to the person. Given the latency of cloud processing, real-time safety functions will likely need to be implemented at the node.
Many safety functions, however, do not immediately impact system operation, and so they can be implemented in the cloud. These are typically long-term functions, such as analysis for preventative and predictive maintenance.
One primary advantage of processing in the cloud is that the cloud is an application-rich environment where data can be analyzed across myriad devices in ways it cannot be analyzed at the node. Through deep learning, this data can be used to discover patterns that lead to greater efficiencies. Additionally, there are numerous companies developing cloud-based applications and libraries. This means there are powerful algorithms available to you in the cloud without investment on your part.
The challenge is getting the right data up to the cloud. The primary limitation is bandwidth. For example, applications utilizing cameras, such as robots on a factory floor or other autonomous systems, can generate substantial data. Even if a device provides a small amount of data, thousands of these devices can strain the available bandwidth.
Bandwidth will play an increasingly important role as more IoT-based devices are deployed. Scalability is a critical factor to consider during initial design given that it might be difficult to modify how much data are sent by devices years down the road. Thus, it makes sense to invest time in figuring out how to reduce the bandwidth requirement by an IoT device.
For example, with a camera-based application, optimize the frame rate to capture what is really needed. Next, mask or filter the field of vision to just the field of interest (i.e., remove areas that are not relevant). A minimal amount of processing can substantially reduce the data set to be processed, either conserving processing resources at the node or lowering the bandwidth required to serve data up to the cloud.
A node can also reduce the data burden by performing a first level of analysis. Data that tends to change slowly, such as temperature, can be compressed before sending. Alternatively, such data could be compared to a threshold and only sent when it is out of bounds and action needs to be taken. Similarly, data can be correlated; when system temperature is low, vibration data may not matter so you don’t need to send it.
Bringing in the cloud introduces new availability issues. For example, a factory might have emergency generators to restore power during an outage, but this won’t bring the internet back up. Thus, when mission-critical processing takes place in the cloud, this creates a point of failure: When – not if – the link fails, operations will come to a halt.
Let’s put this into a monetary perspective. Minimizing cloud costs by utilizing servers offering 99.99 percent availability means you’re looking at 52 minutes of downtime a year, compared to a little over 5 minutes with 99.999 percent availability. When you estimate the cost of these 47 minutes of extra downtime, consider that these interruptions might be short and occur irregularly, causing multiple failures. When some operations are interrupted, such as machining, the cost might be scrapped parts that represent thousands of hours, not just the few seconds lost at the moment.
Mission-critical operations may also include security functions. To prevent unauthorized use of systems, an operator may be required to be authenticated. If the communications link goes down and you haven’t implemented a way for the operator to authenticate manually, the system will effectively be down as well.
Note that the communications link does not need to go down to create risk. There are many issues that can arise that impact a system. For example, Denial of Service (DoS) is a method hackers can use to overwhelm a network and shut it down. Similarly, a “bad actor” within the network (i.e., a device that unintentionally floods the channel with data) can consume limited bandwidth as well. One defense to issues like these is to shut exposed systems down so they cannot be taken over. At the same time, however, operations are shut down unless you have implemented a manual way to manage and override this response.
Another risk to consider is that data sent over the internet can be captured. A hacker could use this data to map out a location and analyze patterns such as when people are not present. Designing with privacy in mind is one defense worth considering. In this context, it means ensuring that data are sent in a way that does not identity the node generating it. This makes it much more difficult for hackers to be able to make use of any data they capture.
The reality is that the best approach is likely a hybrid approach balancing node- and cloud-based processing. For example, while deep learning in the cloud can improve preventative maintenance, local processing is required to watch for and prevent the critical failures this deep learning has discovered. For the greatest benefit, embedded software needs to be flexible enough to allow a system to actually change how it operates to extend system operating life. Real-time functions – including safety and security – will continue the need to be node-based, and even cloud-based functions like authentication will need some local presence to enable manual management of systems.
But you can only do this if you have a thorough understanding of the overall system as well as its real-time requirements, its limitations, and the unintended consequences of each your choices.