Redundancy in the Internet of Things
October 28, 2019
When access to the cloud is impaired, access to this storage and off-site intelligence is interrupted. The impact on an IoT system depends on its reliance upon these cloud resources.
It doesn’t matter whether you believe the recent power outages in California are actually for fire safety or are political posturing by PG&E, one is clear: the power can be shut down with little to no warning.
Power failures are supposed to be part of a good IT disaster plan. But having a redundant power source doesn’t matter when they both get intentionally shut down. Many businesses can mitigate an extended outage with generators and a good supply of gasoline. Business can continue, mostly as usual.
Unless access to the Internet is essential to operations.
Backup power can get a router up and going again, but it does nothing to support the infrastructure of access points that connect a business to the Internet. In my case, I have to drive 40 miles to reach a location with power – and a working WAN connection.
Planned power outages highlight a major architectural and design issue every IoT designer needs to consider: What happens if there’s power but no Internet?
Smart World, Single Point of Failure
Smart applications like factories, buildings, and cities use the cloud for several reasons, including data storage and data analytics. When access to the cloud is impaired, access to this storage and off-site intelligence is interrupted. The impact on an IoT system depends on its reliance upon these cloud resources.
Consider a smart light bulb controlled by Alexa. Alexa offloads voice processing to the cloud. No Internet, no Alexa. However, because smart bulbs are connected to the local network, a local processor like a cell phone can serve as a redundant UI to smart devices. Thus, a user can still control the lights, just in a different way.
The issues become more complex when you get down on a smart factory floor. If the programs to run an industrial machine are kept in the cloud, the machine will be unable to access its programs and unable to run. During an outage a generator can power a machine, but the machine still can’t be used unless there is a redundant (i.e., local) copy of its programs and a mechanism for getting them to the machine. In this case, Internet access represents a single point of failure.
It’s not that IoT systems need a redundant Internet. Rather, they need an alternative way to do what Internet access enables. Put another way, you need to provide a local method for doing what’s being done in the cloud. Unfortunately, the reason for going to the cloud was so that these resources didn’t have to be local or paid for when they aren’t being used. If local resources are available, then the cloud was really necessary in the first place.
In truth, most applications don’t need full Internet redundancy. In many cases, the cloud provides additional capabilities like optimizing workloads, analyzing sensor data, or enabling predictive maintenance. These capabilities often increase efficiency and lower operating costs over the long term. If these kinds of automation features are implemented in an optional or conditional manner, the system can be made to operate without them. And given the choice between a factory running less efficiently (i.e. without optimization) and not running at all, running less efficiently wins without question.
Mission-critical capabilities, such as program storage, will need local redundancy (i.e., backup capabilities). However, consider how much redundancy is really needed. For example, not every machine on a factory floor needs its own program server. Because programs are loaded irregularly, a single computer might be able to provide a temporary fix that, while not the most efficient or convenient approach, still keeps the business operational (i.e., reduced capabilities).
The cloud is often used to automate functions, such as watching to see if a system encounters a problem and needs human intervention. If the cloud is not available, a local computer could take over this task. This requires that a non-cloud version of the software is available. Or, someone could just walk the floor looking for flashing red lights so the machines can keep running.
Some cloud functions collect data and analyze it. For example, sensors can track motor performance to identify issues and adjust operational algorithms to improve performance and lower the risk of failure. This data is not run-time essential and lack of access should not shut down the machine. But you have to design it that way.
Many times the cloud is used to collect data. If the cloud is down, then the system can either buffer data or simply not collect it. Consider what the data is and whether it is so important that a user would rather not run the machine if it can’t be captured. If the data pertains to something important like billing, then the system can still operate. The business can just estimate down in the customers’ favor. Better to make less per hour than to lose money. A business also has the option to go old school and use paper and pen as a last resort.
Redundancy in the Internet of Things is also a serious consideration for autonomous vehicles that require cloud access to operate. It isn’t Internet access to the vehicle that matters so much but rather all the smart city sensors they depend upon to make safe driving decisions. Will vehicles just stop in the street when all of the smart sensors are shut down due to a planned power outage? Or will people be stuck in their homes, unable to drive?
The world worked before the cloud. And the recent power outages make it abundantly clear that we will be reminded of this on a regular basis. We can’t afford to save so much money building smart factories and buildings and cities if this makes Internet access a single point of failure.
 As of writing, I am looking at my THIRD planned outage from PG&E in as many weeks. Planned meaning they gave me about 36-hour notice.