Deconstructing the hype machine: Data analytics key differentiator for IoT

January 28, 2015

Deconstructing the hype machine: Data analytics key differentiator for IoT

When I pulled up a chair next to Gartner's Paul O'Donovan at the 2015 International CES earlier this year, one of the first things he said to me was,...

When I pulled up a chair next to Gartner’s Paul O’Donovan at the 2015 International CES earlier this year, one of the first things he said to me was, “You know what really angers me, Brandon? When companies add a Wi-Fi chip to a washing machine and claim it’s an Internet of Things device. That’s not the IoT.”

By now I’m sure you’ve heard reports about the IoT hitting the top of the “hype cycle,” as well as the projections that IoT services alone are projected to generate in the neighborhood of $260 billion a year by 2020[1, 2]. But as Paul rightly points out, much of the buzz surrounding the IoT hype machine to date has centered around adding connectivity to previously “dumb” devices, rather than focusing on the data processing and analytics that will actually provide intelligence to those devices and make the IoT truly transformative. For instance, while a washing machine you can run remotely is cool, leveraging analytics about a washer that is running outside normal operating limits and diagnosing a problem with the machine’s motor can save you time, a huge mess, and open up additional service-based revenue streams for manufacturers; A smart home that informs you of power consumption is great, but a smart home that tracks usage patterns over time and adjusts the run cycles of your appliances to off-peak hours is a game changer that saves you money and presents utilities with the opportunity to tweak service plans.

So why isn’t more attention being paid to data analytics?

Big Data structures and IoT analytics

Outside of the large up-front investment in backend infrastructure required to get a data processing and analytics system off the ground, one of the major problems for Big Data analysis in the IoT is that data from different sensors is often generated in different formats. The reason for this is that developers of the initial data logging infrastructure didn’t put much thought into formatting the logs of data producers (such as sensor devices) because humans were the primary consumers of log data. For us, parsing through different logs and extracting information from them isn’t much of a challenge, so loosely structured log formats sufficed.

But today, humans are not the foremost consumers of log data, not even by a long shot. We now increasingly rely on machines to perform the bulk of the processing and analysis on data generated by other machines, and, unfortunately, machines aren’t as adept as humans at parsing through the diverse, semi-structured data sets associated with the IoT.

Recently, however, there have been efforts in the data science community to fix this data log illiteracy, notably through the open source Fluentd project ( Fluentd is a data collection software that attempts to reconcile the log formats of data sources with those of the backend systems responsible for processing and analysis. It is able to achieve this through what is called a Unified Logging Layer interface, which restructures data logs from both the source and destination in JavaScript Object Notation (JSON) format. Combined with a set of community-contributed plugins that make Fluentd compatible with numerous data sources and outputs, the Unified Logging Layer provides a mechanism for quickly collecting, filtering, and outputting log data from various inputs into a consistent schema that is suitable for analysis.

Treasure Data, Inc. ( out of Mountain View, CA, is one company that has been a major contributor to the work that has taken place within the Fluentd project, and uses a commercial version of Fluentd, the Treasure Agent, as part of its end-to-end data processing technology. Using Treasure Agent, the company is able capture data logs from a wide range of sources in the IoT, telecommunications, retail, advertising, and gaming sectors before securely storing them in its own backend data store, which runs on public cloud platforms, including AWS in North America and the IDC Frontier public cloud in the APAC region. Treasure Data provides a cloud-based user interface from which clients can view their data in tables and run SQL queries. From there, business intelligence can also be integrated to help automate massive IoT deployments and create new business opportunities, such as with the Pioneer telematics service currently under development (Figure 1). The Treasure Data infrastructure currently processes billions of client logs a day, taking in 400,000 data records per second to scale for IoT deployments.




Figure 1: Pioneer is leveraging Treasure Data’s Big Data platform, which includes Fluentd components, for a new telematics service.
(Click graphic to zoom)


Deconstructing the hype machine

Discussions around IoT connectivity are important, especially as we continue to roll out the necessary infrastructure rollout. But that being said, there’s a clear difference between a “connected” device and an IoT device, and the differentiator is data analytics. Before the hype machine moves any further, I hope that becomes more central to the conversation.







Brandon Lewis, Technology Editor