Traditional Development Processes Aren't Enough When Developing IoT Devices

By Tyler Hoffman

Co-founder, Software Engineer

Memfault

December 28, 2021

Blog

Traditional Development Processes Aren't Enough When Developing IoT Devices

You can’t avoid bugs in connected devices. Good firmware engineers acknowledge that end-users will eventually stumble across a bug no matter how rigorous the testing. When experienced by a critical mass of users, determining the root cause can be fairly straightforward. When the bug is rare or a one-off and can’t easily be reproduced at an engineer’s desk, finding the source is time-intensive and complex.

When your fleet is 20 devices, developers can simply look through logs and manually search for common issues or quickly build some simple Python scripts that parse through the data to determine the cause. But when fleet size scales to the thousands or millions, a Command-F search and simple scripts can’t capture all bugs — and they certainly won’t surface new issues.

Given the rising consumer appetite for IoT devices and increased connectivity across platforms, the likelihood and frequency of bugs — both common and not — is simply accelerating. Just as vendors can no longer release products expecting to never interact with them again, firmware engineers have to adapt to the evolving demands for and complexities of continuous and improved device functionality. Having a plan to build, monitor, and update is critical for modern device development.

Anticipate Bugs and Build with Them in Mind

Unfortunately, there is no way to anticipate every end user’s input and operating environment with your device. So instead of following a reactive development approach where problems are first noticed by annoyed users on Reddit on release days, you should adopt a proactive development stance. Proactive development privileges getting products out the door with built-in future fixes and updates as part of regular device management, including:

  • Ensuring devices that can reset to absolute factory conditions can either restore to an old firmware or establish a minimal firmware route
  • Evaluating device diagnostic trends and shipping updates when fleet-wide trends indicate the right direction, rather than requiring a certain number of days in testing or soak time
  • Following a Day-0 workflow, freezing firmware in a bare minimum state and shipping products with the expectation that you will continuously improve algorithms and update devices after they’ve shipped

Monitor to Mitigate Potential Issues

For too long, device manufacturers have relied on end-users to effectively serve as product testers, waiting for affected customers to report issues to customer service. But the combination of social media channels and high customer expectations render that approach anachronistic and risky. End-users don’t just want “good enough” products — they expect devices to offer unique, efficient, and convenient functionality; seamless integrations with other devices and applications; security; and regular updates, all with no disruption.  And they’re often prepared to let everyone know when those expectations aren’t met.

Device monitoring is central to meeting those demands and ensuring fleetwide device health. Deploy a system to monitor events, the number of times they occur, and the action they’re triggering in devices (crashing, resetting, or hitting an assert). Capture key metrics such as:

  • The battery life drop in response to system changes or events
  • Seconds Bluetooth chip was on
  • The number of Bluetooth disconnects
  • How many ticks or seconds the CPU was active
  • How long the device was connected to Bluetooth over the hour

With a monitoring system in place, issues can be detected (and repaired) early, often without disrupting the end-user experience at all.

Adopt a Staged Rollout Approach to Mitigate Potential Issues

Staging rollouts (making releases available to all devices incrementally) offers greater control and observability, reducing risk exposure. Staged rollouts allow developers to limit the impact of unanticipated bugs in version updates and patches while simultaneously capturing up-to-date metrics for triage and repair.

At the simplest level, staged rollouts can include a vendor’s employees and their willing connections to serve as an initial test audience. Beyond testing with the friends and family method, device manufactures can select a percentage of end-users whose devices can be monitored to gauge potential issues.

Alternatively, developers can implement a less formal version of staged rollouts through device monitoring in timed intervals. When devices call in at pre-determined intervals, developers can push OTA payloads for a specific period of time and monitor the devices that received the update during their check-in.

Device developers drive innovation despite a number of constraints like device power, intermittent connection, and limited resources. By adopting a proactive development approach that anticipates issues, developers can ease the additional complexities of hardware development and lean into dynamic and iterative processes that lead to better, more robust products that improve over time.

Tyler Hoffman is co-founder and software engineer at Memfault, a provider for firmware delivery, monitoring, and diagnostics solutions for embedded device companies. Prior to founding Memfault, Tyler led the firmware developer productivity team at Fitbit, and was an embedded software engineer at Pebble Tech.

More from Tyler