How Continuous Observability Unlocks Remote Debugging Superpowers

By Erik Tamlin

Senior Software Engineer

Percepio

April 29, 2024

Blog

Image Credit: Percepio

As an embedded software engineer at Percepio frequently supporting our customers, I've witnessed first-hand the debugging nightmares developers often face when working with RTOS-based firmware. The complexities of large multi-threaded software systems can make debugging an incredibly frustrating and time-consuming experience. Sometimes you don’t even know where to begin, what parts of the code to focus the debugging on. You step through thousands of code lines without finding any explanation. If you are lucky, you finally spot some clue that gets you on the right track, but it might have taken you several days or weeks to get this far.

However, I've also seen how the right tooling and practices around "continuous observability" can completely transform this debugging process. The core principle is straightforward – make sure to enable all available error detections, monitor system health metrics, check for anomalies, provide comprehensive diagnostic data, and keep diagnostic data logging/collection enabled at all times. It’s your “seatbelt” for software crashes that can save you from a lot of pain down the road.

But implementing continuous observability effectively is where the nuances come in. You can't just log everything all the time, as that would quickly exhaust the limited memory and bandwidth on embedded devices. The smart approach is to provide snapshots of the system state and the most recent system activity. Application logging and RTOS event tracing can be directed to circular buffers and included in the snapshot. This gives you a "film reel" view of the runtime activity leading to failure. Core dumps of just a few hundred bytes can provide the most recent call-stack, allowing for deep observability on errors and anomalies, without generating excessive amounts of log data and without the need for live debug connections to the actual devices.

What makes the continuous observability paradigm so powerful is that it works seamlessly whether you need to debug elusive bugs during inhouse testing, or remotely diagnosing issues in the field, either during field trials or in deployed products worldwide. For inhouse testing all you need is a UART to provide the data. But that same diagnostic data can be relayed over any available connectivity channel when in the field – Wi-Fi, cellular, Bluetooth, you name it. 

Of course, there are always challenges like data privacy and IP security concerns around uploading potentially sensitive information. At Percepio, we've given these concerns priority when designing our DevAlert solution to provide Observability-as-a-Service while keeping the sensitive data storage on the customer side.

From my experience, implementing these continuous observability practices, backed by the right tools and workflow integrations, can be an absolute game-changer for embedded developers. Instead of shooting in the dark, they can now quickly reproduce and resolve challenging bugs regardless of whether the issue is in the test lab or in a device on the other side of the world.

As embedded systems continue to grow more complex and connected, this type of observability is becoming a critical success factor. I'm excited to continue driving this cutting-edge methodology forward to empower developers building the next generation of connected devices.

Erik Tamlin is a Senior Software Engineer at Percepio AB. He’s responsible for QA, releases, and the design of the Percepio TraceRecorder used by Tracealyzer.

More from Erik