Difficulties in calculating MTBF and reliability

By OpenSystems Media

April 01, 2014

Difficulties in calculating MTBF and reliability

Mean Time Between Failure (MTBF) is an important metric for determining the lifetime of embedded system components, but it’s difficult to calcul...

Mean Time Between Failure (MTBF) is an important metric for determining the lifetime of embedded system components, but it’s difficult to calculate with accurate results. These difficulties lead to unreliable figures, which has generated backlash against such calculations.

One typically defines reliability as the probability that said device will perform functionally as required for a specified period of time. This all seems rather simplistic, and it can be, to a degree, with a large enough sample size and a long enough period of time. The main issue with deriving such figures is that they are required for a product’s release – not at the end of its lifetime when actual reliability can be determined.

To retrospectively calculate the reliability a component or device provided over its lifetime is fairly rudimentary math – total time/total failures. This is all well and good when proving near-obsolete products and potentially useful to prove the reliability of a typical product, but new integrators want to know how reliable this product is, not the previous incarnation.

Increasingly often, beyond a general acceptance of the estimated lifetime of industrial electronics, reliability is specified upfront at the earliest specification stage. Whilst this is more than logical – for instance, one must decide a warranty period based on an estimated lifetime – we now need to quantify that reliability. This is where our old friend, or increasingly, enemy, Mean Time Between Failure (MTBF) comes in. MTBF and “asset life” increasingly go hand in hand, but how accurate are any of these figures and what do they actually tell us?

It’s also worth pointing out MTBF’s less common cousin Mean Time To Failure (MTTF), which differs in that the latter generally is used for an irreparable product, so is used more often for atomic components rather than an assembled product. MTTF is calculated as total time/number of units.

Both have gaping holes in their accuracy; reliability of a given individual unit is a hugely complex calculation. To provide an example of this minefield, a client recently asked if their bespoke product we manufacture is suitable for a 10-year asset life. By querying this, they wanted us to provide evidence of a 10-year MTBF.

Interestingly, what would seem the most logical way to calculate MTBF gave the most bizarre result! Given the product has been in manufacture and deployed for more than 5 years, unlike a new product, we had the gift of substantial historical data. Unfortunately, that data of approximately 5,000 units, deployed over an average of 3 years, with around 14 failures provides an MTBF of more than 1,000 years!

As much as I’d like to gloat about our bespoke product’s reliability and my figures will entirely support this is a true MTBF figure, no one could realistically believe even the materials the product are constructed from will survive this length of time – though that could well be true of the plastic enclosure!

The second, perhaps more realistic method, only considers one component: the weakest link. It’s perfectly logical that by definition that the weakest link is the most likely to fail, and thus most likely to fail first. So should no calculation exist at all, and this figure just be passed through to the final product?

The way in which MTBF is presented I liken to how automobile manufacturers declare fuel consumption figures. Never in history has the real-world MPG achieved in a vehicle actually matched the extravagant claims of the manufacturer, as this figure was obtained in a far from real-world test with vents sealed, no wind, etc. Likewise, a component manufacturer’s MTBF is unlikely to encompass all, or any of the extraneous factors that will affect it – be that humidity, temperature, vibration, or shock. What these constants were during testing are almost never documented, thus any particular MTBF figure is rarely comparable to the next. Unfortunately, this regress follows to the final product; MTBF simply doesn’t cover the expected usage conditions or what the product lifetime should be.

The calculation of reliability and likelihood of failure has been studied in depth. Well-known, observable phenomena such as the “bathtub” effect are well documented but very difficult to encompass into a single “hours” integer. Weibull analysis, determining where a population of product currently lies in the
bathtub, is well worth researching further – alongside Accelerated Life Testing that tries to encompass an individual unit’s passage of time, though not quite for a millennium!

An increasingly popular website, www.nomtbf.com, is very much worth a read, pushing a backlash against this age-old quantification method. The reality is, though, there’s not even anything close to the “right answer” to truly calculate reliability.

Rory Dear (Technical Contributor)

Embedded Computing Design

Difficulties in calculating MTBF and reliability

By OpenSystems Media

Mean Time Between Failure (MTBF) is an important metric for determining the lifetime of embedded system components, but it’s difficult to calcul...

Mean Time Between Failure (MTBF) is an important metric for determining the lifetime of embedded system components, but it’s difficult to calculate with accurate results. These difficulties lead to unreliable figures, which has generated backlash against such calculations.

Categories

AI & Machine Learning - Predictive Maintenance

Trending Articles

Gateways & Nodes for Raspberry Pi, Pico, Espressif, and More

embedded world 2024 Best in Show Winners

2024 embedded world Product Showcase: Synaptics’ SL-Series of Embedded IoT Processors

embedded world is Coming to America

embedded world 2024: Rutronik Presents State-of-the-Art Components and Future-Oriented System Solutions

Storage

Embedded World 2024: High-Endurance, Robust Cross-Temp Reliability 176-Layer Storage, DDR5-5600 Solutions Take Center Stage at ATP Electronics’ Exhibit

Networking & 5G

Alif Semiconductor Announces BLE and Matter Wireless Microcontroller With Neural Co-Processor for AI/ML Workloads

Security

Embracing FIPS Validation in Medical Device Security

Software & OS

Product of the Week: Parasoft’s C/C++test CT for Continuous Testing & Compliance