Calculating SSD useable life in embedded medical equipment applications
June 01, 2009
A new methodology can help avert medical equipment failure by accurately predicting an SSD’s lifespan.
SSDs have evolved to become a viable option to replace rotating Hard Disk Drives (HDDs) in many embedded systems, including medical equipment. This is because SSDs eliminate the single largest failure mechanism in most medical systems – the moving parts of HDDs.
Medical devices have long product test and qualification cycles and are subject to rigorous regulatory approval processes. These processes are necessary given that primary hard drive failure is an unfortunate reality in all devices, not just medical devices; it is not “if” but “when” an HDD will fail because it has moving parts that at some point will wear out and stop functioning. When failure occurs, it can be a regulatory nightmare.
The Safe Medical Device Act of 1990 authorizes the Food and Drug Administration (FDA) to regulate medical devices. Hospitals and health care organizations must report all instances of medical device failure causing serious illness, injury, or death. This can result in costly lawsuits, product recalls, and untold ill will. Even if there is no fatality, at the very least, the medical device will have to be requalified through the FDA, which could take years and cost hundreds of thousands of dollars.
Storage solutions must be rugged and able to perform in critical applications without failure. A small footprint is often required, as well as tolerance to high shock and vibration and protection against drive corruption from power disturbances caused by user error or environmental conditions.
In addition to these requirements, medical equipment designers face continued pressure to reduce overall system costs in medical equipment. NAND flash components have advanced to deliver lower cost per bit, but in doing so, have sacrificed reliability and endurance. This has led many OEMs to question how long an SSD will last in their critical medical applications.
To help medical equipment designers address this significant industry concern, the following discussion provides a brief overview of recent changes in NAND flash technology and some of the algorithms SSD vendors use to manage those changes. Using this common data, a new methodology can help designers predict useful life by outlining the parameters that SSD manufacturers control (such as the type of NAND used, write performance, and write amplification) and those that system OEMs can control (usage model, capacity, and write duty cycle).
NAND flash technology changes
NAND flash components, the primary storage media in SSDs, are experiencing technology changes at a rate never before seen in the semiconductor industry. The quest for lower cost per bit and smaller size requirements is driving NAND flash technology to shrink to smaller process geometries and store multiple bits per cell. Even though this results in higher-capacity SSDs in smaller form factors with an ever-decreasing cost per gigabyte, it brings reliability and product longevity challenges for medical equipment OEMs.
The reliability concern relative to NAND flash-based SSDs is primarily centered around the limitation to the device’s number of write/erase cycles or endurance. OEMs often question whether an SSD will meet their long-term system deployment requirements, especially in 24/7 medical applications with intensive write/erase usage models.
At the raw media level, NAND flash is inherently more reliable than the magnetic disks in HDDs. SSD controllers are now facing the same issues as the HDD controllers that preceded them in determining how to take advantage of the lower cost per bit while maintaining acceptable reliability levels for a specific application. However, SSDs have an advantage in addressing this issue because they are not burdened with what has traditionally been considered the biggest reliability headache for HDDs – the simple mechanics of rotating media.
Storage management algorithms and write amplification
NAND flash must be proactively managed. The SSD controller manages endurance by using wear leveling and other storage management algorithms, and depending on the application, the SSD controller optimizes write/erase operations to increase the endurance at a system level. In addition, SSD controllers reserve a spare area in the NAND flash array to manage bad blocks and other flash vulnerabilities. The number of spares in an SSD is 1 to 2 percent, but it can be as high as 50 percent in applications that require high reliability. This method, called over provisioning, is usually accomplished by providing additional NAND capacity to address these reliability issues.
The concept of write amplification must be considered to accurately calculate SSD useable life. Write amplification is a measure of the SSD controller’s efficiency. It defines the minimum number of writes the controller makes to the media for every write command from the host system. Write amplification highlights the fundamental mismatch between erase block sizes and page sizes. For example, the minimum write size for an SSD controller may be a 4 KB page size.
Most SSDs must erase before writing, which can require that a whole erase block (256 KB) be erased and written. The resulting write amplification in this example would be 256:4 or 64:1. The worst-case scenario is writing to the same logical block address over and over again, which would result in the 64:1 ratio. The best-case scenario is streaming data in file sizes that are integer multiples of the erase block size. In this case, the write amplification would be 1:1. In practice, the write amplification is somewhere in the middle based on how the host writes the data, illustrating that the usage model can have a 64 times impact on the SSD’s useable life.
SSD useable life methodologies
OEMs need to know an SSD’s lifespan in terms of years, months, and days instead of cycles. Classifying endurance using the write/erase cycles per logical block may be a starting point to compare SSD specifications, but it does not answer the real question: How long will an SSD last in an application? It therefore becomes critical to define and measure an application’s usage model to make a real-world determination of an SSD’s lifespan.
Using a worst-case example, the following generic methodology exhibited in Equation 1 is based on a 24/7 usage model with a requirement for one year of data retention. For database or transactional usage model applications, the lifetime calculation must take into account the I/O Per Second (IOPS). IOPS can be measured using an industry-standard benchmark such as IOMeter, which allows the user to define usage model parameters such as file size and the percentage of reads and writes. The write IOPS rating is the output of IOMeter based on the desired file size. The concept of write amplification comes into play here as well. It does not yield accurate information to simply monitor the host writes (IOPS rating); duty cycle must also be considered.
The following definitions describe the terms shown in Equation 1.
· Endurance rating: The block-level endurance traditionally specified as 100K, 10K, or 5K. Use the value 5 for 5K, 10 for 10K, and so on. Many vendors do not give out this information because the NAND is changing so rapidly. Consequently, many users try out different values and adjust capacities accordingly.
· 33.25: Constant derived from endurance rating in thousands of cycles, KB-to-GB, and seconds-to-years unit conversion.
· IOPS rating: Number of write IOPS.
· File size: The file size at which the IOPS rating is measured.
· Write amplification: The number of writes at the NAND level for each host write. This value is related to usage model, but the worst case, for 100 percent random writes as mentioned previously, is a value of 64. This value is based on the ratio of NAND erase block size to page size. If the file size is larger than the page size, then the worst-case write amplification is erase block size divided by file size.
· Duty cycle: The percentage of write cycles to (read cycles plus idle time).
To demonstrate this methodology, consider a medical monitoring equipment manufacturer contemplating a 32 GB SSD to replace a rotating disk drive. The drive uses a NAND device rated at 100K endurance with 200 write IOPS for an 8 KB file. The drive does not specify a write amplification factor, so a value of 32 (256 KB block/8 KB file) will be used. The OEM estimates the write duty cycle at 25 percent, which is a very conservative estimate. Filling in these parameters in the previous equation, the SSD lifetime would be calculated as shown in Equation 2:
LifeEST SSD methodology
As depicted in Figure 1, three parameters govern an SSD’s useable life: technology, capacity, and usage model.
OEMs can use the capacity and usage model to determine useable life based on SSD technology. To that end, Western Digital Technologies proposes a new metric to measure SSD technology. With LifeEST, SSD technology is measured by specifying the number of write years per GB the SSD can achieve, as illustrated in Equation 3 and Figure 2.
The UCC is calculated in Equation 4.
Using the preceding application example, LifeEST is calculated in Equation 5.
SSD useable life is then calculated easily from Equations 6 and 7:
Identifying optimum SSD capacity
Medical equipment designers have traditionally calculated their storage requirements by measuring NAND flash device write/erase cycles at the block level and then determining the size of the operating system and program files before looking at the amount of data to be collected to determine SSD capacity. This approach worked fine before rapid changes in NAND component technology decreased the number of stated write/erase cycles, making it difficult to determine the correct capacity without thoroughly understanding the usage model and effects of write amplification.
Today, medical system OEMs cannot afford costly field failures. It is essential to measure and predict SSD useable life by determining how long a product must be deployed in the field and the application’s usage model. With this information, medical system OEMs can accurately specify the optimum SSD capacity for the required field deployment.
It is important to note that calculations are at best theoretical even with well-modeled applications. A more accurate methodology that yields real-world results involves using a tool within the application itself to monitor the exact wear of the NAND flash and report that data back to the host system. SSDs that feature drive usage monitoring can ensure the integrity of medical devices and eliminate any concerns regarding failure, injury, or regulatory issues. Western Digital’s patent-pending SiSMART monitoring technology integrated into its SiliconDrive SSDs can help medical device manufacturers achieve real-time SSD usage results to accurately forecast useable life based on their specific applications.
Gary Drossel is director of product planning for the Solid-State Storage Business Unit of Western Digital Technologies, based in Lake Forest, California. He is a storage and embedded systems industry expert with nearly 20 years of experience, having served in management roles such as VP of product planning for SiliconSystems, Inc. and director of solid-state drives and product marketing manager for SimpleTech, Inc. Before that, he held various marketing, sales, and field engineering management positions with Pro-Log and the industrial automation group of Parker Hannifin. Gary has a BS in Electrical and Computer Engineering from the University of Wisconsin.
Western Digital Technologies