Distributed cloud storage infrastructure advances with x86 architecture
March 01, 2014
With more and more Internet of Things (IoT) enabled devices sending data to the cloud, new cloud computing strategies are being created to address use...
With the advent of cloud computing, enterprise and SMB storage infrastructure is becoming increasingly decentralized so as to enable the greatest possible management flexibility while improving I/O performance, data recovery time, and cost per Gigabyte. This has been driven in part by parallel advancements in virtualization technology, which helps to create elastic pools of networked IT resources that can be managed dynamically in accordance with evolving business needs.
This accelerating shift to cloud infrastructure has impelled IT system architects to reassess the conventional storage hardware that has underpinned storage platforms to date, and this reassessment naturally extends to the embedded components that comprise these systems – particularly the processor platforms.
Conventional storage arrays and servers are typically architected around centralized controllers or boards that process the aggregate storage I/O via general-purpose x86 CPUs grouped in one-to-many CPU-to-disk drive configurations. The processor is the primary bottleneck of data flows to and from the disk drives within the system, so system vendors design in the highest performing CPUs possible in order to speed I/O, which adds significant expense and power consumption in the process.
As the hardware infrastructure that powers the cloud gets increasingly decentralized, however, a new approach to cloud storage networking is needed – one that deemphasizes consolidated storage models and instead gives the flexibility to deploy storage resources incrementally in building block fashion wherever these resources are needed in the distributed network. This has opened the door to a new cloud storage architecture strategy that leverages distributed intelligent disk drives (flash and mechanical) rather than expensive centralized storage arrays or servers. By equipping each one of these drives with a high-performance, power-efficient x86 multicore processor, the aggregate number of processor cores in the storage network can exceed the amount of cores within legacy storage systems, allowing users to process more data at the same rate or faster – at a lower cost, and less power consumption.
Low-power, high-performance parallel processing
This cloud-optimized storage architecture is enabled by Accelerated Processing Units (APUs) and System-on-Chip (SoC) processor architectures, which combine a general-purpose CPU and discrete-class GPU on a two-die or one-die chipset, respectively. This facilitates high-speed parallel processing by utilizing the GPU to parallel process computational tasks on blocks of data, enabling enterprise-class storage throughput across every individual disk drive node in the network (Figure 1).
These highly integrated processor architectures also minimize design complexity and system size compared to similarly performing heterogeneous CPU+GPU chipsets due to dense integration on the silicon, allowing a reduction in board layers and size. The APU/SoC-enabled computing advantage ultimately makes it possible to harness parallel processing-caliber performance within a standard 3.5” drive enclosure that also houses the requisite embedded storage and networking hardware elements.
Perhaps most importantly, APU and SoC processors can support Thermal Design Power (TDP) profiles that lower system-level heat generation considerably and, more crucially, allow the use of Power over Ethernet (PoE) to power the drives. PoE is a critical enabling factor for this new network storage architecture, and it provides a significant performance per watt efficiency advantage compared to today’s power-hungry storage arrays and servers, which depend on high wattage power supplies with varying efficiencies at different load conditions. In contrast, powering each drive with PoE allows the adjustment and maximization of the efficiency to a fixed load condition.
Backup and battery power
The use of low-power APU and SoC processors within these intelligent drives has yielded an additional benefit that extends beyond primary storage and into the realm of uninterruptable storage and fault-tolerant file systems. Here, too, there are inherent benefits to a hyper-distributed cloud storage architecture, including the ability to perform drive-to-drive data mirroring across the entire network for data redundancy and protection purposes. Most enterprise array and server systems achieve this redundancy capability via Redundant Arrays of Independent Disks (RAID) confined within a single system/cabinet, replicating data across disk drives. This heightens the risk of a multi-drive failure causing irrevocable data loss, whereas in a cloud-based decentralized storage architecture comprised of distributed disk drives, the data is efficiently replicated across every installed node in the network.
Another critical consideration when it comes to data protection is electrical power backup. For conventional storage array and server systems, backup power is provided via large, expensive UPS systems and diesel generators. In a distributed cloud network, however, electrical power backup is enabled at the individual drive level. Owing to the low power enabled by embedded APUs and SoC processors, on-disk battery backup is now a reality using consumer-grade batteries. This enables a reduction in footprint and rack space for datacenters compared to conventional battery backup methods.
Flash storage is a key facilitating technology for this mode of battery backup – indeed this capability would be difficult, if not impossible, to achieve with conventional hard disk drives. Flash on the other hand, when implemented with advanced caching algorithms that accelerate read/write speeds and minimize NAND degradation during data write cycles, provides the highest levels of storage performance and reliability with no moving parts. In an extended power outage scenario, these drives can support graceful shutdown operations that minimize the risk of data loss and/or electrical damage to the drive.
From hardware to software
For this disk drive-based cloud storage model to rival conventional centralized storage approaches in practical functionality, there are some must-have software capabilities to be provided. Virtualization is chief among them, allowing the ability to run Virtual Machines (VMs) natively at the disk drive level to enable granular resource provisioning for diverse workloads and/or multi-tenant usage.
Native support for popular scale-out computing architectures is also critical. Open-source software frameworks like Apache Hadoop and OpenStack are particularly important for big data and cloud storage applications, and must run natively on the drive to minimize performance degradation of the compute cluster when adding or replacing nodes.
Error-Correction Code (ECC) support is another key factor. Traditionally ECC support has been limited to power-hungry processor platforms, but the evolution of ultra-energy-efficient, compute-intensive x86 cloud infrastructure makes ECC support an increasingly important requirement in the low-power processor domain. ECC helps to ensure the highest levels of data integrity while improving data security via the use of advanced encryption algorithms.
An underlying x86 processing platform helps to ensure tight integration with Internet backbone infrastructure, and enables support for the rich ecosystem of industry-standard software and tools used to architect today’s public and private storage clouds.
A new cloud storage model
From expensive, monolithic storage arrays to commodity servers, legacy storage systems can add significant cost, management complexity, and power consumption to cloud-storage infrastructure, while introducing a host of challenges centric to backup and recovery.
A highly scalable, disk drive-based (flash and mechanical) cloud storage model leveraging x86 APUs and SoCs, on the other hand, can yield unprecedented power efficiencies, deliver breakthrough parallel processing performance and enable enterprise-class storage and management capabilities. The added ability to support battery-powered backup at the individual disk drive ensures the highest levels of business continuity and data protection.