WD Proposes New Approach to Providing Fault Tolerance in SCM-based Memory
Storage Class Memory (SCM) has been proposed as a cost-effective enabling technology for large-scale in-memory compute problems. However, to enable such systems at massive scale, a mechanism to ensure reliability and consistency of very large pools of distributed memory is needed.
Western Digital Research will help present a solution to this problem at the P4 European Workshop (P4EU) in Cambridge, UK. The company will demonstrate an in-network high-performance fault tolerant solution for distributed in-memory compute, at latencies that are orders-of-magnitude lower than any currently existing solution.
Computer memory and storage are organized in a hierarchy with tiers distinguished by response time, volatility, and cost. At the top of the hierarchy are SRAM caches and DRAM main memory, which have low latency, unlimited write endurance, and fine granularity of access. They are, however, power-hungry, expensive and volatile, necessitating further tiering to solid-state NAND flash storage (SSD), and finally to spinning disk or tape magnetic storage at the bottom of the hierarchy.
These terminal tiers of non-volatile and durable bit storage have much higher access latency and granularity than volatile memory and, in the case of NAND flash, finite write endurance limiting the total amount of data that can be written before device replacement is required.
This traditional organization is being shaken up by the advent of Storage Class Memory (SCM): several emerging memory technologies, such as Phase-Change Memory (PCM), Resistive RAM (ReRAM), and Spin-Torque Magnetic RAM (STT-MRAM) are non-volatile, offer byte-addressability, and response times not much slower than DRAM, but could cost significantly less on account of simpler memory cell architecture resulting in denser packing. Recent breakthroughs in selector element physics enabled larger memory cell arrays which result in better utilization of die area, leading to further cost advantages over DRAM. Consequently,in some scenarios it is feasible to replace several tiers of the traditional memory/storage hierarchy with a single, cost-effective, uniform type of memory that also serves as the terminal tier of non-volatile and durable data storage.
While this sounds appealing from the architectural simplicity and elegance standpoint, there is a fly in the ointment. All known SCM technologies involve movement of atoms, and so have unavoidable wear-out mechanisms resulting in limited write (and sometimes read) endurance of devices. This places severe practical limits on scale-out size of storage systems built on top of these technologies, and even the practicality of single systems where DRAM is replaced with cheaper SCM main memory that is, alas, guaranteed to fail after brief use.
What this means in practice is that to enable significant displacement of DRAM in prevailing systems architectures, researchers must translate a variety of techniques traditionally used for slow durable storage (e.g., RAID for disks or SSDs) to work at timescales suitable for main memory. This strategy would enable researchers to satisfy data replication and consistency requirements that are taken for granted in the current many- tiered architectures.
WD's approach uses replicated SCM instances that are kept consistent by a programmable interconnect running a generalization of a protocol referred as the ABD protocol. According to WD, the ABD protocol is well suited to the task for several reasons. First, the protocol ensures linearizable read/write access to memory, while tolerating failures. Second, ABD is simpler than more general protocols, such as Paxos, allowing for an efficient, in-network implementation. Third, ABD only requires that the switch keeps soft state during the protocol exchange, reducing the reliance on scarce switch resources (e.g., SRAM, TCAM).
WD's solution implements fault tolerance inside of the network switch, using the emerging P4 network programming language, resulting in many orders of magnitude improvement in performance relative to existing solutions. The open P4 ecosystem empowers network owners, operators and app developers to have greater control of the data plane within the network, enabling a new class of distributed system solutions at performance levels well beyond what is conceivable today.
The live demonstration at P4EU is a P4 program implementation developed by Western Digital and the University della Svizzera italiana (USI) using a high-performance Barefoot Tofino processor powering a BF6064X switch from STORDIS. USI and Western Digital use the rich, open and empowering P4 programming language to implement a consensus protocol within the Tofino network switch chip, allowing us to retain multiple copies of remote, non-volatile memory and to manage the consistency protocol within the switch itself. The system treats the SCM-based main memory in each server as a distributed storage system and implements data replication across systems, along with a consensus protocol to keep the replicas consistent.
WD says that early implementations using software memory controller emulation already operate at time scales orders of magnitude faster than traditional replicated storage systems and show great promise as scalable main memory.