Microsoft Puts Flash-based Bridge Between RAM and a Hard Drive To Make Faster Systems
Microsoft researchers are trying to make systems faster by
adding a flash-based "bridge" between RAM and a hard drive that helps to overcome the handicaps of each while maximizing flash memory's capabilities and minimizing its weaknesses.
The researchers call this flash bridge "FlashStore." FlashStore operates as a key-value store and uses a "key" to access the "value" associated with each piece of a data record. It supports the operations of read, write, update, and deletion of such data records.
"Flash is great technology, but it requires intelligent software to make the best use of it," says Sengupta, a researcher at Microsoft Research Redmond. "That's where our work comes in."
FlashStore has potentially significant usefulness across a range of computing applications, from server farms to cloud applications. Microsoft said that FlashStore has already shown it can speed up online gaming for Xbox LIVE players, as well as data-intensive server applications.
How it works
In FlashStore, flash "sits" between a hard drive and RAM, acting as a high-speed holding area for frequently used data. FlashStore works by overcoming flash memory's drawbacks. These relate to the way in which flash stores and manages data. In flash memory, data is not as easily overwritten as it is on a hard drive. Flash memory starts in an erased state, then collects data in page-sized units at a time, where a page can vary in size from 512 bytes to eight kilobytes, depending on the device.
But if you need to erase data, you can't do it a page at a time. Typically, an erase block consists of 32 to 64 data pages. Li compares the way flash works to a line of water buckets.
"The write operation is like pouring water into a series of buckets," says Li, a principal researcher with Microsoft Research Redmond's Communication and Collaboration Systems group. "The cheapest flash devices are designed so that if you need to erase a bank - the block - you have to drain the water from all the buckets. You can't drain just one bucket."
To handle data writes and reads, the memory controller in a flash-memory device uses a mapping system called Flash Translation Layer (FTL), which maps the logical page address of the incoming data to the actual physical page location of the data in flash memory. If a page of data is rewritten, it is written to a new location with an updated FTL entry.
During a process known in computing circles as "garbage collection," the flash controller takes valid data from a memory block, as well as new data, and writes both to a new memory block. The freed memory block then can be erased for storing new data?in effect, draining all the buckets in one memory block. But this process isn't efficient, in large part because of the requirement to reclaim space with pages that are rendered invalid by such operations. In time, repeated erase cycles also degrade the life expectancy of each flash memory block.
Flash also writes only to pages?nothing smaller. Data written to a page that is less than the page size, typically 512 bytes to eight kilobytes, make poor use of that page. Any attempt to write a small chunk of data within a page means the controller will have to move the existing data to another page, along with the new data.
Flash works fine on consumer electronic devices such as cameras and digital music players, where each photo or music file is at least megabytes in size, and data are written to flash mostly sequentially.
But flash performance drops drastically when data writes are random, which is the case for many desktop computing and server applications. Even today's high-performance, flash-based disk-replacement devices experience a dramatic drop in access latencies and throughput when performing random write operations.
Then there is the cost aspect of flash memory, which is about 10 times more expensive than hard disk per GB - and about 10 times less expensive than RAM. Flash makes most sense for applications that can identify and exploit the sweet spot between cost and performance.
To make flash work efficiently enough to be more attractive on the price-performance tradeoff for heavy-duty server and cloud computing, FlashStore?s creators had to create a more flash-friendly data structure while avoiding random data writes to flash. They did that by giving FlashStore three important design features.
First, FlashStore introduces flash as a cache in the memory hierarchy between RAM and hard disk so that the flash memory holds a "working set" of data - information a computer is most apt to need. FlashStore tracks data use so that, as data becomes less frequently accessed, it eventually is sent back to the hard drive so that new, fresh data can take its place. That makes better use of flash while still getting much faster retrievals than are possible with a hard drive.
Second, FlashStore is designed to eliminate random writes. It organizes data in a log structure on flash so that new data sent to flash does not lead to random writes and, hence, is not subject to garbage collection by the device. FlashStore aggregates small writes from the application into a write buffer in memory and writes to flash when there is enough data to fill a flash page. By making writes sequential to flash, FlashStore makes much better use of flash-device architecture. If the application needs strict data-durability guarantees, FlashStore can aggregate writes in memory up to a timeout interval provided by the application or when a flash page worth of data is amassed, whichever comes first.
Finally, FlashStore is designed to use RAM efficiently by employing a specialized RAM index to access the data on flash. It uses a hash-table-based index in RAM that is designed to save space along both the vertical and horizontal dimensions?the number of slots and the size of each slot, respectively. To reduce the number of slots, it uses a variant of cuckoo hashing that achieves high hash-table load factors while keeping lookup times fast. To reduce the size of each slot, it stores a compact key signature of about two bytes in each slot instead of the full key, which could be tens of bytes or larger, depending on the application. Each slot also stores a pointer, typically four bytes, that points to the full key-value pair record on flash. During key access, the pointer to flash is followed only if the signature in the hash-table slot matches that of the key being searched. These techniques keep the RAM usage frugal - at about six bytes per key-value pair, independent of the key-value pair size?and key-access times fast, involving at most one flash read per lookup on the average.
Microsoft says that FlashStore's results are impressive. Server systems using FlashStore can show as much as several tens of factors of increased data throughput compared with an ordinary RAM/hard-drive configuration and existing software that is not flash-aware, such as a key-value store like Berkeley DB. When compared with simple hard-disk replacement with flash but no changes in software, as in using a flash-unaware key-value store, FlashStore demonstrated several factors of improvement in performance - underscoring the impact of using flash-aware data structures and algorithms in FlashStore. It could enable the creation of much less expensive and more power-efficient computing designs. In some cases, Sengupta says, system designers buy arrays of hard disks that remain only partially filled with data, because they need additional disk heads for higher input/output (IO) operations per second and not for disk space. FlashStore can be used to absorb the IO intensive operations at the flash layer, while leaving the hard drive to handle operations involving large data size that amortize disk-seek time overheads.
"You could replace 10 to 20 hard drives with one flash drive using FlashStore for such applications," he says. "That gives you a capital-expenditure savings, power savings, and operational-expenditure savings, as well, and you also get much faster throughput: an all-win situation on the three metrics of price, power, and performance."
"Flash is great technology, but it requires intelligent software to make the best use of it," says Sengupta, a researcher at Microsoft Research Redmond. "That's where our work comes in."
FlashStore has potentially significant usefulness across a range of computing applications, from server farms to cloud applications. Microsoft said that FlashStore has already shown it can speed up online gaming for Xbox LIVE players, as well as data-intensive server applications.
How it works
In FlashStore, flash "sits" between a hard drive and RAM, acting as a high-speed holding area for frequently used data. FlashStore works by overcoming flash memory's drawbacks. These relate to the way in which flash stores and manages data. In flash memory, data is not as easily overwritten as it is on a hard drive. Flash memory starts in an erased state, then collects data in page-sized units at a time, where a page can vary in size from 512 bytes to eight kilobytes, depending on the device.
But if you need to erase data, you can't do it a page at a time. Typically, an erase block consists of 32 to 64 data pages. Li compares the way flash works to a line of water buckets.
"The write operation is like pouring water into a series of buckets," says Li, a principal researcher with Microsoft Research Redmond's Communication and Collaboration Systems group. "The cheapest flash devices are designed so that if you need to erase a bank - the block - you have to drain the water from all the buckets. You can't drain just one bucket."
To handle data writes and reads, the memory controller in a flash-memory device uses a mapping system called Flash Translation Layer (FTL), which maps the logical page address of the incoming data to the actual physical page location of the data in flash memory. If a page of data is rewritten, it is written to a new location with an updated FTL entry.
During a process known in computing circles as "garbage collection," the flash controller takes valid data from a memory block, as well as new data, and writes both to a new memory block. The freed memory block then can be erased for storing new data?in effect, draining all the buckets in one memory block. But this process isn't efficient, in large part because of the requirement to reclaim space with pages that are rendered invalid by such operations. In time, repeated erase cycles also degrade the life expectancy of each flash memory block.
Flash also writes only to pages?nothing smaller. Data written to a page that is less than the page size, typically 512 bytes to eight kilobytes, make poor use of that page. Any attempt to write a small chunk of data within a page means the controller will have to move the existing data to another page, along with the new data.
Flash works fine on consumer electronic devices such as cameras and digital music players, where each photo or music file is at least megabytes in size, and data are written to flash mostly sequentially.
But flash performance drops drastically when data writes are random, which is the case for many desktop computing and server applications. Even today's high-performance, flash-based disk-replacement devices experience a dramatic drop in access latencies and throughput when performing random write operations.
Then there is the cost aspect of flash memory, which is about 10 times more expensive than hard disk per GB - and about 10 times less expensive than RAM. Flash makes most sense for applications that can identify and exploit the sweet spot between cost and performance.
To make flash work efficiently enough to be more attractive on the price-performance tradeoff for heavy-duty server and cloud computing, FlashStore?s creators had to create a more flash-friendly data structure while avoiding random data writes to flash. They did that by giving FlashStore three important design features.
First, FlashStore introduces flash as a cache in the memory hierarchy between RAM and hard disk so that the flash memory holds a "working set" of data - information a computer is most apt to need. FlashStore tracks data use so that, as data becomes less frequently accessed, it eventually is sent back to the hard drive so that new, fresh data can take its place. That makes better use of flash while still getting much faster retrievals than are possible with a hard drive.
Second, FlashStore is designed to eliminate random writes. It organizes data in a log structure on flash so that new data sent to flash does not lead to random writes and, hence, is not subject to garbage collection by the device. FlashStore aggregates small writes from the application into a write buffer in memory and writes to flash when there is enough data to fill a flash page. By making writes sequential to flash, FlashStore makes much better use of flash-device architecture. If the application needs strict data-durability guarantees, FlashStore can aggregate writes in memory up to a timeout interval provided by the application or when a flash page worth of data is amassed, whichever comes first.
Finally, FlashStore is designed to use RAM efficiently by employing a specialized RAM index to access the data on flash. It uses a hash-table-based index in RAM that is designed to save space along both the vertical and horizontal dimensions?the number of slots and the size of each slot, respectively. To reduce the number of slots, it uses a variant of cuckoo hashing that achieves high hash-table load factors while keeping lookup times fast. To reduce the size of each slot, it stores a compact key signature of about two bytes in each slot instead of the full key, which could be tens of bytes or larger, depending on the application. Each slot also stores a pointer, typically four bytes, that points to the full key-value pair record on flash. During key access, the pointer to flash is followed only if the signature in the hash-table slot matches that of the key being searched. These techniques keep the RAM usage frugal - at about six bytes per key-value pair, independent of the key-value pair size?and key-access times fast, involving at most one flash read per lookup on the average.
Microsoft says that FlashStore's results are impressive. Server systems using FlashStore can show as much as several tens of factors of increased data throughput compared with an ordinary RAM/hard-drive configuration and existing software that is not flash-aware, such as a key-value store like Berkeley DB. When compared with simple hard-disk replacement with flash but no changes in software, as in using a flash-unaware key-value store, FlashStore demonstrated several factors of improvement in performance - underscoring the impact of using flash-aware data structures and algorithms in FlashStore. It could enable the creation of much less expensive and more power-efficient computing designs. In some cases, Sengupta says, system designers buy arrays of hard disks that remain only partially filled with data, because they need additional disk heads for higher input/output (IO) operations per second and not for disk space. FlashStore can be used to absorb the IO intensive operations at the flash layer, while leaving the hard drive to handle operations involving large data size that amortize disk-seek time overheads.
"You could replace 10 to 20 hard drives with one flash drive using FlashStore for such applications," he says. "That gives you a capital-expenditure savings, power savings, and operational-expenditure savings, as well, and you also get much faster throughput: an all-win situation on the three metrics of price, power, and performance."