Microsoft Flat Datacenter Storage Beats Data-sorting Record
Microsoft's researchers have deployed a new technique for quickly sorting large amounts of data, called Flat Datacenter Storage (FDS).
The new approach to managing data over a network has enabled a Microsoft Research team to set a speed record for sifting through, or "sorting," a huge amount of data in one minute.
The team conquered what is known as the MinuteSort benchmark. The benchmark measures how quickly data can be sorted starting and ending on disks. Sorting is a basic function in computing, demonstrating the ability of a network to move and organize data so it can be analyzed and used.
Microsoft's team, led by Jeremy Elson in the Distributed Systems group at Microsoft Research Redmond, set the new sort benchmark by using a different approach to sorting called Flat Datacenter Storage (FDS). The teams system sorted almost three times the amount of data (1,401 gigabytes vs. 500 gigabytes) with about one-sixth the hardware resources (1,033 disks across 250 machines vs. 5,624 disks across 1,406 machines) used by the previous record holder, a team from Yahoo that set the mark in 2009.
To put things in perspective, in one minute, the Microsoft Research team sorted the equivalent of two 100-byte data records for every human being on the planet.
The record is significant because it points toward a new method for crunching huge amounts of data using inexpensive servers.
Elson compares FDS to an organizational chart. In a hierarchical company, employees report to a superior, then to another superior, and so on. In a "flat" organization, they basically report to everyone, and vice versa.
FDS takes advantage of another technology Microsoft Research helped develop, called full bisection bandwidth networks. If you were to draw an imaginary line through a collection of computers connected by a full bisection bandwidth network, every computer on one side of the line could send data at full speed to every computer on the other side of the line, and vice versa, no matter where the line is drawn.
Using full bisection networks, the FDS team built a system that could transfer data at two gigabytes per second on each computer for input, with another two gigabytes for output.
Given the ubiquity of interest in managing "big data" the Microsoft Research work is apt to find a home in several computing fields. It could be used in the biological sciences, managing gene sequencing or helping to create new classes of drugs, or it might help in stitching together aerial photographs to give people better imagery of the planet.
The ability to sort data rapidly also will aid machine learningthe design and development of algorithms that enable computers to create predictions based on data, such as sensor data or information from databases.
The team conquered what is known as the MinuteSort benchmark. The benchmark measures how quickly data can be sorted starting and ending on disks. Sorting is a basic function in computing, demonstrating the ability of a network to move and organize data so it can be analyzed and used.
Microsoft's team, led by Jeremy Elson in the Distributed Systems group at Microsoft Research Redmond, set the new sort benchmark by using a different approach to sorting called Flat Datacenter Storage (FDS). The teams system sorted almost three times the amount of data (1,401 gigabytes vs. 500 gigabytes) with about one-sixth the hardware resources (1,033 disks across 250 machines vs. 5,624 disks across 1,406 machines) used by the previous record holder, a team from Yahoo that set the mark in 2009.
To put things in perspective, in one minute, the Microsoft Research team sorted the equivalent of two 100-byte data records for every human being on the planet.
The record is significant because it points toward a new method for crunching huge amounts of data using inexpensive servers.
Elson compares FDS to an organizational chart. In a hierarchical company, employees report to a superior, then to another superior, and so on. In a "flat" organization, they basically report to everyone, and vice versa.
FDS takes advantage of another technology Microsoft Research helped develop, called full bisection bandwidth networks. If you were to draw an imaginary line through a collection of computers connected by a full bisection bandwidth network, every computer on one side of the line could send data at full speed to every computer on the other side of the line, and vice versa, no matter where the line is drawn.
Using full bisection networks, the FDS team built a system that could transfer data at two gigabytes per second on each computer for input, with another two gigabytes for output.
Given the ubiquity of interest in managing "big data" the Microsoft Research work is apt to find a home in several computing fields. It could be used in the biological sciences, managing gene sequencing or helping to create new classes of drugs, or it might help in stitching together aerial photographs to give people better imagery of the planet.
The ability to sort data rapidly also will aid machine learningthe design and development of algorithms that enable computers to create predictions based on data, such as sensor data or information from databases.