At the 2017 Open Compute Project (OCP) summit today in Santa Clara, California, Facebook is unveiling new types of hardware for use in its data centers, including the Big Basin GPU server and the Bryce Canyon storage platform.
The designs will be available for other companies to use through the Open Compute Project.
Bryce Canyon is Facebook's first major storage chassis designed from the ground up since Open Vault (Knox) was released in 2013. This new storage platform will be used primarily for high-density storage, including for photos and videos, and will provide efficiency and performance.
This new storage chassis supports 72 HDDs in 4 OU (Open Rack units), an HDD density 20 percent higher than that of Open Vault. Its modular design allows multiple configurations, from JBOD to a powerful storage server. The platform supports more powerful processors and a memory footprint up to 4x larger than its predecessor. Bryce Canyon also improves thermal and power efficiency by using larger, more efficient 92 mm fans that simultaneously cool the front three rows of HDDs and pull air beneath the chassis, providing cool air to the rear three rows of HDDs. It also is compatible with the Open Rack v2 standard.
By leveraging Mono Lake, the Open Compute single-socket (1S) server card, as the compute module, a Bryce Canyon storage server provides a 4x increase in compute capability over the Honey Badger storage server based on Open Vault. Bryce Canyon also leverages OpenBMC to manage thermals and power, employing the common management framework for most new hardware in Facebook data centers.
Big Basin is the successor to Facebook's Big Sur GPU server, the company's first widely deployed, high-performance compute platform that helps train larger and deeper neural networks.
Big Basin can train models that are 30 percent larger because of the availability of greater arithmetic throughput and a memory size increase from 12 GB to 16 GB. In tests with popular image classification models like ResNet-50, Facebook's engineers were able to reach almost 100 percent improvement in throughput compared with Big Sur.
Big Basin is designed as a JBOG (just a bunch of GPUs) to allow for the complete disaggregation of the CPU compute from the GPUs. It does not have compute and networking built in, so it requires an external server head node, similar to Facebook's Open Vault JBOD and Lightning JBOF. By designing it this way, Facebook can connect their Open Compute servers as a separate building block from the Big Basin unit and scale each block independently as new CPUs and GPUs are released. The server also supports eight high-performance GPUs (specifically, eight NVIDIA Tesla P100 GPU accelerators), it's compatible with Open Rack v2, and it occupies 3 OU of space.
Tioga Pass is the successor to Leopard, which is used for a variety of compute services at Facebook. Tioga Pass has a dual-socket motherboard, which uses the same 6.5" by 20" form factor and supports single-sided and double-sided designs. The double-sided design, with DIMMs on both PCB sides, allows Facebook to maximize the memory configuration. The onboard mSATA connector on Leopard has been replaced with an M.2 slot to support M.2 NVMe SSDs. The chassis is also compatible with Open Rack v2.
Tioga Pass upgrades the PCIe slot from x24 to x32, which allows for two x16 slots, or one x16 slot and two x8 slots, to make the server more flexible as the head node for both the Big Basin JBOG and Lightning JBOF. This doubles the available PCIe bandwidth when accessing either GPUs or flash. The addition of a 100G network interface controller (NIC) also enables higher-bandwidth access to flash storage when used as a head node for Lightning. This is also Facebook's first dual-CPU server to use OpenBMC after it was introduced with our Mono Lake server last year.
Yosemite v2 is a refresh of Yosemite, Facebook's first-generation multi-node compute platform that holds four 1S server cards, and provides the flexibility and power efficiency for high-density, scale-out data centers. Although Yosemite v2 uses a new 4 OU vCubby chassis design, it is still compatible with Open Rack v2. Each cubby supports four 1S server cards, or two servers plus two device cards. Each of the four servers can connect to either a 50G or 100G multi-host NIC.
Unlike Yosemite, the new power design supports hot service - servers can continue to operate and don't need to be powered down when the sled is pulled out of the chassis for components to be serviced.
The Yosemite v2 chassis supports both Mono Lake as well as the next-generation Twin Lakes 1S server. It also supports device cards such as the Glacier Point SSD carrier card and Crane Flat device carrier card.