Facebook Introduces Network Switch
On Wednesday, Facebook announced that it had created "Wedge," a new kind of computer networking switch, potentially capable of shifting data rapidly through the largest data centers. The top-of-rack (TOR) network switch is based on a new Linux-based operating system, code-named "FBOSS." These projects break down the hardware and software components of the network stack even further, to provide visibility, automation, and control in the operation of the network.
Facebook's goal with these projects was to make its network look, feel, and operate more like the Open Compute Project (OCP) servers the company has already deployed, both in terms of hardware and software.
The switch is designed to work with other commercial and open-source networking products. It is initially designed to work at speeds of up to 40 gigabits per second, the upper end of what most network switches now carry, but will move to 100 gigabits in the near future. Facebook says it has already seen interest from Microsoft, Goldman Sachs and Bloomberg, among others, in working with the product.
The "Wedge" has the same power and flexibility as a server. Traditional network switches often use fixed hardware configurations and non-standard control interfaces, limiting the capabilities of the device and complicating deployments. Facebook leveraged its existing "Group Hug" architecture for modular microservers, which enables the company to use a wide range of microservers from across the open hardware ecosystem. Facebook started with a microserver that it is using elsewhere in its infrastructure, but the open form factor will allow the company to use a range of processors, including products from Intel, AMD, or ARM.
By using a real server module in the switch, Facebook is able to bring switches into its distributed fleet management systems and provision them with its standard Linux-based operating environment.
Unlike with traditional closed-hardware switches, with "Wedge" anyone can modify or replace any of the components in Facebook's design to better meet their needs.
On the software side, "FBOSS" was designed to allow Facebook to leverage the software libraries and systems the company is currently using for managing its servers, including initial turn-up and decommissioning, upgrades and downgrades, and draining and undraining. Facebook also added a Thrift-based abstraction layer on top of the switch ASIC APIs, which will enable its engineers to treat "Wedge" like any other service in Facebook.
The service layer in "FBOSS" allows Facebook to implement a hybrid of distributed and centralized control. The flexibility allows Facebook's exngineers to to optimize where the control logic resides, which in turn will allow them to get higher utilization on its links, troubleshoot easier, recover from failure faster, and respond more quickly to sudden changes in global traffic.
With "FBOSS," Facebook can also leverage existing tools for environmental monitoring that give the company insight into the systems' performance, like cooling fan behavior, internal temperatures, and voltage levels.
"Wedge" and "FBOSS" are currently being tested in Facebook's network and the company plans to propose the designs for "Wedge" and the central pieces of "FBOSS" as contributions to OCP.