Nvidia Tesla P4, P40 Accelerators Focus On Deep Learning, Palm-sized DRIVE PX 2 Computer To Power Baidu car

Nvidia showed off on Monday a smaller and more efficient artificial intelligence computer for self-driving cars, along with new the Tesla P4 and P40 processors for servers and supercomputers, respectively.

DRIVE PX 2

Chinese web services company Baidu will deploy Nvidia's new Drive PX 2 as its in-vehicle car computer for its self-driving system.

Earlier this month, Nvidia and Baidu announced a partnership to develop a full self-driving car architecture from the cloud to the vehicle using both companies' expertise in artificial intelligence (AI).

The new single-processor configuration of the NVIDIA DRIVE PX 2 AI computing platform for AutoCruise functions -- which include highway automated driving and HD mapping -- consumes just 10 watts of power and enables vehicles to use deep neural networks to process data from multiple cameras and sensors.

A car using the small form-factor DRIVE PX 2 for AutoCruise can understand in real time what is happening around it, precisely locate itself on an HD map and plan a safe path forward.

NVIDIA DRIVE PX 2 is powered by the company's newest system-on-a-chip, featuring a GPU based on the NVIDIA Pascal architecture. A single NVIDIA Parker system-on-chip (SoC) configuration can process inputs from multiple cameras, plus lidar, radar and ultrasonic sensors. It supports automotive inputs/outputs, including ethernet, CAN and Flexray.

The new single-processor DRIVE PX 2 will be available to Nvidia's production partners in the fourth quarter of 2016.

Tesla P4, P40 Accelerators

Nvidia also announced new processors Monday to try to embed its products in artificial-intelligence systems.

The chipmaker rolled out graphics chips for running software that makes split-second decisions needed when everything from phones to cars to internet search engines respond to inputs such as speech, images and moving objects.

The company said its new Tesla P4 chip is for servers used in massive data centers. Based on its Pascal design, the P4 is more than three times as efficient at processing images than its predecessor and 40 times more efficient than Intel server chips, according to Nvidia. Another new chip, called the P40, is designed for more-powerful single computers, such as supercomputers.

Nvidia is taking aim at its Intel, which last month announced its own AI chips and talked about its ambition to muscle in on this nascent but fast-growing market.

Nvidia has argued its graphics chips, which perform multiple small manipulations of data simultaneously, are the right answer for AI systems. Intel has said its chips, which have less ability to work in parallel but are more capable in general purpose computing, offer the right solutions.

The Tesla P4 and P40 are designed for inferencing, which uses trained deep neural networks to recognize speech, images or text in response to queries from users and devices.

The Tesla P4 fits in any server with its small form-factor and low-power design, which starts at 50 watts. A single server with a single Tesla P4 replaces 13 CPU-only servers for video inferencing workloads, delivering over 8x savings in total cost of ownership, including server and power costs.

The Tesla P40 delivers maximum throughput for deep learning workloads. With 47 tera-operations per second (TOPS) of inference performance with INT8 instructions, a server with eight Tesla P40 accelerators can replace the performance of more than 140 CPU servers, Nvidia claims. At approximately $5,000 per CPU server, this results in savings of more than $650,000 in server acquisition cost.

Specification	Tesla P4	Tesla P40
Single Precision FLOPS	5.5	12
INT8 TOPS* (Tera-Operations Per Second)	22	47
CUDA Cores	2,560	3,840
GPU GDDR5 Memory	8GB	24GB
Memory Bandwidth	192GB/s	346GB/s
Power	50 Watt (or higher)	250 Watt

Complementing the Tesla P4 and P40 are two software tools to accelerate AI inferencing: NVIDIA TensorRT and the NVIDIA DeepStream SDK.

TensorRT is a library created for optimizing deep learning models for production deployment that delivers instant responsiveness for the most complex networks. It maximizes throughput and efficiency of deep learning applications by taking trained neural nets -- defined with 32-bit or 16-bit operations -- and optimizing them for reduced precision INT8 operations.

NVIDIA DeepStream SDK taps into the power of a Pascal server to simultaneously decode and analyze up to 93 HD video streams in real time compared with seven streams with dual CPUs. This addresses one of the grand challenges of AI: understanding video content at-scale for applications such as self-driving cars, interactive robots, filtering and ad placement.

The NVIDIA Tesla P4 and P40 are planned to be available in November and October, respectively, in servers offered by ODM, OEM and channel partners.