Breaking News

Razer Unveils Raiju V3 Pro Samsung announces Galaxy XR headset Leica M EV1 – the first M-Camera with an integrated electronic viewfinder Micron Delivers Industry’s Highest Capacity SOCAMM2 for Low-Power DRAM in the AI Data Center KIOXIA launches EXCERIA PLUS G3 and EXCERIA G3 microSD cards for exceptional photography and video performance

logo

  • Share Us
    • Facebook
    • Twitter
  • Home
  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map

Search form

Google Says Its Tensor Processing Unit Outperforms CPUs And GPUs

Google Says Its Tensor Processing Unit Outperforms CPUs And GPUs

Enterprise & IT Apr 5,2017 0

Google claims that a Tensor Processing Unit (TPU), a dedicated hardware for running machine- learning applications like voice recognition, is more powerful than Intel's Haswell CPUs and Nvidia Tesla K80 GPUs.

ATensor processing unit (or TPUs) is a custom application-specific integrated circuits (ASIC) developed specifically for machine learning.

Today, Google released a comparison of the TPU's performance and efficiencies compared with Haswell CPUs and Nvidia Tesla K80 GPUs, in a datacenter environment.

Google's paper evaluates a ​Tensor Pro​cessing Unit (TPU)? deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory.

Google says that the TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of its NN applications than are the time-varying optimizations of CPUs and GPUs
(caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. Google compares the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters.

Google's workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of its datacenters' NN inference demand.

Google claims that despite low utilization for some applications, the TPU is on average about 15X -30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's
GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.

The TPU chip size can fit in a hard drive slot within a data center rack according to Google Distinguished Hardware Engineer Norm Jouppi, who joined Google three years ago and has been one of the chief architects of the MIPS processor- possibly the best processor achitecture ever.

Despite its multitude of matrix multiplication units, the TPU does not have any stored program; it simply executes instructions sent from the host. The DRAM on the TPU is operated as one unit in parallel because of the need to fetch so many weights to feed to the matrix multiplication unit.

More details about the test

In Google's tests, a Haswell Xeon E5-2699 v3 processor with 18 cores running at 2.3 GHz using 64-bit floating point math units was able to handle 1.3 Tera Operations Per Second (TOPS) and delivered 51 GB/sec of memory bandwidth; the Haswell chip consumed 145 watts and its system (which had 256 GB of memory) consumed 455 watts when it was busy.

The TPU, by comparison, used 8-bit integer math and access to 256 GB of host memory plus 32 GB of its own memory was able to deliver 34 GB/sec of memory bandwidth on the card and process 92 TOPS - a factor of 71X more throughput on inferences, and in a 384 watt thermal envelope for the server that hosted the TPU.

Google also examined the throughput in inferences per second that the CPUs, GPUs, and TPUs could handle with different inference batch sizes and with a 7 milliseconds.

With a small batch size of 16 and a response time near 7 milliseconds for the 99th percentile of transactions, the Haswell CPU delivered 5,482 inferences per second (IPS), which was 42 percent of its maximum throughout (13,194 IPS at a batch size of 64) when response times were allowed to go as long as they wanted and expanded to 21.3 millisecond of the 99th percentile of transactions. The TPU, by contrast, could do batch sizes of 200 and still meet the 7 millisecond ceiling and delivered 225,000 IPS running the inference benchmark, which was 80 percent of its peak performance with a batch size of 250 and the 99th percentile response coming in at 10 milliseconds.

Tags: Tensor processing unit
Previous Post
Google Invests in INDIGO Undersea Cable to Link Southeast Asia
Next Post
YouTube TV Goes Live For $35 Per Month

Related Posts

  • ASUS Thinker Edge T Single-board Computer is Equipped With a Google TPU

  • Google Makes Its Scalable Supercomputers for Machine Learning Publically Available

  • New NVIDIA Tesla T4 GPU and New TensorRT Software Enable Intelligent Voice, Video, Image and Recommendation Services

  • Google's Third Generation of Tensor Processing Unit Offers 100 Petaflops in Performance

  • NVIDIA CEO Declares Death of Moore's Law, Outlines Company's GPU-centered Ecosystem

  • Google I/O: Google Digital Assistant Coming to iPhone, Android O, Android Go, New TPU and VR

Latest News

Razer Unveils Raiju V3 Pro
Gaming

Razer Unveils Raiju V3 Pro

Samsung announces Galaxy XR headset
Consumer Electronics

Samsung announces Galaxy XR headset

Leica M EV1 – the first M-Camera with an integrated electronic viewfinder
Cameras

Leica M EV1 – the first M-Camera with an integrated electronic viewfinder

Micron Delivers Industry’s Highest Capacity SOCAMM2 for Low-Power DRAM in the AI Data Center
Enterprise & IT

Micron Delivers Industry’s Highest Capacity SOCAMM2 for Low-Power DRAM in the AI Data Center

KIOXIA launches EXCERIA PLUS G3 and EXCERIA G3 microSD cards for exceptional photography and video performance
Cameras

KIOXIA launches EXCERIA PLUS G3 and EXCERIA G3 microSD cards for exceptional photography and video performance

Popular Reviews

be quiet! Dark Mount Keyboard

be quiet! Dark Mount Keyboard

Terramaster F8-SSD

Terramaster F8-SSD

be quiet! Light Mount Keyboard

be quiet! Light Mount Keyboard

be quiet! Pure Base 501

be quiet! Pure Base 501

Soundpeats Pop Clip

Soundpeats Pop Clip

Akaso 360 Action camera

Akaso 360 Action camera

Dragon Touch Digital Calendar

Dragon Touch Digital Calendar

Noctua NF-A12x25 G2 fans

Noctua NF-A12x25 G2 fans

Main menu

  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map
  • About
  • Privacy
  • Contact Us
  • Promotional Opportunities @ CdrInfo.com
  • Advertise on out site
  • Submit your News to our site
  • RSS Feed