Tensor processing unit

A tensor processing unit (TPU) is an  (ASIC) developed by  specifically for.

Overview
The tensor processing unit was announced in May 2016 at, when the company said that the TPU had already been used inside their s for over a year. The chip has been specifically designed for Google's framework, a symbolic math library which is used for  applications such as. However, Google still uses and  for other types of. Other designs are appearing from other vendors also and are aimed at  and  markets.

Google's TPUs are proprietary. Some models are commercially available, and on February 12, 2018, The New York Times reported that Google "would allow other companies to buy access to those chips through its cloud-computing service." Google has stated that they were used in the series of man-machine  games, as well as in the  system which produced,  and Go playing programs from the game rules alone and went on to beat the leading programs in those games. Google has also used TPUs for text processing, and was able to find all the text in the Street View database in less than five days. In, an individual TPU can process over 100 million photos a day. It is also used in which Google uses to provide search results.

Compared to a, it is designed for a high volume of low precision computation (e.g. as little as precision) with more input/output operations per , and lacks hardware for rasterisation/. The TPU s are mounted in a heatsink assembly, which can fit in a hard drive slot within a data center, according to Google Distinguished Hardware Engineer Norman Jouppi.

First generation TPU
The first-generation TPU is an  engine, driven with s by the host processor across a  bus. It is manufactured on a process with a die size ≤ 331 2. The is 700  and it has a  of 28–40. It has 28 of on chip memory, and 4  of  s taking the results of a 256×256  of 8-bit s. Within the TPU package is 8  of  2133 MHz  offering 34 GB/s of bandwidth. Instructions transfer data to or from the host, perform matrix multiplications or s, and apply s.

Second generation TPU
The second-generation TPU was announced in May 2017. Google stated the first-generation TPU design was limited by and using 16  of  in the second-generation design increased bandwidth to 600 GB/s and performance to 45 tera. The TPUs are then arranged into four-chip modules with a performance of 180 teraFLOPS. Then 64 of these modules are assembled into 256-chip pods with 11.5 petaFLOPS of performance. Notably, while the first-generation TPUs were limited to integers, the second-generation TPUs can also calculate in. This makes the second-generation TPUs useful for both training and inference of machine learning models. Google has stated these second-generation TPUs will be available on the for use in TensorFlow applications.

Third generation TPU
The third-generation TPU was announced on May 8, 2018. Google announced that processors themselves are twice as powerful as the second-generation TPUs, and would be deployed in pods with four times as many chips as the preceding generation. This results in an 8-fold increase in performance per pod (with up to 1,024 chips per pod) compared to the second-generation TPU deployment.

Edge TPU
In July 2018, Google announced the Edge TPU. The Edge TPU is Google’s purpose-built chip designed to run machine learning (ML) models for, meaning it is much smaller and consumes far less power compared to the TPUs hosted in Google datacenters (also known as Cloud TPUs). In January 2019, Google made the Edge TPU available to developers with a line of products under the Coral brand.

The product offerings include a (SBC), a  (SoM), a  accessory, a mini  card, and an  card. The Coral Dev Board and Coral SoM both run Mendel Linux OS – a derivative of. The USB, PCI-e, and M.2 products function as add-ons to existing computer systems, and support Debian-based Linux systems on x86-64 and ARM64 hosts (including ).

The machine learning runtime used to execute models on the Edge TPU is based on. The Edge TPU is only capable of accelerating forward-pass operations, which means it's primarily useful for performing inferences (although it is possible to perform lightweight transfer learning on the Edge TPU). The Edge TPU also only supports 8-bit math, meaning that for a network to be compatible with the Edge TPU, it needs to be trained using TensorFlow quantization-aware training technique.