Google's AlphaZero Masters Chess Within Hours

Google's AlphaZero, a computer algorithm running on Google's AI-specific Tensorflow processing units, has managed to learn, master and then dominate the game of chess in just four hours.

Google's DeepMind AI offspring, AlphaZero, took four hours to teach itself how to play chess and then proceeded to demolish the best, highest rated chess computer, Stockfish. After 100 games, AlphaZero racked up 28 wins and zero losses.

Chess can be incredibly complex, with possible position totals that exceed 10100 possibilities.

Google's algorithm used self-play reinforcement learning, starting at a chess rating of ten and took 700,000 iterative training steps over four hours before taking on Stockfish. During its training phase, the algorithm had no access to opening books or endgame tables. It simply played a large number of iterative games against itself.

This training session ran on 5,000 first-generation Google Tensorflow Processing Units (TPUs) to generate the self-played games. It also used 64 second-generation TPUs to train the neural networks for those games. During the match, AlphaZero ran on a single machine with four TPUs. Stockfish ran on a single machine with 64 threads and a hash size of 1GB.

During a multiple-game tournament, both algorithms were given one minute per move. During play, Stockfish searched 70 million positions per second, while AlphaZero searched only 80,000 in the same time period.

The full result of the 100 game match gave AlphaZero 28 wins and zero losses, but 72 draws. Of those 28 wins, 25 came as white and only three as black.

We should notice here that AlphaZero made up its own opening book, meaning Stockfish could not make use of its considerable opening preparation.

Google estimates that each TPU is capable of delivering up to 225,000 predictions per second. A regular old CPU can muster just over 5,000.