Toshiba Memory Develops Faster, Energy-efficient Algorithm and Hardware Architecture for Deep Learning Processing
Toshiba Memory Corp. has developed of a high-speed and high-energy-efficiency algorithm and hardware architecture for deep learning processing with less degradation of recognition accuracy.
The new processor for deep learning implemented on an FPGA (Field Programmable Gate Array) achieves 4 times energy efficiency compared to conventional ones, according to TMC. The announcement was made at IEEE Asian Solid-State Circuits Conference 2018 (A-SSCC 2018) in Taiwan on November 6.
Deep learning calculations generally require large amounts of multiply-accumulate (MAC) operations, which result in long calculation times and high energy consumption. Although techniques that reduce the number of bits to represent parameters (bit precision) have been proposed in order to reduce the total calculations -- one proposed algorithm reduces the bit precision down to one or two bit -- those techniques degraded the recognition accuracy. Toshiba claims that its newly developed algorithm reduces MAC operations by optimizing the bit precision of MAC operations for individual filters in each layer of a neural network. Generally, there are many filters of up to several thousands in one layer of a neural network. By using the new algorithm, Toshiba says that the MAC operations can be reduced with less degradation of recognition accuracy.
Furthermore, Toshiba developed a new hardware architecture, called bit-parallel method, which is suitable for MAC operations with different bit precision. This method divides each various bit precision into a bit one by one and can execute 1-bit operation in numerous MAC units in parallel. It significantly improves utilization efficiency of the MAC units in the processor compared to conventional MAC architectures that execute in series.
Toshiba Memory implemented ResNet50, a deep neural network generally used to benchmark deep-learning for image recognition, on an FPGA using the various bit precision and bit-parallel MAC architecture. In the case of image recognition for the image dataset of ImageNet (more than 14,000,000 images), both operation time and energy consumption were reduced by 25 % and with less recognition accuracy degradation, compared to conventional methods.
Artificial intelligence (AI) is forecast to be implemented in various devices. The developed high-speed and low-energy-consumption techniques for deep-learning processors are expected to be utilized for various edge devices like smartphones and Head Mounted Displays and datacenters. High-performance processors like GPU are important devices for high-speed operation of AI. Memory and storage are also one of the most important devices for AI which inevitably use big data.