Intel Focuses On Energy Efficiency In Semiconductors At VLSI 2014
At the VLSI Symposia this week in Hawaii, Intel is presenting five papers detailing advancements in circuits and semiconductor technology with the common theme of energy-efficient performance.
Across the spectrum of computing devices, from supercomputers and datacenters to tablets and wearables, there is a constant pressure to increase capabilities while consuming less power. Towards this direction, it has become common to use SoCs in order to save both space and power by integrating a variety of processors (e.g. CPUs, GPUs, radios, accelerators?) onto the same silicon chip.
Although the system may power the whole chip at one voltage, in reality each of these elements will require different amounts of power at different times depending on what they are doing. Hence it is becoming common to add voltage regulators (VRs) across the chip so that each region is powered only with the voltage level it really needs.
At VSLI 2014, an Intel Labs paper entitled: "A 500 MHz, 68% efficient, Fully On-Die Digitally Controlled Buck Voltage Regulator on 22nm Tri-Gate CMOS," describes the ideal VR, which should be fast and fully digital.
This voltage regulator, implemented on a research test chip, is fully digital, switches at 500MHz and has a bandwidth 20x larger than the best reported comparable design. This enables fast and efficient power management across a wide range of voltages and currents. And because the controller is digital it will benefit from Moore?s Law, scaling readily to smaller sizes on future process technologies.
In addition to power management, security is a consideration that spans the entire spectrum of computing devices. Many of Intel's Core and Xeon processors now incorporate Intel Advanced Encryption Standard New Instructions (AES-NI). AES-NI was designed to implement some of the complex and performance-intensive steps of the AES algorithm using hardware acceleration.
For smaller, battery-constrained devices (including phones, wearables, sensors, etc.), Intel Labs researchers wanted to find ways to accelerate the AES algorithm using as little space and power as possible. That?s the subject of the second paper "340mV-1.1V, 289Gbps/W, 2090-Gate Nano-AES Hardware Accelerator with Area-optimized Encrypt/Decrypt GF(2^4)^2 Polynomials in 22nm Tri-Gate CMOS." Researchers created circuits on a test chip that consume less than 1/10 the energy of previously reported work. It?s also extremely small (using just over 2000 transistor gates) which helps to keep these devices compact.
Intel also continues to make advancements in transistors, the fundamental semiconductor building blocks of computing.
In another paper, "Process Technology Scaling in an Increasingly Interconnect Dominated World," Intel describes how delay and power restrictions imposed by the resistance and capacitance of an interconnect system -- a system that connects billions of transistors on a chip -- contribute to poor circuit performance in an increasingly severe manner as dimensions shrink.
Interconnect resistances are increasing faster than the rate at which feature sizes scale, and capacitance improvements are constrained by the required mechanical strength of the assembled stack. Collectively, these cause a bottleneck in both local and global information transfer on a chip.
Novel processing techniques and materials are being explored as means to increase conductive cross sectional area. Molecular ordering of the interlayer dielectric materials is an opportunity to simultaneously deliver lower capacitance with the required mechanical strength. In addition to these improvement paths, novel micro-architectures and applications that are more tolerant of resistive and capacitive scaling constraints must be explored in tandem.
At last year's VSLI conference, Intel presented technical details of its eDRAM technology (eDRAM is DRAM built on a high-performance logic process). This technology enables the CPU and graphics processor to access large amounts of memory with very high bandwidth and low power. The eDRAM is manufactured on an SoC variant of Intel?s 22nm logic technology, featuring tri-gate transistors which deliver a combination of performance improvement and power reduction. Intel is using this technology in select Core processors to increase memory bandwidth at low latency and power.
This year, in a paper entitled "2nd Generation Embedded DRAM with 4X Lower Self Refresh Power in 22nm Tri-Gate CMOS Technology," Intel is describing a much improved 1Gbit, 2GHz embedded DRAM, developed in 22nm tri-gate CMOS technology, with 4X lower self-refresh power compared to the prior generation. Intel says retention time has been improved by 3X by process and design optimizations. Source-synchronous clocking is integrated in the design to reduce clock power without penalizing bandwidth. Charge pump power is reduced by 4X by employing comparator based regulation. Temperature controlled refresh enables minimum refresh power at all temperature conditions.
Serializer/Deserializer (SerDes) circuits are critical for getting data on and off logic chips. In the paper, "A 2GHz-to-7.5GHz Quadrature Clock Generator Using Digital Delay Locked Loops for Multi-Standard I/Os in 14nm CMOS," Intel demonstrates the first building blocks for high-speed SerDes in Intel?s 14nm tri-gate technology. The 2.0-7.5GHz quadrature generator is designed to support next-generation SerDes IPs such as CEI-28G and 100Gb Ethernet. The voltage-regulated digital locked loop (DLL) architecture is designed to leverage the area and power benefits of Intel?s 14nm technology. The DLL is at least 4X smaller and 40% more energy efficient than the best previously published DLLs. Three levels of delay tuning, including a 1-bit delta-sigma modulator for fine delay control, provide nearly 4X tuning range along with extremely low jitter and quadrature error, according to the paper. The measured jitter at 7GHz is 176fs-rms, nearly 2X lower jitter than previously reported quadrature generators.
Intel also participated in two rump sessions. In one, called "Who gives up on scaling first: device and process technology engineers, circuit designers, or company executives? Which scaling ends first - memory or logic?" Intel Senior Fellow Mark Bohr argued that process engineers, circuit designers and company executives all share the responsibility for continuing scaling. Each has its part to play to ensure that scaling delivers expected improvements in performance, power and cost per transistor. However, Intel's executives said they would tend to pull R&D funding when either process or design engineers can no longer demonstrate a viable scaling path leading to manufacturable and profitable products.
Finally, in a second rump session, entitled "Lessons and Challenges for Future Mixed-Signal, RF, and Memory Circuits," Intel Fellow and Vice President of the Technology and Manufacturing Group, Kevin Zhang, argued that the demand for larger and ever-faster static RAMs will continue to increase, and that innovation in technology and design will drive SRAM scaling well into the future.
Although the system may power the whole chip at one voltage, in reality each of these elements will require different amounts of power at different times depending on what they are doing. Hence it is becoming common to add voltage regulators (VRs) across the chip so that each region is powered only with the voltage level it really needs.
At VSLI 2014, an Intel Labs paper entitled: "A 500 MHz, 68% efficient, Fully On-Die Digitally Controlled Buck Voltage Regulator on 22nm Tri-Gate CMOS," describes the ideal VR, which should be fast and fully digital.
This voltage regulator, implemented on a research test chip, is fully digital, switches at 500MHz and has a bandwidth 20x larger than the best reported comparable design. This enables fast and efficient power management across a wide range of voltages and currents. And because the controller is digital it will benefit from Moore?s Law, scaling readily to smaller sizes on future process technologies.
In addition to power management, security is a consideration that spans the entire spectrum of computing devices. Many of Intel's Core and Xeon processors now incorporate Intel Advanced Encryption Standard New Instructions (AES-NI). AES-NI was designed to implement some of the complex and performance-intensive steps of the AES algorithm using hardware acceleration.
For smaller, battery-constrained devices (including phones, wearables, sensors, etc.), Intel Labs researchers wanted to find ways to accelerate the AES algorithm using as little space and power as possible. That?s the subject of the second paper "340mV-1.1V, 289Gbps/W, 2090-Gate Nano-AES Hardware Accelerator with Area-optimized Encrypt/Decrypt GF(2^4)^2 Polynomials in 22nm Tri-Gate CMOS." Researchers created circuits on a test chip that consume less than 1/10 the energy of previously reported work. It?s also extremely small (using just over 2000 transistor gates) which helps to keep these devices compact.
Intel also continues to make advancements in transistors, the fundamental semiconductor building blocks of computing.
In another paper, "Process Technology Scaling in an Increasingly Interconnect Dominated World," Intel describes how delay and power restrictions imposed by the resistance and capacitance of an interconnect system -- a system that connects billions of transistors on a chip -- contribute to poor circuit performance in an increasingly severe manner as dimensions shrink.
Interconnect resistances are increasing faster than the rate at which feature sizes scale, and capacitance improvements are constrained by the required mechanical strength of the assembled stack. Collectively, these cause a bottleneck in both local and global information transfer on a chip.
Novel processing techniques and materials are being explored as means to increase conductive cross sectional area. Molecular ordering of the interlayer dielectric materials is an opportunity to simultaneously deliver lower capacitance with the required mechanical strength. In addition to these improvement paths, novel micro-architectures and applications that are more tolerant of resistive and capacitive scaling constraints must be explored in tandem.
At last year's VSLI conference, Intel presented technical details of its eDRAM technology (eDRAM is DRAM built on a high-performance logic process). This technology enables the CPU and graphics processor to access large amounts of memory with very high bandwidth and low power. The eDRAM is manufactured on an SoC variant of Intel?s 22nm logic technology, featuring tri-gate transistors which deliver a combination of performance improvement and power reduction. Intel is using this technology in select Core processors to increase memory bandwidth at low latency and power.
This year, in a paper entitled "2nd Generation Embedded DRAM with 4X Lower Self Refresh Power in 22nm Tri-Gate CMOS Technology," Intel is describing a much improved 1Gbit, 2GHz embedded DRAM, developed in 22nm tri-gate CMOS technology, with 4X lower self-refresh power compared to the prior generation. Intel says retention time has been improved by 3X by process and design optimizations. Source-synchronous clocking is integrated in the design to reduce clock power without penalizing bandwidth. Charge pump power is reduced by 4X by employing comparator based regulation. Temperature controlled refresh enables minimum refresh power at all temperature conditions.
Serializer/Deserializer (SerDes) circuits are critical for getting data on and off logic chips. In the paper, "A 2GHz-to-7.5GHz Quadrature Clock Generator Using Digital Delay Locked Loops for Multi-Standard I/Os in 14nm CMOS," Intel demonstrates the first building blocks for high-speed SerDes in Intel?s 14nm tri-gate technology. The 2.0-7.5GHz quadrature generator is designed to support next-generation SerDes IPs such as CEI-28G and 100Gb Ethernet. The voltage-regulated digital locked loop (DLL) architecture is designed to leverage the area and power benefits of Intel?s 14nm technology. The DLL is at least 4X smaller and 40% more energy efficient than the best previously published DLLs. Three levels of delay tuning, including a 1-bit delta-sigma modulator for fine delay control, provide nearly 4X tuning range along with extremely low jitter and quadrature error, according to the paper. The measured jitter at 7GHz is 176fs-rms, nearly 2X lower jitter than previously reported quadrature generators.
Intel also participated in two rump sessions. In one, called "Who gives up on scaling first: device and process technology engineers, circuit designers, or company executives? Which scaling ends first - memory or logic?" Intel Senior Fellow Mark Bohr argued that process engineers, circuit designers and company executives all share the responsibility for continuing scaling. Each has its part to play to ensure that scaling delivers expected improvements in performance, power and cost per transistor. However, Intel's executives said they would tend to pull R&D funding when either process or design engineers can no longer demonstrate a viable scaling path leading to manufacturable and profitable products.
Finally, in a second rump session, entitled "Lessons and Challenges for Future Mixed-Signal, RF, and Memory Circuits," Intel Fellow and Vice President of the Technology and Manufacturing Group, Kevin Zhang, argued that the demand for larger and ever-faster static RAMs will continue to increase, and that innovation in technology and design will drive SRAM scaling well into the future.