Intel, AMD, Sun and IBM Talk About 16- and 48-Core Chips
Intel, Sun, IBM and AMD, will unveil the details of a present and future processor designs at this year's International Solid State Circuits Conference (ISSCC 10).
The International Solid-State Circuits Conference is the foremost forum for presentation of advances in solid-state circuits and systems-on-a-chip. The Conference offers an opportunity for engineers working at the cutting edge of IC design and use to maintain technical currency, and to network with leading experts.
The upcoming conference will be held February 7-11 2010 in San Francisco, CA.
There, Intel will talk about its Westmere 32nm processors. Westmere is a family of next-generation IA processors for mobile, desktop and server segments on a second-generation high- metal-gate 32nm process offering increased core count, cache size, and frequency within same power envelope as the previous generation with further improvements in power efficiency, rich set of new features, and support for low-voltage DDR3.
Intel has already launched the first round of its Westmere products. These are the Clarksdale CPUs for desktops and the mobile-oriented Arrandale chips. The company will launch its higher-end, six-core Gulftown processor later this year.
However, the big public reveal comes in a Monday session, where the company will talk about "a 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS." The paper wil describe a 567mm2 processor on 45nm CMOS, which integrates 48 IA-32 cores and 4 DDR3 channels in a 6?4 2D-mesh network. Cores communicate through message passing using 384KB of on-die shared memory. Finegrain power management takes advantage of 8 voltage and 28 frequency islands to allow independent DVFS of cores and mesh. As performance scales, the processor dissipates between 25W and 125W, according to Intel.
AMD wil describe describe an upcoming 32nm mobile processor. According to the paper "An x86-64 Core Implemented in 32nm SOI CMOS," this procesor could be either the Bobcat or the company's first "Fusion" processor. According to AMD, the upcoming 32nm implementation of an AMD x86-64 core is occupying 9.69mm2 and containing more than 35 million transistors (excluding L2 cache), while it operates at frequencies >3GHz. The core incorporates numerous design and power improvements to enable an operating range of 2.5 to 25W and a zero-power gated state that make the core well-suited to a broad range of mobile and desktop products.
Sun will talk about a 40nm 16-Core 128-Thread CMT SPARC SoC processor. The processor (code-named Niagara) enables up to 512 threads in a 4-way glueless system to maximize throughput. The 6MB L2 cache of 461GB/s and the 308-pin SerDes I/O of 2.4Tb/s support the required bandwidth. Six clock and four voltage domains, as well as power management and circuit techniques, optimize performance, power, variability and yield trade-offs across the 377mm2 die, according to Sun.
IBM will detail the implementation of POWER7, a highly parallel and scalable multi-core server processor. POWER7 is IBM's next generation processor of the POWER family. The 8-core chip, supporting 32 threads, is implemented in 45nm 11M CMOS SOI. The 32kB L1 caches feature 1 read port banked write for the I-cache and 2 read ports banked write for the D-cache. The on-chip cache hierarchy consists of a 256kB fast, private SRAM L2 and a 32MB shared L3, implemented in embedded DRAM.
IBM will also talk about a 2.3GHz 45nm SOI with 16 Cores and 64 Threads processor. According to IBM, the 64-thread simultaneous multi-threaded processor uses architecture and implementation techniques to achieve high throughput at low power. Included are static VDD scaling, multi-voltage design, clock gating, multiple VT devices, dynamic thermal control, eDRAM and low-voltage circuit design. Power is reduced by >50% in a 428mm2 chip. Worst-case power is 65W at 2.0GHz, 0.85V, according to the company.
Next generation 1Gbit non-volatile memory
The event will also cover the latest develoopments in the Phase-change memory (PCM), change memory (PCM), resistive RAM (ReRAM) and Magnetic RAM (MRAM) technologies, which are expected to be proposed as the future non-volatile memory.
Numonyx will detail a 45nm 1Gb 1.8V single-level cell (SLC) phase-change memory (PCM). The PCM has been designed to with 85ns random-access time and 9MB/s program throughput, featuring read-while-write and over-write program commands. The company will present sensing techniques to enable wide-temperature-range operation and reject wordline noise. The 37.5mm2 die size of the PCM uses a doublegate-oxide and triple-Cu-metal process.
Unity Semiconductor will describe an implementation of resistive RAM (ReRAM). The 64Mb NAND-compatible non-volatile memory testchip is based on a conductive metal-oxide technology and it has been developed in 0.13μm technology. The memory cell, which does not require a selection device, occupies 0.17μm2 and is built at the intersection of two metal lines above the CMOS circuitry. The chip uses 4 layers of cross-point arrays. Decoding and sensing techniques will also be described.
Toshiba will talk about flash memory chips. The Japanese company will announce the development of a 64Mb spin-transfer-torque MRAM in 65nm CMOS. A 47mm2 die uses a 0.3584μm2 cell with a perpendicular-TMR device. To achieve read-disturb immunity for the reference cell, a clamped-reference scheme is adopted. An adequate-reference scheme is implemented to suppress read-margin degradation due to the resistance variation of reference cells, Toshiba says.
Toshiba will also describe "a scalable shield-bitline-overdrive technique for 1.3V chain FeRAM." A ferroelectric capacitor overdrive with shield-bitline drive for 1.3V chain FeRAM has been verified using a 0.13μm 576Kb test chip with 0.719μm2 cell, TOshiba says. This technique applies 0.24V bias to ferroelectric capacitors without increasing stress and bitline capacitance. The measured tail-to-tail cell signal is improved by 100mV and doubled in 1.3V array operation.
Toshiba has also developed a technology that achieves low voltage operation of System LSI, opening the way to reduced power consumption in digital products. The technology secures successful operation of static random access memories (SRAM) at low voltage with an improved circuit design that optimizes voltage control of the bit line and word line. The new technology overcomes the high failure rate that has been the main challenge in achieving practical, low voltage SRAM, and reduces memory cell failure rate by four orders of magnitude at 0.7V. Moreover, the circuit design can be applied to the memory compiler, software that automatically configures SRAM, contributing to shorter design lead times and bringing an effective solution to the LSI development process.
System LSI is the core components of digital products, and their operation voltage has a major impact on power consumption. However, voltage scaling of cutting-edge system LSI has been a big challenge, because embedded SRAM lose stability in the memory function at low voltage. The memory cell transistors of SRAM are smaller than those of other logic circuits, which makes SRAM operation susceptible to transistor variability at low voltage.
Toshiba claims that it has overcome this problem with a new method that employs read-assist and write-assist techniques and secures the function of a low voltage SRAM.
Read-assist and write-assist techniques are recognized as a means to stabilize SRAM functionality by optimizing bit-line and word-line level during read and write operation. However, the conventional write-assist technique requires adjustment in circuit parameters of the SRAM's negative voltage generator to match the SRAM capacity. This has been an obstacle to design efficiency and has hindered practical application. Toshiba's solution employs a newly developed negative voltage generator with bit-line-capacitance replica, which adaptively optimizes the negative level to the SRAM capacity. This approach eliminates the burden of adjusting circuit parameters in accordance with the SRAM configuration by automating the SRAM design process.
Toshiba has confirmed this significant advance by measurement: a test chip fabricated with 32-nanometer high-k/metal gate process technology cut voltage for stable operation from the 1V typical for conventional SRAM to 0.7V. Equally as significant, the failure rate decreased by four orders of magnitude, i.e. a 10,000 times improvement.
Other approaches proposed for securing SRAM stability employ a memory cell that increases the transistor count from the typical number of six. However, Toshiba's approach is more efficient, as it requires no increase in transistors, avoiding the penalty of an increase in the SRAM cell area.
Cutting-edge system LSI is designed to use a supply voltage around 1V. However, further reductions in set power consumption require much lower voltages. Toshiba plans to continue to promote early development of practical, low voltage circuit techniques and advanced CMOS technology toward the goal of achieving high performance system LSI that consumes less power.
Samsung Electronics will talk about a 1.2V 1Gb mobile SDRAM, having 4 channels with 128 DQ pins per each. It exhibits 330.6mW read operating power during 4 channel operation, achieving 12.8GB/s data bandwidth. Samsung has also developed test correlation techniques to verify functions through microbumps. Block-based dual data retention scheme is applied to reduce self-refresh current.
Micron Tecnology and Intel will demonstrate a 3.3V 32Gb NAND-Flash memory with 3b/cell. The 34nm device features a programming throughput of 6MB/s on blocks configured as 3b/cell mode and can dynamically switch up to 13MB/s in 2b/cell mode. A new quad-plane architecture and an optimized programming algorithm are adopted to achieve the design targets, Intel says.
Another interesting sessions is Toshiba's paper about the development of a 40nm 14-core mobile application processor with a 222mW Full-HD H.264 video decoder and a video/audio multiprocessor. The processor has 25 power domains. The power switch circuits realize less than 1μs power-up switching while minimizing rush current. The x512b power-efficient stacked DRAM I/F achieves 10.6GB/s of bandwidth, Toshiba claims.
For more information on the ISSCC event, visit http://www.isscc.org
The upcoming conference will be held February 7-11 2010 in San Francisco, CA.
There, Intel will talk about its Westmere 32nm processors. Westmere is a family of next-generation IA processors for mobile, desktop and server segments on a second-generation high- metal-gate 32nm process offering increased core count, cache size, and frequency within same power envelope as the previous generation with further improvements in power efficiency, rich set of new features, and support for low-voltage DDR3.
Intel has already launched the first round of its Westmere products. These are the Clarksdale CPUs for desktops and the mobile-oriented Arrandale chips. The company will launch its higher-end, six-core Gulftown processor later this year.
However, the big public reveal comes in a Monday session, where the company will talk about "a 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS." The paper wil describe a 567mm2 processor on 45nm CMOS, which integrates 48 IA-32 cores and 4 DDR3 channels in a 6?4 2D-mesh network. Cores communicate through message passing using 384KB of on-die shared memory. Finegrain power management takes advantage of 8 voltage and 28 frequency islands to allow independent DVFS of cores and mesh. As performance scales, the processor dissipates between 25W and 125W, according to Intel.
AMD wil describe describe an upcoming 32nm mobile processor. According to the paper "An x86-64 Core Implemented in 32nm SOI CMOS," this procesor could be either the Bobcat or the company's first "Fusion" processor. According to AMD, the upcoming 32nm implementation of an AMD x86-64 core is occupying 9.69mm2 and containing more than 35 million transistors (excluding L2 cache), while it operates at frequencies >3GHz. The core incorporates numerous design and power improvements to enable an operating range of 2.5 to 25W and a zero-power gated state that make the core well-suited to a broad range of mobile and desktop products.
Sun will talk about a 40nm 16-Core 128-Thread CMT SPARC SoC processor. The processor (code-named Niagara) enables up to 512 threads in a 4-way glueless system to maximize throughput. The 6MB L2 cache of 461GB/s and the 308-pin SerDes I/O of 2.4Tb/s support the required bandwidth. Six clock and four voltage domains, as well as power management and circuit techniques, optimize performance, power, variability and yield trade-offs across the 377mm2 die, according to Sun.
IBM will detail the implementation of POWER7, a highly parallel and scalable multi-core server processor. POWER7 is IBM's next generation processor of the POWER family. The 8-core chip, supporting 32 threads, is implemented in 45nm 11M CMOS SOI. The 32kB L1 caches feature 1 read port banked write for the I-cache and 2 read ports banked write for the D-cache. The on-chip cache hierarchy consists of a 256kB fast, private SRAM L2 and a 32MB shared L3, implemented in embedded DRAM.
IBM will also talk about a 2.3GHz 45nm SOI with 16 Cores and 64 Threads processor. According to IBM, the 64-thread simultaneous multi-threaded processor uses architecture and implementation techniques to achieve high throughput at low power. Included are static VDD scaling, multi-voltage design, clock gating, multiple VT devices, dynamic thermal control, eDRAM and low-voltage circuit design. Power is reduced by >50% in a 428mm2 chip. Worst-case power is 65W at 2.0GHz, 0.85V, according to the company.
Next generation 1Gbit non-volatile memory
The event will also cover the latest develoopments in the Phase-change memory (PCM), change memory (PCM), resistive RAM (ReRAM) and Magnetic RAM (MRAM) technologies, which are expected to be proposed as the future non-volatile memory.
Numonyx will detail a 45nm 1Gb 1.8V single-level cell (SLC) phase-change memory (PCM). The PCM has been designed to with 85ns random-access time and 9MB/s program throughput, featuring read-while-write and over-write program commands. The company will present sensing techniques to enable wide-temperature-range operation and reject wordline noise. The 37.5mm2 die size of the PCM uses a doublegate-oxide and triple-Cu-metal process.
Unity Semiconductor will describe an implementation of resistive RAM (ReRAM). The 64Mb NAND-compatible non-volatile memory testchip is based on a conductive metal-oxide technology and it has been developed in 0.13μm technology. The memory cell, which does not require a selection device, occupies 0.17μm2 and is built at the intersection of two metal lines above the CMOS circuitry. The chip uses 4 layers of cross-point arrays. Decoding and sensing techniques will also be described.
Toshiba will talk about flash memory chips. The Japanese company will announce the development of a 64Mb spin-transfer-torque MRAM in 65nm CMOS. A 47mm2 die uses a 0.3584μm2 cell with a perpendicular-TMR device. To achieve read-disturb immunity for the reference cell, a clamped-reference scheme is adopted. An adequate-reference scheme is implemented to suppress read-margin degradation due to the resistance variation of reference cells, Toshiba says.
Toshiba will also describe "a scalable shield-bitline-overdrive technique for 1.3V chain FeRAM." A ferroelectric capacitor overdrive with shield-bitline drive for 1.3V chain FeRAM has been verified using a 0.13μm 576Kb test chip with 0.719μm2 cell, TOshiba says. This technique applies 0.24V bias to ferroelectric capacitors without increasing stress and bitline capacitance. The measured tail-to-tail cell signal is improved by 100mV and doubled in 1.3V array operation.
Toshiba has also developed a technology that achieves low voltage operation of System LSI, opening the way to reduced power consumption in digital products. The technology secures successful operation of static random access memories (SRAM) at low voltage with an improved circuit design that optimizes voltage control of the bit line and word line. The new technology overcomes the high failure rate that has been the main challenge in achieving practical, low voltage SRAM, and reduces memory cell failure rate by four orders of magnitude at 0.7V. Moreover, the circuit design can be applied to the memory compiler, software that automatically configures SRAM, contributing to shorter design lead times and bringing an effective solution to the LSI development process.
System LSI is the core components of digital products, and their operation voltage has a major impact on power consumption. However, voltage scaling of cutting-edge system LSI has been a big challenge, because embedded SRAM lose stability in the memory function at low voltage. The memory cell transistors of SRAM are smaller than those of other logic circuits, which makes SRAM operation susceptible to transistor variability at low voltage.
Toshiba claims that it has overcome this problem with a new method that employs read-assist and write-assist techniques and secures the function of a low voltage SRAM.
Read-assist and write-assist techniques are recognized as a means to stabilize SRAM functionality by optimizing bit-line and word-line level during read and write operation. However, the conventional write-assist technique requires adjustment in circuit parameters of the SRAM's negative voltage generator to match the SRAM capacity. This has been an obstacle to design efficiency and has hindered practical application. Toshiba's solution employs a newly developed negative voltage generator with bit-line-capacitance replica, which adaptively optimizes the negative level to the SRAM capacity. This approach eliminates the burden of adjusting circuit parameters in accordance with the SRAM configuration by automating the SRAM design process.
Toshiba has confirmed this significant advance by measurement: a test chip fabricated with 32-nanometer high-k/metal gate process technology cut voltage for stable operation from the 1V typical for conventional SRAM to 0.7V. Equally as significant, the failure rate decreased by four orders of magnitude, i.e. a 10,000 times improvement.
Other approaches proposed for securing SRAM stability employ a memory cell that increases the transistor count from the typical number of six. However, Toshiba's approach is more efficient, as it requires no increase in transistors, avoiding the penalty of an increase in the SRAM cell area.
Cutting-edge system LSI is designed to use a supply voltage around 1V. However, further reductions in set power consumption require much lower voltages. Toshiba plans to continue to promote early development of practical, low voltage circuit techniques and advanced CMOS technology toward the goal of achieving high performance system LSI that consumes less power.
Samsung Electronics will talk about a 1.2V 1Gb mobile SDRAM, having 4 channels with 128 DQ pins per each. It exhibits 330.6mW read operating power during 4 channel operation, achieving 12.8GB/s data bandwidth. Samsung has also developed test correlation techniques to verify functions through microbumps. Block-based dual data retention scheme is applied to reduce self-refresh current.
Micron Tecnology and Intel will demonstrate a 3.3V 32Gb NAND-Flash memory with 3b/cell. The 34nm device features a programming throughput of 6MB/s on blocks configured as 3b/cell mode and can dynamically switch up to 13MB/s in 2b/cell mode. A new quad-plane architecture and an optimized programming algorithm are adopted to achieve the design targets, Intel says.
Another interesting sessions is Toshiba's paper about the development of a 40nm 14-core mobile application processor with a 222mW Full-HD H.264 video decoder and a video/audio multiprocessor. The processor has 25 power domains. The power switch circuits realize less than 1μs power-up switching while minimizing rush current. The x512b power-efficient stacked DRAM I/F achieves 10.6GB/s of bandwidth, Toshiba claims.
For more information on the ISSCC event, visit http://www.isscc.org