Tuesday, February 5, 2008

Teraflops Research Chip

Advancing Multi-Core Technology into the Tera-scale Era
The Teraflops Research Chip is the latest development from the Intel® Tera-scale Computing Research Program. This chip is Intel's first silicon tera-scale research prototype. It is the first programmable chip to deliver more than one trillion floating point operations per second (1 Teraflops) of performance while consuming very little power. This research project focuses on exploring new, energy-efficient designs for future multi-core chips, as well as approaches to interconnect and core-to-core communications. The research chip implements 80 simple cores, each containing two programmable floating point engines—the most ever to be integrated on a single chip. Floating point engines are used for accurate calculations, such as for graphics as well as financial and scientific modeling. In terms of circuit design, they are more complex than integer engines, which just process instructions.

80 core teraflop research chip magnified.80 core teraflop research wafer.Packaged Teraflops Research Chip

Packaged teraflop research chip on board.Sriram Vangal checking teraflops research chip results.Intel researchers on the Tera-scale Research Project

First row left to right: 80 core teraflops research chip magnified, 80 core teraflops research wafer, Packaged Teraflops Research Chip.
Second row left to right: Packaged teraflops research chip on board, Sriram Vangal checking teraflops research chip results, Intel researchers on the Tera-scale Research Project.
Intel's Teraflops Research Chip implements several innovations for multi-core architectures:
  • Rapid design - The tiled-design approach allows designers to use smaller cores that can easily be repeated across the chip. A single-core chip of this size (100 million transistors) would take roughly twice as long and twice as many people to design.
  • Network on a chip - In addition to the compute element, each core contains a 5-port messaging passing router. These are connected in a 2D mesh network that implement message-passing. This mesh interconnect scheme could prove much more scalable than today's multi-core chip interconnects, allowing for better communications between the cores and delivering more processor performance.
  • Fine-grain power management - The individual compute engines and data routers in each core can be activated or put to sleep based on the performance required by the application a person is running. In addition, new circuit techniques give the chip world-class power efficiency—1 teraflops requires only 62W, comparable to desktop processors sold today.
  • And other innovations - Such as sleep transistors, mesochronous clocking, and clock gating.
Below is a summary of results from the research chip. Note that while performance gains can still be made through frequency scaling, there is a significant cost in terms of energy efficiency. This underscores the motivation to scale by utilizing more and more cores, instead of just increasing the frequency.

Frequency Voltage Power Aggregate Bandwidth Performance
3.16 GHz 0.95 V 62W 1.62 Terabits/s 1.01 Teraflops
5.1 GHz 1.2 V 175W 2.61 Terabits/s 1.63 Teraflops
5.7 GHz 1.35 V 265W 2.92 Terabits/s 1.81 Teraflops


ASCI Red was the first computer to benchmark at a teraflops (1996). That system used nearly 10,000 Pentium® Pro processors running at 200MHz and consumed 500kW of power plus an additional 500kW just to cool the room that housed it. Although not a general purpose computing device, this Teraflops Research Chip delivers 1.0 teraflops of performance and 1.6 terabits aggregate core to core communication bandwidth, while dissipating only 62W.

Bringing tera-scale computing to PCs and servers requires a new way of building processors that can be thought of as a network of powerful computers on a chip. This Teraflops Research Chip is one important example of how the Intel® Tera-scale Computing Research Program aims to change the future through constant hardware and software innovation.

No comments: