A 2 Teraflops Workstation For $20,000?

By Roland Piquepaille

If William Dally, a professor of computer science at Stanford University, can convince a manufacturer to adopt his idea of a custom stream processor, the Merrimac, we could soon see a 2 petaflops system built for only $20 million. EE Times says that his paper, presented recently at SC2003, stirred a petascale controversy.

[In this paper, William Dally] said today's CPUs are inefficient because they spend too little time performing calculations and too much time waiting for memory. Merrimac attempts to shift microprocessor design by rewriting applications as so-called streams that expose the parallelism of multiple arithmetic units on the Merrimac CPU and provide mechanisms to handle more calculations without going to off-chip memories.
The Merrimac design contains a cluster of sixty-four 64-bit floating-point multiply-add units fed from a hierarchy of registers and supervised by separate on-chip controllers. Dally estimates a 90-nanometer chip measuring 10 x 11 mm could deliver 128 Gflops, yet would cost only $200 to make and would dissipate about 31 watts. A 96-port router chip of a similar size would connect up to 16 Merrimac nodes on a single board or 512 nodes in a cabinet.
The resulting system architecture could deliver a 2-Tflops workstation for $20,000 or a 2-Pflops supercomputer for $20 million, according to Dally's paper.

For more information about Merrimac, you can visit the Stanford Streaming Supercomputer Project site. And for a very technical explanation of what could do this processor, you can read the paper witten by Dally and his colleagues, "Merrimac: Supercomputing with Streams" (PDF format, 524 KB, 8 pages).

You'll find there several figures, including a diagram of a synthetic stream application (Credit: Dally and colleagues).

A synthetic stream application

Here is how this stream program could be mapped to the Merrimac processor.

The stream application mapped to the Merrimac processor

The Stanford team already wrote such applications to simulate the processor. But so far, it seems that no manufacturer is interested in building the chip. Too bad.

Sources: Rick Merritt, EE Times, November 24, 2003; and various websites