Woodcrest will outperform all other CPUs on the market
In Linpack
By Nebojsa Novakovic: Friday 19 May 2006, 21:53 A FEW DAYS AGO, a friend told me that the famous Linpack FLOPs benchmark for matrix multiplication, still used (despite its near uselessness in real apps) to rank the world's TOP 500 supercomputers, will run on the Conroe and Woodcrest far better than expected, due to the recent compiler and tuning advances.
Basically, a 3GHz Woodcrest chip gives you 3 billion x 4 FP ops x 2 cores per second, or 24 GFLOPS theoretical peak (Rpeak number in Linpack) per socket in 64-bit precision.
So, two chips on a typical workstation or server board give you 48 GFLOPS Rpeak - almost as good as a quad-chip Montecito, or 50% better than dual-chip (four cores total) POWER5+.
With the rumoured improvements, Woodcrest is hitting 80% efficiency in Linpack, i.e. its measurable Rmax rate of execution will be four-fifths of theoretical peak, or over 38 GFLOPs in this case.
Now, the 2.8GHz dual-core Opteron grade, expected to be the one greeting Woodcrest, has some 88% efficiency per socket, but the Rpeak per clock is half. So, it is 2.8 billion x 2 FP ops x 2 cores per second, or 11.2 GFLOPs Rpeak per socket. For two-socket Opteron, the Rpeak then would be 22.4 GFLOPs, and Rmax some 19.8 GFLOPs - half that of Woodcrest two socket setup! Wow, AMD should have had K8L now, not in a year's time.
We all know that Itanic is darn great at least in Linpack, with over 90% efficiency in some cases - but its clock is nearly half that of Woodcrest, with the same peak FP rate per cycle anyway. So, for a dual-core 1.6GHz Montecito, we have 1.6 billion x 4 FP ops x 2 cores per second, or 12.8 GFLOPs Rpeak per socket. Times 0.9 for Rmax, we have 11.5 GFLOPs Rmax per socket, or roughly 23 GFLOPs Rmax per two-socket board.
And 2.2GHz Power5+? Let's say 77% efficiency in this case. Dual-core chip too, so 2.2 billion x 4 FP ops x 2 cores per second, or 17.6 GFLOPs Rpeak, and 13.5 GFLOPs Rmax per socket, or 27 GFLOPs per two-socket board.
In summary? Woodcrest will, socket for socket, outperform all other CPUs in the market, X86 or RISC or EPIC - when it comes to the TOP500 battle and its Linpack benchmark.
We all also know that the real application performance is quite different from Linpack benchmark code, and that the good ship Itanic, as well as POWER4, have more registers to play with, and more efficient instruction set architectures than X86-64, so there will be apps where either of these two will still outperform Woodcrest or Opteron. And yes, the new Core 2 Duo doesn't have the quad-socket MP option yet for larger compute nodes.
But, with almost deadly assurance, I can state that the number of such apps will be greatly diminished - at least in the workstation and small server node class of supercomputing apps. You don't really need to re-optimised your apps to enjoy most of the new power.
Importantly, most tender bids for large supercomputer clusters include the Linpack number TOP500 position in a very prominent place. It's a question of bragging rights. µ
|