Upload
emerald-greene
View
214
Download
2
Embed Size (px)
Citation preview
[Tim Shattuck, 2006] [1]
Performance / Watt:The New Server Focus
Improving Performance / WattFor Modern Processors
Tim Shattuck <[email protected]>April 19, 2006
From the Paper by James Laudon <[email protected]>Computer Architecture News, Volume 33, Number 4, September
2005
[Tim Shattuck, 2006] [2]
At Issue:
Power Hungry Servers
Increasing Costs to Power Hardware
Wastes Limited Resources
[Tim Shattuck, 2006] [3]
Three Trends
High power consumption to performance gains ratio
Hardware costs account for a smaller percentage of Total Cost of Ownership (TCO)
Energy costs are rising
These trends are expected to make power the dominant factor in calculating TCO within five years.
[Tim Shattuck, 2006] [4]
Niagra Optimizations
Simple
Clock gating
Pipelines
More complex
Hardware support for multithreading
[Tim Shattuck, 2006] [5]
Simple Optimizations
Clock gating
Don't power idle parts of the chip
Shorter, medium-length pipelines
Fewer registers, transistors between stages
Less power wasted on (failed) speculation
Allow for more cores / chip
[Tim Shattuck, 2006] [6]
More Optimizations
Hardware Multithreading
Keep on-chip resources busy
Deals with high cache miss rates
Boosts performance / Watt
Increases throughput of threads
Increases power consumption only slightly
Increases size of the die 4 - 7% per thread
[Tim Shattuck, 2006] [7]
Cores / Die
Fewer complex cores
More simple cores
Individual thread completion
Aggregate thread throughput
Simpler cores tend to have better performance / Watt ratios
[Tim Shattuck, 2006] [8]
Sufficient Cache and Memory Bandwidth
Necessary to keep threads busy
Sun's Niagra:
Cores connected to L2 cache by a crossbar switch
Cache bandwidth of 76.8 GB/s
Four memory controllers directly connected to DDR2 SDRAM memory unit (200 Mhz)
Raw memory bandwidth of 25.6 GB/s
Controllers can reorder accesses to favor reads over writes.
[Tim Shattuck, 2006] [9]
Testing
SPEC JBB 2000
Java server side business logic
TPC-C, TPC-W
Transactional processing tests
XML Test
Sun's multithreaded processing test.
Result: Scalar processors with moderate pipelines and thread support outperformed superscalar processors.
[Tim Shattuck, 2006] [10]
Case Studies
Sun's Niagra
8 cores, 4 threads each
Scalar cores
Tries to maximize performance / Watt
Intel's Pentium Extreme Edition
2 cores, 2 threads each
Superscalar cores
Tries to maximize performance
[Tim Shattuck, 2006] [11]
Case Studies (II) - Results
Feature Niagra Pentium Extreme EditionClock Speed 1.2 Ghz 3.2 GhzPipeline Depth 6 stages 31 stagesNumber of Cores 8 2Number of Threads 32 4L2 Bandwidth 76.8 GB/s ~180 GB/sMemory Bandwidth 25.6 GB/s 6.4 GB/sTransistor Count 279 Million 230 Million
Power 72 W 130 W
[Tim Shattuck, 2006] [12]
Simple Core Limitations
Lower single thread performance
Amplified by lower instruction level parallelism
Keeping a large number of threads busy may become difficult
Hot locks – threaded applications may not scale very well
[Tim Shattuck, 2006] [13]
Future Directions
Use multithreading to enhance single threaded applications
Run-ahead execution – allows out of order execution with only a modest amount of hardware
Software control of power consumption
Dynamic adjustments to voltage and frequency to tune power consumption
Control of non-processing devices' (disk, memory systems) power consumption
[Tim Shattuck, 2006] [14]
Conclusion
Invest in a Niagra today!