27
ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor UC San Diego and Toshiba Presented By: Vandit Agarwal

ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

ASIC Clouds: Specializing the Datacenter

Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor

UC San Diego and Toshiba

Presented By: Vandit Agarwal

Page 2: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Motivation• GPU and FPGA based clouds already successful

• Even ASIC Clouds have been successfully used

• Take this idea ahead to form ASIC based clouds for other applications

• Purpose built Datacenter

• Large arrays of ASIC accelerators

• Optimize Total Cost of Ownership (TCO)

• For increasingly common high-volume chronic computations

• Downside:

• High Non Recurring Engineering (NRE)

• Inflexibility

Page 3: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Introduction• Two visible trends:

• Heavy work done on cloud; interactive moved to client

• Rise of dark silicon - specialization and near threshold computation

• Conjunction of these two designs proved viable

• On a single machine level, ASICs can offer at least an order improvement - explore and propose ASIC cloud

• Identify key issues by studying Bitcoin ASIC Cloud

Page 4: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Objective In a Nutshell• Two key metrics drive the development:

• H/w cost per performance = $ per op/s

• Energy per operation = W per op/s

• Working with a joint knowledge/control over datacenter and h/w design

• Select single TCO-optimal point amongst many Pareto-optimal points

Page 5: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

• ASIC Design: achieves reduction in silicon area and energy consumption

• ASIC Server: organization of ASIC, heat sinks, selective components, custom voltages

• ASIC Datacenter: optimize rack and datacenter level thermal distribution, costs such as provisioning cost, availability, taxes etc.

**To meet the requirements at datacenter level, modifications trickle down in the hierarchy

Specialization HierarchyOff-PCB Interface On-PCB

Network

On-ASIC Interconnection

Network

Page 6: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

ASIC Cloud Architecture

• Trying to create a generic skeleton for ASIC Cloud

• Heart of ASIC cloud - Replicated Compute Accelerator (RCA) - multiplied recursively

• Customization: eg - if RCA requires DRAM, then ASIC contains shared DRAM controllers connected to ASIC-local DRAMs

Off-PCB Interface On-PCB

Network

On-ASIC Interconnection

Network

Page 7: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

ASIC Server Overview

• Focussed on 1U 19-inch Rackmount servers

• Forced air-cooling system

• Air intake from front, removal from back

• Air at 30oC

Page 8: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

ASIC Server Evaluation Flow• Given an implementation and architecture for target

RCA:

• VLSI tools used to map it to target process

• Analysis tools provide info on:

• Area

• Performance

• Power density

• Tune the following to find lowest TCO:

• No. of RCAs/Chip

• No. of chips/PCB

• Organization of chips on PCB

• Power delivery mechanism

• Cooling mechanism

• Choice of voltage

Page 9: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Thermally-Aware ASIC Server Design• ASICs and DC/DC convertors - major sources of heat

• Heat Sinks:

• Heat spreader glued to the heat source (die) using Thermal Interface Material (TIM)

• Spreader has fins - air blowed through them

• Increasing spreader size improves cooling

• Increasing the die size improves cooling - overcomes TIM resistance

• Developed a model:

• Input: fan curve, ASIC count/row

• Output: Optimal heat sink parameters

Page 10: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Arranging ASICs on PCB

Page 11: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

More Chips vs Fewer Chips• How large (in mm2) should each chip

be?

• Determines how many RCAs will be on each chip

• Many small ASICs easier to cool than few large ASICs

• Increasing silicon area -> heat dissipation capacity increases (TIM)

• Large total die area in a row is effective

• Increasing no. of chips increases the packaging cost but not by much

Page 12: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Power Density and Server Cost

• Given same RCA, increasing Watts, increases performance

• Moving right (high power density), very little total silicon per lane (due to temperature constraints) and must be divided into many smaller chips

• Cooling and packaging cost

• Moving left (low power density), more silicon per lane and fewer chips

• Silicon area cost

Page 13: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Bitcoin• Semi-anonymously and securely transfer money

• Blockchain - globally replicated public ledger of transactions

• A distributed consensus algorithm called Byzantine Fault Tolerance determines whose transactions are added to the blockchain

• Mining:

• Machines request work from a pool server

• Hash - brute force attempt at partial inversion of cryptographically hard hash function

• Hashrate - rate of hash - typically Giga hashes per second (GH/s)

• On success, other machines verify. Accept and append the block

Page 14: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

What Led to Bitcoin ASIC Cloud?

• People are incentivized to mine:

• More number of machine = more secure system

• Blockchain reward (25 BTC = ~USD 11k in 2016)

• 144 blocks daily x 25 BTC per block = ~USD 1.5M daily

• Rising TCO justifies the increased investment in NRE and other development cost

• Leads to more specialization

Page 15: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Bitcoin ASIC TrendD

ifficu

lty

Page 16: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Implementation

• 0.66 mm2 silicon in UMC 28-nm process.

• Power density: 2W/mm2

• Extremely high power density

Page 17: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Results

• More silicon -> optimal voltages decreases -> server efficiency increases

• Initially, costs reduce (right to left) but then silicon costs start building up

Page 18: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Voltage Stacking

• DC/DC power is significant

• Chips serially chained so that their supplies sum to 12V

• Lead to significant savings in TCO optimal case

Page 19: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Litecoin ASIC Cloud

Page 20: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Video Transcoding ASIC Cloud

**Pareto points are glitchy because of variations in constants and polynomial order for server components as they vary with voltages

Page 21: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

CNN ASIC Cloud

Page 22: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

When is ASIC Cloud Feasible

Page 23: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Discussion• This is one of the earlier attempts to create a general

framework/skeleton for an ASIC cloud. How feasible do you think this technology is and how widely and how soon can we potentially adopt it for a large variety of applications?

• The authors recommend that open sourcing various tools by the cloud providers and silicon foundries would potentially lead to lower TCO. Is this a good solution? Why or why not?

• What do you think is more optimal? Investing heavily in (high NRE) in more advanced nodes (eg 16nm) or using/modifying older nodes (eg 65nm) in an ASIC?

Page 24: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W

Bitcoin ASIC Cloud Design• Repeatedly execute a Bitcoin hash operation

• Input: 512 bit block

• Mutate the block and perform SHA256 on it

• Fed into another round of SHA256

• Leading zero count performed and matched with the target

• 64 rounds in each SHA

Page 25: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W
Page 26: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W
Page 27: ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W