Author
marvin-stephens
View
217
Download
2
Tags:
Embed Size (px)
Michihiro Koibuchi*, Hiroki Matsutani**Hideharu Amano**. Timothy Mark Pinkston***
*National Institute of Informatics, Japan/JST, Japan*Keio University, Japan***University of Southern CaliforniaA Lightweight Fault-Tolerant Mechanism for Network-on-Chip
BackgroundImprovement of the die yieldCircuit LevelArchitecture Levele.g. Cell Brd. Eng.Play Station 37SPEHPC-Purpose: 8SPE
Fault tolerance of the communication on multi-core systemsLightweight mechanism
Cell Broadband EngineRing busesPPESPESPESPESPESPESPESPESPECell for PS3 (One SPE is disabled)
OutlineFault patterns on Network-on-Chip (NoC)
Default-backup path mechanism (DBP)maintains the connectivity of all healthy PEs, even if the network includes hard faultsObjectiveProvide a highly reliable network using lightweight hardware
EvaluationEnergyAmount of HardwareThroughput
Network-on-Chip (NoC)(*) Kyoto U/VDEC/ASPLA 90nm CMOS On-chip routerCoreProcessor CoreLargest componentVarious fault-tolerant techniquesResource sparingRedundancyOn-Chip RouterArea is not so large.Infrastructure that affects on-chip communicationDuplication
16-Core Architecture
Failures in CommunicationTransient Error (e.g. bit error) Software layer is responsible, and recoverableLink-to-link, and/or end-to-end [Murali,DToC05]Error detection and/or error correction (e.g. CRC)Permanent Error (e.g. hard error)System avoids using the failed modules
PEPERouterRouterHard error!
Router ArchitectureSpeculative Router [Kim ISCA06]Providing fault-tolerance at input buffer, routing computation, and switch allocation unit. Dependability for misrouted packets [Thottethodi IPDPS03]Channel Reconfiguration[DallyText03, Soteriou ICD04]
Routing PathsResource SparingDynamic ReconfigurationFault-Tolerant RoutingEach Technique is resilient for portion of possible failures. - Using them together enables high reliability! But, how about simplicity?- Hard to recover crossbar failuresExisting NoC Fault-tolerant Techniques
OutlineFault patterns on Network-on-Chip (NoC)
Default-backup path mechanism (DBP)maintains the connectivity of all healthy PEs, even if the network includes hard faultsObjectiveProvide a highly reliable network using lightweight hardware
EvaluationEnergyAmount of HardwareThroughput
MotivationOn-chip routerCoreNoC ComponentRouter, Link Failuredisabling healthy local PEsSegmentation of the networkNI FailureDisabling the healthy local PEDisabledUnlike off-chip systems, a faulty module cannot be removed and replacedDisabled healthy PE16-Core Architecture
Conventional NoC Router2-D mesh5-by-5 Router, channel bit-width (flit size) 64-bit5x5 XBARARBITERFIFOFIFOFIFOFIFOFIFOX+X-Y+Y-COREX+X-Y+Y-COREEach input buffer has two VCs(2x64-bit x 4)Area (after place and route) is 4045 [KGate]; 75% is FIFO[Matsutani.ASP-DAC08]Each module may fail.Duplication of all the input ports is too expensive.
Minimum Requirements for Communication 5x5 XBARARBITERFIFOFIFOFIFOFIFOFIFOX+X-Y+Y-COREX+X-Y+Y-COREIt should send packets to at least one output portIt should receive packets from at least one input portTo communicate a local core with neighboring cores,
Default-backup Path(DBP) Mechanism5x5 XBARARBITERFIFOFIFOFIFOFIFOFIFOX+X-Y+Y-COREA local core can send packets to at least one output portA local core can receive packets from at least one input port
5x5 XBARARBITERFIFOFIFOFIFOFIFOFIFOX+X-Y+Y-COREFailureHeadTail
BodyDefault-backup Path(DBP) MechanismA local core can send packets to at least one output portA local core can receive packets from at least one input port
Cores can communicate with each other, even if router modules fail maintain packet transfers from X- direction, o X+ directionARBITERFIFOFIFOFIFOFIFOFIFOX+X-Y+Y-COREFailureBehavior of the DBP Mechanism (within a Router)
Behavior of the DBP Mechanism bypassing Xbar and NI faults5x5 XBARARBITERFIFOFIFOFIFOFIFOFIFOX+X-Y+Y-COREUsing 3:1 Mux instead of 2:1 mux
Another Issue: Network Connectivity 16-Core ArchitectureOn-chip routerCoreThe DBP mechanism provides reliability not only on intra-router datapath but also on routing pathsRouter, link failureDisabling healthy local PEsSegmentation of the networksmay disable all the PEsDividing into two regions!
DBP Mechanism (inter-router behavior)Router(omit PEs)5x5 XBARARBITERFIFOFIFOFIFOFIFOFIFOX+X-Y+Y-CORESet the DBP ports along a unidirectional embedded ring topologyRouter
Routing Bypasses Faults (e.g., failed crossbar)RouterDefault-backup path is used only at the faulty portThe corresponding network graphA unidirectional channel on a linkLink
DBP Applied to Up*/Down* RoutingSDUp*/Down* routingThe router has only a single output port Existing deadlock-free routing cannot provide the network connectivity, due to the directional routing restrictions
Virtual channel (VC) transitionDBP Routing MechanismTurn Model[Glass,1992]Guaranteeing deadlock-freedom and connectivity by imposing routing restrictionsAllows packet transfer along the DBP ringAllows VC transitions in increasing orderUses existing deadlock-free routing within every virtual-channel network
The Idea is similar to the SAN routing [koibuchi,ICPP03]We propose a new routing strategy for NoCs with directional routing restrictions!
OutlineFault patterns on Network-on-Chip (NoC)
Default-backup path mechanism (DBP)maintains the connectivity of all healthy PEs, even if the network includes hard faultsObjectiveProvide a highly reliable network using lightweight hardware
EvaluationEnergyAmount of HardwareThroughput
Energy: NoC Energy ModelAve. flit energy:Send 1-flit to destinationHow much energy[J] ?
Simulation parameters6/12mm square chip (16/64 cores)90nm CMOS
[Wang, DATE05]
Energy Consumption 16 coresAs the number of faulty links increases, DBP gracefully increases the energy, due to the increased hop counts64 cores
Amount of Hardware Area is increased by at most only 11.1% (the 2-VC case)Total router area of 2-D meshRouter area with various # of ports.The ratio of additional HW is decreased, as # of ports increases.
Performance EvaluationNetwork simulationThroughput and latency16 cores and 64 coresTopology 2-D meshTraffic patternRandom (as a baseline)
Packet size16-flit (1-flit header)Buffer size1-flit per channelSwitchingWormhole switching# of VCs2Min latency3-cycle per router
Throughput and Latency Topology is changed from 2-D mesh (no faults) to ring at /224 faults on 16/64 cores16 cores64 coresThroughput is decreased by the increased path hops.
Extensions of DBP MechanismFaults within the DBP itself and various portsPartially duplicationMultiple embedded DBP rings
Another approachTo improve the latency, a healthy router enables the DBPARBITERFIFOFIFOFIFOFIFOFIFOX+X-Y+Y-CORERouterDatapath via no crossbar
ConclusionsWe proposed a lightweight fault-tolerant mechanism, DBP, for NoCs (architecture level)Resilient for hardware faults of both intra-router modules and routing pathsA new routing strategy was developed The idea is applicable to various NoC architecturesAs well as regular topologiesEvaluationEnergy consumptionalmost constant by up to 40 faults (64 cores)Amount of Hardwareincreasing by at most only 11.1% Throughput performancedecreasing by the increased path hopsThe DBP serves the role of lifeline to increase the lifetime of NoCs
******