AltiumLive 2017: PCBs for Computing Density From Big Bang ... 2/PCBs for...¢  AltiumLive 2017: PCBs

  • View
    0

  • Download
    0

Embed Size (px)

Text of AltiumLive 2017: PCBs for Computing Density From Big Bang ... 2/PCBs for...¢  AltiumLive...

  • AltiumLive 2017: PCBs for Computing Density From Big Bang to the Automobile

    Andreas Doering IBM Research – Zurich Laboratory

    1

  • Motivation for Microservers

    1

    Insights

    Outlook

    The DOME project

    Boards

    2

    3

    4

    5

    Agenda

    2

  • * IDC HPC technology excellence award, ISC17

    3

  • DOME ppp Astron, IBM, Dutch

    gvt

    Ronald P. Luijten / July 2017 •4

  • SKA (Square Kilometer Array) to measure Big Bang

    Picture source: NZZ march 2014

    0 10-32s 10-6s 0.01s 3min 380’000 years 13.8 Billion years

    Big Bang Inflation

    Protons created

    Start of nucleosynthesi

    s through fusion

    End of nucleo-

    synthesis Modern

    Universe

    •5

  • SKA: What is it?

    Top 500: Sum=123 PFlops. 2GFlops/watt.  100x Flops of Sum!  ~ 7GWh

    ~3000 Dishes 3GHz-10GHz.

    ~0.5M Antennae .5GHz-1.7GHz.

    ~0.5M Antennae .07GHz-0.45GHz.

    1. 109 samples/second * .5M antennae: .5 1015 samples/sec.

    2. 3.5 109 samples/second * .5M antennae: 1.7 1015 samples/sec.

    3. 2 1010 samples/second * 3K antennae: 6.1013 samples/sec

    Sum = 2 1015 samples/second @ 86400 seconds/day:

    170 1018 (Exa) samples/day. Assume 10-12x reduction @antenna:

    14 Exabytes/day (minimum).

    •6

  • © 2016 IBM Corporation

    ~ 10 Pb/s

    86’400 sec/day

    14 ExaByte/day

    ?

    ~ 1 PB/Day.

    330 disks/day

    120’000 disks/yr ?

    Top-500 Supercomputing(11/2013)…. 0.3Watt/Gflop/s Today’s industry focus is 1 Eflop @ 20MW. (2018) ( 0.02 Gflop/s)

    Most recent data from SKA: CSP….max. power 7.5MW SDP….max. power 1 MW Latest need for SKA – 4 Exaflop (SKA1 - Mid)  1.2GW…80MW

    Too easy (for us)

    Too hard

    Moore’s lawFactor 80-1200

    SDPCSP

    multiple breakthroughs needed •7

  • Dome Project:

    System Analysis

    Data & Streaming

    Sustainable (Green)

    Computing Nanophotonics

    Computing Transport Storage

    Algorithms & Machines

    - Nanophotonics - Real-Time Communicatio ns

    - New Algorithms

    - Microservers - Accelerators

    - Access Patterns

    Research Streams…

    …are mapped to research projects:

    …plus an open user platfor m: User platform

    - Student projects

    - Events - Research Collaboratio n

    33M€ 5-year Research Project: 76 IBM PY (32 in NL); 50 ASTRON PY •8

  • Definitions

    9

    • “Microserver” = The server class of the mobile era

    • “Microserver” = SoC + DRAM + Flash + Power

    • “Microserver” = Backplane + not-enclosed modules

  • Motivation

    10

    • Silicon scaling limits, Energy for computation vs. on-chip-

    communication vs. off-chip communication

    • Use of large SMP-servers by partitioning, docker, etc.: Cache

    Coherency not fully used

    • Emergence of powerful embedded processor cores, in particular

    ARM

    • Premise given through Aquasar cooling work

     enabled DOME funding

  • Table of PCBs

    11

    A= Altium Designer, C = Cadence

    Module Name Iterations Length [mm] Width [mm]

    Thickness [mm] Layers Holes Components Nets Backdrilling Material Tool

    P5020/P5040 processor 3 139.7 55.5 1.28 10 3242 1007 539 no ISOLA-400 A

    Big Baseboard 1 220 160 1.28 10 491 175 154 no ISOLA-400 A Power Converter 2 139 56.5 1.63 8 737 440 231 no FR-4 A mSATA on DIMM 2 139.7 55.5 1.24 4 341 69 67 no FR-4 A 8p1 backplane 2 300 200 2.7 18 3582 565 1326 no FR-4+ A Testboard for switch power converter 1 160 220 1.6 8 888 259 134 no FR-4 A Switch Mothercard >1 139.7 57.8 3.6 28 3311 837 730 yes A Switch Daughtercard 1 139.7 57.8 1.8 10 423 213 160 no FR-4 A Mini baseboard 2 160 100 1.2 6 851 376 241 no FR-4 A Bracket for DIMM connector on Minibaseboard 2 154 32 1.2 6 116 2 98 no FR-4 A Bracket for SPD08 connector on Minibaseboard 1 C T4240 processor 3+1 139.7 63 1.6 16 1316 820 Panasonic C mSATA on SPD08 3 139.7 62.5 1.6 6 1014 79 105 no FR-4 A M2 carrier 2 139.7 61.6 1.6 6 1091 130 131 A Auxiliary power converter 1 61 56 1.6 4 478 74 30 no FR-4 A PCIe Extender no FR-4 A LS2088 Processor module 1? 139.7 62.5 1.6 14 1037 714 no Panasonic R1577/1570 C USB HUB Module 2 139.7 61.5 1.57 8 1162 557 387 no FR-4 A BB2 backplane 2 520 200 3.15 22 12598 1076 3820 7 Runs Panasonic Megtron 6 A

    Interposer card 1 139.7 80 1.57 8 897 76 132 4 Runs FR-4, Panasonic Megtron 6N A

    FMKU2595 FPGA 2 139.7 63 1.57 14 7442 881 914 no Panasonic Megtron 6 A

  • System Overview

    12

    8/32/128 compute nodes

    10G Ethernet Switch

    storage node

    Power converter

    P5020/P5040 2/4 cores PowerPC-64@2.1GHz, 16GByte DDR3, 2xXAUI,4x1GbE, 2xSATAv1

    T4240 24 cores PowerPC-64@1.8GHz, 24GByte DDR3, 4x10GbE, 2x1Gb, 2x SATAv2, PCIe-2.0 x8

    LS2088 8xARMv8@2GHz, 32GByte DDR4, 6x10GbE, PCIe, 2xSATA

    FMKU2595 FPGA 330KLUTs, 4x10GbE, 4xGbE,2xSATA

    8 x mSATA or 2xM2

    8x40Gb Ethernet

  • DIMM socket with removed latches for generation 1

    3M’s SPD08 in various lengths For generation 2

    Xtreme Poweredge for power converter (both)

    3 segments of Molex Impact 210 contacts (70 diff pairs)

    Backplane connectors

    13

  • System today

    14

    Backplane for • 32 compute nodes,

    • 8 populated

    • 1 Switch node,

    • 1 Management node

    • 2 Storage nodes

    • Water cooled

  • View from above

    15

    Server nodes

    Power node

    Storage node

    10 GbE Switch

    QSFP cages

    Water In/Out

    Cooling Rails

  • System Q4 2017

    16

    Two backplanes,

    total 64 compute

    Nodes,

    e.g.

    1536 cores,

    1536 GB DRAM

    64 SSDs

  • Gallery of (some) Boards

    17

  • Power Converter

    18

    • Master thesis project:

    • Student did high-level design (e.g. selection of backplane connector), component selection, and schematic entry. Layout was completed by regular engineer: First version worked,

    • 1 iteration to improve stability, protection

    Challenges: High current on top/bottom and SMD packages, location of connectors, and tight IC/L/C-converter triangle, conflict ofhigh profile Ls and hot ICs that must be covered by cool plate

  • 40A per contact finger, allowing different type of C/L

    19

  • Switch Module

    20

    Left: Main Switch PCB 130mm x 55mm

    Right: Switch with

    mounted daughter card

  • Pin Assignment

    21

    • Pin Assignment has to suit back plane and switch module design

    • Both are challenging (Back plane has more space, but many more wires)

    • Reduce crossing on both boards

    • XAUI has low requirements on length balancing

    • 1st Iteration:

    • Let the CAD tool choose the pinout on both boards independently

    • Find out the critical spots

    • Use python script to build systematic pinout that circumvents these

  • PCB Layer Stack

    22

    6 inner signal layers, impedance controlled with shielding ground layers in-between

    4 high-current power supply lanes

    Total PCB thickness

    3.6mm

    Length of connector

    pins 1.2mm

    Original Assumption,

    that board space

    across “through-hole”

    connector cannot be

    used, was wrong.

    Need backdrilling

    Press-Fit Connector on this side

    ASIC on this side

  • PCB routing

    23

    This narrow strip (1cm wide) is one critical part.Routing between connector pins with 1 signal pair

  • FPGA Node

    24

    PCI- and/or Network-Attached 2 Channels DDR4 (e.g. 16GByte)

    Xilinx® Kintex® UltraScale

    6 x 10 GBE, PCIe3 x8, 2 x SATA3

    Status: In bringup

  • FPGA Node – Layout Concept

    25

    Flyby control signals

    on 3 Layers,

    P2P data signals mainly

    on 1 layer

    HighSpeed IO on

    2 inner layers

  • Cooling

    26

    Combination of passive cooling on decapped chip, using vapor chambers and hot- water

  • Insights

    27

    • Main source of error: transfer from data sheet into tool

    • Second source of error: Harness interface (swapping P/N on diff pairs, clock/data on I2C)

    • Third source of error: voltage levels of pins (e.g. enable of power converter)

    • Why is there no electronic transfer of component data to designers?

    Exception: TI (e.g. https://webench.ti.com/cad/)

    Why is there no standard format? There was an initiative XMLEDA, etc.

    • DRC could do more, if symbols provided the information (e.g. P/N property, clock, etc.)

    • Conversion from one tool to another is a кошмар

    Hired Elgris and still 5 working days turned into 2 months

  • Acknowledgements