View
83
Download
0
Category
Tags:
Preview:
DESCRIPTION
Sonic Millip3De : Massively Parallel 3D Stacked Accelerator for 3D Ultrasound. Richard Sampson * Ming Yang † Siyuan Wei † Chaitali Chakrabarti † Thomas F. Wenisch * * University of Michigan † Arizona State University. Portable Medical Imaging Devices. - PowerPoint PPT Presentation
Citation preview
Sonic Millip3De:Massively Parallel 3D Stacked Accelerator for
3D Ultrasound
Richard Sampson* Ming Yang† Siyuan Wei† Chaitali Chakrabarti† Thomas F. Wenisch*
*University of Michigan †Arizona State University
2
Portable Medical Imaging Devices
• Medical imaging moving towards portability– MEDICS (X-Ray CT) [Dasika ‘10]
– Handheld 2D Ultrasound [Fuller ‘09]
• Not just a matter of convenience– Improved patient health [Gunnarsson ‘00, Weinreb ‘08]
– Access in developing countries• Why ultrasound?
– Low transmit power [Nelson ‘10]
– No dangers or side-effects
3
Handheld 3D Ultrasound
• 3D has numerous benefits over 2D– Easier to interpret images– Greater volumetric accuracy
• … as well as many challenges– 12k transducers, 10M image points
• 10-20x beyond state of the art– High raw data bandwidth (6Tb/s)
• Major bottleneck in state of the art– Tight handheld power budget (5W)
4
Why a Custom Accelerator?
• Software algorithms load/store intensive– von Neumann designs inefficient
• Large system would require over 700 DSPs– General purpose CPUs even less efficient
Architecture Energy/Scanline(1 fps)
Single CoreTime/Scanline
Intel Core i7-2670 25.08J 4.46sARM Cortex-A8 33.04J 132.18sTI C6678 DSP 2.84J 2.27s
5
Contributions
• Iterative delay calculation algorithm– Reduces storage by over 400x– Enables streaming data flow
• Sonic Millip3De design– Leverages 3D die stacking technology– Transform-select-reduce accelerator framework
• Power and image analysis of Sonic Millip3De– Negligible change in image quality– Able to meet 5W power budget by 11nm node
6
Outline
• Introduction• Ultrasound background• Algorithm design• System design
– Sonic Millip3De– Select Sub-Unit
• Results and analysis• Conclusions
7
Ultrasound: Transmit and Receive
Receive Raw Channel Data
ImageSpace
FocalPoints
ReceiveTransducer
TransmitTransducer
𝜏
Ultrasound: Transmit and Receive
8
𝜏
Ultrasound: Transmit and Receive
9
𝜏
Ultrasound: Transmit and Receive
10
𝜏
Ultrasound: Transmit and Receive
11
𝜏
Ultrasound: Transmit and Receive
12
𝜏
Ultrasound: Transmit and Receive
13
𝜏
Ultrasound: Transmit and Receive
14
𝜏
Ultrasound: Transmit and Receive
15
𝜏
Ultrasound: Transmit and Receive
16
𝜏
Ultrasound: Transmit and Receive
17
𝜏
Ultrasound: Transmit and Receive
18
𝜏
Ultrasound: Transmit and Receive
19
𝜏
20
Ultrasound: Transmit and Receive
Each transducer stores array of raw receive data
𝜏
21
Ultrasound: Image Reconstruction
Image reconstructed from data based on round trip delay
22
Ultrasound: Image Reconstruction
Images from each transducer combined to produce full frame
23
Delay Index Calculation
• Iterate through all image points for each transducer and calculate delay index
• Often done with lookup tables (LUTs) instead• 50 GB LUT required for target 3D system
𝜏 𝑃
𝑃
24
Challenges of Handheld 3D Ultrasound
• Delay index LUT requires too much storage– New iterative algorithm reduces necessary
constant storage by 400x• Peak raw data bandwidth (6Tb/s) infeasible
– Sub-aperture multiplexing reduces peak data rate, but requires more transmits
• Handheld power budget very tight (5W)– 3D stacked, highly parallel data streaming design
reconstructs images efficiently
25
Iterative Delay Index Calculation
• Deltas between adjacent focal points on a scanline form smooth curve
• Fit piecewise quadratic approx. to delta function
• Two sections sufficient for negligible error
Section 1 Section 2
26
Sub-aperture Multiplexing
• Peak raw data bandwidth (6Tb/s) infeasible• Solution: sub-aperture multiplexing
– Transmit multiple times from same location– Receive with subset of transducers (sub-aperture)– Sum images together
• Prior work: reduce data rate• Our design: also reduces HW
and power requirements
27
System Design
28
System Design
Sonic Millp3De comprises 1,024 parallel pipelines
29
System Design: Transducers
Interchangeable CMOS transducer layer; can use older process
30
System Design: ADC/Storage
Separate storage layer to reduce wire lengths
31
System Design: Transform-Select-Reduce
Accelerator units in fast, low power process
32
Select Sub-Unit Design
Selects sample closest to each focal point using our algorithm
33
Select Sub-Unit Design
All delays for a scanline estimated using 9 constants
Section 1 Section 2
34
Select Sub-Unit Design
Adders calculate next iteration of quadratic approximation
A(n+1)2 + B(n+1) + C = (An2 + Bn + C) + 2An + (A+B)
Section 1 Section 2
35
Select Sub-Unit Design
Decrementor selects sample for next image focal point
Section 1 Section 2
36
Select Sub-Unit Design
Section decrementor indicates when to change constants
Section 1 Section 2
37
Outline
• Introduction• Ultrasound background• Algorithm design• System design
– Sonic Millip3De– Select Sub-Unit
• Results and analysis• Conclusions
38
System ParametersParameters Value
Sub-apertures 12Transmit Sources 16
Transmits per Frame 192Transducers per Sub-aperture 1,024
Total Transducers 12,288Storage per Transducer 4,096 x 12 bits
Focal Points per Scanline 4,096Image Depth 6 cm
Image Angular Width π/4Sampling Frequency 40 MHzInterpolation Factor 4x
Interpolated Sampling Frequency (fs) 160 MHzSpeed of Sound (tissue) 1,540 m/s
Target Frame Rate 1 fps
39
Image Quality Comparison
Ideal Our Design (12 bit)
Our design has negligible difference from ideal system
11 bit
Bits Ideal 14 13 12 11 10CNR 2.972 2.942 2.960 2.942 2.536 2.233
Simulations using Field II [Jensen ‘92, ‘95]
40
Power Analysis and Scaling
45 32 22 16 110
5
10
15
20DRAMMemory InterfaceNetwork WiresAcceleratorSRAMADCTransducers
Technology Node
Pow
er (W
)
Can meet 5W by 11nm node
41
Conclusions
• 3D die stacked Sonic Millip3De design is able to meet 5W power budget by 11nm
• Algorithm/HW co-design enables order-of-magnitude gains– Power and output quality goals often in conflict– Need guidance from domain experts to balance
• Architects have much to offer for application-specific system designs
42
Questions?
Special thanks to:
Brian FowlkesOliver KripfgansRon Dreslinski
Recommended