Architectural tricks to maximize memory bandwidth

Preview:

Citation preview

Architectural tricks to maximize Memory Bandwidth

Deepak ShankarCEO, Mirabilis Design

Why Focus on Memory Sub-System

• Processors have huge number of cycles and bandwidth – How do you take advantage of this?

• Memory access is a major bottleneck– Especially in high-performance systems like multimedia

and networking• Memory access forms the largest power

consumption– Too many ACT(RAS, RP and RCD) will dramatically

increase the power

Reports

Introduction

• Importance of improving Memory Performance

• Addressing challenges with Architecture Level Memory explorations

• Need for Performance vs. Power trade-off analysis

• Memory addressing scheme on Performance

About Mirabilis Design

• Provider of system-level architecture exploration solution for electronics and semiconductors

• Platform to conduct power-performance trade-offs, hardware-software partitioning and topology design

• VisualSim- Modeling and simulation software• Based in Silicon Valley with experts in system

modeling and architectures• Largest source of system modeling library with

embedded timing, functionality and power

Explore/Simulate a Memory System

• Key attributes– DRAM datasheet– Memory Controller attributes– Connected Bus topology– Workloads including rate, size, command and back

pressure

Statistical Memory Model for Performance Analysis

Challenges in Memory Usage

• Product– Multimedia, Networking, HPC, Avionics

• Situation– Using an off-the-shelf Processor, FPGA or SoC

• Challenge– What will be the performance and power consumption for

my use-cases?• Metrics

– Power per frame or packet– Latency from sensor input to HDMI output

Opportunities in Memory Usage

• Vary the data sizes• Memory configuration• Ordering of tasks in the use-case• Multiple Masters making asynchronous

request to memory- Addresses• Task and data distribution across multi-core

Full System Analysis

Processor Performance

Challenges in Memory System Design

• SoC interface to memory• AXI bus and NoC topology to minimize the

overhead for each Master• Single vs. dual channels• Memory controller algorithm

Opportunity and Advantage of Design

• Consolidate read and write• Split transaction• Group transaction• Read re-ordering• Transaction priority assignment• Lower clock frequency vs. wider bus

Cycle-accurate Memory Model for Architecture Exploration

Power vs. Timing

About VisualSim

Architecture

Exploration

Performance Analysis

Power Analysis

HW-SW Partitioning

Software

InterfacesRTOS

Hardware

• Graphical and hierarchical modeling

• Large library of stochastic and cycle-accurate components and IP blocks with embedded timing and power

• Library blocks are used to assemble hardware, software, network, traffic, reports and use-cases

System- vs. Pin-level Modeling

Mirabilis Design Inc.

One Router

System Design Transaction-level Cycle-accurate Signal-level

VisualSim

Schematics and RTL are very slow and to detailed for end-to-end metrics

System- vs. Pin-level Modeling

Similarity• Hardware attributes- width,

clock speed, buffer depths• Timing• Algorithms & arbitration• Data & control flow logic• Use addresses

Differences• Data & control combined in

transaction not bits• No pin definitions• No signal handshaking• Skip cycles with no change• Flexible to make major

changes• 100-1000X Faster

05/03/2023 Mirabilis Design Inc. Confidential Slide18

System model accuracy and simulation is sufficient for the explorations

How can System Level Explorations Help improve Memory Performance

• Evaluate performance and power advantages of different types of memory technologies.

• Early prediction of latency, throughput, power, and energy

• Evaluation of next gen Storage device for high bandwidth and less latency requirements

• Spend more time on analysis and less time on implementation

Modeling Libraries - Semiconductors

SoC•AMBA (AHB/ APB/ AXI)•CoreConnect- PLB & OPB•NoC, Virtual Channel•USB

Memory•SDR, DDR, DDR2, DDR3•QDR, RDRAM•LPDDR, LPDDR2, LPDDR3, LPDDR4•HBM•Flash

Processors•ARM•PowerPC- Freescale and IBM•Intel and AMD•TI•MIPS•Tensilica•Renesas SH

Interfaces•PCI, PCI-X, PCIe•RapidIO•NVMe•Serial Switch•Crossbar•Ethernet•Fibre Channel

BenefitsFeatures Benefits

Facilitating transition from concept to design • Creating realistic workload scenarios

driving simulations • Models enable experimentation and

enhance innovation • Simulations facilitate analysis and

exchanges between teams

Increasing productivity • Rapid Exploration and analysis• Graphics are better suited to handle

complexity • Graphics are 10x more efficient than C/C++

programming Optimizing design • HW Footprint, buffers, timings, power

Facilitating implementation and validation • Providing executable specifications for

implementation • Reusing test cases for validation

Deepak ShankarCEO, Mirabilis Design

info@mirabilisdesign.comwww.mirabilisdesign.com/new/

Phone - 408-245-8992

Recommended