17
Every Cycle Counts Gajinder Panesar, CTO, UltraSoC [email protected] Iain Robertson, VP Engineering, UltraSoC [email protected] RISC-V Summit December 2019

Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC [email protected] RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

Every Cycle Counts

Gajinder Panesar, CTO, UltraSoC

[email protected]

Iain Robertson, VP Engineering, UltraSoC

[email protected]

RISC-V Summit December 2019

Page 2: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

10 December 2019 Gajinder Panesar/Iain Robertson UL-003128-PT 2

• Overview

• Retirement Delta Trace

• Trace Encoder Interface

• Algorithm

• Filtering

• Holistic System

• Summary

Page 3: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• It is all about the system - not about ISAs or Cores.

• Understanding today’s systems’ behaviour is hard, very hard

• Surprisingly, software sometimes does not behave as expected

• Software developers spend between 50-75% of development time debugging[1], this will increase as systems become even more complex

• Realtime performance, safety and security are increasingly important

• In such systems non-intrusive visibility is crucial

• Processor Branch Trace alone is not enough

• Understanding CPU stalling behaviour is key

Overview

10 December 2019 3Gajinder Panesar/Iain Robertson UL-003128-PT

[1] The Economic Impacts of Inadequate Infrastructure for Software Testing, http://www.rti.org/pubs/software_testing.pdf, 2002

Page 4: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• Stated objective is to be able to determine when each instruction retires, but more importantly when the CPU stalls and does not retire instructions

• Can be achieved by reporting the number of clocks cycles between each successive retirement

• The expectation is the CPU will be able to retire one instruction every cycle most of the time, so it will also be advantageous to be able to report the number of back to back retirements without stall

• Based on this, the basic reported unit (or group) should be a pair of numbers: the first being the number of contiguous retirements, the second being the number of stall cycles that follow

• For CPUs that support Multiple retirement, need to be able to report details of how many instructions retire per cycle as well

Retirement Delta Trace

10 December 2019 4Gajinder Panesar/Iain Robertson UL-003128-PT

Page 5: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• Key is to efficiently represent a sequence of count values

• Various encoding formats were investigated

• Elias Delta[2] is most efficient but for values under 32 Elias Gamma has the same or better efficiency

• Coding number of cycles so large numbers will be less frequent so coding efficiency is less important

• Gamma will result in more efficient coding when it matters

• Amortising short stalls below some threshold will also improve efficiency

Retirement Delta Trace

10 December 2019 5Gajinder Panesar/Iain Robertson UL-003128-PT

[2] https://commons.wikimedia.org/wiki/File:Fibonacci,_Elias_Gamma,_and_Elias_Delta_encoding_schemes.GIF

Page 6: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

Current Hart to Trace Encoder Interface

10 December 2019 Gajinder Panesar/Iain Robertson UL-003128-PT 6

Signal Function

iretire Instruction has retired

itype [ITYPELEN-1:0] Type of instruction

cause [CAUSELEN-1:0] Exception cause

tval [XLEN-1:0] Exception data

priv [PRIVLEN-1:0] Privilege mode during execution

iaddr [XLEN-1:0] The address of the instruction

This shows the mandatory interface signals.

Optional information can include the instruction context.

The interface supports side-band signals, for example, to indicate core is in reset, halted, or stalled.

CPU Core(s)

Trace Encoder

Trace Decoder

(SW)

Debug Infrastructure

Page 7: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

Extra Hart to Trace Encoder Interface Signals

10 December 2019 Gajinder Panesar/Iain Robertson UL-003128-PT 7

Optional for multi-retirement CPUs for improved efficiency

CPU Core(s)

Trace Encoder

Trace Decoder

(SW)

Debug Infrastructure

Signal Function

caretire[CARETIRE-1:0] Number of half code words retired in this block

Page 8: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• Packets comprise packed groups of count tokens. Five token types are defined:

1. Token A – Elias gamma encoded count of contiguous retire or amortised stall cycles

2. Token B – Elias gamma encoded count of missed retirement opportunities

3. Token C – Elias gamma encoded count of unamortised contiguous stall cycles

4. Token D – Count of retirements in a single cycle (usually binary, though Elias gamma is an option)

5. Token E – Elias gamma encoded count of number of cycles to next non-stall cycle

• Tokens are assembled into 3 group formats:

• {A, C}

• {A, B, C}

• {D, E}

This ensures most efficient packing to reduce data being routed on and subsequently transported off-chip

Trace Encoder Output

10 December 2019 8Gajinder Panesar/Iain Robertson UL-003128-PT

Page 9: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• Basic retirement delta for single retirement harts

• Uses packed groups of {A, C} tokens

Trace Encoder Output – single retirement example

10 December 2019 9Gajinder Panesar/Iain Robertson UL-003128-PT

R R R R S R R R S R R S S S S R

A=4, C=1(6 bits)

A=3, C=1(4 bits)

A=2, C=4(8 bits)

Group 1 Group 2 Group 3

R retire S stall

• Three groups, 18 bits in total (2 bits per instruction)

Page 10: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• Amortise single stalls with retirements

• Uses groups of {A, B, C} tokens

Trace Encoder Output – stall amortisation example

10 December 2019 10Gajinder Panesar/Iain Robertson UL-003128-PT

R R R R S R R R S R R S S S S R

A=12, B=3, C=3(13 bits)

Group 1

R retire S stall

• One group, 13 bits in total – more efficient (1.4 bits per instruction)

• Precise location of single stalls not known – less accurate

Page 11: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• Basic retirement delta for multi retirement harts (4 instructions per cycle in this example)

• Uses packed groups of {A, B, C} tokens

Trace Encoder Output – multi retirement example

10 December 2019 11Gajinder Panesar/Iain Robertson UL-003128-PT

R1 R4 R3 R2 S R4 R4 R4 S R1 R1 S S S S R

A=4, B=6, C=1(11 bits)

A=3, B=0, C=1(5 bits)

A=2, B=6, C=4(13 bits)

Group 1 Group 2 Group 3

R3 retire 3 instructions S stall

• Three groups, 29 bits in total (1.2 bits per instruction)

• With single stalls amortised: A=12, B=24, C=3; 19 bits (0.8 bits per instruction)

Page 12: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• Per cycle detail for multi retirement harts (4 instructions per cycle in this example)

• Uses packed groups of {D, E} tokens

Trace Encoder Output – intra-cycle example

10 December 2019 12Gajinder Panesar/Iain Robertson UL-003128-PT

R1 R4 R3 R2 S R4 R4 R4 S R1 R1 S S S S R

D=1E=1(3b)

G 1

R3 retire 3 instructions S stall

• Ten groups, 30 bits in total (1.25 bits per instruction)

• Highest precision detail of retirement activity

D=4E=1(3b)

G 2

D=3E=1(3b)

G 3

D=2E=2(5b)

G 4

D=4E=1(3b)

G 5

D=4E=1(3b)

G 6

D=4E=2(5b)

G 7

D=1E=1(3b)

G 8

D=1E=5(7b)

G 9

Page 13: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

Retirement Delta Trace Algorithm

• Shows the {A,C} token case

• Some boundary conditions omitted for clarity

• {A,B,C} and {D,E} cases are similar

10 December 2019 Gajinder Panesar/Iain Robertson UL-003128-PT 13

Y

Retired? C_Count > 0

Room in

Packet?C_Count++

Convert counts to Elias

Send packet

Init new packet

Add group to packet

Reset counts to 0A_Count++

Y Y

N

N

N

Page 14: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• Controlling when trace is generated is important.

• Helps reduces volume of trace data

• Filters are required.

• Using filters the following trace examples are available:

• Trace in an address range

• Start trace at an address end trace at an address

• Trace particular privilege level

• Trace interrupt service routines

• Other examples

• Trace for fixed period of time

• Start trace when external (to the encoder) event detected

• Stop trace when an external (to the encoder) event detected

Trace Control

10 December 2019 14Gajinder Panesar/Iain Robertson UL-003128-PT

Page 15: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

10 December 2019 Gajinder Panesar/Iain Robertson UL-003128-PT 15

Holistic System

Page 16: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

xtensa

Monitoring, Reliability, and Security for the Whole SoC

4 December 2019 16Commercial in Confidence UL-003131-PT

Interconnect (AXI, ACE, ACE-lite, OCP, CHI NoC)

GPUDRAM

controllerCustom

Logic

BusMon

Trace Receiver

PAMTrace

EncoderPAM

Static Instrumentation

DMAStatus

Monitor

Message Engine Message Engine Message Engine

Message Engine

AXIComm

JTAGComm

USB 2Comm

AuroraComm

Portfolio of Analytic Modules

Family ofCommunicators

Flexible & Scalable

Message Fabric

System Block

UltraSoC IP

DSP

SystemMemory

Buffer

AnalysisSub

System

Bus Sentinel

UltraSoC IP With Lock

EBCComm

USB 3 controller

ETH controller

LockstepBus Sentinel

Page 17: Every Cycle Counts - RISC-V · Iain Robertson, VP Engineering, UltraSoC Iain.robertson@ultrasoc.com RISC-V Summit December 2019 . 10 December 2019 Gajinder Panesar/Iain Robertson

• Today’s systems are too hard to understand with yesterday’s tools

• The systems must do what they were designed to do: safely and securely

• Understanding system behaviour is key : both in-field and realtime

• One dimension to this is to understand where and how CPUs stall

• An encoder providing this has been presented

• A number of different filtering and triggering schemes are supported

• Coupling this with a holistic non-intrusive monitoring infrastructure provides the means of understanding complete SoC behaviour

Summary

10 December 2019 17Gajinder Panesar/Iain Robertson UL-003128-PT