21
§ Georgia Institute of Technology, Intel Corporation Cache Coherence Support for Non-Shared Bus Architecture on Heterogeneous MPSoCs Taeweon Suh Taeweon Suh § , , Daehyun Kim Daehyun Kim , , and Hsien-Hsin S. Lee and Hsien-Hsin S. Lee § June 15, June 15, 2005 2005

Cache Coherence Support for Non-Shared Bus Architecture on Heterogeneous MPSoCs

  • Upload
    berg

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Cache Coherence Support for Non-Shared Bus Architecture on Heterogeneous MPSoCs. Taeweon Suh § , Daehyun Kim † , and Hsien-Hsin S. Lee § June 15, 2005. § Georgia Institute of Technology, † Intel Corporation. MPSoCs. Time-to-Market Flexibility Low cost - PowerPoint PPT Presentation

Citation preview

Page 1: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

§ Georgia Institute of Technology, † Intel Corporation

Cache Coherence Support for Non-Shared Bus

Architecture on Heterogeneous MPSoCs

Cache Coherence Support for Non-Shared Bus

Architecture on Heterogeneous MPSoCs

Taeweon SuhTaeweon Suh §§, , Daehyun Kim Daehyun Kim ††, , and Hsien-Hsin S. Lee and Hsien-Hsin S. Lee §§

June 15,June 15, 20052005

Page 2: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

2

MPSoCsMPSoCs

IP IP

IP

ADC

MemoryController

uP

Time-to-MarketTime-to-Market FlexibilityFlexibility Low costLow cost

– Share memory Share memory interface to reduce pin interface to reduce pin countcount

– However, shared bus However, shared bus arch. hinders the arch. hinders the versatility provided by versatility provided by each processoreach processor

– Non-Shared bus arch.Non-Shared bus arch. Real-time propertyReal-time property

– communication communication between processorsbetween processors

Wireless IP

Memory

SDRAM

uP

DSP

Page 3: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

3

IntroductionIntroduction

Cache CoherenceCache Coherence– Well known technique for data consistency for Well known technique for data consistency for

multiprocessor systems multiprocessor systems

ProtocolStates

ModifiedExclusiveOwnedSharedInvalid

P0

D$ (MOESI)

Memory

P1

D$ (MOESI)

1234

Example operation sequence

E 1234S 1234 S 1234

shared

M abcd

invalidate

I 1234

cache-to-cache

O abcdS abcd P0: readP1: readP1: write (abcd)P0: read

I ----- I -----

Page 4: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

4

MemoryController

Wrapper 0

Proc 0(MSI)

Bus

Wrapper 1

Proc 1(MESI)

Shared-signal assertion

Previous WorkPrevious Work

Integration techniques for Integration techniques for shared-busshared-bus based based platform platform [1][2][3][1][2][3]

[1] Taeweon Suh, Douglas M. Blough, and Hsien-Hsin S. Lee, Supporting cache coherence in heterogeneous multiprocessor systems, In DATE’04, Feb. 2004 [2] Taeweon Suh, Hsien-Hsin S. Lee, and Douglas M. Blough, Integrating cache coherence protocols for heterogeneous multiprocessor systems, Part 1, In IEEE Micro, July/August 2004 [3] Taeweon Suh, Hsien-Hsin S. Lee, and Douglas M. Blough, Integrating cache coherence protocols for heterogeneous multiprocessor systems, Part 2, In IEEE Micro, September/October 2004

MemoryController

Wrapper 0

Proc 0(MEI)

Bus

Wrapper 1

Proc 1(MESI)

Read-to-write conversion

Read

Shared

Read/Write

Write

MemoryController

Wrapper 0

Proc 0(MEI)

Bus

Snoop-hit Buffer (single cache line)

Wrapper 1

Proc 1(MESI)

Snoop-hit buffer

Write-back

To memory

Read Read

Page 5: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

5

ProposalProposal

CCache ache CCoherence-enforced oherence-enforced MMemory emory CController ontroller (ccMC) for Non-Shared bus based MPSoCs(ccMC) for Non-Shared bus based MPSoCs– Bypass approachBypass approach– Bookkeeping approachBookkeeping approach

Integration of invalidation-based protocols such Integration of invalidation-based protocols such as MEI, MSI, MESI, and MOESIas MEI, MSI, MESI, and MOESI

ccMCBus 0

Proc 1(MEI)

Bus 1

Proc 0(MESI)

Memory

MPSoC

Page 6: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

6

Bypass ApproachBypass Approach

Blindly pass bus transactions if in shared Blindly pass bus transactions if in shared rangerange

Very inexpensive in terms of silicon areaVery inexpensive in terms of silicon area

ccMCBus 0

Proc 1(MEI)

Bus 1

Proc 0(MESI)

Memory

MPSoC

ccMC

Bus 0 Bus 1

Start_addr_reg

Range_reg

Snoop-hit buffer

mux

comparatorBus request 0

1 addr.

Page 7: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

7

Bookkeeping ApproachBookkeeping Approach

Selectively pass bus transactions if in shared Selectively pass bus transactions if in shared rangerange

Expensive compared to bypass approachExpensive compared to bypass approach

ccMCBus 0

Proc 1(MEI)

Bus 1

Proc 0(MESI)

Memory

MPSoC

ccMC

Bus 0 Bus 1

Snoop-hit buffer

Bus request

if M

Start_addr_reg

Range_reg

addr.I I

S I

S S

M I

I I

I I

StatesP0 P1 if inside

shared range

•••

Page 8: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

8

MPSoC

ccMC

Bus 0

Proc 1(MESI)

Bus 1

Proc 0(MSI)

Memory

I I

I I

P0 P1

ExampleExample

Bookkeeping approachBookkeeping approach

P1: readP1: write (abcd)P0: read

Example operation sequence

S

S

M -------- 1234abcd

S

S abcd

sharedinvalidate

M

Breq

abcd1234

S

S

Page 9: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

9

Integration with no-coherence support processorIntegration with no-coherence support processor

No-coherence support processors work like No-coherence support processors work like having MEI w/o snooping: MEI-like integrated having MEI w/o snooping: MEI-like integrated protocolprotocol

Interrupt is used to inform possible snoop-hitsInterrupt is used to inform possible snoop-hits

ccMCBus 0

Proc 1(no hardware

support)

Bus 1

Proc 0(MESI)

Memory

MPSoCIRQ

Page 10: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

10

Simulation ModelSimulation Model

Atalanta Atalanta [4][4] RTOS RTOS– Home-grown RTOS in Georgia Tech Home-grown RTOS in Georgia Tech – Designed for heterogeneous multiprocessor Designed for heterogeneous multiprocessor

SoCsSoCs Atalanta kernel simulationAtalanta kernel simulation

– Task insertion/deletionTask insertion/deletion– Tasks are managed in TCB (Task Control Block)Tasks are managed in TCB (Task Control Block)– TCBs are connected through doubly-linked listTCBs are connected through doubly-linked list– Each other’s TCB is accessible by other Each other’s TCB is accessible by other

processorprocessor– Update the highest priority TCB, waiting for Update the highest priority TCB, waiting for

system objects such as semaphore, when a system objects such as semaphore, when a system object is readysystem object is ready[4] Di-Shi Sun, Douglas M. Blough, and Vincent J. Mooney, A New Multiprocessor RTOS

Kernel for System-on-a-Chip Applications. Technical Report GIT-CC-02-09, CERCS

Page 11: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

11

Simulation EnvironmentSimulation Environment

ProcessorsProcessors– Platform1: PPC755 (MEI) + ARM9 with MESIPlatform1: PPC755 (MEI) + ARM9 with MESI– Platform2: ARM9 with MSI + ARM9 with MESIPlatform2: ARM9 with MSI + ARM9 with MESI

Simulators: Seamless CVE + ModelSimSimulators: Seamless CVE + ModelSim

ccMCBus 0

Proc 1

Bus 1

Proc 0

Memory

DMA0 DMA1

100MbpsEthernet

320X240LCD

controller

Page 12: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

12

Simulation Results Simulation Results

Bypass Approach: 2 tasks on each processorBypass Approach: 2 tasks on each processor

10 15 20 25 30 35 40 451.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

platform 2 (MSI-MESI): bypass with snoop-hit buffer platform 2 (MSI-MESI): bypass

Sp

eed

up

ove

r so

ftw

are

solu

tio

n

Miss penalty (cycles)

platform 1 (MEI-MESI): bypass with snoop-hit buffer platform 1 (MEI-MESI): bypass

Page 13: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

13

Simulation Results Simulation Results

Bypass Approach: 32 tasks on each Bypass Approach: 32 tasks on each processorprocessor

10 15 20 25 30 35 40 45

3

4

5

6

7

platform 2 (MSI-MESI): bypass with snoop-hit buffer platform 2 (MSI-MESI): bypass

Sp

eed

up

ove

r so

ftw

are

solu

tio

n

Miss penalty (cycles)

platform 1 (MEI-MESI): bypass with snoop-hit buffer platform 1 (MEI-MESI): bypass

Page 14: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

14

Simulation Results Simulation Results

Bookkeeping ApproachBookkeeping Approach– Platform 2, Miss penalty 14 cyclesPlatform 2, Miss penalty 14 cycles– Microbench simulationMicrobench simulation

0 20 40 60 80 1000.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

1.12

Sp

eed

up

ove

r th

e b

ypas

s ap

pro

ach

Bus utilization attempt by DMAs (percent)

accessed cache lines 1 2 4 8 16 32

Page 15: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

15

Conclusions Conclusions

Proposed integration techniques for cache Proposed integration techniques for cache coherence on coherence on Non-shared bus based-MPSoCsNon-shared bus based-MPSoCs– Bypass approach, Bookkeeping approachBypass approach, Bookkeeping approach

Bypass approachBypass approach– Blindly pass shared memory operationsBlindly pass shared memory operations– Very cheap in terms of silicon areaVery cheap in terms of silicon area

Bookkeeping approachBookkeeping approach– Selectively pass shared memory operationsSelectively pass shared memory operations– Expensive compared to bypass approachExpensive compared to bypass approach

Effective solutions for communication as more Effective solutions for communication as more and more heterogeneous processors are and more heterogeneous processors are integrated in a single chipintegrated in a single chip

Page 16: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

16

Questions, Comments?Questions, Comments?

Thanks for your attention!

Page 17: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

17

Backup Slides

Page 18: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

18

MotivationMotivation

Embedded systems more and more require Embedded systems more and more require heterogeneous processors on a chip according heterogeneous processors on a chip according to applications needsto applications needs

Efficient communication is imperative to meet Efficient communication is imperative to meet real-time property of embedded applications real-time property of embedded applications

Shared-bus architecture using AMBA, Shared-bus architecture using AMBA, CoreConnect compromises the versatility CoreConnect compromises the versatility provided by each processorprovided by each processor

Pin count restricts to use dedicated memory Pin count restricts to use dedicated memory interface for each processor on SoCsinterface for each processor on SoCs– Commercial MP SoCs such as TI’ OMAP and Commercial MP SoCs such as TI’ OMAP and

Philip’s Nexperia employ Non-shared bus Philip’s Nexperia employ Non-shared bus architecture sharing memory interface architecture sharing memory interface (check (check Nexperia)Nexperia)

Page 19: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

19

MPSoC

ccMC

Bus 0

Proc 1(MESI)

Bus 1

Proc 0(MSI)

Memory

I I

I I

P0 P1

Bookkeeping Approach (cont’d)Bookkeeping Approach (cont’d) Problem with E-stateProblem with E-state

P1: readP1: writeP0: read

Example operation sequence

E

E

M

1234

-------- 1234abcd

E

E 1234

Page 20: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

20

MPSoC

ccMC

Bus 0

Proc 1(MESI)

Bus 1

Proc 0(MSI)

Memory

I I

I I

P0 P1

Bookkeeping Approach (cont’d)Bookkeeping Approach (cont’d) Solution: Prohibit E-state (shared signal Solution: Prohibit E-state (shared signal

assertion)assertion)

P1: readP1: writeP0: read

Example operation sequence

S

S

M -------- 1234abcd

S

S abcd

sharedinvalidate

M

Breq

abcd1234

S

S

Page 21: Cache Coherence Support  for Non-Shared Bus Architecture  on Heterogeneous MPSoCs

21

Previous Work (cont’d)Previous Work (cont’d)

Snoop-hit Buffer Snoop-hit Buffer [2][3][2][3]

RRegion-egion-BBasedased CCache ache CCoherence (RBCC) oherence (RBCC) [2][3][2][3]

[2] Taeweon Suh, Hsien-Hsin S. Lee, and Douglas M. Blough, Integrating cache coherence protocols for heterogeneous multiprocessor systems, Part 1, In IEEE Micro, July/August 2004 [3] Taeweon Suh, Hsien-Hsin S. Lee, and Douglas M. Blough, Integrating cache coherence protocols for heterogeneous multiprocessor systems, Part 2, In IEEE Micro, September/October 2004

MemoryController

Wrapper 0

Proc 0(MEI)

Bus

Snoop-hit Buffer (single cache line)

Wrapper 1

Proc 1(MESI)

Snoop-hit buffer

Write-back

To memory

Read Read

MemoryController

Wrapper 2

Proc 0(MEI)

Bus

Wrapper 1

Proc 1(MESI)

RBCC

Wrapper 0

Proc 0(MESI)

MESIMEI