Tutorial06 Solution

Embed Size (px)

Citation preview

  • 8/12/2019 Tutorial06 Solution

    1/16

    Technische Universitt Mnchen

    Chip Multicore ProcessorsTutorial 6

    Institute for Integrated Systems

    Theresienstr. 90

    Building N1www.lis.ei.tum.de

    S. Wallentowitz

  • 8/12/2019 Tutorial06 Solution

    2/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 2S. Wallentowitz

    Task 6.1: Cache Misses

    Explain the 3 Cs of cache misses. How do they relate?

  • 8/12/2019 Tutorial06 Solution

    3/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 3S. Wallentowitz

    3 Cs Model

    Compulsory :Very first access to a data blockCannot be avoided

    Capacity :Restricted size leads to miss after replacementWould not have happened if cache was unlimited

    Conflict :Miss occurs due to prior replacementWould not have happened with full-associative caches

  • 8/12/2019 Tutorial06 Solution

    4/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 4S. Wallentowitz

    Example for Contribution of each of the Cs

    (c) Rochester Institute of Technology

  • 8/12/2019 Tutorial06 Solution

    5/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 5S. Wallentowitz

    Task 6.2: Multilevel Caches and separated instruction and

    data caches

    In this tutorial we focus on the impact of the cache hierarchy on theperformance of the system.

    a) You have a simple system with a processor core, no caches and anexternal memory. The on-chip interconnect between the processor core and

    the memory requires 3 + x cycles for the transfer of x data words. Thememory requires 46 cycles to access data. By using a benchmark you findthat on average each fifth instruction accesses data. Your processor has a

    CPI of 1.8.

    What is the CPI of your whole system?

  • 8/12/2019 Tutorial06 Solution

    6/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 6S. Wallentowitz

    Simple System

    Proc.Core Memory

  • 8/12/2019 Tutorial06 Solution

    7/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 7S. Wallentowitz

    b)

    Proc.Core Memory

    You are using a direct-mapped cache of 32 kB and with cache blocks of 4words (each 32 bit). The cache is accessed in one clock cycle. For your

    application you measure a miss rate of 5%. How does the CPI change?

    Cache

  • 8/12/2019 Tutorial06 Solution

    8/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 8S. Wallentowitz

    c)

    Proc.Core

    Impressed by the improvement you add a second instance of this cache.How does the CPI change?

    Cache MemoryCache

  • 8/12/2019 Tutorial06 Solution

    9/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 9S. Wallentowitz

    Local vs. Global Miss Rate

  • 8/12/2019 Tutorial06 Solution

    10/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 10S. Wallentowitz

    d)

    Alternatively, you can use two other caches, each of total 256 kB. One ofthe caches is 2-way associative and has an access time of 4 cycles. Theother cache is 4-way associative and has an access time of 6 cycles. The

    global miss rate is 4% for the first cache and 3% for the second cache.

    Which cache should be chosen?

  • 8/12/2019 Tutorial06 Solution

    11/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 11S. Wallentowitz

    Proc.Core MemoryCache

    256 kB2-way

    4% global miss rate4 cycles

    Cache

    32 kBDirect-mapped

    5% global miss rate1 cycle

  • 8/12/2019 Tutorial06 Solution

    12/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 12S. Wallentowitz

    Proc.Core MemoryCache

    256 kB4-way

    3% global miss rate6 cycles

    Cache

    32 kBDirect-mapped

    5% global miss rate1 cycle

  • 8/12/2019 Tutorial06 Solution

    13/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 13S. Wallentowitz

    e)

    Explain the meaning of spatial and temporal localityin the context of instructions and data.

  • 8/12/2019 Tutorial06 Solution

    14/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 14S. Wallentowitz

    Locality of data

  • 8/12/2019 Tutorial06 Solution

    15/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 15S. Wallentowitz

    Locality of instructions

  • 8/12/2019 Tutorial06 Solution

    16/16

    Technische Universitt Mnchen

    Institute for Integrated SystemsChip Multicore Processors Tutorial 6 16S. Wallentowitz

    f)Based on your previous findings, exchange the level one cache with separatecaches for instructions and data. The miss rate for instructions is 3% and for

    data 8%. Use the level 2 cache from d) and calculate the CPI value.