Upload
evangeline-pugh
View
42
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Introduction to Systems Research at SFU. Dr. Alexandra Fedorova August 2007. Introduction. Systems: software systems, hardware systems, the interaction between them - PowerPoint PPT Presentation
Citation preview
Dr. Alexandra FedorovaAugust 2007
Introduction to Systems Research at SFU
2CMPT 401 Summer 2007 © A. Fedorova
Introduction
• Systems: software systems, hardware systems, the interaction between them
• New research area at SFU, before December 2006 there were no faculty members at SFU doing systems research (not counting networking)
• Research opportunities at undergraduate and graduate level:– Undergraduate honours thesis– CMPT 415– Paid research assistanships– Master’s and Ph.D.
3CMPT 401 Summer 2007 © A. Fedorova
What is Systems Research?
• System – a collection of software and hardware components that accomplish a certain goal
• Usually this does not include applications, but includes system software:– The operating system– System libraries
• Systems research concerns with building these components and structuring their interaction
4CMPT 401 Summer 2007 © A. Fedorova
Systems Research at SFU
System software design for chip multithreading
processors
Computer Architecture
Distributed Systems
5CMPT 401 Summer 2007 © A. Fedorova
System Software Design for Chip Multithreading Processors
• What is chip multithreading?• Why is this research relevant?• What research problems are we addressing?
6CMPT 401 Summer 2007 © A. Fedorova
Chip Multithreading (CMT)• Conventional processor: one
software thread runs on a chip at a given instant:
Level-1cache
A CHIP
Level-2 cache
• CMT processors: multiple threads runs on the same chip simultaneously:
7CMPT 401 Summer 2007 © A. Fedorova
CMT: The Dominant Architecture
• Most new processors are CMT:– Intel: 100% of new server processors and 90% of high-
performance desktop processors are CMT by the end of 2007• All major hardware vendors are in the CMT business:
– Sun Microsystems Niagara (32 threads on the chip)– IBM Power4, Power5, Power6– Intel Hyper-threaded Xeon (servers, desktops)– Intel Core Duo (desktops and laptops)– Dell Quad core systems (2x Intel Dual-core processors)– AMD Quad core (coming up in Fall 2007)
8CMPT 401 Summer 2007 © A. Fedorova
Why CMT?
• Running one thread per chip is inefficient• Due to nature of modern applications, computational hardware is
underutilized– Modern applications spend 50-60% of their CPU time accessing
memory– While memory is accessed CPU pipeline is stalled – it is idle, not
doing anything useful– But while it is stalled, CPU is still consuming power– So there’s power waste with no benefit
• Idea behind CMT: while one thread stalls the pipeline, let another thread use it– Sort of like overlapping I/O and computation but at the micro
level
9CMPT 401 Summer 2007 © A. Fedorova
CMT: More Efficient CPU Utilization
time
1:add 2:subtract 4:load data from memory3:load data from cache
stall the pipeline2:add1:load data
from memory3:subtract
thre
ad
1
4:add
thre
ad
0
Stall the pipeline
Pipeline is busy
10CMPT 401 Summer 2007 © A. Fedorova
How to Enable CMT?
• How to enable running multiple threads on the same chip? – Hardware multithreading– Multicore processing– Combination of the two
11CMPT 401 Summer 2007 © A. Fedorova
Hardware Multithreading• Run at least two threads on the same
processing core• Some hardware is duplicated, some is
shared• Shared hardware:
– Pipeline: i.e., functional units, register files, queues
– Caches: Level-1 (L1) instruction and data caches, Level-2 (L2) unified cache
– Interconnects• Multithreaded processors:
– Intel Hyper-threaded Xeon– IBM Power5, Power6, Cell– Sun Microsystems Niagara
Level-1cache
A CHIP
Level-2 cache
12CMPT 401 Summer 2007 © A. Fedorova
Multicore Processing
• Multiple processing cores on the same chip
• Threads share the L2 cache (and other lower-level caches), and interconnects
• Multicore processors:– Intel Core Duo– AMD Quad Core– IBM Power4, 5, 6– Sun Microsystems Niagara
L1cache
A CHIP
L1cache
L2 cache
13CMPT 401 Summer 2007 © A. Fedorova
Multicore + Multithreading
• A multicore processor• Each core is multithreaded
• Multicore and multithreaded processors:– Sun Microsystems
Niagara– IBM Power5, Power6
L1cache
A CHIP
L1cache
L2 cache
14CMPT 401 Summer 2007 © A. Fedorova
Research on CMT Processors
• Computer architecture research:– How to design a CMT processor to achieve a good combination of:
CPU utilization, application performance, power efficiency• System software research:
– How to design system software, i.e., the operating system, that enables applications to perform well on these processors?
15CMPT 401 Summer 2007 © A. Fedorova
OS Design for CMT Processors
• Operating systems are traditionally responsible for the allocation of hardware resources
• On CMT processors, on-chip resources are shared among threads that run simultaneously
• How you allocate those resources among threads determines the performance that those threads will achieve
• Let’s look at a few examples…
16CMPT 401 Summer 2007 © A. Fedorova
Constructing Optimal Co-schedules
L1cache
A CHIP
L1cache
L2 cache
• Blue suffers when it does not have enough L1 cache,
• Red uses lots of L1 cache• Green does not use much L1 cache• Yellow does not suffer when it does
not have much L1 cache
17CMPT 401 Summer 2007 © A. Fedorova
Constructing Optimal Co-schedules (cont.)
• How do we find out applications’ cache behaviour?– Turns out you need to consider memory access patterns - this is
not trivial to measure• How do you model interactions among applications?
– How do you know if one application’s cache usage patterns are incompatible with another’s?
• These patterns/relationships cannot be measured directly• Can they be modeled?
– Simple models are inaccurate– Complex models are too inefficient to use inside an operating
system scheduler• Approach of my group: use learning methods, feedback-directed
scheduling
18CMPT 401 Summer 2007 © A. Fedorova
Heterogeneous Multicore Systems
• One size does not fit all– Application class A runs best on
core with feature set X– Application class B runs best on
core with feature set Y• Rather than designing a
homogeneous multicore system that attempts to satisfy everyone but satisfies no one, design a heterogeneous multicore system (HMC)
L1cache
A CHIP
L1cache
L2 cache
19CMPT 401 Summer 2007 © A. Fedorova
Scheduling On HMC Systems
L1cache
Core 1
A CHIP
L1cache
Core 2
L2 cache
Set A: Want to run on Core 1
Set B: Want to run on Core 2
20CMPT 401 Summer 2007 © A. Fedorova
Scheduling On HMC Systems
• If you schedule all threads in Set A on their preferred core, those threads will suffer from:– Low amount of CPU time– High response time
• Because there is high demand for that core, and they’d have to share it with others
• So you might want to schedule threads on their non-preferred core once in a while
• How do you balance between performance, fair CPU allocation and good response time?
21CMPT 401 Summer 2007 © A. Fedorova
Summary
• CMT systems are new and cool, yet prevalent enough for people to care about them
• Companies are desperate to hire students with experience on CMT systems
• If you are thinking about academic career: new and hot research area– Many problems– Many opportunities to publish
• Talk to me if you are interested in research opportunities• Tell your friends who might be interested