View
303
Download
3
Category
Tags:
Preview:
DESCRIPTION
Invited talk by Lamia Youseff at SRCS 2012 - sponsored by the Awareness Initiative: www.aware-project.eu
Citation preview
1
Self-‐awareness and Adap/ve Technologies: the future of opera/ng systems?
Lamia Youseff*
fos team: Nathan Beckmann, Harshad Kasture, Charles Gruenwald III, Adam Belay, David Wentzlaff**, Lamia Youseff,
Jason Miller, Anant Agarwal And e-‐fos team in collabora/on with the Angstrom team.
Live fos web server - http://fos.csail.mit.edu * Lamia worked on fos while she was a postdoctoral associate with the Carbon group at CSAIL, MIT. ** David worked on fos while he was a Ph.D. candidate with the Carbon group at CSAIL, MIT.
Emerging Computer Architectures Provide Opportuni/es for OS n Clouds
n Huge compu/ng power n Scalability with a “click”
n Mul/core
n Large number of cores n Very fast on-‐chip core-‐to-‐core communica/on
2 All logos and trademarks appearing in this presentation are the property of their respective owners.
What is Wrong with Current OSs?
n Unclear how to scale contemporary OSs on Mul/core and Manycore
n Programming on an Infrastructure as a Service (IAAS), Amazon EC2 style, Cloud is a challenge
3
Linux Does not Scale – Page Alloca/on Example n Physical Page alloca/on (toy example)
n Allocate physical pages as quickly as possible n Examine the scalability of where the /me goes
n Precision /mers & Oprofile n 16 core: quad – quad Intel server
4
ON EACH CORE: void main(void) {
char * big_array; long long count; big_array = malloc(MEMORY_SIZE_IN_MB * MB_IN_BYTES); for (count = 0; count < ((MEMORY_SIZE_IN_MB * MB_IN_BYTES)/PAGE_SIZE_IN_BYTES); count++) { big_array[count * PAGE_SIZE_IN_BYTES] = count % 256; }
}
5
Lock Conten/on Dominates Run/me
Number of Cores
Cyc
les
(in B
illio
ns)
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
get_page_from_freelist
__pagevec_lru_add_active
__rmqueue_smallest
free_pages_bulk
rmqueue_bulk
main
page_fault
handle_mm_fault
other
clear_page_c
Lock contention
Architectural overhead
Useful work
6
… …
Problem with Current Day Clouds
Independent Linux VM instances cobbled together in an ad-‐hoc manner at applica/on layer
Network Switch
App App App
Linux VM
Linux VM
App
Linux VM
App
SMP Linux VM
Linux VM
…
App App App
…
App App App
Core 0 Core 2 Mul/core Blade
Hypervisor Hypervisor
Core 0 Core 1 Core 2 Mul/core Blade
Core 1 Core 0 Core 2 Mul/core Blade
Hypervisor
Core 1
Linux VM
Linux VM
Linux VM
Core 0 Core 2 Mul/core Blade
Hypervisor
Core 1
Linux VM
Linux VM
Linux VM
7
OSes Need to be Rethought
n Shared memory and locks lead to poor scalability
n Non-‐composable due to locks – hard to modify or add services
n OS and applica/on fight for in-‐core cache
n OS relies on shared memory, which does not work across a cloud
n No resilience to faults; single failure
causes OS to crash
Manycore moving from a few cores to 100’s of cores
Outline n fos Introduc/on
n Vision, Structure and Architecture n Anatomy of a malloc()
n Scalability in fos n Fleets of servers n Example: File System Stack
n Why adaptable and self-‐aware OS services? n Adaptability in fos
n Hybrid Messaging System n Elas/c Fleets
n Enabling Automa/c Self Awareness in fos 8
9
What is fos?
n Scalability and reliability for future mul/cores with 100’s to 1000’s of cores
n Provides single-‐system image across cloud machines
n Dynamically grow and shrink services to meet applica/on demands
Core 0 Core 0 Core 0 Core 0 Core 2 Core 0 Core 0 Core 0 Core 0 Core 2
fos
App
Mul/core Blade Core 1
Hypervisor Hypervisor
Core 1 Mul/core Blade
A novel elas/c opera/ng system for mul/cores and clouds
Provides typical OS services: n Process management, scheduling of mul/ple applica/ons, virtual memory
management, protec/on, file systems, networking Also provides special mul/core services
n Process migra/on, fault resilience, dynamic goal op/miza/on, elas/c resource alloca/on/dealloca/on
An Opera)ng System as a Service (OSaaS)
10
fos Structure
I/O
PS PS PS
PS PS PS
FS
User App
FS
FS Need new page
File read
n Internet-‐inspired structure, based on messaging. n OS is collec/on of services (e.g. name service, page service PS, file service FS) n Each service is implemented by a fleet of distributed servers n Each server is bound to a core n Applica/on cores message a par/cular server core to u/lize service n Server cores collaborate/communicate to implement needed OS service
Core
11
fos Architecture
n Microkernel n Executes on all cores n Provides protec/on mechanism
but not policy n Provides memory management
mechanism, not policy n Naming
n Translates symbolic names to physical des/na/on
n Standard high-‐level API hides loca/on
n Provides one-‐to-‐many maps n Provides redirec/on n Provides resilience when a server
crashes
PS PS
PS PS PS
FS FS
app2
blk
FS
NS
File System Server, fleet member Block Device Driver Process Management server, fleet member Name server, fleet member Network Interface
Applications Page allocator, fleet member
app3 app1
app4 NS
NS
PS
app5
Name Server microkernel
nif
File system server
microkernel
Applica/on 5 microkernel
libfos
FS
NS
12
fos Architecture
n System Servers n Internet Inspired Servers n Fleets: Co-‐opera/ng processes
that provide a service n Communicate only via
messaging n Facilitates transparent
communica/on inter-‐machine or intra-‐machine
n Easily move server processes anywhere
n Messaging n uk level messaging for core-‐to-‐
core communica/on. n Light-‐weight user space
messaging. n Inter-‐machine vs intra-‐machine
communica/on.
PS PS
PS PS PS
FS FS
app2
blk
FS
NS
File System Server, fleet member Block Device Driver Process Management server, fleet member Name server, fleet member Network Interface
Applications Page allocator, fleet member
app3 app1
app4 NS
NS
PS
app5
Name Server microkernel
nif
File system server
microkernel
Applica/on 5 microkernel
libfos
FS
NS
Anatomy of a malloc Call in fos
Core 1 Core 2
Microkernel
Applica/on
libc
Mul/core Blade
Hypervisor Core 1 Core 2
Proxy-‐Network Server
Mul/core Blade
Namecache
Microkernel
Namecache
Microkernel
Namecache
libfos
Core 3
Microkernel Namecache
Hypervisor
Proxy-‐Network Server
fos Server (b) Paging System Server
Microkernel
2
3
4
5
6
1
Namecache
msg msg
Anatomy of a malloc Call in fos when Server is on Remote Machine in the Cloud
Core 1 Core 2
Microkernel
Applica/on
libc
Mul/core Blade
Hypervisor Core 1 Core 2
Proxy-‐Network Server
Mul/core Blade
Namecache
Microkernel
Namecache
Microkernel
Namecache
libfos
Core 3
Microkernel Namecache
Hypervisor
Proxy-‐Network Server
fos Server (a) Paging System Server
Microkernel
2
3
45
6
7
8
9
10
11
1
Namecache
msg msg msg msg
15
n The key three design guidelines n Inspired by the Internet, OS is built as collec/on of services (e.g., naming and resource discovery, scheduler, file system)
n Each service is fully distributed with no global shared state or locks – implemented as a collec/on of coopera/ng servers
n OS servers are bound to cores n Eliminates interference between OS and applica/on
Scalability in fos
fos Scalable Fleet example: File System Fleet
16
PS PS
PS PS PS
FS FS
app3
blk
FS
FS
app4 app6
app5
PS
app2
FS
FS FS
File System Server microkernel
File System Server microkernel
File System Server microkernel
microkernel
microkernel
Applica/on 1 libfos
File System coordinator
microkernel
File
Sys
tem
flee
t
Applica/on 2 libfos
FS
app1
Outline n fos Introduc/on
n Vision, Structure and Architecture n Anatomy of a malloc()
n Scalability in fos n Fleets of servers n Example: File System Stack
n New Challenges for today’s OS n Adaptability in fos
n Hybrid Messaging System n Elas/c Fleets
n Enabling Automa/c Self Awareness in fos 17
New Challenges for today’s OS n Unprecedented amounts of
resources and variables n Increased number of resources
=> increased number of failures
n Unprecedented variability in demand n For the same applica/on, its
performance characteris/cs and requirements usually change during it run/me (e.g. allocated number of core)
n Different simultaneously-‐execu/ng applica/ons have different performance requirements (e.g. a memory-‐intensive vs a computa/onal intensive app).
18
Disk I/O
Devices
miss rate
cache size, associativity
pow
er
Memory Manager
File System
Device Drivers Scheduler
App 1 App 2 App 3
Core
Cache
System call
spee
d
algorithm heartbeat,
goals
19
Building Adaptability and Self-awareness into fos
FOS:
A S
elf-A
war
e O
pera
ting
Sy
stem
Disk I/O
Devices DRAM
App 2
App 1
App 3
miss rate
voltage, freq, precision
cache size, associativity
pow
er
Memory Manager
File System
Device Drivers Scheduler
activity, power, temp
App 1
Analysis & Optimization Engine
Observe
Decide Act
Core
Cache
App 2 App 3
Learner
Core
Cache
System call
spee
d
algorithm heartbeat,
goals
Heartbeat
Perf. Models
n fos Analysis and Optimization Engine. n A closed feedback loop of
ODA. n Each component provides a
new OS service: n The Observer provides vital
signs services. n The Decision engine provides
performance models and AI learning engine.
n The actuators changes system status.
n Each component is either implemented as a fos fleet or integrated into another fleet.
20
Analysis & Optimization Engine
Vital Signs (e.g. heartbeats)
Obs
erve
Decision Engine (e.g. performance models, control system, AI learner)
Dec
ide
Actuators
Act
Building Automa/c self-‐awareness into fos
Analysis & Optimization Engine
• Applications (e.g. algorithms) • OS sub-systems (e.g. growing fleets) • Hardware components (e.g. frequency scaling)
Vital Signs Fleet n Observes the status of software
and hardware components. n Provides a global knowledge-
base of the system n Implemented as a fleet, storing
the status information in a distributed data object n key-value store
21
Vital Signs (e.g. heartbeats)
Obs
erve
Decision Engine Actuators
Sefos: Building self-‐awareness into fos
Analysis & Optimization Engine
n Collects measurements from: 1) Applications e.g. Apps heartbeats, App. specific measurements (fps in a video encoder or flops in a scientific app). 2) OS subsystems e.g. Utilization ratio of the file system fleet. 3) Hardware components e.g. temperature, core frequency, power, cache miss rate.
Decision Engine n A new OS service in fos. n Implemented through a fleet of
servers for scalability. n Process input from the vital
signs service. n Provides three approaches to
runtime decision making:
22
Vital Signs
Decision Engine (e.g. performance models, control system AI learner)
Dec
ide
Actuators
Sefos: Building self-‐awareness into fos
Analysis & Optimization Engine
• Machine learning • Classical control theory • Performance Models
Uniform API across the mechanisms allowing them to be switched at runtime & composable.
Actuators n Action that allows the system to
adapt at runtime. n Three sets of actions, based on
their impact: • Application actions
• Allocating or de-allocating cores to an application
• Switching between algorithms • Migrating processes for better data
locality and cache usage • OS services actions
• Growing and shrinking fleets • Migrating servers
• Hardware actions • Frequency scaling to save power
23
Vital Signs
Decision Engine Actuators
Act
Sefos: Building self-‐awareness into fos
Analysis & Optimization Engine
• Applications • OS sub-systems • Hardware components (e.g. frequency scaling)
• Implemented as extension to fos fleets, ac/ons for hardware through OS tools or applica/on-‐specific ac/ons.
Adaptability in fos
n Fos adapts to the varying demand in mul/core and clouds: n Hybrid Messaging n Elas/c Fleets
24
I. Messaging in fos
n Messaging in fos provides IPC through message-‐passing abstrac/on n Mailbox-‐based, each associated with name and capability
n Can be implemented via a variety of mechanisms, mul/plexed via libfos library layer
n Transparent to the applica/on n fos currently supports three mechanisms
25
Messaging Mechanisms in fos
n Kernel messaging n Implemented in microkernel via shared memory n Messages are sent by trapping into the microkernel
n User-‐level messaging n A channel-‐base mechanism via URPC n Runs en/rely in user-‐space n Lower messaging latency at the cost of ini/al overhead for
channel crea/on
n Inter-‐machine messaging n Message goes through the proxy server n Proxy server routes the message to the correct machine
26
Hybrid Messaging n fos adapts automa/cally using various messaging mechanisms
27
5 x
104
1
x 1
05
Cum
ulat
ive
cycl
es
Number of messages
II. Elas/c Fleets
n fos fleets can grow and shrink to meet varying demand n Fleet can grow
n New servers are spawned on new cores
n The naming service provides load-‐balancing n Requests directed to the new fleet member
n Fleet can shrink n Load is distributed to other fleet members
28
29 29
fos Name Service Enables Elas/city
PS PS PS
PS PS PS
User App
FS2
FS1
NS File System Service Elas=city Example 1. Name server points to File Server 1 2. Applica/on request rate rises; File
Service spawns a new File Server 2 3. Name Server directs file system
accesses to the new server transparent to applica/on
Name Service enables service fleets to grow elas/cally with demand
30
fos File System Service Elas/city
n fos services can grow and shrink to match demand n A synthe=c benchmark app makes requests at a varying rate;
n 2 Clients, each repeatedly opens a file, reads 1 KB and closes the file n Clients increase request rate un/l midpoint then decrease
n fos filesystem grows from 1 to 2 servers to match demand, then shrinks as demand lessens
31
fos File System Service Elas/city
n fos services can grow and shrink to match demand n A synthe=c applica=on makes requests at a varying rate n fos filesystem grows from 1 to 2 servers to match demand, then shrinks as demand lessens
1 server 1 server
32
fos File System Service Elas/city
n fos services can grow and shrink to match demand n A synthe=c applica=on makes requests at a varying rate n fos filesystem grows from 1 to 2 servers to match demand, then shrinks as demand lessens
1 server 2 servers
33
fos File System Service Elas/city
n fos services can grow and shrink to match demand n A synthe=c applica=on makes requests at a varying rate n fos filesystem grows from 1 to 2 servers to match demand, then shrinks as demand lessens
1 server 2 servers Elastic fleet
Grow fleet to 2 servers Shrink fleet to 1 server
An Early Prototype of a self-aware fs service n Observer: Leverage fs fleet utilization to indicate the load of the OS subsystems. n Decision Engine: Use simplified high- and low-watermarks. n Actions: grow and shrink the fleet
34 0 2000 4000 6000 8000 10000
0
10
20
30
40
50
1
2
3
4
Number of transactions
System
throughput�trans
�millioncycles�
Fleetsize
Conclusions n New many-cores and clouds architectures present new OS challenges for the
OS community: n Scalability n Varying demand
n fos is a new factored operating system that is: n Addressing scalability by the fleets design n Addressing varying demands through the optimization and analysis engine
n We designed the analysis and optimization engine as a new OS service in the form of a close ODA loop
n We demonstrated how adaptability and self-awareness can be achieved through the analysis and optimization engine in two toy prototypes n Hybrid messaging n Elastic fs fleet
35
Ques/ons
live fos web server at h0p://fos.csail.mit.edu
36
Hidden Slides
37
Elas/c fleets example: fos file system
n 2 Clients, each repeatedly opens a file, reads 1 KB and closes the file n Clients increase request rate un/l midpoint then decrease 38
1 Server 2 Servers Elastic fleet
GrowShrink
0 1� 109 2� 109 3� 109 4� 109 5� 109 6� 1090
1
2
3
4
Cycles elapsed
Throughput�trans
actions�106
cycle�
Anatomy of a malloc Call in fos
Core 1 Core 2
Microkernel
Applica/on
libc
Mul/core Blade
Hypervisor Core 1 Core 2
Proxy-‐Network Server
Mul/core Blade
Namecache
Microkernel
Namecache
Microkernel
Namecache
libfos
Core 3
Microkernel Namecache
Hypervisor
Proxy-‐Network Server
fos Server (b) Paging System Server
Microkernel
2
3
4
5
6
1
Namecache
msg msg
Anatomy of a malloc Call in fos when Server is on Remote Machine in the Cloud
Core 1 Core 2
Microkernel
Applica/on
libc
Mul/core Blade
Hypervisor Core 1 Core 2
Proxy-‐Network Server
Mul/core Blade
Namecache
Microkernel
Namecache
Microkernel
Namecache
libfos
Core 3
Microkernel Namecache
Hypervisor
Proxy-‐Network Server
fos Server (a) Paging System Server
Microkernel
2
3
45
6
7
8
9
10
11
1
Namecache
msg msg msg msg
Overview of Sefos ODA Heartbeats services (Sensors or Observers): i.e.
knowledge-base of the system. n Two sets of APIs:
n Collecting status updates from apps and system services n Responding to queries about the apps and sys services’
status Decision engine service (decision or controllers):
n Generalized form of the scheduling services n Deploy Control theoric and AI models to make decisions.
Variables (Actions or Actuators); e.g. leveraging : n elasticity: growing and shrinking fleets. n Spatial locality: by migrating process. n Improving performance: by controlling the number of
cores allocated to a service/ app. n Changing cores frequency: to save overall power
O:
D:
A:
O: heartbeats Service n A new OS service that keeps the current status of the
system n App processes statistics n Other system fleets’ statistics (utilization level,
incoming queue length, etc ) n Hardware components status (temp, cores freq, etc)
n Implemented as a fleet of servers. n Functionality:
1. Collects the heartbeats from different applications and other system service, Via heartbeats API, e.g.
2. Responds to inquiries about overall system status or specific application heartbeats
Via heartbeats_monitor API, e.g.
D: Decision Engine n Another OS service that is a generalized form of the
scheduling service. n Can maximize overall performance, minimize power or
other goals. n Query the heartbeats service for system status. n Make decision using:
n A closed feedback loop to decide on best course of action to achieve a certain goal
n Works best for changing one variable at a time n Model can become very difficult to understand as
number of goals and number of variables increase. n Alternatively, an AI model
n Works best for changing several variables at a time
A: Actions n Each sub-system provide a set of actions (variables)
and n Three types of actions, according to their direct
impact… n App Actions; e.g.
n Allocating or de-allocating cores to an application n Migrating App processes for better spatial allocation
§ To move process to data: improve cache efficiency. § To decrease communication overhead (in asymmetric
machines) n System Services Actions; e.g.
n Leverage fleet elasticity: grow and shrink the fleet n Migrate servers to improve spatial locality
n Hardware Actions; e.g. n Frequency scaling to save power
Recommended