Upload
ashish-gupta
View
916
Download
0
Tags:
Embed Size (px)
DESCRIPTION
My PhD defense talk. Please see www.ashishgupta.org for more details.
Citation preview
Black Box Methods for Inferring Parallel Applications' Properties in Virtual
Environments
Ashish GuptaMarch 2008
PhD Final Talk
Committee:
Prof. Peter Dinda
Prof. Fabian Bustamante
Prof. Yan Chen
Prof. Dongyan Xu (Purdue University)
Introduction
3 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Background
• Virtual Machine Distributed Computing
• Virtuoso– Middleware for autonomic Virtual Machine distributed
computing– Presents a simple abstraction for distributed computing,
insulating from underlying computational, networking and middleware complexities
– VMM abstracts computational resources– VNET abstracts different networking domains into one –
also ideal point for monitoring– Autonomic Resource Management
4 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Problem Introduction
• Adaptation – Resources can be heterogeneous (CPU, memory etc)– If shared, then resources availability can also highly dynamic– Application demands also change !
• Autonomic computing:– What is available ?– What is required ?– How can we effectively match the two ?– One of the major components What does the
distributed application want ?– Should work with existing unmodified applications and OS
6 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Thesis Statement
My thesis is that it is feasible to infer various useful demands and behavior of a parallel application running inside a collection of VMs to a significant degree using a black box model. To evaluate this thesis, I enumerate and define various demands and types of behavior that can be inferred, and also design, implement and evaluate ideas and approaches towards inferring these.
One of the demands I infer is the communication behavior and the runtime topology of a parallel application. I also show how to infer some very useful runtime properties of a parallel application like its runtime performance, its slowdown under external load and its global bottlenecks.
Significantly all of this is done using black-box assumptions and without specific assumptions about the application or the operating system. I also give evidence of how automatic black box inference can assist in adapting the application and its resource usage resulting in improved performance.
Chapter 2
Chapter 3 Chapter 4
Appendices A, B
7 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Black box assumption and impact
Black Box – Can make no assumptions about the implementation, behavior or internal state of the Guest OS/module beyond its external interface
• Lowers barrier to adoption of the new inference techniques
• helps deploy my work to legacy applications
• Mainly accomplished by looking at external signals:
• traffic, host load etc.
• Not tied to Virtuoso
• Other systems like softUDC [91], XenoServer [52], SODA [87], Violin [88], In-VIGO [13], VioCluster [136] can also benefit from black box application inference
• Potentially any virtual distributed system
8 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Components of my dissertation
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application
slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed Computing
Inference
Adaptation
1 2
3 4
5 6
9 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Topics I cover
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application
slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed Computing
Inference
Adaptation
2 3
4 5
A B
10 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
BSP application model
• Multiple processes executing a common kernel
• Execution alternates between one or more computing phases and one or more communication phases
• Combination of a schedule of computation and communication phases Super-step
• A very popular model for implementing a large variety of scientific applications and parallel algorithms
• Original paper: Valiant [167]
11 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns
• Synthetic workload generator developed before. Models a BSP application
• Can execute many different types of topologies common in BSP programs
Some parameters
Topology
Number of processors
Message size
# of iterations
Flops per element
Memory Reads/Writes per element
Topologies
N-dimensional mesh
N-dimensional torus
N-dimensional hypercube
Binary reduction tree
All to All
12 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns application’s capabilities
2-D Mesh 3-D Toroid 3-D Hypercube
Reduction Tree All-to-All
13 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
NAS parallel benchmarks
• Developed by NASA [17, 172]
• Set of programs to evaluate performance of parallel supercomputers
• Representative of CFD applications
• To generate realistic parallel application workloads
• 5 kernel benchmarks: EP, MG, FT, IS, CG– Five very frequently used numeric methods
14 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Different contributions of my dissertation
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application
slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed Computing
Inference
Adaptation
2 3
4 5
A B
Collective Inference for Topology:VTTIF
16 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Goal of VTTIF
Low Level Traffic Monitoring
?
Application Topology
An online topology inference framework for a VM environment
17 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Traffic Analyzer
Rate based Change detection
Traffic MatrixQuery Agent
VM Network Scheduling Agent
VNET daemon
VM
VNET overlay network
To other VNET daemons
Physical Host
VTTIF Architecture
18 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Inferred topology
Parallel Integer Sort
Black Box Measures of performance
20 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The problem
• Performance of BSP applications – an important goal
• Lot of work dedicated to improving performance of parallel applications, e.g.
– Virtuoso– VioCluster
• How do we measure performance in a black box fashion ?– The current way in Virtuoso is manual (e.g. Lin et.al. [106])
• Impact– Would enable superior adaptation algorithms– Automated evaluation and adaptation cycle– Generate reports on effectiveness of different
adaptation/scheduling methods
21 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Cost model for BSP applications
• Popular strategy: break super-step into its components:– computational cost
– Communication cost of the global exchange of the data
– Synchronization cost
Computation cost
Communication cost
Number of super steps
Sync latency cost
Speed of computation in
FLOPS
Static model of performance
+
requires detailed application profiling and access to source
code
22 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Super-step approach
• Super-step structure an invariant– No. of steps depend on parameters and data
Another possible measure of performance: number of super-steps
executed per second, or the
iteration rate dynamic metric
Multiple super-steps for dynamic applications iteration rate is not
constant
23 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
A new black box metric: Round Trip Iteration Rate (RIR)
• Based solely on communication behavior of the application
• Correlated to the iteration rate
• Indirectly measures number of process interactions this indicates progress as synchronization happens at end of a super-step
• Approach: Examined various properties of the traffic trace
• Inter send-packet delay exhibits interesting properties
24 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Inter send-packet delay
176 Receive from 175
176 Send to 175
176 Send to 175
176 Send to 175
176 Receive from 175
176 Send to 175
25 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Inter send-packet delay
Traffic trace for Patterns
Message size = 4000 bytes
Computation per iteration: 100 MFlops
Clustering based on inter-send
delays
Count in cluster matches actual iteration rate
output by application (325)
26 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns: clusters without load
Actual: 568
Reported: 569
27 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns: clusters with external CPU load
Actual: 142
Reported: 153
Actual execution time ratio: 3.922
Ratio from reported iteration
rate: 3.72
Within 5%
28 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Why Does Circled Bin Correspond To Iteration Rate?
• Each iteration consists of send, receive and compute phases
• Number of items in large inter send-packet delay cluster represent the group that represent an inter-process interaction
• Each inter-process interaction represents progress in the super-step
time
Send packets
29 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Plotting inter send-packet delay for MG
1. No clean clusters for a more complex application
2. Delays shift towards right on greater load
30 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the RIR metric
• Previous examples were for Patterns –
• static performance case, easy clustering
• Applications like MultiGrid from NAS benchmarks changing iteration rate
For a given packet time series,
1. Count send pairs whoseinter-packet delay exceeds byc * RTT
2. Send pair must be interleaved by one receive
Based on BSP principles
31 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
RIR time series
For dynamic applications, RIR changes with time
Need a time series for RIR over the trace
32 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Outputting RIR time series - Workflow
Sniff Packets
Send packets that obey the conditions
Slide a 1 second window over these send packets
Sliding interval = t
Get a new time series denoting RIR for each
1 second sliding window
Sampling duration = T
Derivative Metrics from the above time series
Average RIR
CDF
Power Spectrum
Super-phase period
Using tcpdump/libpcap for the VM traffic
Send packets satisfying the two conditions
1 sec Slide by t
i1, i2, i3, i4, i5, …. in
Each number represents number of iterations for a particular 1 sec window instance
33 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Representing Dynamic Performance
• Define a spectrum of metrics for dynamic performance
RIRavg
RIR-CDF
RIR-PS
RIR-PSE
Long term stationary average of RIR time series
CDF of RIR time series indicating spread of iteration rates
Phase structure, periodicities, application fingerprinting, statistical scheduling
Summary of the periodic behavior for multiple supersteps
34 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the stationary Average – sampling issues
• Sliding window resolution (sampling rate)– Needs to be high enough to capture the any important high frequency
behavior
• Capture duration– Enough to capture the stationarity of the signal (repetition of all super-
steps)
• Assumption:– iteration dynamics of the application are indeed empirically stationary for the long
run.– For a dynamic application that consist of repetitive phases, it means capturing
enough of its performance behavior to capture this repetitive element.
• Capturing stationarity
• Power Spectrum based techniques help us determine the right sampling rate and duration
• Details in dissertation
35 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effectiveness of RIRavg
For MG application, running under different load conditions (100% and 60% CPU load), predicted execution
time error rates were 13% and 7% respectively
(completely black box)
Value of c = 1.1 here (c*RTT factor)
36 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
RIR time series graph
A super-step
Can we predict slowdown of application if we put it under load ?
For the IS and MG applications, which application may be hurt more if one of the processes from each application shares the
physical host with an external computational load?
The impact: we can now determine in advance, the impact of external load if we must choose one of these applications to be influenced by the load.
• Scheduling fact: Depending the scheduling algorithm, affect of load on different RIR regions can be different (Govindan et.al. [60])
• Reason: Scheduling handled differently for CPU bound processes vs I/O bound processes
• “Providing enough CPU is not enough, an equally important consideration is to provide the CPU at the right time”
38 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Role of CDF
• RIR-CDF can be used to predict which application will be more affected by external load (for dynamic applications)
• Very useful when extra load needs to be introduced over existing applications due to demand
• Scheduling using the CDF:– From the normal CDF, we predict a slowdown CDF based on a slowdown
mapping– Slowdown CDF How will the RIR-CDF of the application look like after
load ?– Slowdown mapping What is the slowdown for a particular RIR under a
particular load ?
Slowdown mapping
39 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Other metrics
• Power spectrum of RIR– Gives idea about the super-step structure of the application– Length of Super-step– A summary of the power spectrum and the significant frequencies serves
as a fingerprint/super-step snapshot
40 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
A sample power spectrum for 4 processes
Consistency across processes
41 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Example for MG
Significant frequency separation
Exec time = 19.44 seconds
Super-phase period = 4.1
seconds (1/0.244)
Number of super-phases
~4
42 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Recap
• We can deduce performance for a BSP application using black box means.
• We can predict performance for an application, when imposed with load
• Entirely based on Based on packet analysis
• New metric called RIR : Round trip Iteration Rate
• More complex metrics for dynamic applications– RIRavg
– RIR-CDF– RIR Power Spectrum
Ball In the Court Methods
44 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The problem
• Last chapter: Predicting performance under load (Slowdown CDF)
• This chapter: Can we predict performance of application if an existing external load was removed ?
• I.e. What is the slowdown of the application under the current load conditions ?
45 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Ball in the Court
• BSP super-step computation, communication, sync
• Communication according to a certain schedule and then computation in between
• Each process acts (computes) on a message before sending out the next one
• This acting on a message is called “Ball in the Court” (BIC)
• Ball is the responsibility of the process to do some local processing and then interact with other processes , court is the local host.
BIC delay
46 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Developing a strategy
• Focus on one process
• If just one process is loaded, entire application slowed down because of one process
• All processes operate in sync, and iterations can only proceed if the loaded process does its duties
stretched
Under load
47 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Why BIC delay?
• The traffic trace captures the behavior of the process for the entire duration
• If the process slows down, the trace time length will increase.
• There will be corresponding changes in the trace as well. What are those changes ? Seed of the idea
48 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Approach to computing the BIC delay
Let’s compute the time differential for event pairs, for *.176
Some computation
(6 ms)
Some computation
(6ms)
49 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
With Load, Some BIC Delays Get Larger
Loaded process
Unloaded process
1. Sending an ack process’s responsibility
2. This responsibility was hugely inflated in the loaded case ( 64 us to 23822 us)
3. BIC delays for other receives are similar (receive BIC for other processes)
50 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
An Algorithm for BIC Delay?
• Can we estimate the total BIC delay from the traffic trace alone ?
• Investigated this question with different approaches.
• Each packet can be of the following types:– 1. Send Packet (SP)– 2. Send Ack (SA)– 3. Receive Packet (SP)– 4. Receive Ack (RA)
• We can pair up consecutive packets to form event pairs SA followed by SP SA-SP
• For 4 event possibilities we have 16 event pairs
51 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
BIC events
• * - S* event pairs can be classified as BIC events
• Intuitively, – process has either received a packet and is responsible to send the next
one– Just sent a packet and is responsible to send the next one as well
52 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the BIC delay
• 1. Count all BIC event pairs and sum up their time differentials total BIC delay
• 2. BIC delays partitioned by the recipient IP address– E.g.: RP (175) – SP (173) delay goes into the bucket of 173
• Some special cases and concerns for counting BIC delays are explained in the dissertation (Page 140)
53 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Using BIC Delays to Measure Application Imbalance
• Get a measure of the local computing/processing time for which its responsible
• Comparing this local time with another unloaded process gives an idea of imbalance for the loaded process
• Gives an idea of how much extra time the loaded process is taking
•The way we compare produces different strategies for computing slowdown
54 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Using BIC Delays to Measure Application Imbalance
Compute BIC delay
Imbalance Algorithm
Slowdown
55 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the no-load runtime using BIC delays
•Global BIC imbalance algorithm: – Assumes all processes are executing the same BSP kernel
– 1. Compute the BIC delay for the loaded process
– 2. Select a partner process for comparison
• What if the loaded process was put in the shoes of the partner process ?• Choice of partner process can affect results. Select least loaded for most optimistic
results. Or select one that reflects possible migration conditions
– 3. Compute the “imbalance” by finding the difference in the BIC delays of these two processes
56 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Global BIC algorithm
57 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Evaluation with the IS application
Balanced case
Total runtime = 14.36 seconds
Total runtime = 152.67 seconds
Loaded case
Imbalance between 176 and 175 = 124.49 seconds
Actual difference = 138.41 seconds
Using completely black box means, we could determine, how slow is the application running with load,
without knowing about the unloaded case
58 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
More sophisticated Imbalance Algorithms
• Process-level BIC imbalance algorithm
• A better fit for more dynamic applications like the NAS benchmarks
• Sometimes work done by all processes is not the same
• Algorithm takers into account inter-process level interactions and BIC delays
• Multi-iteration BIC-delay Bias-Aware Imbalance Algorithm
• BIC algorithm for multi-load situations
• Covered in detail in the dissertation
• Gives a range of slowdown (optimistic and pessimistic values)
• Evaluated with MG and IS benchmarks
59 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Other contributions in the Chapter
• Metrics for computing imbalance amongst processes – Gives idea of heterogeneity experienced by different processes– Draws attention to applications that can use lot of help– Three imbalance metrics
• Two global: Standard Deviation from average, Squared distance• Process Level imbalance metric
Global bottlenecks and Time Decomposition
62 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Global Bottleneck
• In the BIC discussion, we could identify the slow processes using their BIC delays
• This chapter extends the argument to network and I/O resources
• Looks at the problem of Global Time Decomposition– Where does the application spend its time ?– At network resource and process level
• Time decomposition can point out potential sources of imbalance
• Some metrics output are:– Cumulative Message Latency– Average Latencies observed– Cumulative message transfer time – Average Bandwidth observed
63 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
BIC analysis for local imbalance
Global Time Decomposition
Measure currentperformance
• Indicates if application is highly imbalanced, • Indicates amount of slowdown, and• Which host to focus on?
• Shows time allocation for network components, and• Which non-host resources to focus on ?
• Is application running at expected speed compared to previous known instances ?• Multiple application case: which application will be slowed down more from load ?
Eliminate a bottleneck using adaptation mechanisms
NO
• Traffic based iteration rate metrics• RIR time series• Dynamic Performance Metrics
• Global BIC imbalance algorithm• Process-level Bias Aware BIC imbalance algorithm• Multi-load BIC imbalance algorithm• Imbalance metrics
• Global Time decomposition techniques for various network metrics• Unix tools to find root case
• Adaptation Mechanisms• Migration • Network overlay links• Routing rules • Network reservation
Flowchart for diagnosis
Chap 3
Chap 4
Chap 5
Appendices A,B
Tools Questions/problems
64 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Different contributions of dissertation
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application
slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed Computing
Inference
Adaptation
2 3
4 5
A B
* An API also defined in the dissertation
65 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Monitoring and inference
Application performance measure
Adaptation algorithm
Adaptation mechanisms
AdaptationApplications
Optimization metric
1. Overlay topology
2. Forwarding rules
3. VM migration
1. Application Throughput
1. BSP
2. Transactional ecommerce
1. Application throughput
1. VTTIF
2. Network monitoring
1. Single metric
2. Combined metric
66 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect on BSP Application Throughput
For high Compute/Communicate Ratio, Migration + Topology dramatically improves performance
Adapting to External Load Imbalance
External load removed, but I/O still
dominates, so topology helps
External load removed, can drive
higher I/O
67 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Free Network Measurement For Adaptive Virtualized Distributed Computing
Closest Related Work
• Wood et. al. [178] - Black box and gray box strategies for Virtual Machine Migration, NSDI 2007
• Goal: To improve performance of Stand-alone applications running inside VMs
• Black box and gray box strategies
• Black box externally visible parameters like CPU load, disk swapping activity and network usage
• Focuses on stand-alone applications
• I focus on distributed parallel applications collective properties
Closest Related Work• Aguilera et. al. [14] – Performance debugging for distributed systems of blackboxes, OSP 2003
• Goal: To find components that contribute to high latency amongst critical message paths in a distributed application
• Somewhat analagous to BIC chapter
• Assume constant delay per component
• Unclear how their techniques translate to parallel applications (cyclic computation)
•No estimate of actual slowdown
• Reynolds et. al. [133] – WAP5: black-box performance debugging for wide-area systems, WWW 2006
• Focus on wide area systems now, more focus on the network delay aspects
• Major work focuses on identifying causal relationships between messages
• Assume a one to one message casualty not so in Parallel applications
• Acknowledge that will not work for barrier type communications
70 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Related Publications
• Gupta, A., Dinda, P. A. Inferring the topology and traffic load of parallel programs running in a virtual machine environment. In Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing (JSPPS 2004 (June 2004).
• A. Sundararaj, A. Gupta, P. Dinda, Dynamic Topology Adaptation of Virtual Networks of Virtual Machines, Proceedings of the Seventh Workshop on Languages, Compilers and Run-time Support for Scalable Systems (LCR 2004)
• Sundararaj, A. Gupta, P. Dinda, Increasing Distributed Application Performance in Virtual Environments through Run-time Inference and Adaptation, In Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC 2005)
• Ashish Gupta, Ananth Sundararaj, Marcia Zangrilli, Peter Dinda, Bruce B. Lowekamp, Free Network Measurement For Adaptive Virtualized Distributed Computing, In Proceedings of 20th IEEE International Parallel & Distributed Processing Symposium, 2006.
71 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Other Work
• Robert Schweller et al., Reversible Sketches: Enabling Monitoring and Analysis over High-speed Data Streams, Journal paper in IEEE/ACM Transactions on Networking, 2006.
• Robert Schweller et al., Monitoring Flow-level High-speed Data Streams with Reversible Sketches, In Proceedings of IEEE INFOCOM 2006.
• Robert Schweller, Ashish Gupta, Elliot Parsons, Yan Chen, Reverse Hashing Algorithms for Sketch-based Change Detection on Highspeed Networks:, In Proceedings of ACM SIGCOMM Internet Measurement Conference, October 2004, Taormina, Sicily
• P. Dinda, G. Memik, R. Dick, B. Lin, A. Mallik, A. Gupta, S. Rossoff, The User In Experimental Computer Systems Research, Proceedings of the Workshop on Experimental Computer Science (ExpCS 2007)
• A. Gupta, B. Lin, P. Dinda, Measuring And Understanding User Comfort With Resource Borrowing, In Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing (HPDC 2004), Honolulu, Hawaii
• Bin Lin, A. Gupta, Peter Dinda, Measuring, Understanding, and Exploiting Direct User Input In Resource Scheduling, Journal paper in submission
Accepted Posters
• Ashish Gupta, Peter Dinda, Fabian Bustamante, Distributed Popularity Indices, Poster Presentation at ACM SIGCOMM 2005, Philadelphia
• Ashish Gupta, Manan Sanghi, Peter Dinda, Fabian Bustamante, Magnolia: a novel DHT architecture for Keyword based search, Poster Presentation at Network System Design and Implementation (NSDI 2005), Boston
• Ashish Gupta et al., Free Network Measurement For Adaptive Virtualized Distributed Computing, Poster Presentation at Supercomputing 2005, Seattle
Sketch-based
reverse hashing
work
User feedback
work
72 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Components of my dissertation
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application
slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed Computing
Inference
Adaptation
1 2
3 4
5 6
Thank you !
Backup slides
75 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Infrastructure
• Virtuoso cluster, which is an IBM e1350 with 32 compute nodes, each of which is a dual 2.2 GHz Intel HT Xeon Processors (except the management node which is faster), 1.5 GB RAM, and 40 GB of disk.
• The machines also have 1 Gbit interfaces and in some evaluation scenarios in later chapters(Chapters 3, 4, 5), I use 4 of these machines which form a mini-cluster of their own. These machines are then connected via a 100 Mbit switch.
• Virtual Machine Monitor Used:
• I use both VMWare and the popular open source Xen VMM in my evaluations.
• VMWare GSX Server 2.5 (VTTIF evaluation)
• Xen version 3.0.3-rc3-1.2798.f in my evaluations in Linux Kernel release 2.6.18-1.2798.fc6xen. (other chapters)
76 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Increasing Application Performance In Virtual Environments Through Run-time Inference and Adaptation
Dynamically adapt unmodified applications on unmodified operating systems in virtual environments to available resources
The adaptation mechanisms are application independent and controlled automatically without user or developer help
Demonstrate the feasibility of adaptation at the level of collection of VMs connected by Virtual Networks
Show that its benefits can be significant for two classes of applications
Published in LCR 2004, HPDC 2005
77 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Informally stated:
Input
• Network traffic load matrix of application
• Topology of the network
Output
• Mapping of VMs to hosts
• Overlay topology connecting hosts
• Forwarding rules on the topology
Such that the application throughput is maximized
Optimization Problem (2/2) Topology + Migration
The algorithm is described in detail in the paper
78 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Evaluation
Applications
• Patterns: A synthetic BSP benchmark
• TPC-W: Transactional web ecommerce benchmark
Benefits of adaptation (performance speedup)
• Adapting to compute/communicate ratio
• Adapting to external load imbalance
79 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect on BSP Application Throughput
Adapting to Compute/Communicate Ratio
For high Compute/Communicate Ratio, Migration + Topology dramatically improves performance
Less time spent in I/O, so
migration alone is enough
Since I/O dominates, drop in latency
improves performance
Even for small amount of I/O,
it takes up significant time
80 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
TPCW
No Topology Topology
No Migration 1.216 1.76
Migration 1.4 2.52
Throughput (WIPS) With Image Server Facing External Load
81 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Free Network Measurement for Adaptive Virtualized Distributed Computing
ADAPTATION : A FOUR STEP PROCESS
1. Automatically infer application demands (network/CPU)
2. Monitor resource availability (bw/latency/CPU)
3. Adapt distributed application for better performance/cost effectiveness
4. Reserve Resources when possible
WREN PAPER, Published in IPDPS 2006
82 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
What is WREN ?
Developed by my colleague Marcia Zangrilli from WM College
1. Observes incoming/outgoing packets
2. Online analysis to derive latency/bandwidth information for all host pair connections
3. Answers network queries for any pair of hosts
What does it do ?
83 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
An important Contribution: Problem Formalization
84 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Two approaches to adaptation
85 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
VTTIF in a dynamic topology environment
Parameters: Smoothing Window sliding window
duration over which updates are aggregated
Update Rate
Detection Threshold
86 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Questions
• Are these techniques applicable to other applications ?
• Why do you think they are generic enough even for BSP applications ? How did you test their generality ?
– Because the reasoning for their derivation was based on the BSP model not on any particular application
– Tested on Patterns and then tried on other more complex applications.– VTTIF was used by other components like adaptation
• What modifications would you suggest to the – Network hardware– VMM– Router level ?
• How could you improve time synchronization issues ?
• What is the closed work to you and how is it different from whats out there– Black box inference NSDI 2007– Black box debugging from HP labs
87 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
• How about applying the techniques beyond BSP applications ?– Distributed apps– P2P– Web apps
• What are the limitations of these techniques ? When will they not work ? What are the assumptions ?
• What is the main research contribution to your work..?
Bin
Evidence of Adaptation Driven by Inference
90 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Evidence of Adaptation
How can automated inference assist automated adaptation without user
intervention and application knowledge ?
Significant work already done in this area by me
• Using multiple mechanisms for adaptation
• Combining resource availability with application demands to match needs
• Non-scientific apps also investigated (multi-tier web sites)
• More in following slides…
91 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Informally stated:
Input
• Network traffic load matrix of application
• Topology of the network
Output
• Mapping of VMs to hosts
• Overlay topology connecting hosts
• Forwarding rules on the topology
Such that the application throughput is maximized
Optimization Problem (2/2) Topology + Migration
The algorithm is described in detail in the paper
92 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect on BSP Application Throughput
For high Compute/Communicate Ratio, Migration + Topology dramatically improves performance
Adapting to External Load Imbalance
External load removed, but I/O still
dominates, so topology helps
External load removed, can drive
higher I/O
93 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Evaluation
Scenario 2 : Large 256 host topology. 32 potential hosts, 8 Virtual Machines
Results for Multi Constraint Cost Function : Bandwidth and Latency
Annealing easy to adapt and finds good mappings compared to heuristic
94 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The New Adaptation Process
95 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Two approaches to adaptation
96 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Evaluation
Scenario 2 : Large 256 host topology. 32 potential hosts, 8 Virtual Machines
Results for Multi Constraint Cost Function : Bandwidth and Latency
Annealing easy to adapt and finds good mappings compared to heuristic
97 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed
Computing
1. Infer the traffic matrix for a BSP application
2. Spatial Communication Topology of the application
3. Online implementation and low performance overhead
4. Handling dynamic situations and avoiding oscillations
5. Integration with adaptation system
98 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed
Computing
1. How to evaluate BSP application performance under black box conditions ?
2. New set of metrics based on traffic
3. RIR (Round trip Iteration Rate) correlated to the iteration rate
4. Complex metrics for dynamic applications
1. RIRavg
2. RIR-CDF
3. RIR-PS
5. Scheduling using these advanced metrics (Slowdown CDF)
6. Other applications beyond performance
1. Fingerprinting
2. Statistical Scheduling
99 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed
Computing
1. Quantitative estimate of the application slowdown under external load
2. Amount of speedup achievable
3. Can greatly help in adaptation/scheduling decisions predicts the benefit of migration in advance
4. Concept of Ball In the Court delays
5. Using BIC delays to compute imbalance in application
6. Algorithms to compute slowdown
1. Global Imbalance Algorithm
2. Process Level BIC imbalance algo
3. Multi-load BIC imbalance algo
7. Metrics to indicate amount of imbalance in application as a whole
100
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed
Computing
1. How can we find the global bottleneck for the application ?
2. Time Decomposition of execution time Helps identify regions of great imbalance
3. Includes the network aspects
4. Many new metrics for latency and bandwidth time components
5. Evaluation with various load scenarios
101
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed
Computing
102
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic Inference Framework
Black box metrics for Absolute Performance
Ball in The Court principles to compute application slowdown
Global Bottlenecks using time decomposition
Increasing Application Performance In Virtual
Environments through Run-time Inference and Adaptation
Free Network Measurement For Adaptive Virtualized Distributed
Computing
106
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Impact
• Further the goal of autonomic computing
• Black box approach impacts huge set of applications
• Help in lower the entry barriers for distributed and parallel autonomic computing
• Shared resources no new tweaks needed to existing applications
•
108
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Application Model
• BSP Parallel applications
• Two benchmarks being considered– Patterns– NAS Benchmarks
Citations
[167] VALIANT, L. A bridging model for parallel computation. Communications of the ACM 33, 8 (Aug. 1990), 103–111.
117
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Contributions
• Understanding the correlation between network traffic and inter-process interactions
• Defining a black box measure of application performance (RIR)– Techniques to derive it
• Advanced metrics for Dynamic Applications and their performance
• Frequency Domain based metrics for greater insight about application
• Applications to advanced scheduling and application fingerprinting
118
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Solving the issues
• Sampling rate:– Compute at a “high enough” sample rate to capture enough high
frequency dynamics
– Compute power spectrum to calculate total energy in the signal (power spectrum is magnitude of Fourier Transform) Integrate the power spectrum
– Sample at a slightly lower rate and see reduction in energy.
– Repeat until we start losing energy beyond a certain threshold (5%) this indicates at this rate we start losing significant part of signal
– If actual sampling rate is much higher (5 to 10 times) than this cutoff point, we are doing alright
Gives us 95% cutoff sampling rate
119
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Solving the issues
120
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Capturing the right time duration
• Again a Power Spectrum based approach
• Helps us determine if our sample size captures most of the energy in the signal
• Ideally we will capture as much as resources allow us
•The process:– After capturing, we test the amount of energy loss, if we shorten the
sample size.– Find the cut-off point as last time– Compare the cut-off sample size and our measured sample size
• After sampling rate and sampling duration is satisfactory,
RIRavg computed by averaging the time series
121
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect of load on the RIR time series
1. Expansion effect
2. Basic structure of time series is same
3. Each phase more irregular (more spikes)
4. Captures the time dynamics of the application
122
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The RIR-CDF metric
• More exhaustive than the RIRavg
• Statistical scheduling
• Application fingerprint
• Performance comparison of runs at more detailed level than RIRavg
Global metric
123
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The RIR-CDF metric
CDF indicates the RIR region application spends most of its time
in
RIR-CDF can be used to predict which application will be more affected by external load (for
dynamic applications)
Slowdown mapping
I show how we can predict slowdown for MG vs IS
124
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Taking guidance from RIR-CDF : a proof of concept
• CDF indicates the RIR region application spends most of its time in
• Scheduling fact: Depending the scheduling algorithm, affect of load on different RIR regions is different (Govindan et.al.)
• Reason: Scheduling handled differently for CPU bound processes vs I/O bound processes
• “Providing enough CPU is not enough, an equally important consideration is to provide the CPU at the right time”
• Higher RIR more inter-process interaction can use more efficient communication
• Lower RIR more compute intensive or large messages
• How do applications with different RIRs suffer from external load ?
125
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect of load on different RIR areas
Difference of performance over 12.46 times at the
extremes under full load
Applications with higher iteration
rates always seem to be more
drastically affected by
external load then those with lower iteration
rates.
126
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Scheduling using the CDF
The slowdown mapping can be complicated and built over a long history. We use results from Patterns benchmark to provide a simpler mapping
Input: RIR value
Output: Fractional slowdown
Assumed load = 100%
More realistic mappings may take bandwidth
and CPU utilization as
input
127
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Scheduling using the CDF
For the IS and MG applications we have seen till now, which application may be hurt more if one of the processes from each application shares the physical host with an external
computational load?
The impact: we can now determine in advance, the impact of external load if we must choose one of these applications to be influenced by the load.
Map each RIR value to its slowdown RIR value
from the mapping
Slowdown CDF
128
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Slowdown CDF
For MG
Slowdown CDF
For IS
More time spent in low
regions
Avg = 0.103
Avg = 0.065
129
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Actual slowdown
• MG 2.4 times
• IS 15.52 times
• IS is affected much more from the load as predicted.
• A better mapping function can yield more powerful slowdown estimates for complex applications.
• Shows how we can make complex decisions for dynamic applications using RIR – CDF.
• Other applications Statistical scheduling ideas– examples
130
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Power Spectrum
• Other applications of the Power Spectrum– Statistical Scheduling decisions– Application categorization– Component separation
• More discussion in dissertation
• Steps in computing the Power Spectrum – numerous measures taken to get a representative power spectrum (Windowing functions, zero padding, eliminating the DC bias)
• Evaluation with another application: IS (Integer Sort) from the NAS benchmarks similar results
• Implementation details and making it work online
133
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Process Level BIC imbalance Algorithm
• A better fit for more dynamic applications like the NAS benchmarks
• Sometimes work done by all processes is not the same
• Algorithm takers into account inter-process level interactions and BIC delays
• Phenomenon of Load Bias also observed A heavily loaded process affects the BIC delay of other unloaded processes too (shifting of work)
• Covered in detail in the dissertation
• Gives a range of slowdown (optimistic and pessimistic values)
• Evaluation with MG– Range of slowdown = [28.67s, 50.78s ]– Actual slowdown = 42.67s