Autonomic Runtime System: Design and Evaluation for SAMR Applications * Salim Hariri High Performance Distributed Computing Laboratory The University of

Autonomic Runtime System: Design and Evaluation for SAMR Applications * Salim Hariri High Performance Distributed Computing Laboratory The University of Arizona http://www.ece.arizona.edu/~hpdc Supported by: NSF, DOE, DARPA, Intel, Raytheon and AOL grants

Outline Motivation and objectives Autonomia: An Autonomic Control and Management Environment Self-Optimization Self-Protection Conclusion Remarks

Information Technology and Biology Convergence Our system design methods and management tools seem to be inadequate for handling the complexity, size, and heterogeneity of today and future Information systems Biological systems have evolved strategies to cope with dynamic, complex, highly uncertain constraints

Current Design and Development of Computing Systems Different fields evolved separately and Targeted few domains/applications

New System Construction: Part to The Whole Approach Adds Complexity High-Cost Interoperability Issues

Autonomic Computing System: Wholestic Approach Self-Healing Component Self-Optimizing Component Self-Configuring Component Self-Protecting Component Autonomic Building Block Secure, Fault-Tolerant System High-Performance, Fault-Tolerant System Autonomic Computing Systems

Autonomia: An Autonomic Control and Management l Provide dynamically programmable control and management services to support the development and deployment of autonomic applications l Provide Autonomic Runtime Services (self-healing, self- configuring, self-protecting, self-optimizing) l Provide automated deployment, registration, discovery of autonomic components l Provide automated configuration of autonomic applications and system resources

Application Management Editor Users Application Event Server Coordinator Monitoring &Analysis Engine Scheduling Engine Planning Engine AIK Repository ACA Specifications Policy Component State Resource State Policy Engine Self ProtectingSelf OptimizingSelf HealingSelf Configuring Autonomic Runtime Services CRM: Component Runtime Manager VEE: Virtual Execution Environment VEE(App1) VEE(Appn) VEE(App2) Application Runtime Manager (ARM) Know- ledge High Performance Computing Environment (HPCE) Autonomic Runtime System ACAm ACA2 ACA3 ACA1 ACA2 ACA3 ACA1ACA2 ACAj CRM ACA1 Computational Component

Application Management Editor Users Application Event Server Coordinator Monitoring &Analysis Engine Scheduling Engine Planning Engine AIK Repository ACA Specifications Policy Component State Resource State Policy Engine Self ProtectingSelf OptimizingSelf HealingSelf Configuring Autonomic Runtime Services CRM: Component Runtime Manager VEE: Virtual Execution Environment VEE(App1) VEE(Appn) VEE(App2) Application Runtime Manager (ARM) Know- ledge High Performance Computing Environment (HPCE) Autonomic Runtime System ACAm ACA2 ACA3 ACA1 ACA2 ACA3 ACA1ACA2 ACAj CRM ACA1 Computational Component 1 22 3 3 4 4 Autonomia Process Flow 3 2 4

Application Execution

Self-Optimizing: Design and Evaluation

Current Implementations Intractable for Large Problems

Georeferenced Distributed DB Wildfire Autonomic Runtime Manager (WARM) Dynamic Data Driven Wildfire Model Analysis Objectives Resource State Application State Monitor Natural Region Characterization System Capability Module Memory Bandwidth Availability Access Policy Resource History Module Active Performance Model Planning Engine Knowledge Repository VCU Virtual Computation Unit Autonomic Scheduling VCU Virtual Resource Unit Heterogeneous, Dynamic Computational Environment NR1 Burned NR2 Burning NR3 Unburned SP2 ClusterBeowulf Linux MPP IBM SP2 tt NC M IBM SP2 Actual Predicted Sensors Survey Flights GPS Satellite Wild Fire Model Development Environment Regional Weather Terrain Characteristic Local Weather Temp Humidity Wind Speed Wind Direction Clouds Precipitation Lightning Fire Behavior Location Intensity Geometry Propagation Fuel Conditions Smoke Locations and concentration Firefighting Activities Execution NR2 CPU

Forest Fire Cell Space: Dynamic Repartitioning Initial partitioning NR2 Burning zone finer gridding Burned zone coarser gridding NR2 NR3 NR5

Wild Fire Simulation Physics The entire area is represented as a 2-D cell-space.The weather and vegetation conditions are assumed to be uniform within a cell, but may vary in the entire cell space When a cell is ignited, its state will change from unburned to burning . During its burning phase, the fire will propagate to its eight neighbors along the eight directions as shown below. As the simulation time advances, the fire will propagate from the first ignition cell to other cells.

Parallel Wild Fire Simulation Analysis The composition of execution time at time step t for 4 processors. To decrease T(t), make the computation time on each processor as even as possible, which minimizing the synchronization time. Imbalance Ratio (IR) characterizes the imbalance situation

Fire Simulation Example t = 1 The example above describes the imbalance ratio at different time steps. As the simulation advances, imbalance situation will get worse. t = Nt = 2N

Self-Optimization Monitors the state of fire simulation to obtain the computation load at any time step Monitors the states of the underlying system to obtain the computation capacity Monitor the imbalance ratio at any time step. If the imbalance ratio is larger than a given threshold, dynamically adjust the workload among processors at run time.

Self-Optimization Algorithm Obtain the total workload at time t Estimate the computation time of one burning cell on processor p with the consideration of system load Where L(p,t) is the length of CPU queue on processor p at time t Calculate the average execution time of one burning cell

Self-Optimization Algorithm(contd) To balance the load on each processor, processor allocation factor (PAF) is defined as inversely proportional to the processor execution time with respect to the average execution time. Calculate the Processor Load Ratio (PLR) that characterize the capacities of processors Note that: Calculate the workload assigned to processor p at time step t, workload(p,t)

Fire Simulation Example with Self-Optimization Algorithm With the self-optimization algorithm, the imbalance situation will be dramatically decreased. t = 1 t = 2Nt = N

Wildfire Autonomic Runtime Manager Wildfire Autonomic Runtime Manager (WARM) Resource State Application State Monitor Online Monitoring and Analysis Resource History Module Scheduler VCU Virtual Resource Unit NR1 Burned Heterogeneous, Dynamic Computational Environment SP2 ClusterBeowulf Linux MPP IBM SP2 tt NC M IBM SP2 Execution (DDWM VRUs) NR2 Burning NR3 Unburned NR2 Active Performance Model Planning Engine Knowledge Repository VCU Virtual Computation Unit System Capability Module Memory Bandwidth Availability Access Policy CPU Online Planning Autonomic Scheduling NR1 Burned 1 Run (DDWM) 2 3 4 5 6 7 8

Experimental results Problem size is 64K and number processors is 8 With self-optimization, the imbalance ratio will be controlled as close to the threshold. But without self-optimization, the imbalance ration will get larger as the simulation advances

Experimental results (contd) Problem size is 64K and number processors is 8. Without self-optimization, the execution times of processors for one time step will be heterogeneous as the simulation advances. With self-optimization, the execution times of processors for one time step will be almost evenly distributed as the simulation advances.

Experimental results (contd) Problem size (256*256 = 64K) Problem size (512*512 = 256K) Number of Processors Execution Time with Static Partition (s) Execution Time with Dynamic Partition (s) Percentage Improvement 82441.881540.5836.91% 161824.431132.7937.91% Number of Processors Execution Time with Static Partition (s) Execution Time With Dynamic Partition (s) Percentage Improvement 816868.0411244.4033.34% 1611121.667859.8929.33% 329093.396092.2333%

Memory-based Proactive Runtime Partitioning Optimize performance using memory-based approach minimize number of page faults and balance work among processors Memory function model for RM3D W is application workload, a i are PF-based heuristics Memory-based processor grouping and workload partitioning Lightly (X - ), moderately (X), or heavily (X + ) loaded groups based on 2-level threshold with N -, N, and N + processors respectively Work in group X - transferred to X + with unit of work being Sort processors in X + in ascending order of available memory Checks are made for processors with corresponding least available memory Threshold conditions for work transfers must be met After work transfers, new memory-based work partitioning ratios are computed as

Memory-based Proactive Runtime Partitioning Better performance moderately, heavily loaded scenarios Most processors have less available memory Frequent page faults resulting in long application delays Memory-based algorithm yields better performance Evaluation Scenario Lightly loaded Moderately loaded Heavily loaded Execution time without memory adaptation (seconds) 6922.1415890.4716962.1 Execution time with memory adaptation (seconds) 5210.877401.618284.84 Percentage improvement24.72%53.42%51.16% Memory-based proactive adaptation performance gain for RM3D application with base grid size 128*32*32 on 8 processors

CPU-based Proactive Runtime Partitioning Adaptive system sensitive partitioner uses system capacities and obtained performance function to compute the relative computational capacities of each processor System Capacity Calculation N processors, the total work to be assigned is L Runtime monitors application and system state Application state: level of refinement, number, shape and aspect ratio of refined patches System state: computational load, memory availability, link bandwidth Performance engine selects the appropriate performance function to predict the execution time of the application for next time step is the execution time on processor k The PF of RM3D on processor k for a given load X1 and AMR level X2 is empirically defined as:

CPU Based Proactive System Sensitive Runtime Partitioning CPU-based proactive partitioning performance gain on 16 processors. (Base grid size: 64 16 16) ScenariosExecution time w/o CPU adaptation (seconds) Execution time with CPU adaptation (seconds) Percentage Improvement Lightly loaded2126.06727.1765.8% Moderately loaded2301.151641.7328.66% Heavily loaded2378.251624.1531.71%

Autonomia Self-Healing analyzer monitoring Self healing monitoring and analyzing engine planning execution Knowledge Self healing planning and execution engine APPLICATION FAULT MANAGER Event server Mobile Agent System APPLICATION RUNTIME MANAGER Autonomic Middleware Services SELF-HEALING SERVICE AUTONOMIC RUNTIME SYSTEM Component FAult Manager Heterogeneous Environment AIK User application Application Management Editor

Self-Healing Engine

Self-Protection Methodology Online Monitoring Adaptive Analysis Self Healing Engine Data mining Statistic Engine Real Network Running Environment

Measurement Attributes for Different Protocols Inside a network element, the measurement attributes can be monitored at different protocol layers. During the attack (DoS attack, SQL slammer worm, email worm, etc.), significant behaviors will be observed. Impacted ProtocolsMeasurement AttributesObserved Behaviors App layer IF: invocation frequency of emails NIP/NOP: number of incoming/outgoing PDUs. IF increase 2 or 3 in order of magnitude NIP/NOP increases 1 to 2 in order of magnitude compared with normal scenario AR increases 1 or 2 in order of magnitude HTTP, DNS, SMTP, pop3 Transport layer NIP/NOP: number of incoming/outgoing PDUs. TCP/UDP Network layer NIP/NOP: number of incoming/outgoing PDUts AR: ARP Request rate. IP/ICMP/ARP

Illustrative Network Example 100 Mbps, router to router links.Router to client node links are 30 Mbps and 10 Mbps 150 clients, 30 routers - client networks 12 routers and 30 servers - server networks Traffic Configuration Legitimate client traffic through same interface as attack traffic to other servers Legitimate client traffic through different interface to attacked server Legitimate client traffic through same interface to attacked server and towards attack targets Legitimate server traffic (heavy) through different interface and towards other clients. Attack traffic Client Net 0 Client Net 1 Client Net 3 Server Net1 Server Net 2 Client Net 2

Abnormality Distance (AD) Abnormality Distance of measurement attributes is used as an abnormality metric for profile modeling of the component behavior. where and are the mean and variance under the normal operation condition corresponding to the online measurement of attribute k. Right figure shows the AD tcp_out based on the single measurement attribute measure where the larger magnitude of the AD tcp_out indicates the abnormal behavior that might be due to an attack. Packet Number AD TCP-out

Multivariate Analysis Techniques on Network Attack Detection Measurement Attributes tcpOut: legitimate outgoing TCP segments rate tcpTotal: legitimate outgoing and spoofed outgoing TCP segments rate NRC: Normal Region Center, which is the baseline profile for the normal state AD: Abnormality Distance UCL tcpout LCL tcpout UCL tcptotal tcpOut A tcpTotal LCL tcptotal NRC AD Normal Region

Validation on Attacker Side Spoofed TCP SYN Attack Attack intensity and duration are adjustable TCP SYN attack traffic is spoofed Number of incoming/outgoing packets only wont detect the attack existence Jointly with the total TCP network activity analysis can reveal the attack.

Autonomia Self-Protection Architecture Raw Traffic w.r.t. metric 1 Information Theory Autonomic Runtime Engine Online Monitoring Policy Translator Change Network Topology Abnormality function w.r.t metrics 1.. m Raw Traffic w.r.t. metric 2 Raw Traffic w.r.t. metric n Normal/ Abnormal Characterization Change Network Configuration Parameters Analysis Engine

Working Flow of the Analysis Engine 1.Information theory is used to identify the most important features that can be extracted from network data. 2.Genetic algorithm is used to train data and obtain the threshold and coefficients used by the linear rule for detection. 3.Threshold and coefficients are used to detect a wide range of attacks in the period of testing.

Network Attack Feature Extraction Feature(X)I(X;Y) Is_hot_login0 Land0 Root_shell0 Su_attempt0 Is_guest_login0.006 Flag0.062 Protocol_type0.304 Logged_in0.381 service0.571 Feature(X)I(X;Y) Is_hot_login0 Land0 Root_shell6e-06 Su_attempt5.3e-6 Is_guest_login0.0018 Flag0.0629 Protocol_type0.3116 Logged_in0.3931 service0.5927 Total Dataset DoS + Normal Feature(X)I(X;Y) Is_guest_login0 Is_hot_login0 Su_attempt0 Land0 Logged_in5.2e-5 Protocol_type7.3e-5 Flag0.0001 Root_shell0.003 service0.003 Feature(X)I(X;Y) Is_hot_login0 Land0 Su_attempt2.8e-5 Root_shell0.0002 Logged_in0.0021 Flag0.0033 Protocol_type0.0039 Is_guest_login0.0144 service0.0505 U2R+Normal R2L + Normal Feature(X)I(X;Y) Is_hot_login0 Land0 Su_attempt7e-06 Root_shell1.4e-5 Is_guest_login0.0022 Protocol_type0.0386 Logged_in0.0701 Flag0.0807 service0.1243 Probe + Normal Discrete Features Base dataset has a larger sample size Discrete feature provides little semantics information

Network Attack Feature Extraction (Cont.) Feature(X)I(X;Y) service0.571 logged_in0.381 protocol_type0.304 flag0.062 Is_guest_login0.006 su_attempt0 root_shell0 Land0 Is_hot_login0 Discrete Features on Total Dataset Feature(X)I(X;Y) count0.613353 dst_bytes0.504773 srv_count0.326754 src_bytes0.282306 same_srv_rate0.079569 srv_serror_rate0.066003 serror_rate0.061391 dst_host_count0.053339 Duration0.050635 dst_host_srv_count0.024559 num_root0.002558 rerror-_rate0.001 Continuous Features on Total Dataset Continuous Features Compared with the discrete features, some continuous features will provide more information to the final detection Information provided by the continuous features is much more meaningful Partition strategy is deployed in the discretization of the continuous features Heuristic algorithms (e.g. Genetic Algorithm) is used to determine the optimal partition Combining both discrete and continuous features will provide better detection rate

Experimental Results We compare our approach that is based on discrete features with fuzzy classifier evolved using Ctree and those of the winner group in the KDDCup99 contest. ClassOur ApproachCtreeWinner Entry Normal98.34%92.78%99.5% Dos99.33%98.91%97.1% U2R63.64%88.13%13.2% R2L5.86%7.41%8.4% PROBE93.95%50.35%83.3%

Results Discrete vs. Cont. & Combined We compare the results of using discrete and continuous features respectively ClassResults using Discrete Features Results using Continuous Features Normal98.34%98.45% 99.98% Dos99.33%99.93% 99.98% U2R63.64%75.34% 98% R2L5.86%41.34% 80% PROBE93.95%99.91%

Summary and Concluding Remarks Increased complexity, heterogeneity, uncertainty, and scale require new paradigms to design, control and manage systems and applications Systems and Applications need to operate reliably, securely, efficiently and cost-effectively Need Wholestic Approach that can dynamically integrate and address all these issues simultaneously at the layers of the system and application hierarchy Autonomic Computing Provides an interesting, pragmatic approach to address these issues Many challenges are ahead including composing and analyzing in real-time the operations and states of systems and applications need new bio-inspired metrics that accurately characterize and quantify the system and application normal and abnormal states

Documents

Autonomic Runtime System: Design and Evaluation for SAMR Applications * Salim Hariri High Performance Distributed Computing Laboratory The University of