Upload
kelly-lawrence
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Investigating Survivability Strategies for
Ultra-Large Scale (ULS) Systems
Vanderbilt University Nashville, Tennessee
Institute for Software Integrated
Systems
Jaiganesh [email protected]
www.dre.vanderbilt.edu/~jai
Dr. Aniruddha [email protected]
www.dre.vanderbilt.edu/~gokhale
Dr. Douglas C. Schmidt [email protected]
www.dre.vanderbilt.edu/~schmidt
Dr. Sherif [email protected]
www.isis.vanderbilt.edu/~sherif
3
Motivating Scenario for ULS• Impact of Service-Oriented
Architectures on enterprise distributed real-time & embedded (DRE) ULS systems• Applications composed of an
“operational string” of services• A service is an assembly of
components• Dynamic (re)deployment of
services into operational strings is necessary
• Performability = performance + survivability requirements
• Key challenges• Regulating & adapting to (dis)continuous changes in runtime environments
• e.g., online prognostics, dependable upgrades• Satisfying tradeoffs between multiple (often conflicting) QoS demands
• e.g., secure, real-time, reliable, etc.• Satisfying QoS demands in face of fluctuating and/or insufficient resources
• e.g., mobile ad hoc networks (MANETs)
4
Some Performability Challenges for ULS Systems
• Performability challenges in dynamic provisioning of operational strings & services
• Service workloads & resource capacity issues – service placement depends on workloads & available resources
• Service accessibility patterns – service survivability depends on its sharing degree
• Differentiated levels of QoS – affects resource provisioning & survivability strategies
• Operational string & service failover – different failover possibilities e.g., as a whole or part operational string or one service at a time
• No one-size-fits-all dependability strategy – cannot dictate one survivability strategy on all services & operational strings
Application performability addressed by resolving service placement & survivability problems
5
Model of ApproachModel addresses various concerns:
• Per-service concern: Choice of implementation
• Depends on resources, compatibility with other components in assembly
• Coupling concern: Choice of invocation & communication mechanism used
• Sharing concern: Shared services will need proactive survivability since it affects several services simultaneously
• Failure recovery concern: What is the unit of failover?
• Availability concerns: What is the degree of redundancy? What replication styles to use? Does it apply to whole assembly?
• Deployment concerns: How to select resources? How much sharing?
• Assembly concerns: What components to assemble dynamically? Configurations & optimizations for end-to-end performability?
Service placement & service survivability strategies address these concerns
6
Addressing the Service Placement Problem
Service placement algorithms must consider tradeoffs between providing performance to applications & providing survivability to applications, allocating resources either to primaries or replicas
• Service placement problem must consider:• Set of computation nodes attributed by:
• Processing index or capacity• Memory index or capacity• Survivability index
• Set of communication links attributed by:• Bandwidth index• Survivability index
• Set of components attributed by:• Different implementations offering
performance tradeoffs across quality dimensions
• Different implementations consuming various amounts of resources
• Constraints on being deployed as an assembly to offer a complete service
• Replica placement issues involve:• Different availability requirements for different assemblies of components:
• Multiple replicas needed, tolerate non-availability of replicas based on importance of assemblies• Replica resource provisioning depending on replication schemes used• Load balancing of replicas if resources available but introduce run-time problems on consistency
7
Addressing the Survivability Problem• A configurable approach to survivability including micro- (infrastructure) & macro- (assembly & operational
string) level strategies
• Micro-level strategies monitor infrastructure state to make proactive decisions at
• Component level (swapping & migration)
• Middleware level (configurations)
• Component Server Level (process resource allocations)
• Node level (multiple components)
• Macro-level strategies monitor assembly health to make failover decisions
• Failover based on type of failover unit
• Affects service placement decisions
• May involve load balancing
• State synchronization issues
• Replication styles (hidden by FT strategies)
• Initial prototype developed using Component-Integrated ACE ORB (CIAO) & Deployment & Configuration Engine (DAnCE) (www.dre.vanderbilt.edu)
• Future work on Data Distribution Service (DDS) & Distributed Real-time Specification for Java (DRTSJ)