24
Automated Multi-Tier System Design for Service Availability John Janakiraman Jose Renato Santos Yoshio Turner Hewlett Packard Laboratories

Automated Multi-Tier System Design for Service Availability

  • Upload
    buikhue

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Automated Multi-Tier System Design for Service Availability

Automated Multi-Tier System Design for Service Availability

John JanakiramanJose Renato Santos

Yoshio Turner

Hewlett Packard Laboratories

Page 2: Automated Multi-Tier System Design for Service Availability

page 206/23/2003 DSMS workshop - DSN 2003

Motivation:New Enterprise/Internet Computing Model

• Utility ComputingHP (Adaptive Enterprise), IBM (Autonomic Computing), SUN (N1)

– Shared resources allocated to services on demand• Virtual resources hide details of physical environment

– Self-managing system• Service life-cycle (creation, change, deletion)• Adaptation (load changes, component faults, etc.)

– High level service requirements specification• E.g. desired performance and availability instead of detailed design

• Automated System Design/Configuration– Required for utility computing– Our focus: design for service availability

Page 3: Automated Multi-Tier System Design for Service Availability

page 306/23/2003 DSMS workshop - DSN 2003

access tier

web tier

application tier

database tier

internetinternet internetinternet

access tier

web tier

application tier

database tier

internetinternet internetinternet

allocated physical resourceslogical resources specification

deployment

internetinternet intranetintranet

storage virtualization

network /servervirtualization

utility controller

server poolfirewall pool

switching pool

switching pool

storage pool

internetinternet intranetintranet

storage virtualization

network /servervirtualization

utility controller

server pool

load balancer pool

firewall pool

storage pool

• HP hardware and software solution that enables provision of computing resources to applications on demand.

HP Utility Data Center (UDC)

Page 4: Automated Multi-Tier System Design for Service Availability

page 406/23/2003 DSMS workshop - DSN 2003

Utility Computing Environment

• Higher level service specification– Functional specification– Performance requirement– Availability requirement

• Design/Configuration automation– System determines the required computing

resources for service• type of resources• amount of resources• topology• application and OS configurations

– Dynamically redesign and reconfigure due to changes (load, faults, etc…)

Physical Resources

ResourceAllocation

Virtual Resources

AutomaticDesign and

Configuration

ServiceSpecification

Page 5: Automated Multi-Tier System Design for Service Availability

page 506/23/2003 DSMS workshop - DSN 2003

Automated System Design for Availability

Goal: Automatically explore the space of infrastructure designs and availability mechanisms and select a design that meets availability requirements with minimum cost.

– Numerous availability mechanisms to select/configure (cost/benefit tradeoffs)• host failover, NIC failover, standby/active spare, database

checkpointing, application state checkpointing (on peer, on file, on database), data replication, software rejuvenation, etc…

• Different mechanisms and operating points have different cost, performance overhead and availability characteristics

Page 6: Automated Multi-Tier System Design for Service Availability

page 606/23/2003 DSMS workshop - DSN 2003

• Initial prototype for stand-alone environment• Current Scope:

Application type: Multi-tier services (e.g. 3-tier e-commerce)Availability requirement: Maximum Service Downtime per yearDesign space (limited set):

• Choice of server hw and sw components (no network, no storage)• Number of servers• State of servers (active or spare)• Repair strategies for component failures

• Key challenges: – How to model relevant properties of service and compute infrastructure– How to reason about alternative designs

AVED: Proof of Concept Tool

AVEDService Model

Service Requirements

Infrastructure ModelAvailable

Design

Page 7: Automated Multi-Tier System Design for Service Availability

page 706/23/2003 DSMS workshop - DSN 2003

Modeling ApproachInfrastructure Model• Component types: Basic elements that can fail

– Cost model– A set of failure modes

• A set of repair options for each failure mode (cost/benefit)Example: lp2000r(2-way x86 server) , rp8400 (8-way PA-RISC

server), Linux, WebLogic, etc.• Resource types: Unit of provisioning

– Set of componentsExample: App server = lp2000r+Linux+WebLogic

Service Model• Set of tiers

– Set of valid resource alternatives for each tier– Performance model for each tier

Service Requirements• Expected load (peak)• Maximum Annual Downtime

– System is assumed down when resources cannot sustain expected peak load

Component

Component

Component

Resource

Tier

Tier

Tier

Service

Page 8: Automated Multi-Tier System Design for Service Availability

page 806/23/2003 DSMS workshop - DSN 2003

AVED architecture

• Two availability evaluation engines supported:– AVANTO (HP availability

simulation tool)– Simple Markov model

Design Generator

Translatoravailability

model

AvailabilityEvaluation

Engine

Translator

Service Model Infrastructure Model

final design

candidate design

final design

AVED

availabilityestimate

Output

Service Requirement

GUI REPOSITORY

Page 9: Automated Multi-Tier System Design for Service Availability

page 906/23/2003 DSMS workshop - DSN 2003

Illustrating the use of AVED

• Application tier scenario

• Published component failure rates (hardware)

• Costs based on vendor published prices

• Reasonable assumptions for unavailable information– E.g. SW failure rates

Page 10: Automated Multi-Tier System Design for Service Availability

page 1006/23/2003 DSMS workshop - DSN 2003

Infrastructure ModelComponent Types

Infrastructure ModelResource Types

Service Model

Output DesignServiceRequirements

AVED GUI

Page 11: Automated Multi-Tier System Design for Service Availability

page 1106/23/2003 DSMS workshop - DSN 2003

Page 12: Automated Multi-Tier System Design for Service Availability

page 1206/23/2003 DSMS workshop - DSN 2003

Page 13: Automated Multi-Tier System Design for Service Availability

page 1306/23/2003 DSMS workshop - DSN 2003

Page 14: Automated Multi-Tier System Design for Service Availability

page 1406/23/2003 DSMS workshop - DSN 2003

Page 15: Automated Multi-Tier System Design for Service Availability

page 1506/23/2003 DSMS workshop - DSN 2003

Page 16: Automated Multi-Tier System Design for Service Availability

page 1606/23/2003 DSMS workshop - DSN 2003

0.1

1

10

100

1000

10000

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Ann

ual D

ownt

ime

Lim

it [m

ins]

Units of load (service specific)

1 - lp2000r/linux/AS-A, bronze, no spare2 - lp2000r/linux/AS-A, silver, no spare3 - lp2000r/linux/AS-A, gold, no spare4 - lp2000r/linux/AS-B, gold, no spare5 - lp2000r/linux/AS-A, platinum, no spare6 - lp2000r/linux/AS-A, bronze, 1 cold spare7 - lp2000r/linux/AS-B, bronze, 1 cold spare8 - lp2000r/linux/AS-B, silver, 1 cold spare9 - lp2000r/linux/AS-A, bronze, 1 active spare10 - lp2000r/linux/AS-A, silver, 1 active spare11 - lp2000r/linux/AS-A, gold, 1 active spare12 - lp2000r/linux/AS-B, gold, 1 active spare13 - lp2000r/linux/AS-A, platinum, 1 active spare14 - lp2000r/linux/AS-B, platinum, 1 active spare15 - lp2000r/linux/AS-A, bronze, 2 active spare16 - lp2000r/linux/AS-A, silver, 2 active spare17 - lp2000r/linux/AS-A, bronze, 3 active spare

Example of AVED use – Results

• Identifies optimal solution for a range of service requirements: load and maximum annual downtime

Design 9 optimal for requirement (1000 load units, 100minutes max

downtime)

Design 10 optimal for requirement (2000 load units, 100minutes max

downtime)

Page 17: Automated Multi-Tier System Design for Service Availability

page 1706/23/2003 DSMS workshop - DSN 2003

Example of AVED use – Results

• Availability/Cost tradeoff– e.g., relaxing downtime requirement from 1.5 min/yr to 2.5 min/yr

in a design for 800 load units reduces cost from $9500 to $6600

0

2000

4000

6000

8000

10000

12000

14000

0.1 1 10 100 1000 10000

Add

ition

al A

nnua

l Cos

t for

Ava

ilabi

lity

Downtime (minutes)

Load= 400 unitsLoad= 800 units

Load= 1,600 unitsLoad= 3,200 units

Page 18: Automated Multi-Tier System Design for Service Availability

page 1806/23/2003 DSMS workshop - DSN 2003

Automated Design in Self-Managing System

self-managing system

monitoringinfrastructure

automatedruntime mgmt

managed infrastructure

automateddeployment

infrastructurerepository

resource allocationengine

servicedescriptions

automated designengine

deploy /configure

monitor

service specification

automated designengine

Page 19: Automated Multi-Tier System Design for Service Availability

page 1906/23/2003 DSMS workshop - DSN 2003

Integrating AVED w/ automatic deployment

• Automatic OS deployment using network boot (PXE)• Automatic application deployment/configuration • Automatic configuration of failure detection, repair and

failover mechanisms • Extensions for closed-loop adaptive operation

– Integrate monitoring mechanisms (load, performance, recovery time, failure rates)

– Adapt AVED for incremental design/configuration changes

Page 20: Automated Multi-Tier System Design for Service Availability

page 2006/23/2003 DSMS workshop - DSN 2003

Conclusion

• AVED: proof of concept tool that automates the design and configuration of availability mechanisms– Important building block in utility computing environment– Also useful in stand-alone mode

• Future work– Extended cost models– More complex performance models

• Experimental models based on monitoring– Factor business cost of downtime– Alternative availability metrics

• Outage duration, degraded performance levels, etc.– Support for storage and network resources– Failure dependencies/correlations

Page 21: Automated Multi-Tier System Design for Service Availability
Page 22: Automated Multi-Tier System Design for Service Availability

page 2206/23/2003 DSMS workshop - DSN 2003

0

500

1000

1500

2000

0 1 2 3 4 5 6 7 8

Ann

ual D

ownt

ime

(min

utes

)

OS boot time (minutes)

Load= 3,200 unitsLoad= 1,600 units

Load= 800 unitsLoad= 400 units

Example of AVED use – Results

• downtime is sensitive to repair parameters, e.g., the OS boot time.– sensitivity analysis necessary for parameters with unreliable values– can select designs with lower sensitivity to such parameters

Page 23: Automated Multi-Tier System Design for Service Availability

page 2306/23/2003 DSMS workshop - DSN 2003

parametersused in

example

• design choices: type of machine/OS/application server resource, number of extra machines, state of extra machines (cold or active), repair option

N/A$030 secReset150 daysTransient$93500$85000Machine B M-B2 min$380/node38 hrsBronze1300 daysPermanent

$580/node15 hrsSilver

$750/node8 hrsGold

$1500/node6 hrsPlatinum

$1500/node6 hrsPlatinum

$750/node8 hrsGold

$580/node15 hrsSilver

2 min$380/node38 hrsBronze650 daysPermanent

N/A$030 secReboot90 daysCrash$0$2000App Server AS-B

N/A$02 minReboot30 daysCrash$0$1700App Server AS-A

N/A$04 minReboot365 daysCrash$0$200UnixN/A$02 minReboot60 daysCrash$0$0Linux

N/A$030 secReset75 daysTransient$2640$2400Machine A lp2000r

Failover TimeRepair CostMTTRRepair OptionMTBFFailure TypeCost ActiveCost ColdComponent

Cluster?

true251600 unitsM-B/Unix/AS-B

true25200 unitslp2000r/Linux/AS-B

true251600 unitsM-B/Unix/AS-A

true25200 unitslp2000r/Linux/AS-A

Max NodesNode capacity

Performance ModelResource

Inputs:Component Behaviors &

Costs

Inputs:Service

Characteristics

Page 24: Automated Multi-Tier System Design for Service Availability

page 2406/23/2003 DSMS workshop - DSN 2003

Relative cost

0

20

40

60

80

100

0.1 1 10 100 1000 10000Add

ition

al A

nnua

l Cos

t for

Ava

ilabi

lity

(%)

Downtime (minutes)

Load= 400 unitsLoad= 800 units

Load= 1,600 unitsLoad= 3,200 units