40

Cloud Data Center – Chicago Designed 2007/ Opened 2009 Generation 2 Deployment (SLA 99.999)Generation 3 Deployment (SLA 99.9) Physical Redundancy N+2,

Embed Size (px)

Citation preview

Azure Architecture Patterns change IT approaches to T<solutions>Ulrich (Uli) HomannPartner Software Architect, Microsoft [email protected]

DCIM-B214

Cloud Data Center – Chicago Designed 2007/ Opened 2009

Generation 2 Deployment (SLA 99.999)

Generation 3 Deployment (SLA 99.9)

Physical RedundancyN+2, Tier 3

Software Geo-RedundancyActive/Active nodes – geo-distributed

Raised Floor ITServers · Storage · Network

ContainerDC in a box

3x9s

Enterprise Architecture Service Architecture

Mainframe

N/S

Tra

ffic

E/W Traffic (active/active)

Seats 10,000 1,000,000,000

Talent Custodians Designers

Budget Fixed Cost Rates

Architectures Many Few

App Integration Loose Tight

Infrastructure Overhead Enabler

Reach Regional Global

Cost/Mb $1.74M $0.026M

Network $/server >$200 <$200

Hardware Custom Commodity

Availability Infrastructure Service

Operability MTBF MTTR

Reliability Hardware Software

Network Downtime Impacting Irrelevant

Network Availability 99.9999% 99.9%

Design Primary/Backup Active/Active

Speed Speed Performant

Deployment Time Weeks Minutes

Enterprise IT Cloud-scale

From the enterprise to the cloud

Changing Behavior with Microsoft Tools

SCRY

Microsoft’s SCRY measurement tool aligns actual resource use with charge back model

Tracking Carbon

Tracking UtilizationFrom Allocating by Space…

…To Allocating by Power

Tracking Power

Billing & Cost Allocation

$

What Does Moving to an Online Service Mean?

DEPLOYMENT

SERVICE CHANGESTANDARDIZATION

Single architecture

Limited configuration and customization options

Initial deploy is still required to migrate data to Office 365

AD clean up and network upgrade is often required

PRIVACY and SECURITY CONSIDERATIONS

Understand your internal security and privacy requirements

Balance between continuous innovations and minimize change

Customer controls IT policies but not feature availability

On-premises

Online

Lessons learned

8

Extreme Standardization

SLA-Driven Architecture

Process Maturity

Delegation & Control

Re-imagined Processes

Automation Change Control

Scale Out Application

Customer Self Service

DEFINETHE FABRIC

(Mostly) Yesterday’s Platform

Each layer “early bound” to layer belowMust provision entire stack for each layer instanceDifficult to balance isolation and utilization/efficiency

1. Purchase

OS2. InstallRole3. InstallApp4. Deploy

Context5. Configure

Requests

Today's Platform

Virtualization breaks the tight coupling between hardware & softwareSoftware stack is still mostly statically bound though…

OS

Role

App

Context

OS

Role

App

Context

Virtualization

“Fabric Based” Computing PlatformInfrastructure Fabric

OSRole

OSRole

OSRole

OSRole

OSRole

OSRole

OSRole

OSRole

OSRole

OSRole

OSRole

OSRole

OSRole

OSRole

InfrastructureFabric

Base infrastructure serves multiple workloads / rolesInfrastructure is managed as one resourceProvisioned to aggregate need rather than per project

Hardware becomes fungible

DEFINE THE FABRIC

o Offloaded Data transfer (ODX)

o Storage Spaceso Thin-Provisioningo Deduplicationo Tier-ing

Storage Consolidation

o High Performance & Share Nothing Live Migration

o System Center Multi Hypervisor support (Hyper-V, VMware, XEN)

o BitLocker Encryptiono Up to 64TB Virtual

Hard Disk (VHDX) Size

ServerVirtualization

Network Virtualization

o Software Defined Networking

o Virtual IP Address Management

o Datacenter Bridging

Access & Information Protection

o Windows Server & Azure Active Directory

o Active Directory Federation Services

Management

o PowerShell Automation, >3000 cmdlets

o Desired Configuration

o Windows Management Framework: WS-Management, REST, HTTP, PSRP

High Availability

o Hyper-V Replicao Windows Azure

Hyper-V Recovery Manager

System Center

Windows Server 2012

Hardware Stamp

Compute

Networking

Storage

Workloads

SQL

Lync VDI

SharePoint

Exchange

CRM

Fast Track Microsoft Private Cloud Fast Track

Guidance Sethttp://technet.microsoft.com/en-us/jj572811

Microsoft Azure

App services

Data services

Infrastructure services

Integration HPC Analytics

Web sites

Mobile services

Caching IdentityService

bus MediaCloud

services

SQL database

HDInsight Table

Blob storage

Virtual machine

sVirtual

network VPNTraffic

manager CDN

DEFINETHE SERVICE

Microsoft Confidential – Internal Use Only

FailSafe Services

Microsoft Confidential – Internal Use Only

Microsoft Confidential – Internal Use Only

SCALE ^

Health Endpoint Monitoring PatternSummary: Implement functional checks within an application that external tools can access through exposed endpoints at regular intervals. This pattern can help to verify that applications and services are performing correctly

http://aka.ms/Health-Endpoint-Monitoring-Pattern

Discover and assess risk using Resilience Modeling and Analysis

Resilience Modeling and AnalysisPhases

Document

Act

Rate

Discover

Identify failure pointsComponent interaction diagram

Prioritize reliability workRemediate against effects and validate mitigations

Record failure effectsAssess risk priority using Impact and Likelihood

Brainstorm failure modesDIAL categories (Discovery, Auth, Incorrectness, Limits, Component)

Resilience Modeling and AnalysisDocument - Component interaction diagram

Resilience Modeling and AnalysisDiscover

Discovery

Limits

Auth

Incorrectness

Name resolution service health or configurationCaller configuration

Timeouts and blockingService unavailable or unhealthy, throttlingFlooding, congestion, slow response times

Protocol and version mismatchCorruption, data fidelity, poison messageDuplicate request, invalid state, timing errors

Authentication service health or configurationResource authorization configuration

ComponentCode or configuration changesHangs, crashes, resource exhaustionFault domains

Resiliency Modeling and AnalysisRate – Assessing Risk

Effects

Likelihood

Resolution

Detection

Portion Affected

When this failure occurs, how deeply is the functionality impaired?

What is the frequency this failure is likely to occur?

How long does it take the automated system or human to restore functionality after the failure has been detected?

How long does it take until an automated system or human is notified to take corrective measures?

When this failure occurs, what portion of users or transactions are affected?

Impact

Likelihood

Resiliency Modeling and AnalysisAct – Prioritize and Mitigate

Impact Likelihood

IDComponent/ Dependency Interactions

Failure Short Name

Failure Description

Consequences EffectsPortion

AffectedDetection Resolution Likelihood

3Storage Layer -

> Azure Storage

Error 5xx from Azure

Storage::Service

Azure Storage may respond with error

Return Error to caller. Service

closed

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

4Storage Layer -

> Azure Storage

No Response from Azure

Storage::Service

Azure Storage may fail to respond

within the timeout period

No retry. Return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

5Storage Layer -

> Azure Storage

Latency from Azure

Storage::Service

Azure Storage component may

When memory pressure is

sufficient, return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Less than once a year

6Web Service ->

Server APILatency from

Server APIThe Server API may be slow to respond

Caller will timeout resulting in a client

retry.

Major impairment of

core functionality

Less than 2%More than 15

minMore than 45

minMore than

once a month

7Storage Layer -

> Azure Storage

Error 5xx from Azure

Storage::Service

Azure Storage may respond with error

Return Error to caller. Service

closed

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

8Storage Layer -

> Azure Storage

No Response from Azure

Storage::Service

Azure Storage may fail to respond

within the timeout period

No retry. Return Error to caller.

Major impairment of

core functionality

Less than 50%More than 15

minMore than 45

minMultiple times

a year

9 Azure DNSAzure DNS

Failure::ClientAPI

The Azure DNS system may fail to

respond

Error DNS not found returned to

caller.

Major impairment of

core functionality

Less than 2%Between 5 min and 15

min

More than 45 min

Less than once a year

Risk

Impact Likelihood

IDComponent/ Dependency Interactions

Failure Short Name

Failure Description

Consequences EffectsPortion

AffectedDetection Resolution Likelihood

4Storage Layer -

> Azure Storage

No Response from Azure

Storage::Service

Azure Storage may fail to respond

within the timeout period

No retry. Return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

3Storage Layer -

> Azure Storage

Error 5xx from Azure

Storage::Service

Azure Storage may respond with error

Return Error to caller. Service

closed

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

5Storage Layer -

> Azure Storage

Latency from Azure

Storage::Service

Azure Storage component may

When memory pressure is

sufficient, return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Less than once a year

7Storage Layer -

> Azure Storage

Error 5xx from Azure

Storage::Service

Azure Storage may respond with error

Return Error to caller. Service

closed

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

6Web Service ->

Server APILatency from

Server APIThe Server API may be slow to respond

Caller will timeout resulting in a client

retry.

Major impairment of

core functionality

Less than 2%More than 15

minMore than 45

minMore than

once a month

8Storage Layer -

> Azure Storage

No Response from Azure

Storage::Service

Azure Storage may fail to respond

within the timeout period

No retry. Return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

9 Azure DNSAzure DNS

Failure::ClientAPI

The Azure DNS system may fail to

respond

Error DNS not found returned to

caller.

Major impairment of

core functionality

Less than 2%Between 5 min and 15

min

More than 45 min

Less than once a year

Impact Likelihood

IDComponent/ Dependency Interactions

Failure Short Name

Failure Description

Consequences EffectsPortion

AffectedDetection Resolution Likelihood

4Storage Layer -

> Azure Storage

No Response from Azure

Storage::Service

Azure Storage may fail to respond

within the timeout period

No retry. Return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

3Storage Layer -

> Azure Storage

Error 5xx from Azure

Storage::Service

Azure Storage may respond with error

Return Error to caller. Service

closed

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

7Storage Layer -

> Azure Storage

Error 5xx from Azure

Storage::Service

Azure Storage may respond with error

Return Error to caller. Service

closed

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

5Storage Layer -

> Azure Storage

Latency from Azure

Storage::Service

Azure Storage component may

When memory pressure is

sufficient, return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Less than once a year

8Storage Layer -

> Azure Storage

No Response from Azure

Storage::Service

Azure Storage may fail to respond

within the timeout period

No retry. Return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

6Web Service ->

Server API

Latency from Azure

Storage::Service

The Server API may be slow to respond

Caller will timeout resulting in a client

retry.

Major impairment of

core functionality

Less than 2%More than 15

minMore than 45

minMore than

once a month

9 Azure DNSAzure DNS

Failure::ClientAPI

The Azure DNS system may fail to

respond

Error DNS not found returned to

caller.

Major impairment of

core functionality

Less than 2%Between 5 min and 15

min

More than 45 min

Less than once a year

Impact Likelihood

IDComponent/ Dependency Interactions

Failure Short Name

Failure Description

Consequences EffectsPortion

AffectedDetection Resolution Likelihood

4Storage Layer -

> Azure Storage

No Response from Azure

Storage::Service

Azure Storage may fail to respond

within the timeout period

No retry. Return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

7Storage Layer -

> Azure Storage

Error 5xx from Azure

Storage::Service

Azure Storage may respond with error

Return Error to caller. Service

closed

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

3Storage Layer -

> Azure Storage

Error 5xx from Azure

Storage::Service

Azure Storage may respond with error

Return Error to caller. Service

closed

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

5Storage Layer -

> Azure Storage

Latency from Azure

Storage::Service

Azure Storage component may

When memory pressure is

sufficient, return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Less than once a year

8Storage Layer -

> Azure Storage

No Response from Azure

Storage::Service

Azure Storage may fail to respond

within the timeout period

No retry. Return Error to caller.

Major impairment of

core functionality

More than 50%

More than 15 min

More than 45 min

Multiple times a year

6Web Service ->

Server APILatency from

Server APIThe Server API may be slow to respond

Caller will timeout resulting in a client

retry.

Major impairment of

core functionality

Less than 2%More than 15

minMore than 45

minMore than

once a month

9 Azure DNSAzure DNS

Failure::ClientAPI

The Azure DNS system may fail to

respond

Error DNS not found returned to

caller.

Major impairment of

core functionality

Less than 2%Between 5 min and 15

min

More than 45 min

Less than once a year

RiskRiskRisk

Retry PatternSummary: Enable an application to handle anticipated, temporary failures when it attempts to connect to a service or network resource by transparently retrying an operation that has previously failed in the expectation that the cause of the failure is transient. This pattern can improve the stability of the application.

http://aka.ms/Retry-Pattern

Circuit Breaker PatternSummary: Handle faults that may take a variable amount of time to rectify when connecting to a remote service or resource. This pattern can improve the stability and resiliency of an application.

http://aka.ms/Circuit-Breaker-Pattern

Throttling PatternSummary: Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. This pattern can allow the system to continue to function and meet service level agreements, even when an increase in demand places an extreme load on resources.

http://aka.ms/Throttling-Pattern

34

Solution Design Patternshttp://aka.ms/Cloud-Design-Patterns

Copies of the poster are available at the SCT booth…

Come Visit Us in the Microsoft Solutions Experience!

Look for Datacenter and Infrastructure ManagementTechExpo Level 1 Hall CD

For More InformationWindows Server 2012 R2http://technet.microsoft.com/en-US/evalcenter/dn205286

Windows Server

Microsoft Azure

Microsoft Azurehttp://azure.microsoft.com/en-us/

System Center

System Center 2012 R2http://technet.microsoft.com/en-US/evalcenter/dn205295

Azure PackAzure Packhttp://www.microsoft.com/en-us/server-cloud/products/windows-azure-pack

Resources

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

msdn

Resources for Developers

http://microsoft.com/msdn

TechNet

Resources for IT Professionals

http://microsoft.com/technet

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

Complete an evaluation and enter to win!

Evaluate this session

Scan this QR code to evaluate this session.

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.