s317299 Deploy Riskfree Db Replay NO GOOD OCM

Embed Size (px)

Citation preview

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    1/75

    1

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    2/75

    Deploy New Features Risk Free Using Database Replay

    Prabhaker Gongloor, Oracle CorporationPaula Camporaso, Rajeev Sethi, SolyndraTom Robertson, Nationwide Insurance

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    3/75

    The following is intended to outline our general productdirection. It is intended for information purposes only,

    and may not be incorporated into any contract. It is nota commitment to deliver any material, code, orfunctionality, and should not be relied upon in making

    3

    . , ,

    timing of any features or functionality described forOracles products remains at the sole discretion ofOracle.

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    4/75

    Agenda

    Deploying New Features - Challenges & Solution

    Database Replay Overview

    Database Replay Enhancements: Oracle Database11g Release 2

    Strategy and Best Practices

    4

    Real-world Customer Case Studies

    Conclusion

    Please visit us at: OOW Demo Grounds Moscone West 038/039

    S318966: Database and Application Testing HOL, Wed:4.45-5.45 pm

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    5/75

    Deploying New Features Challenges and Solution

    Customers want to take advantage of new features, but

    No easy way to mitigate change risk

    Impossible to test new features with real-world workloads No end-to-end testing solution

    Significant risk to production instabilities, SLAs violated, fire-fighting, etc.

    5

    Higher quality testing

    Rapid technology adoption

    As a result, businesses can adopt new features at Lower cost

    Lower risk

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    6/75

    Real Application Testing Features

    Capture

    Workload

    Create Test

    System

    End-to-end testing with real workloads

    6

    SQL Performance Analyzer

    SQL unit testing for response time

    Identify and tune regressed SQL

    Replay

    Workload Deploy Replay

    Clients

    Database Replay

    Load, performance testing for throughput

    Remediate application concurrency problems

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    7/75

    Agenda

    Deploying New Features - Challenges & Solution

    Database Replay Overview

    Database Replay Enhancements: Oracle Database11g Release 2

    Strategy and Best Practices

    7

    Real-world Customer Case Studies

    Conclusion

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    8/75

    Database load and performancetesting with real productionworkloads

    Production workload characteristicssuch as timing, transactiondependency, think time, etc., fullymaintained

    Test and measure transaction

    TestProductionReplay DriverClients

    Oracle Real Application TestingDatabase Replay

    8

    throughput improvements Identify application scalability and

    concurrency problems with newfeatures

    Remediate issues pre-productionfor risk-free change

    Supports migrations from Oracle9iR2 and 10gR2

    Capture Process ReplayAnalysisReportin

    Storage Storage

    *MOS 560977.1: Real Appl ication Testing for Earl ierReleases

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    9/75

    File 1

    Replay Clients

    Workload Replay Architecture

    Replay captured workload Replayed operations see

    the same data and

    perform the same work Preserve timing andconcurrencycharacteristics

    Timing PreservationTiming Preservation

    Re-mapping

    e

    File N

    Metadata

    Replay Files

    Server ServerServer

    connections Replay Client

    Multithreaded OCI Client

    Drives multiple captured

    processes Scalable Architecture

    Interprets capture intosequence of OCI calls

    Functional replay

    Background

    Commit Order Synchronization

    Sequence Replay

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    10/75

    Database Replay - Supported Changes

    Changes

    Unsupported

    ClientClient

    Client

    Middle Tier

    10

    Changes SupportedDatabase Upgrades, Patches

    Schema, Parameters

    RAC nodes, Interconnect

    OS Platforms, OS Upgrades

    CPU, Memory

    Storage

    Etc.

    Storage

    Recording ofExternal Client

    Requests

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    11/75

    Agenda

    Deploying New Features - Challenges & Solution

    Database Replay Overview

    Database Replay Enhancements: Oracle Database11g Release 2

    Strategy and Best Practices

    11

    Real-world Customer Case Studies

    Conclusion

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    12/75

    Database Replay Enhancements

    New in Oracle Database 11g Release 2

    Earlier restrictions removed with support for

    Shared Server configuration

    Streams apply workload

    Replay filter support to target sub-set workload

    Similar to existing capture filters Include/Exclude

    12

    , , , , .

    API support only in current release

    Integration of SPA and Database Replay

    Allows SQL-centric analysis when using Database Replay

    Simultaneously captures SQL workload into two different STS duringworkload capture and replay

    SPA Report built from the two STS captured helps understand workloaddrift

    Uses STS Compare functionality to highlight new, missing, top SQL,changes in execution plans, #SQL executions etc.

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    13/75

    Database Replay Enhancements

    New in Oracle Database 11g Release 2

    Workload Analyzer

    New tool for assessing quality of workload capture and its replayability

    Identifies potential problems and recommends appropriate remediation

    Provides insight into workload capture quantifies percentage of capturedDB Time that is unreplayable

    Rule-based analysis executed as part of pre-processing

    EM and API support (DB release 11.2.0.2 and above)

    13

    ownoa or ear er re eases an new rues

    Workload Analyzer recommendations example follows

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    14/75

    Workload Analyzer: Recommendations

    Run Workload Analyzer on captured workload and followrecommendations to improve replay quality, for e.g.,

    Significant in-flight transactions: Capture for longer duration orrestart database

    Workload sensitive to sysdate/time: Reset system clock

    Screenshot of Workload Analyzer output

    14

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    15/75

    Database Replay Enhancements

    New in Oracle Database 11g Release 2

    Replay Compare Period Report

    Provides holistic view of the experiment covers functional and performanceaspects of testing

    Replay Divergence Summary categorization indicates if further analysis isnecessary: LOW/MED/HIGH

    Two reports are available

    15

    Capture Vs Replay, Replay Vs Repla

    Identifies interference from other workloads, e.g., maintenance windows or othernon-replayed workload

    Automatically runs ADDM

    Reports provide more accurate performance analysis

    Uses enhanced ASH infrastructure for capture/replay sessions

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    16/75

    Agenda

    Deploying New Features - Challenges & Solution

    Database Replay Overview

    Database Replay Enhancements: Oracle Database11g Release 2

    Strategy and Best Practices: Deploying New Features

    16

    Real-world Customer Case Studies Conclusion

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    17/75

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    18/75

    Testing New Database FeaturesThe Right Approach

    Step 1: Upgrade to Oracle Database 11g

    Test the impact of 11g Upgrade on the peak workloadcaptured on production system & make sure no negativeeffects due to the upgrade

    Ste 2: Introduce new features

    18

    Then introduce one feature at a time on theworkload and test the workload impact - RAC,TDE, Advanced Compression

    Well walk through the mentioned scenario using Siebel workload covering:

    Recommended testing strategy Best practices Replay analysis

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    19/75

    Siebel Workload Description

    Siebel PSPP workload used for testing DB upgrade scenario Used internally for upgrade certification and new feature uptake

    Siebel 8.0, 1300 users: 700 financial call center, 600 financial

    partner manager

    Financial call center scenario: Creates new contact

    19

    Creates new opportunity for the contact Add products to the opportunity

    Creates quotes

    Converts quotes to order

    Financial partner manager scenario Creates a new service request

    Assigns the service request

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    20/75

    Real Application Testing: Recommended Method

    NoDONE?

    Step 2. SQL response/unit testing: SPA*

    Step 1. Setup Test System

    20

    Step 3. Load Testing: Database Replay

    DONE?

    Deploy Change and Tuning

    Yes

    YesNo

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    21/75

    Real Application Testing: Recommended Method

    NoDONE?

    Step 2. SQL response/unit testing: SPA*

    Step 1. Setup Test System

    21

    Step 3. Load Testing: Database Replay

    DONE?

    Deploy Change and Tuning

    Yes

    YesNo

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    22/75

    Step 1: Setup Test System: Best Practices

    Apply recommended patches and use latest software on testand production

    MOS Note: 560977.1

    Test system should be as close to production as possible

    22

    Similar HW/OS where possible, unless this is being tested Full dataset should be close or same as production data to avoiddivergence

    Validate no missing schema objects (indexes, views, etc.) ontest system

    Use Oracle Enterprise Manager 11g Change and ConfigurationManagement Packs to understand drift between test and production

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    23/75

    Step 1: Setup Test System: Best Practices

    Use Database Flashback, Flash Recovery Area, andGuaranteed Restore Points

    Helps reset database to point of capture

    Use Oracle Enterprise Manager Grid Control Releases10.2.0.5 or 11.1

    23

    Supports end-to-end workflow including test system creationand cloning

    Best practice based workflows

    Disable maintenance windows and background jobs

    Avoid workload interference, background jobs may alreadyexist on test system

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    24/75

    Real Application Testing: General Recommended

    Strategy

    NoDONE?

    Step 2. SQL response/unit testing: SPA*

    Step 1. Setup Test System

    24

    *Session S317300 : Avoiding SQL PerformanceRegressions New Techniques for Solving an Old Problem

    Step 3. Load Testing: Database Replay

    DONE?

    Deploy Change and Tuning

    Yes

    YesNo

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    25/75

    Step 1: SQL Performance Analyzer (SPA) Testing

    Always use SPA before DatabaseReplay to help reduce testingcycles

    Most changes such as patch-sets,upgrades may result in plan changes

    SPA is the best tool to perform SQL-centricanalysis

    SPA trials complete quickly relative to DB

    25

    epay runs

    SPA trials can be repeated withoutrestoring database

    Use SPA to identify regressedSQL and remediate them

    Gives you a mechanism to revert back toold plans if they are better

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    26/75

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    27/75

    Real Application Testing: General Recommended

    Strategy

    NoDONE?

    Step 2. SQL response/unit testing: SPA*

    Step 1. Setup Test System

    27

    Step 3. Load Testing: Database Replay

    DONE?

    Deploy Change and Tuning

    Yes

    YesNo

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    28/75

    Capture peak or interesting workload

    Run Workload Analyzer and followrecommendations

    Recommended Testing Methodology with Database

    Replay (1)

    Prod

    Test

    28

    Establish Replay Baseline (first replay)without any change

    Make one change at time, replay

    workload

    Review Replay Compare Period, AWRreports, tune until done

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    29/75

    Recommended Testing Methodology with Database

    Replay (2)

    Test one change at a time to understand causality

    Exception to this rule is when upgrading to Oracle Database 11g

    Startwith small duration capture, e.g., 30-60 min, perform end-to-end testing, then iteratively move on to longer duration testing

    29

    This strategy will quickly unravel any test system setup issues Makes it easier debug potential issues

    Establish Replay Baseline

    Use Replay Compare Period Report to understand Baseline (first replay)deviations from production capture

    Perform replay analysis, understand divergence (covered in Replay Analysis)

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    30/75

    Recommended Testing Methodology with Database

    Replay (3)

    Once Replay Baseline is established, compare two replays in thesame environment

    Baseline to Replay NReplay N-1 to Replay N for incremental changes and tuning

    In addition to replay divergence, use using application metrics to

    30

    validate replayFor e.g., possible application metrics such as calls records processed /hr, ordersentered/ min

    Save workload, tuning (SQL Profiles), AWR export, and Replay reportsafter each run

    Test systems may need to be refreshed or testing done at a later time

    Better safe than sorry!

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    31/75

    Recommended Testing Methodology with Database

    Replay (1)

    Capture peak or interesting workload

    Run Workload Analyzer and followrecommendations

    Prod

    Test

    31

    Establish Replay Baseline (first replay)without any change

    Make one change at time, replay

    workload

    Review Replay Compare Period, AWRreports, tune until done

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    32/75

    Database Replay - Workload Capture Tips

    How do I estimate capture disk space? Maximum of

    Extrapolated size from smaller duration capture (10-30min)

    2 * Bytes received via SQL*Net from client statistics (from AWR

    report)

    32

    In general, the answer is NO

    For high number of in-flight transactions/busy system

    Follow Workload Analyzer recommendations

    Replay can still be done, but replay analysis should factorpossible divergence

    Application validation for capture duration can help determine ifreplay quality is good

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    33/75

    Database Replay - Workload Capture: Best

    Practices

    Filter background activity

    For e.g., monitoring infrastructure - STATSPACK, OMS, EM

    Save AWR performance data

    33

    Create AWR baseline or export AWR after workload capture to avoid purgingof AWR data

    Capture SQL workload into STS along with Database Replayworkload capture

    Automated workflow in Oracle Database Release 11.2.0.2

    The same can be done manually using API or EM in earlier releases

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    34/75

    Recommended Testing Methodology with Database

    Replay (1)

    Capture peak or interesting workload

    Run Workload Analyzer and followrecommendations

    Prod

    Test

    34

    Establish Replay Baseline (first replay)without any change

    Make one change at time, replay

    workload

    Review Replay Compare Period, AWRreports, tune until done

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    35/75

    Run Workload Analyzer and Recommendations

    Run Workload Analyzer on captured workload and followrecommendations to improve replay quality, for e.g.,

    Significant in-flight transactions: Capture for longer duration orrestart database

    Workload sensitive to sysdate/time: Reset system clock

    Screenshot of Workload Analyzer output

    35

    R d d T ti M th d l ith D t b

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    36/75

    Recommended Testing Methodology with Database

    Replay (1)

    Capture peak or interesting workload

    Run Workload Analyzer and followrecommendations

    Prod

    Test

    36

    without any change

    Make one change at time, replayworkload

    Review Replay Compare Period, AWRreports, tune until done

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    37/75

    Database Replay Workload Replay Best Practices

    Resolve and correct external dependencies, e.g., db

    links, external files

    37

    recommended by wrc calibrate command

    Replay clients should not be co-located with database tierto avoid contention

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    38/75

    Database Replay Workload Replay Analysis

    nalAnalysisInvestigate errors, data

    divergence - Is it smallpercentage of overall calls?

    Target replay for 80-90% usercalls successful

    Is divergence limited to fewobjects, schemas?

    nceA

    nalysis

    Only after ReplayFunctional Analysis isperformed

    38

    F

    uncti

    Can divergence be ignored? Background jobs Belongs to non-critical

    business flows

    High divergence usually pointsto test system setupincorrectly (missing objects.)

    Use application metrics forvalidation

    P

    erform

    Period, AWR, ADDM,reports, etc.

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    39/75

    Step 2: Database Replay Testing and Analysis (new

    screenshot to be added)

    39

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    40/75

    Step 2: Replay Summary (Contd.): Errors and Data

    Divergence

    40

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    41/75

    Step 2: Replay Errors and Data Divergence Analysis

    41

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    42/75

    Step 2: Database Replay Analysis

    42

    1

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    43/75

    Replay Compare Period Report

    Your new best friend in analyzing replay information!! Provides holistic view of the experiment covers functional and

    performance aspects of testing

    Replay Divergence Summary categorization indicates if further

    analysis is necessary: LOW/MED/HIGH

    Two reports are available Capture Vs Replay, Replay Vs Replay

    43

    Identifies interference from other workloads, e.g., maintenancewindows or other non-replayed workload

    Automatically runs ADDM

    Reports provide more accurate performance analysis Uses enhanced ASH infrastructure for capture/replay sessions

    R l C P i d C t V R l

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    44/75

    Replay Compare Period: Capture Vs Replay

    Baseline

    1

    2

    3

    43.57 3.73

    Hours

    Duration

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    0.04

    %

    Divergence

    0

    5000

    10000

    15000

    20000

    25000

    3000025055.79

    11067.96

    Seconds

    Total CPU Time

    Low Divergence CPU Time is better!

    Replay Elapsed Time

    almost same as Capture

    44

    Capture Replay BaselineCapture Replay Baseline

    0

    200

    400

    600

    800

    10001200

    1400

    1600

    Capture Replay Baseline

    1486.55

    854.14

    Seconds

    I/O Wait Time

    ap ure epay ase ne

    10.1

    10.2

    10.3

    10.4

    10.5

    10.6

    10.7

    10.8

    10.9

    11

    Capture Replay Baseline

    10.945

    10.395

    GB

    Total Physical Write

    Compare Period

    Report Link

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    45/75

    Excerpts of Replay Compare Period ReportImportant Changes between Capture and Replay Baseline

    45

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    46/75

    Oracle Database 10g Release 2 11g Upgrade:Summary

    After upgrading to 11g without any other change,performance remains almost the same

    Very low divergence rate, limited to background

    Change Accepted

    46

    No other issues are noted during replay of the peakworkload

    Further tuning can be performed or other newfeatures can be added one at a time.

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    47/75

    Introducing New Features after Upgrade

    After Database Upgrade, introduce new features oneat a time

    TDE

    Advanced Compression

    RAC

    47

    Exadata real-world example follows

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    48/75

    Agenda

    Deploying New Features - Challenges & Solution

    Database Replay Overview

    Database Replay Enhancements: Oracle Database11g Release 2

    Strategy and Best Practices: Deploying New Features

    48

    Real-world Customer Case Studies Conclusion

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    49/75

    Agenda

    Deploying New Features - Challenges & Solution

    Database Replay Overview

    Database Replay Enhancements: Oracle Database11g Release 2

    Strategy and Best Practices: Deploying New Features

    49

    Real-world Customer Case Studies Conclusion

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    50/75

    r l R l A li ti n T tin

    Solyndra Confidential

    A Solyndra Success Story

    Paula Camporaso

    Vice-President, Information Technology

    Sept, 2010

    S l d B k d

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    51/75

    Solyndra Backgrounder

    5 year old company Solar Manufacturer3 years ago..

    100 Employees, few contractors No Manufacturing 40K sf. total

    Solyndra Confidential

    Today.. 1200+ Employees, 250+ Mfg Temps 1 Fab at ~Mfg capacity

    FAB2 in Phase I 1,350,000 sf.

    51

    O l R l A li i T i

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    52/75

    Oracle Real Applications Testing

    Todays presentation:

    A real Story about how Solyndra was able to ramp ourFactories with minimal risk of change in a 24/7

    -

    Solyndra Confidential

    Application Testing Application.

    Show of Hands..

    How many have been Reproduce Production Workload ina Test Environment?

    How many without Test Scripts?

    How many without Custom Code and/or Many Man hours?52

    IT F t S t A t t d F t

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    53/75

    IT Factory Systems - Automated Factory

    Factory Tools/ Automation are not only interoperablewith Factory systems, they are interdependent.

    Key Measures of Success:

    Solyndra Confidential

    Factory Systems - 24/7 Production with little to No downtime. Ensure Production runs at speed of Tools and Automation

    Not IT Networks or Information Systems

    Must scale without re-engineering..and in most cases, without downtime..

    53

    E ti l 24/7 F t IT S t

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    54/75

    Essential 24/7 Factory IT Systems

    MES Control / Track Product flow (WIP Routes)

    MCS - Automated Material control (AGVs)FIS - Quality Tool/ Product data

    Solyndra Confidential

    SPC - Quality/ Statistical Process ControlMany System/Tool Interfaces (100+)

    Must perform faster than Tools/ Automation (Real-time robotics) Be available 24/7 (1 planned downtime per year) Must Scale without reengineering and/or downtime

    54

    Solyndra IT Challenges

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    55/75

    Solyndra IT ChallengesHyper-Growth Change mgmt

    Fab1 was the first of its kind. Everything was New

    FCS in Q3/08 Q4 Factory producing product 24/7Growth was 2x+ Qtr over Qtr through most of 2009

    Solyndra Confidential

    IT Factory Systems required to Automate ProductionTransaction Volume and Databases doubling by Qtr

    Growing DBs showing signs quickly of slowing performanceApplications and Databases were changing almost daily

    No way to truly reproduce Production workload in Test Env55

    Ph I

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    56/75

    Phase I

    Fix what we know- Low hanging Fruit Implemented Load Balancing - Eased change mgmtRe-architect MES Application for reduced IO by 125%Upgraded all Oracle DBs to 11g Use DB tools

    Nehalem Server 35% improvement

    Solyndra Confidential

    - ~ Oracle DB Compression ~ 40% improvement Oracle Partitioning for Xact tables ~ 50% improvement

    9Yet we still needed significantly more performance runway to support even2009 Production Plan (6x to be exact)

    56

    Phase II

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    57/75

    Phase IIStill needed 6x improvement for FAB1

    Bottom-Line

    Needed to perform interative tuning scenarios In a Safe/Test EnvThus, we needed a tool able to reproduce production workload in TestWe consulted with Oracle on their Real Application Testing Product

    Solyndra Confidential

    Oracle Real Application Testing Eval deployed

    Captured and Replayed workload under several Scenarios Implemented several tuning changes with predictable results Able to isolate Storage IO as primary remaining bottleneck This was huge ~ We were seeing the results of multiple change

    Scenarios, previously only possible In a Production Environment

    57

    S f RAT A l i / Fi di

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    58/75

    Summary of RAT Analysis/ Findings

    Database Storage was our biggest bottleneck.

    Launched Storage Selection Analysis process

    -

    Solyndra Confidential

    HP 15% improvement

    Hitachi 20% improvement

    Oracle Exadata 20X improvement

    58

    Oracle Real Application Testing Results

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    59/75

    Oracle Real Application Testing ResultsWow

    Solyndra Confidential59

    9 IO Waits directly track to Factory Systems performance on the Mfg Floor9 Eliminating IO Waits, we knew would provide substantial improvements

    Exadata Validation of Test Results

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    60/75

    Exadata Validation of Test Results10X-27X Performance Improvement

    Solyndra Confidential60

    Average 10x

    Real Application Testing results: Exadata

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    61/75

    Real Application Testing results: Exadata

    Overall 4x improvement in statistical performance

    Average 27x improvement in monster query performanceAlmost complete elimination of IO Wait Time

    Solyndra Confidential

    Super linear scalability(2x load takes 1.15x DB time)

    Overall ~ 20X improvement on the Factory Floor

    61

    Learnings incorporated into FAB2 Architecture

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    62/75

    Learnings - incorporated into FAB2 Architecture(FAB2 = 4x Production volume of FAB1 @ capacity)

    24/7 Apps must be able to Reproduce Workloadproduction in Test

    ORAT is the only tool we know that can truly Capture andPlay transactions exactly, providing needed insight tomitigate the risk of change

    Solyndra Confidential

    For Solyndra, Flash Storage technology is key for ourAutomated Factory performance needsExadata tested and realized 10X-20X performance gains

    over all other Storage evaluated (Using ORAT)Essential to Proactively identify issues while in Test(2) Exadatas currently in production in FAB2

    62

    RAT Business Value/ ImpactSummary

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    63/75

    RAT Business Value/ Impact Summary

    The risk of making changes to our fast growing IT FactorySystems without a tool like Real Application Testing would

    be prohibitively High.

    Final Measure of Success:

    Solyndra Confidential

    Solyndra has not had a single Factory System/Databaseperformance issue nor a reboot since April 2009.

    63

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    64/75

    Real Application Testing

    NationwideInsurance

    TomRobertsonDatabaseTechnologyArchitectInfrastructureandOperations

    September,2010

    Overview

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    65/75

    ApplicationsandsystemsclosetomaximumutilizationORdueforhardwarerefresh Complexity:reportinginstances,increasedstoragecostsandcapacity,informationnot

    availableintimelymanner CanRealApplicationTestingprovideamethodtovalidatenew commodityhardware,

    platformstrategy? Cancommodityplatformshandlehigh OLTPandmixedworkloads?

    Challenges

    NationwideInsurance 65

    UseDatabaseReplaytocapturebatchandOLTPworkloadsforpeakperiods UseDatabaseReplaytoexecuteworkloadonnewsystems,measuresystem

    performance,resourceutilization,andanySQLregressionSolutionApproach

    Benefit

    Replayworkloadonnewsystems:2xto12xproductionvolumes Processperformanceimprovementsnoted:2xto10x Validatednewfeatures upto67xreductioninspacethroughAdvanced

    Compression*

    Validatednewhardwarehandlespeakworkloadwithexcesscapacityanddramaticallyimprovedperformance

    Use Case and Load Testing Results

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    66/75

    SystemandWorkloadPackaged

    Application

    and

    Custom

    Reporting

    Databases

    OracleDatabase9.2.0.8Release:2unique

    databasesandInstances

    Movingto

    RAC

    Platform,

    x86,

    database

    upgrade11g:singledatabasewith

    multipleinstances 100

    LegacyServerModel CaptureRISCCPUs:8x2 @2150MHz

    Memory:64Gb

    ReplacementServerModel ReplayCISCCPUs:IntelXeon2.27Ghz2x4

    Memory:72

    Gb

    RAM

    NationwideInsurance 66

    Validated thatmigratedsystemcanhandle

    peak/mixedworkload

    at

    20%

    utilization,

    obtainedrangeof2x 10xDBtime

    improvementforvariousdatabases

    x86and

    RAC

    platforms

    here

    generate

    an

    averagesavingsonhardwareof85%

    and25%+forsoftware

    0

    20

    40

    60

    Host Load DBTime

    Capture

    Replay

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    67/75

    Conclusion

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    68/75

    Conclusion

    Real Application Testing enablesbusinesses to safely deploy newdatabase features using realworkloads

    Increases business agility anduptime

    68

    less firefighting

    More focus on strategic planningand execution

    Increased capital expendituresavings

    232% ROI over 4 years*

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    69/75

    The positive results from using the RAT option for

    the first time enable us to look to the future and the

    What Munich Services GMBH Is Saying

    69

    before

    Facts instead of surprises!

    Christian Duschle, Munich Services GMBH, SAP customer

    Source: Oracle for SAP Technology Update, Vol 19, May 2010

    Backup Slides

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    70/75

    Backup Slides

    70

    Database Replay Terminology

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    71/75

    Database Replay Terminology

    wrc: Workload Replay Client These recreate production workload ontest system. Multiple wrc clients can be used to drive large workloadsmany replay hosts

    Workload Capture Files: Workload that is captured on production andmoved to test system for replay, files in binary format

    Workload Anal zer: Anal zes ca tured workload and rovides

    71

    information on how to improve replay quality

    Workload Capture

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    72/75

    Workload Capture

    BackupClientClient

    Client

    Middle Tier

    OS Directory

    Minimal overhead

    Platform and protocol independent

    Workload filters

    Capture interesting workload

    File 1

    File 2

    Server 1 Server 2 Server N

    Capture Infrastructure

    Server 1 Server 2 Server N

    File N

    Background ProductionSystem

    Workload Capture Overhead

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    73/75

    Workload Capture Overhead

    Performance overhead Workload dependent

    Proportional to the data sent from the client

    TPC-C throughput degradation about 4.5%

    Workload capture sizeTPC-C 20min 100 users 10 warehouses: 1.2G

    Maximum of (a, b)

    a) Extrapolated size from smaller duration capture (15-30min)

    b) 2 * Bytes received via SQL*Net from client AWR statistic

    Enable capture for few minutes to assess size

    LOW Overhead HIGH

    Long Running SQL Short SQL/DML Insert Intensive Large LOBs

    DSS OLTP

    Step 2: Process Workload Files

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    74/75

    Step 2: Process Workload Files

    File 1

    File 2

    Test System Setup test system Application data should be same

    as production system as ofcapture start time

    Use RMAN, Snapshot Standby,

    imp/exp, Data Pump, etc. tocreate test system

    Make change: upgrade db and/orOS, change storage, migrate

    File 1

    File 2

    File n

    MetadataReplay Files

    platforms, etc.

    Processing makes captured datainto replay ready format

    Once processed, workload can bereplayed many times

    For RAC copy all capture files tosingle location for processing

    File n

    Capture Files

    Workload Replay Architecture

  • 7/28/2019 s317299 Deploy Riskfree Db Replay NO GOOD OCM

    75/75

    File 1

    Replay Clients

    Workload Replay Architecture

    Replay captured workload Replayed operations see

    the same data andperform the same work

    Preserve timing andconcurrencycharacteristics

    Timing PreservationTiming Preservation

    Re-mapping

    e

    File N

    Metadata

    Replay Files

    Server ServerServer

    connections Replay Client

    Multithreaded OCI Client

    Drives multiple captured

    processes Scalable Architecture

    Interprets capture intosequence of OCI calls

    Functional replay

    Background

    Commit Order Synchronization

    Sequence Replay