Click here to load reader

Rob Lancaster, Orbitz Worldwide Survival Analysis & TTL Optimization

Embed Size (px)

Citation preview

Slide 1

Rob Lancaster, Orbitz WorldwideSurvival Analysis &TTL OptimizationOutlineThe ProblemSurvival AnalysisIntroKey TermsTechniques & Models:Kaplan-Meier EstimatesParametric ModelsOptimizing Cache TTLMethodsResults

The ProblemThe hotel rate cache and TTL optimization.The Hotel Rate Cache

The Hotel Rate CacheKey/Value StoreKey: Search Criteria

Value: Hotel Rate Information

Benefit = Reduce looks & latencyCost = Increased re-price errors

hotel idcheck-in# peoplehostcheck-out# roomsThe Hotel Rate CacheEach cache entry is given a time-to-live (TTL)TTLs set based on intuition ages ago.Goal: Optimize TTL to decrease looks, control re-price errorsHow? Ideally, find greatest TTL value at which probability of rate change is below an acceptable threshold.

Survival AnalysisA brief? introduction.What is Survival Analysis?Statistical procedures for predicting time until an event occurs.Event: death, relapse, recovery, failure.Examples:Heart transplant patients:Time until death.Leukemia patients in remission:Time until relapse.Prison parolees:Re-arrest.

Key TermsSurvival Time, T vs. tFailureCensoringSurvival Function

CensoringPeriod of no informationLeft-censored.Right-censored.Causes:Individual is lost to follow-upDeath from cause unrelated to event of interestStudy endsModels assume either failure or censoring.Survival FunctionSurvival Function: S(t)Probability of survival greater than t, i.e. that T > tProperties:Non-increasingS(t) = 1, for t=0.S(t) = 0, t=

Kaplan-Meier Estimatestjmjqjnj0001411014211134211160287106910510224tj: observation timemj: number of failuresqj: number of censored observationsnj: number at risk

Kaplan-Meier Estimates

Parametric ModelsAccelerated Failure TimeAssume distributionUse regression to fit parameters. is parameterized in terms of predictor variables and regression parameters.

DistributionS(t)ExponentialWeibullLog-logistic

Optimizing Cache TTLMethods and early results.Data CollectionData is collected from service hosts in our hotel stack.Includes every live rate search (aka burst) performed by our hotel stack.Raw data: ~200 GB, compressed, 108 records.Extraction: