Upload
ca-technologies
View
128
Download
4
Embed Size (px)
Citation preview
1 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
World®’16
ApplyingDataSciencetoYourBusinessProblem
PaulDulany - VPDataScience- CATechnologies
SCX31S
SECURITY
2 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
©2016CA.Allrightsreserved.Alltrademarksreferencedhereinbelongtotheirrespectivecompanies.
Thecontentprovidedinthis CAWorld2016presentationisintendedforinformationalpurposesonlyanddoesnotformanytypeofwarranty. The informationprovidedbyaCApartnerand/orCAcustomerhasnotbeenreviewedforaccuracybyCA.
ForInformationalPurposesOnlyTermsofthisPresentation
3 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
Abstract
Forawhilenow,anumberofindustrieshavebeeninterestedindatascienceandadvancedanalytics.Butitisn’talwaysclearhowbesttousethesewithinthebusinesscontext.Inthissession,we’lldiscusshowtoturnabusinessproblemintoadata-scienceproblem,andthenback.We’llusecard-not-presentpaymentfraudandloginattemptsasexamplesofhowtoidentifytheproblem,determineifdatascienceandadvancedanalyticscanhelp(andifthesituationwarrantsthem),andthenfollowthroughondevelopingasolutiontotheproblem.
PaulDulany,PhDCATechnologiesVPDataScience
4 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
Agenda
WHATISDATASCIENCE?
DETERMININGAPROBLEMOFINTEREST
UNDERSTANDTHEPRODUCTIONENVIRONMENTANDDEMANDS
MODELCREATIONANDEVALUATION
Q&A
1
2
3
4
5
5 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
KeyPointsforApplyingDataScience
§ Identifyahigh-valueBusinessProblemwithHighQualityData
§ Determinetheclassoftheproblemtosolve
§ Utilizebusiness-domainknowledge– Understandthe"ecosystem"– Defineappropriatemetrics– Understandthedatainfull
§ Developfeaturesandmodels/Evaluate/Iterate
§ Alwayskeepthebusinessprobleminmind!
6 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
WhatisDataScience?
§ Theapplicationofanalyticaltechniquestolargeand“big”data– Awidefieldencompassingmanydifferentaspectsofanalytics,
statistics,anddatamining– Fundamentallydatadriven– Baseduponthescientificmethod– Thegoalistousedataandanalyticaltechniquestosolveproblems
§ Requiresknowledgeinmultipledomains– Analytics– Scientificcomputations
– Dataformats– Businessdomain
– Statistics
7 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinApplyingDataScience
§ Identifyahigh-valuebusinessproblem– Thebusinesscaseiscritical
§ IntelligentMainframeOperations– Needearlydetectionofissues
§ Bestistopredictandavoidissues– Currently,falsepositives(falsealarms)aretooprevalent– Expert-maintainedsystemsofthresholdsarehardtomaintain
BusinessProblem
8 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinApplyingDataScience
§ PaymentSecurity:– FraudineCommerceisasignificantproblem
§ 3-DSecurewasdevelopedtocombatthis– Issuersincurthemostpainfromthecurrentstate
§ Fraudlosses§ Lossofincomefrominterestandinterchangefees§ Customerexperienceandannoyance§ Costofinboundcalls
– Merchantsfeelpaintoo…
BusinessProblem
9 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
BusinessImpactforaClient:3YearImpact
Credit DebitFraudsavings(abovecurrentvendor) £13,441,843 £15,174,365
Interchangefees(abovecurrent) £466,462 £3,441,882
Interest income(abovecurrent) £3,674,803 N/A
Operationalsavings(notcalculated) - -
Total £17,583,108 £18,616,247
10 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinApplyingDataScience
§ Identifydatarelatedtothebusinessproblem– Themoredata,thebetter!– Isitcategorical,ordinal,ornumerical?Whatcardinality?– Isthereauniqueadvantageoverthecompetition?
§ PaymentSecurity– 3DSdata:PAReq message,deviceinformation,…– Widemixoftypesofdata– Timeseriesisimportant– SaaSDeploymentallowsqualitydatatobegathered
IdentifyData(ResultsProportionaltoQuality!)
11 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinApplyingDataScience
§ IntelligentMainframeOperations– Multiplepossibledatasources
§ VSAM,DB2,IMSDB,IDMS,DATACOM,SMF,Syslogs,Vtape,CICS,…– UtilizeCASYSVIEW’sexcellence– Embedanalyticstodetectabnormalpatterns
IdentifyData(ResultsProportionaltoQuality!)
12 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinApplyingDataScience
§ Determinethegeneralclassofproblem– Classification,regression,anomalydetection,etc.– “Supervised”(teachingyourchildrentoread,teachingthemmanners)– “Unsupervised”(university)– “Semi-supervised”(schoollunchroom)
13 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinApplyingDataScience
§ PaymentSecurity– Supervisedclassification– Fraudinformationforlossesislikelyingoodshape– Complexitieshappenonceyouhaveasysteminplace
§ Censoredproblem,bothinmarkingandinchangingfraudsterbehavior
§ IntelligentMainframeOperations– Unsupervisedtobegin– Needtodevelopbaselinesofnormalbehavior
§ Butmustprovideresultsfromday0– Possibilityofsemi-supervisedinthefuture
Determinethegeneralclassofproblem
14 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinApplyingDataScience
§ Understandtheecosystem– Whatactionscanbetaken?– Isgettingtheresulttimesensitive?
§ IntelligentMainframeOperations– Predictiveanalyticsneeded– Multiplepossibleactions,keyistoinformtheoperator’sactions– Differenttime-scalesforproblems– Reaction-timeiscritical– real-time
15 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinapplyingDataScience
§ PaymentSecurity– Predictiveandprescriptiveanalyticsneeded– Multiplepossibleactions,atthetransactionandthecardlevel– Timingiscritical– real-time,i.e.,<50msforvastmajority
§ Wemustbeabletotakeactionnow
UnderstandtheEcosystem
16 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinApplyingDataScience
§ Determinetheappropriatemetrics– Howdowemeasuresuccess?
§ Welldefinedmeasuresarecritical
§ IntelligentMainframeOperations– HighAvailability– Problemavoidance– ReducedMTTR– ReduceSMEdependenceforissuedetection
17 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
StepsinApplyingDataScience
§ PaymentSecurity– Anumberofpossibilities,baseduponcustomer’sobjectives– Considerthemall
§ Detectionrates§ “Outsort”rates§ False-positiveratios§ False-positiverates§ Value-based/transactionbased/cardbased
Metrics
TOR 𝑆 = ∑ 𝐹 𝑠* + 𝑁 𝑠*�./0.
∑ 𝐹 𝑠* + 𝑁 𝑠*�122.*
TDR 𝑆 = ∑ 𝐹 𝑠*�./0.
∑ 𝐹 𝑠*�122.*
18 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
What’sNext?
§ Nowwehaveawelldefinedproblem
§ Sowespendalotoftimewiththedata!– “Browse”thedata– Runsomedescriptivestatistics– Seeifyoucansurpriseyourself– Ifsupervised,viewthetaggingdataandthe
productiondataseparately,andthentogether.
19 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
ReviewtheTaggedData
§ Browsethedataagain
§ Runmanystatisticaltest– Bewareof“TargetLeaks”!!– Begingettingafeelforthevariations,correlations,idiosyncrasies
§ Youneverwantperfectlycleandata– Youwantdatathatsimulatesproduction!
§ Bewareofanychangestothedata,especiallynon-causalchanges§ Modeltrainingisanumericalsimulationofproduction
20 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
DataPartitioning:BewareWhatYouPartitionandHow!
§ Partitioning– Oftenneedtousestratified
sampling– Whenusingmultipleentities
fortrackingbehavior,interactionsaretricky!§ Lookforirreducibility§ Gotoout-of-timeifneeded
HistoricalData
Training
Fraud
Non-Fraud
Validate
Fraud
Non-Fraud
Holdout
Fraud
Non-Fraud
21 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
FeatureDevelopment
§ “Features”pullthedistinguishingcharacteristicsfromthedata– Timeseriesanalysistechniques– DigitalSignalProcessingtechniques– Statisticalmeasuresofdifferences– Bayesianapproaches– Lineardiscriminants– Non-lineartransformations– …
22 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
FeatureDevelopment:SimpleExample
§ Determineapeergroupfromthehistoricaldata– Alltransactionswheretherewerefourprevioustransactionsinthelast
week,atleasttwoofwhichwereonthesamedevice,butindifferentcountries
§ Determinethedistributionsofclassesforacontinuousvariable– Let’ssay,theamount– Useadiscriminantcalculationto
determinelikelihoodofbelongingtoeitherclass.
23 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
FeatureDevelopment:MoreComplexity
§ Attheotherendofthescaleareonline-learningmodelstodeterminebehaviors– Autocorrelationmodels,exponentialweighting,KDE,etc.– Manytechniques–
§ butmustkeepinmindtheCPUconstraints,I/Oconstraints,etc.
§ Conversionofhigh-cardinalitycategoricaldataintonumericalinputs
𝑥"(𝑡𝑛 , 𝑡𝑛−1, 𝑡0) = 𝛼(𝑡𝑛 , 𝑡0)𝑥𝑛 + 𝛽(𝑡𝑛 , 𝑡𝑛−1, 𝑡0)𝑥"𝑛−1
𝛼(𝑡𝑛 , 𝑡0) =1 − 𝛾
1 − 𝛾(𝑡𝑛−𝑡0)
𝛽(𝑡𝑛 , 𝑡𝑛−1, 𝑡0) =𝛾(𝑡𝑛−𝑡𝑛−1)11 − 𝛾(𝑡𝑛−1−𝑡0)2
1 − 𝛾(𝑡𝑛−𝑡0)
24 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
Example:UnsupervisedAnomalyDetection
§ UtilizeHistoricaldatatodefinebandsofdifferentprobabilities– Maprealtimemetricstreamsagainstsystemdefinednormal– Multi-pointalertsgeneratedusingindustry-standardWestern-Electric
rules– Makestaticthresholdsoptional!
Unlikely
MostLikely
Metric
Time
LessLikely
25 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
Example:SupervisedModel
§ AlwaysrememberOccam'sRazor!– Amongcompetinghypotheses,theonewiththefewestassumptions
shouldbeselected.– Avoidneedlesscomplexity
§ Startwithsimplemodels,andgrowmorecomplexasneeded– Linearregression– Logisticregression– Decisiontrees– NeuralNetworks– SVM…
27 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
Example:SupervisedModel
§ Therearemanyaspectsoftraininganeuralnetwork– differentactivationanderrorfunctions– differenttrainingalgorithms,– variationsofseeds,learningrate,momentum,– self-regulation,– numberofhiddenlayers,– numberofnodes,– boosting/bagging,– preventingoverfitting,– etc.
TraintheModel(s)
28 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
Review
§ Reviewtheresultsofyourtraining,andstartalloveragain!– Considersegmentation– Tryleavingoutthevariableswiththehighestsensitivity– Subdividethedatatoseeifthereareregionsofinstability– Iterateasneeded
§ Finally,selectyourmodel(s)!
§ Butwe’renotdone…– Nowworryaboutcalibration,upgrade/downgrade,primingtime,
packaging,integration,modelreport,…
30 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
IntelligentMainframeOperations
TypicalVolatility
Anomaly
Tasksreadytobe
disp
atched
31 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
KeyPointsforApplyingDataScience
§ Identifyahigh-valueBusinessProblemwithHighQualityData
§ Determinetheclassoftheproblemtosolve
§ Utilizebusiness-domainknowledge– Understandthe"ecosystem"– Defineappropriatemetrics– Understandthedatainfull
§ Developfeaturesandmodels/Evaluate/Iterate
§ Alwayskeepthebusinessprobleminmind!
32 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
RecommendedSessions
SESSION# TITLE DATE/TIME
SCX50S ConvenienceandSecurityforbankingcustomerswithCAAdvancedAuthentication
11/17/2016at3:00pm
SCX34S SecuringMobilePayments:ApplyingLessonsLearnedintheRealWorld
11/17/2016at3:45pm
SCT05T ThreatAnalyticsforPrivilegedAccessManagement 11/17/2016at4:30pm
33 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
Don’tMissOurINTERACTIVESecurityDemoExperience!
SNEAKPEEK!
33 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
34 ©2016CA.ALLRIGHTSRESERVED.@CAWORLD#CAWORLD
WeWanttoHearFromYou!
§ ITCentralisaleadingtechnologyreviewsite.CAhasthemtohelpgenerateproductreviewsforourSecurityproducts.
§ ITCSstaffmaybeatthissessionnow!(lookfortheirshirts).Ifyouwouldliketoofferaproductreview,pleaseaskthemaftertheclass,orgobytheirbooth.
Note:§ Onlytakes5-7mins§ Youhavetotalcontroloverthereview§ Itcanbeanonymous,ifrequired