Upload
vuongmien
View
222
Download
6
Embed Size (px)
Citation preview
BIGDATAWhatitis,howtodoitright!
JamesLuckPrincipalDataScienAstClockworkSoluAons805LasCimasParkway#100AusAn,TX,78746800-994-1336
AT&TConsul+ng
TODAY’SPRESENTATION¢ I’mnottryingtosellyouanything.¢ Thisisahigh-levelapproachtounderstandingand
implemenAngBigData.¢ Baseduponmyrecentexperiencestalkingwithpeoplejust
likeyou!¢ EveryorganizaAonhasthesameissues&concerns.¢ YOUcanavoidthemistakesothershavemade!
2
AT&TConsul+ng
JAMESLUCKBIO¢ JamesisaPrincipalDataScienAstwith
ClockWorkSoluAons.
¢ Hehas25+yearsofexperienceindataanalyAcs,inaddiAontoextensivetelecommunicaAonsandmanagedservicesdevelopment.HeholdsadvanceddegreesinbothAerospaceandElectricalEngineering,andanMBA.
¢ Previously,JameswasaSeniorConsultantforAT&TConsulAng,providingclientswithassistanceisdesigningandmapping-outtheirownBigDataprograms.
¢ PriortoAT&T,hewasscienAstandPhDcandidateatSandiaLabsandAirForceResearchLab.HisexperiencethereincludesavarietyofprojectsusingcomplexdataanalyAcsinorbitalsystemsandSyntheAcApertureRadar.
3
AT&TConsul+ng
AGENDA
¢ WhatisBigData?¢ Terminology¢ FromBusinessAnalyAcstoBigData¢ ImplemenAngBigData–Infrastructure¢ ImplemenAngBigData–DataAnalyAcs
4
AT&TConsul+ng
FOCUSONASUCCESSFULPROJECT!
¢ AllgoodBigDataprojectssucceedinthesameway.¢ AllfailedBigDataprojectsfailintheirownuniqueway!
AvoidlistsofDON’T’S
FocusontakingacAonsthatwillmakeyousuccessful
AT&TConsul+ng
WHATISBIGDATA?¢ Bigdataishigh-volume,high-velocityandhigh-varietyinformaAonassetsthat
demandcost-effecAve,innovaAveformsofinformaAonprocessingforenhancedinsightanddecisionmaking.(Gartner,ITGlossary)
¢ A“termofart”usedtodescribelargedataprojects.
¢ AnyprojectwherecollecAng,storing,retrieving,orprocessingthedatabecomesasignificantpartoftheproblem.
AlmostallorganizaAonshaveaVarietyproblem,notreallyaVolumeproblem.
7
AT&TConsul+ng
WEKNOWWHATITIS…..SOWHAT’STHEPROBLEM?
¢ Overly-broaddefiniAon
¢ Nocommonindustryunderstanding
¢ EveryoneorganizaAonhasaslightlydifferentdefiniAon
¢ VendorscanlabelawidevarietyofhardwareandsonwareproductsandservicesasBigData
ToomuchfocusonhardwareandsonwareproductsYoucan’t“buy”aBigData
8
AT&TConsul+ng
THEVALUEOFBIGDATA¢ You’resiongonaGoldMineandyoudon’tevenknowit!
¢ YourdatacontainsawealthofinsightsandinformaAonunavailablefromanyothersource.
¢ UsethedatayoualreadyowntorunyourorganizaAon–it’sFREE!
¢ Thereisnooutsidedatayoucanpurchasethatwilltellyoumoreaboutyourbusinessthanyourowndata.
9
AT&TConsul+ng
THEPROMISEOFBIGDATAWhattheytellyou……
“Giveusallyourdata,everything,sales,markeAng,customerssurveys,manufacturing,accounAng,structured,unstructured,text,logical,numericdata.We’llcrunchittogetherandproduceinsightsandacAonableinformaAonthatwillenableyoutorunyourbusinessbeqer.”
Whattheydon’ttellyou……
It’samazinglyexpensiveandAme-consumingtodoBigDatathatway.Thinkmillionsofdollarsandseveralyears.
Thegoodnews……
Itdoesn’thavetobeall-or-nothing.OrganizaAonsgetexcellentresultswithafocused,programmaAcapproach
10
AT&TConsul+ng
KEEPINMIND…..
¢ BigDatacanINFORMyourbusinesspracAces¢ Helpyoutomakeinformeddecisions
¢ BigDataCANNOTtellyouhowtorunyourbusiness!
BigDatacannotcreateyourbusinessgoalsoryourmission,visionorvalueforyou.
11
AT&TConsul+ng
Sowhydo
Amazon,Google,Yahoo,Microsondothissowell?
It’sTheirCoreBusiness!TheyAREBigData
ThevastmajorityoforganizaAonshaveothercore
missionsthattheyaugmentwithBigData
AT&TConsul+ng
BIGDATA&FRIENDS¢ BusinessIntelligence
• Theaggrega+onandprocessingofbusinessdatatoprovidea360-degreeviewofthebusiness.Focusaroundaggrega+ngandvisualizingandrepor+ngontheoverallbusiness
¢ DataAnalysis/AnalyAcs• Theoverallprocessofanalyzingdata,fromcollec+ngdatathoughtanalysisthrough
visualiza+on
¢ DataScience• Theoverarchingtermfortoolsandtechniquestoextractinforma+onfromdata
¢ DataMining• Toolsandtechniquesfordiscoveringpa^ernsindatasets
¢ PredicAveAnalyAcs• Toolsandtechniquesthatanalyzetrendsandhistoricaldatatomakepredic+ons• NOTE:Youcanpredicttrends,youCANNOTpredictthefuture!
¢ TextAnalyAcs• Dataanaly+csfortext
¢ BusinessAnalyAcs• Generalnamefordataanaly+csperformedonbusinessdata 14
AT&TConsul+ng
CURRENTBUSINESSANALYTICSPARADIGM¢ Focusondataproductsformanagingthebusiness
¢ TypicalquesAons:• Howmanycallsdidwetakeyesterday?• Howmuchdidwesellyesterday?• Howmuchinventorydowehave?• FocusonKPI’s,metrics
¢ Lookingforchangesfromthenorm
¢ UsingdescripAvestaAsAcs• Summarizedata• Mean,variance,trends
¢ ReporAng• Chart,graphs,trendplots
¢ Allabout“monitoringthemachine”• Focusonanarrowsetofdata
16
AT&TConsul+ng
BIGDATAANALYTICSVS“TRADITIONAL”BUSINESSANALYTICS
¢ BusinessAnalyAcsPLUSawholelotmore¢ Usemuchlargersetofdata¢ Manydifferentdatatypes&combinaAons
• Structured,unstructured,logical,text
¢ Typicallycan’tprocesswithtradiAonalsystems• Newalgorithmsandapproaches
¢ PredicAveAnalyAcs• Whoismostlikelytobuythiswidget?• Ifadevicefails,howlikelyisittofailagainin30days?
¢ DataMining• Whatmakescustomersunhappy?
¢ TextAnalyAcs• Sen+mentanalysis• TopicModeling
¢ VisualizaAon• Heatmapsofcustomersa+sfac+onbycounty
¢ FindingrelaAonships• Whatfactorsmostaffectemployeereten+on? 17
AT&TConsul+ng
SOMEPITFALLS……¢ Literally,thousandsoftechniques
• Whichone(s)shouldyouuse?
¢ Thesetechniquesrequirealotofskilltouseproperly• Datacleanlinessrequirements,robustness• Havetoknowhowtointerpretresults
¢ Generallynotpossibletoverifyresults• Howdoyoucheckthat100,000trouble+cketswereproperlycategorizedbyanalgorithm?• Canverifyasmallfew,can’tcheckthemall
¢ Relyupon“goodness-of-fit”tocheckquality¢ Algorithmsdon’tlendthemselvestoauto-runtools¢ IdenAfiedrelaAonshipsmaynotactuallyexist.
• Ar+factofapar+culardataset
Youneedexperiencedalgorithmspeople(datascienAsts)topickalgorithms,buildmodelsandinterpretresults
properly.18
AT&TConsul+ng
BUILDINGANINFRASTRUCTURE
IDBusinessGoalsGatherStakeholdersCreateUseCasesfor
thebusiness
CreateaStrategy&Roadmapthatmeets
theUseCaserequirements
Implementinfrastructure
21
AT&TConsul+ng
BIGDATAINFRASTRUCTUREFAILS¢ Buyingfromvendorsbeforeyouhaveaplan
¢ BuildinganinfrastructureBEFOREyoudefineusecases
¢ NeglecAngtoengagestakeholders
¢ Nothavingawell-definedS&Rplan
¢ NeglecAngtouseexisAngsystems
¢ UnderesAmaAngstorage&processingrequirements
¢ BigData≠Hadoop¢ Youdon’tneedHadooptoimplementBigData
22
AT&TConsul+ng
FIRSTTHINGSFIRST¢ NoonereallyknowswheretheirdataisorwhattheyhaveØ Performadatasurveybeforeyoustart!
¢ Youwillspend90%ofyour+medoingdataclean-upØ Acceptthisasafact.Don’texpectresultsforthefirstfewmonths.
DecideiftheselimitaAonsareworkableforyou!24
AT&TConsul+ng
BIGDATAISATEAMSPORT¢ BusinessAnalyst
• Gatherrequirements,createusecases
¢ DataEngineer• Design,build,maintainBigDatainfrastructure
¢ DataScienAsts• Selectalgorithms,build,verifymodels
¢ DataCurators• Acquireandpreservedatasets• Handledatagovernanceandqualityissues
¢ DataVisualizers• Createdataproductsfromtheinforma+ongleanedfromthedata
25
AT&TConsul+ng
BIGDATAISAPROGRAMMATICAPPROACHIden+fy
BusinessGoals
CreateUseCases
BigDataAnaly+cs
InsightsDataProducts
Implementintobusinessprocesses
MeasureandEvaluate
26
WhyaProgrammaAcApproach?
Noteveryusecasewillproducedesiredresults
RepeatunAlresults
achieved
AT&TConsul+ng
BIGDATAISAPROGRAMMATICAPPROACHIden+fy
BusinessGoals
CreateUseCases
DataAnaly+cs
InsightsDataProducts
Implementintobusinessprocesses
MeasureandEvaluate
27
Mostbreakdownsoccur
AT&TConsul+ng
DATAANALYTICSFAILS¢ Failingtoassembleateam
¢ CreaAngrandomdataproductsandtryingtofeedthosebackintothebusiness
¢ PoorUseCases&businessgoals(yes,again!)
¢ FailingtoimplementrecommendaAons
¢ Failingtointegratedataproductsintobusinessprocesses• Whataretheysupposedtodowiththesethings?
¢ Failingtomeasuretheimpactonthebusiness• Can’tjus+fywhyyou’redoingthis
¢ FailingtoconAnuallyimplement/improveunAlresultsareachieved
28