29
BIG DATA What it is, how to do it right! James Luck Principal Data ScienAst Clockwork SoluAons 805 Las Cimas Parkway #100 AusAn, TX, 78746 800 - 994 - 1336

Big data - What it is v2 - Information Assurance | · PDF fileBIG DATA What it is, how to do it right! ... • Chart, graphs, trend plots ... Big data - What it is v2.pptx Author:

Embed Size (px)

Citation preview

BIGDATAWhatitis,howtodoitright!

JamesLuckPrincipalDataScienAstClockworkSoluAons805LasCimasParkway#100AusAn,TX,78746800-994-1336

AT&TConsul+ng

TODAY’SPRESENTATION¢  I’mnottryingtosellyouanything.¢  Thisisahigh-levelapproachtounderstandingand

implemenAngBigData.¢  Baseduponmyrecentexperiencestalkingwithpeoplejust

likeyou!¢  EveryorganizaAonhasthesameissues&concerns.¢  YOUcanavoidthemistakesothershavemade!

2

AT&TConsul+ng

JAMESLUCKBIO¢  JamesisaPrincipalDataScienAstwith

ClockWorkSoluAons.

¢  Hehas25+yearsofexperienceindataanalyAcs,inaddiAontoextensivetelecommunicaAonsandmanagedservicesdevelopment.HeholdsadvanceddegreesinbothAerospaceandElectricalEngineering,andanMBA.

¢  Previously,JameswasaSeniorConsultantforAT&TConsulAng,providingclientswithassistanceisdesigningandmapping-outtheirownBigDataprograms.

¢  PriortoAT&T,hewasscienAstandPhDcandidateatSandiaLabsandAirForceResearchLab.HisexperiencethereincludesavarietyofprojectsusingcomplexdataanalyAcsinorbitalsystemsandSyntheAcApertureRadar.

3

AT&TConsul+ng

AGENDA

¢ WhatisBigData?¢ Terminology¢ FromBusinessAnalyAcstoBigData¢  ImplemenAngBigData–Infrastructure¢  ImplemenAngBigData–DataAnalyAcs

4

AT&TConsul+ng

FOCUSONASUCCESSFULPROJECT!

¢ AllgoodBigDataprojectssucceedinthesameway.¢ AllfailedBigDataprojectsfailintheirownuniqueway!

AvoidlistsofDON’T’S

FocusontakingacAonsthatwillmakeyousuccessful

AT&TConsul+ng

6

BIGDATAWhatisit?

AT&TConsul+ng

WHATISBIGDATA?¢  Bigdataishigh-volume,high-velocityandhigh-varietyinformaAonassetsthat

demandcost-effecAve,innovaAveformsofinformaAonprocessingforenhancedinsightanddecisionmaking.(Gartner,ITGlossary)

¢  A“termofart”usedtodescribelargedataprojects.

¢  AnyprojectwherecollecAng,storing,retrieving,orprocessingthedatabecomesasignificantpartoftheproblem.

AlmostallorganizaAonshaveaVarietyproblem,notreallyaVolumeproblem.

7

AT&TConsul+ng

WEKNOWWHATITIS…..SOWHAT’STHEPROBLEM?

¢  Overly-broaddefiniAon

¢  Nocommonindustryunderstanding

¢  EveryoneorganizaAonhasaslightlydifferentdefiniAon

¢  VendorscanlabelawidevarietyofhardwareandsonwareproductsandservicesasBigData

ToomuchfocusonhardwareandsonwareproductsYoucan’t“buy”aBigData

8

AT&TConsul+ng

THEVALUEOFBIGDATA¢  You’resiongonaGoldMineandyoudon’tevenknowit!

¢  YourdatacontainsawealthofinsightsandinformaAonunavailablefromanyothersource.

¢ UsethedatayoualreadyowntorunyourorganizaAon–it’sFREE!

¢  Thereisnooutsidedatayoucanpurchasethatwilltellyoumoreaboutyourbusinessthanyourowndata.

9

AT&TConsul+ng

THEPROMISEOFBIGDATAWhattheytellyou……

“Giveusallyourdata,everything,sales,markeAng,customerssurveys,manufacturing,accounAng,structured,unstructured,text,logical,numericdata.We’llcrunchittogetherandproduceinsightsandacAonableinformaAonthatwillenableyoutorunyourbusinessbeqer.”

Whattheydon’ttellyou……

It’samazinglyexpensiveandAme-consumingtodoBigDatathatway.Thinkmillionsofdollarsandseveralyears.

Thegoodnews……

Itdoesn’thavetobeall-or-nothing.OrganizaAonsgetexcellentresultswithafocused,programmaAcapproach

10

AT&TConsul+ng

KEEPINMIND…..

¢  BigDatacanINFORMyourbusinesspracAces¢ Helpyoutomakeinformeddecisions

¢  BigDataCANNOTtellyouhowtorunyourbusiness!

BigDatacannotcreateyourbusinessgoalsoryourmission,visionorvalueforyou.

11

AT&TConsul+ng

Sowhydo

Amazon,Google,Yahoo,Microsondothissowell?

It’sTheirCoreBusiness!TheyAREBigData

ThevastmajorityoforganizaAonshaveothercore

missionsthattheyaugmentwithBigData

AT&TConsul+ng

13

SOMETERMINOLOGY….

AT&TConsul+ng

BIGDATA&FRIENDS¢  BusinessIntelligence

•  Theaggrega+onandprocessingofbusinessdatatoprovidea360-degreeviewofthebusiness.Focusaroundaggrega+ngandvisualizingandrepor+ngontheoverallbusiness

¢  DataAnalysis/AnalyAcs•  Theoverallprocessofanalyzingdata,fromcollec+ngdatathoughtanalysisthrough

visualiza+on

¢  DataScience•  Theoverarchingtermfortoolsandtechniquestoextractinforma+onfromdata

¢  DataMining•  Toolsandtechniquesfordiscoveringpa^ernsindatasets

¢  PredicAveAnalyAcs•  Toolsandtechniquesthatanalyzetrendsandhistoricaldatatomakepredic+ons•  NOTE:Youcanpredicttrends,youCANNOTpredictthefuture!

¢  TextAnalyAcs•  Dataanaly+csfortext

¢  BusinessAnalyAcs•  Generalnamefordataanaly+csperformedonbusinessdata 14

AT&TConsul+ng

15

FROMBUSINESSANALYTICSTOBIGDATAWhereAreWeToday?

AT&TConsul+ng

CURRENTBUSINESSANALYTICSPARADIGM¢  Focusondataproductsformanagingthebusiness

¢  TypicalquesAons:•  Howmanycallsdidwetakeyesterday?•  Howmuchdidwesellyesterday?•  Howmuchinventorydowehave?•  FocusonKPI’s,metrics

¢  Lookingforchangesfromthenorm

¢  UsingdescripAvestaAsAcs•  Summarizedata•  Mean,variance,trends

¢  ReporAng•  Chart,graphs,trendplots

¢  Allabout“monitoringthemachine”•  Focusonanarrowsetofdata

16

AT&TConsul+ng

BIGDATAANALYTICSVS“TRADITIONAL”BUSINESSANALYTICS

¢  BusinessAnalyAcsPLUSawholelotmore¢  Usemuchlargersetofdata¢  Manydifferentdatatypes&combinaAons

•  Structured,unstructured,logical,text

¢  Typicallycan’tprocesswithtradiAonalsystems•  Newalgorithmsandapproaches

¢  PredicAveAnalyAcs•  Whoismostlikelytobuythiswidget?•  Ifadevicefails,howlikelyisittofailagainin30days?

¢  DataMining•  Whatmakescustomersunhappy?

¢  TextAnalyAcs•  Sen+mentanalysis•  TopicModeling

¢  VisualizaAon•  Heatmapsofcustomersa+sfac+onbycounty

¢  FindingrelaAonships•  Whatfactorsmostaffectemployeereten+on? 17

AT&TConsul+ng

SOMEPITFALLS……¢  Literally,thousandsoftechniques

•  Whichone(s)shouldyouuse?

¢  Thesetechniquesrequirealotofskilltouseproperly•  Datacleanlinessrequirements,robustness•  Havetoknowhowtointerpretresults

¢  Generallynotpossibletoverifyresults•  Howdoyoucheckthat100,000trouble+cketswereproperlycategorizedbyanalgorithm?•  Canverifyasmallfew,can’tcheckthemall

¢  Relyupon“goodness-of-fit”tocheckquality¢  Algorithmsdon’tlendthemselvestoauto-runtools¢  IdenAfiedrelaAonshipsmaynotactuallyexist.

•  Ar+factofapar+culardataset

Youneedexperiencedalgorithmspeople(datascienAsts)topickalgorithms,buildmodelsandinterpretresults

properly.18

AT&TConsul+ng

19

IMPLEMENTINGBIGDATAIt’snotabouttheinfrastructure!

AT&TConsul+ng

THEMOSTCOMMONBIGDATAFAIL

Failureto(1)createUseCasesthatare(2)AedtoBusinessGoals

20

AT&TConsul+ng

BUILDINGANINFRASTRUCTURE

IDBusinessGoalsGatherStakeholdersCreateUseCasesfor

thebusiness

CreateaStrategy&Roadmapthatmeets

theUseCaserequirements

Implementinfrastructure

21

AT&TConsul+ng

BIGDATAINFRASTRUCTUREFAILS¢  Buyingfromvendorsbeforeyouhaveaplan

¢  BuildinganinfrastructureBEFOREyoudefineusecases

¢  NeglecAngtoengagestakeholders

¢  Nothavingawell-definedS&Rplan

¢  NeglecAngtouseexisAngsystems

¢  UnderesAmaAngstorage&processingrequirements

¢ BigData≠Hadoop¢ Youdon’tneedHadooptoimplementBigData

22

AT&TConsul+ng

23

IMPLEMENTINGBIGDATADataAnaly+cs

AT&TConsul+ng

FIRSTTHINGSFIRST¢ NoonereallyknowswheretheirdataisorwhattheyhaveØ Performadatasurveybeforeyoustart!

¢ Youwillspend90%ofyour+medoingdataclean-upØ Acceptthisasafact.Don’texpectresultsforthefirstfewmonths.

DecideiftheselimitaAonsareworkableforyou!24

AT&TConsul+ng

BIGDATAISATEAMSPORT¢  BusinessAnalyst

•  Gatherrequirements,createusecases

¢  DataEngineer•  Design,build,maintainBigDatainfrastructure

¢  DataScienAsts•  Selectalgorithms,build,verifymodels

¢  DataCurators•  Acquireandpreservedatasets•  Handledatagovernanceandqualityissues

¢  DataVisualizers•  Createdataproductsfromtheinforma+ongleanedfromthedata

25

AT&TConsul+ng

BIGDATAISAPROGRAMMATICAPPROACHIden+fy

BusinessGoals

CreateUseCases

BigDataAnaly+cs

InsightsDataProducts

Implementintobusinessprocesses

MeasureandEvaluate

26

WhyaProgrammaAcApproach?

Noteveryusecasewillproducedesiredresults

RepeatunAlresults

achieved

AT&TConsul+ng

BIGDATAISAPROGRAMMATICAPPROACHIden+fy

BusinessGoals

CreateUseCases

DataAnaly+cs

InsightsDataProducts

Implementintobusinessprocesses

MeasureandEvaluate

27

Mostbreakdownsoccur

AT&TConsul+ng

DATAANALYTICSFAILS¢  Failingtoassembleateam

¢  CreaAngrandomdataproductsandtryingtofeedthosebackintothebusiness

¢  PoorUseCases&businessgoals(yes,again!)

¢  FailingtoimplementrecommendaAons

¢  Failingtointegratedataproductsintobusinessprocesses•  Whataretheysupposedtodowiththesethings?

¢  Failingtomeasuretheimpactonthebusiness•  Can’tjus+fywhyyou’redoingthis

¢  FailingtoconAnuallyimplement/improveunAlresultsareachieved

28

AT&TConsul+ng

QUESTIONS?