Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
EssentialSkillsforDataAnalyticsinHealthcare
SusanFenton,RHIA,PhDUTHealthSchoolofBiomedicalInformatics
Goals/ObjectivesorAgenda
• Determinetheessentialskillsforeffectivehealthcaredataanalysis
• Articulatedifferentdatatypesandappropriateusesforeach
• Compareandcontrastdataanalytictypesandtools
• Practicedataanalyticsskills
SkillsNeeded
• SoftSkills• Curiosity• CriticalThinking• Listening
• TechnicalSkills• UnderstandData• Basic Stats• Communication
IntroductiontoAnalytics• Definition• Typesofanalytics
–Descriptive–Diagnostic–Predictive–Prescriptive
WhatisAnalytics?“Thediscovery ofmeaningfulpatternsindata,andisoneofthestepsinthedatalifecycleofcollectionofrawdata,preparationofinformation,analysisofpatternstosynthesizeknowledge,andactiontoproducevalue.”
NISTBigData.(2015)PublicWorkingGroupDefinitionsandTaxonomiesSubgroup.Retrievedfromhttp://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf.http://dx.doi.org/10.6028/NIST.SP.1500-1
WhatisAnalytics?
• Entireprocessofdatacollection,extraction,transformation,analysis,interpretation,andreporting
"Data visualization process v1" by Farcaster at English Wikipedia. Licensed under CC BY-SA 3.0 via Commons https://commons.wikimedia.org/wiki/File:Data_visualization_process_v1.png#/media/File:Data_visualization_process_v1.png
WhatisAnalytics?
“Analyticsisusedtorefertothemethods,theirimplementationsintools,andtheresultsoftheuseofthetoolsasinterpretedbythepractitioner.”
NISTBigData.(2015)PublicWorkingGroupDefinitionsandTaxonomiesSubgroup.Retrievedfromhttp://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf.http://dx.doi.org/10.6028/NIST.SP.1500-1
Theanalyticsprocessisthesynthesisofknowledgefrominformation.
TypesofAnalytics:Overview• Descriptive:usesbusinessintelligenceanddataminingtoask:“Whathashappened?”
• Diagnostic:examinesdatatoanswer“Whydidithappen?”Gartner.(n.d.)GartnerITGlossary:DiagnosticAnalytics.Retrieved2/21/2016fromhttp://www.gartner.com/it-glossary/diagnostic-analytics.
• Predictive:usesstatisticalmodelsandforecaststoask:“Whatcouldhappen?”
• Prescriptive:usesoptimizationandsimulationtoask:“Whatshouldwedo?”IBMSoftware.(2013).Descriptive,predictive,prescriptive:Transformingassetandfacilitiesmanagementwithanalytics.Retrieved2/21/2016fromhttp://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=TIW14162USEN.
TypesofAnalytics:Overview
http://www.gartner.com/it-glossary/predictive-analytics
DescriptiveAnalytics• Describethedata• Commonstatistics:– counts– averages
• Typicalreportingmethods:– Tables– Piecharts– Column/barcharts– Writtennarratives http://www.gartner.com/it-glossary/predictive-analytics
DiagnosticAnalytics
• Attemptstoanswer“whydidithappen?”
• Drill-downtechniques
• Datadiscovery• Correlations
http://www.gartner.com/it-glossary/predictive-analytics
PredictiveAnalytics
• Predictsinsteadofdescribingorclassifying
• Rapidanalysis• Relevantinsights
• Easeofuse
http://www.gartner.com/it-glossary/predictive-analytics
WhatPredictiveAnalyticsCannotDo
• “ThepurposeofpredictiveanalyticsisNOTtotellyouwhat will happeninthefuture.Itcannotdothat.Infact,noanalyticscandothat.Predictiveanalyticscanonlyforecastwhat might happeninthefuture,becauseallpredictiveanalyticsareprobabilisticinnature.”
– MichaelWuasquotedbyJeffBertolucciinBigDataAnalytics:Descriptivevs.Predictivevs.Prescriptive.InformationWeek.December31,2014,para13.Retrievedfromhttp://www.informationweek.com/big-data/big-data-analytics/big-data-analytics-descriptive-vs-predictive-vs-prescriptive/d/d-id/1113279.
PrescriptiveAnalytics
• Examinesdataorcontenttoanswerthequestion“Whatshouldbedone?”or“Whatcanwedotomake_______happen?
• Ischaracterizedbytechniquessuchas– graphanalysis– simulation– complexeventprocessing– neuralnetworks– recommendationengines– heuristics– machinelearning
http://www.gartner.com/it-glossary/prescriptive-analytics
StepsinDataAnalytics1. Identifytheproblemandthestakeholders2. Identifywhatdataareneededandwherethose
dataarelocated3. Developaplanforanalysisandaplanfor
retrieval4. Extract/transform/loadthedata5. Check,clean,andpreparethedataforanalysis6. Analyzeandinterpretthedata7. Visualizethedata8. Disseminatethenewknowledge9. Implementtheknowledgeintotheorganization
1.IdentifytheProblemorQuestionandtheStakeholders
• Whyisthisanimportantproblem?• Howwilltheresultsimpactpatientcareortheinstitution?
• Whatisthebusinesscase?• Whoarethestakeholders?
2.Identifywhatdataareneeded• Whatdataelements,suchasdateofbirth,gender,medications,laboratoryresults,andsoonareneeded?
• Wherearethesedataelementslocated– inwhatsystemorsystemsandwhatdatabasetables?
• Isthereaclinicaldatawarehouse?• Whoisthecontactpersonforeachsystemwhowillberesponsibleforretrievingthedata?
3.Developplansforretrievalandanalysis
Retrieval• Enlistdatabaseadministratorforeachsystem• Developspecificplanforretrievingtherequireddata
elements• Methodforcross-checkingnumberofrecordsaswellas
completeness– howmanyshouldyouexpectanddidyougeteverything?
Analysis• Enliststatistician• Identifypopulation,samplesize,statisticalteststobe
performed
4.Extract/Transform/Load(ETL)Process
Extraction• Maybeaniterativeprocess• Thedataareretrieved• Checkedforcompleteness• Descriptivestatistics• Errorscorrected,emptyfieldsaddressedTransformation• Datasynchronized(“transformed”)– e.g.M,F,Uvs1,2,9
Loading• Datathenimportedintodestinationsystem
5.Check,clean,andpreparethedata
• Dataarenowinthesystemwhereanalysiswillberun
• Shouldbeacompletesetofdata• Needtocheckthateverythingisreadyforanalysis
• Descriptivestatistics• Double-checkproblemorquestionbeinginvestigated
• Double-checkagainstanalysisplan
6.Analyzeandinterpretthedata
• Usethedataanalysisplan• Performtheactualstatisticalanalysesasdescribedintheplan
• Consultwithstatisticiantoconfirminterpretationsandconclusions
7.VisualizetheData• Nominal (categorical)data:columnorbarcharts,tables,
piecharts,pivottables• Quantitative data:histograms,scatterplots,starplots• Examplesoftools
– Microsoft®ExcelChartfunction– Tableau®
Piecharthttps://commons.wikimedia.org/wiki/File:Charts_SVG_Example_5_-_Simple_Pie_Chart.svg
Histogramhttps://commons.wikimedia.org/wiki/Histogram#/media/File:Histogram_example.svg
8&9:DisseminatingandImplementing
Disseminatingthenewknowledge• Writeupthefindings• DisseminatetothestakeholdersImplementingthenewknowledge• Requiresparticipationofstakeholders
Data,Information,Knowledge,WisdomHierarchy
Data:symbols,facts,andmeasurements
Information:dataprocessedtobeuseful;providesthe“who,what,when,where”
Knowledge:applicationofdataandinformation;providesthe“how”
Wisdom:evaluatedunderstanding;providesthe“why”
Wisdom
Knowledge
Information
Data
TypesofDatainanEHR• Quantitativedata(eg,laboratoryvalues)• Qualitativedata(eg,text-baseddocumentsanddemographics)
• Transactionaldata(eg,arecordofmedicationdelivery).
Murdoch,T.B.,&Detsky,A.S.(2013).Theinevitableapplicationofbigdatatohealthcare. Jama, 309(13),1351-1352.
UnderstandingtheData:ScalesofMeasure
• Datacomeinmanyforms,andthoseformsdeterminewhatcan orcannot bedonewiththedata.
• Forexample,twopatientnamescannotbeaddedtogether.
• Likewise,interpretingtherelativedistancebetweentwomeasurementscanonlybedonewithcertainkindsofdataandnotothers.
• Therearefourscales:Nominal,ordinal,interval,andratio.
ScalesofMeasure:Nominal• FromLatin• Names,labels,categories• Examples:
– Patientnames(JohnDoe,MariaGarcia)– Drugnames(Ampicillin,Valium)– Eyecolor(blue,brown,green,gray)– Gender:male,female,unknown– Religiouspreference(Catholic,Jewish,none)
• Maybemappedtoanumberinadatabase– Example:browneyes=1,blueeyes=2
Eyeimagefromhttps://commons.wikimedia.org/wiki/File:Deep_Blue_eye.jpg
ScalesofMeasure:Ordinal• IncludesallpropertiesofNominal(soOrdinaldataallhave
anameofsomesort)• Example:first,second,third,i.e.,arankingororder• Butintervalsarenotnecessarilyequal
http://www.cdc.gov/growthcharts/
Photo by Paul Kehrer. https://www.flickr.com/photos/paulkehrer/3659279740Creative Commons Attribution 2.0 Generic (CC BY 2.0) license.
ScalesofMeasure:IntervalandRatio
• Continuous• Hasequalintervals;Ratioalsohasabsolutezero.• Examples:distance,length,temperature,weight• IncludespropertiesofNominalandOrdinal• Maybegroupedtogetherinonecategorycalled“scale”
https://commons.wikimedia.org/wiki/File:Soft_ruler.jpg
"Clinical thermometer 38.7" by Menchi - Own work. Licensed under CC BY-SA 3.0 via Commons -https://commons.wikimedia.org/wiki/File:Clinical_thermometer_38.7.JPG#/media/File:Clinical_thermometer_38.7.JPG
CheckUnderstanding
• ZipCode• BloodPressure• HeartFailureClassificationI,II,III,IV
• Age• Ethnicity
• MaritalStatus• LengthofStay• DischargeDisposition(home,SNF,andsoon)
• Weight• LevelofEducation
DataInconsistencies• Inconsistentnamingconventions, suchas“systolic
bloodpressure”versus“bloodpressure,systolic”• Inconsistentdefinitions,suchashowthedateof
admissionisdefinedacrossdepartments;• Varyingfieldlengths forthesamedataelement,such
asonesystemallowingapatient’slastnametobeupto50characterswhileanothersystemallows25characters
• Varieddataelements, suchasM,F,orUforpatientgenderinonesystemwhileanothersystemuses1,2,or9orMale,Female,orUnknown.
[AHIMA."ManagingaDataDictionary." JournalofAHIMA 83,no.1(January2012):48-52.Retrievedfromhttp://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_049331.hcsp?dDocName=bok1_049331]
DataDictionaries• Thefirststeptounderstandingthedatayouareworkingwith
Syntheticdataset
DataDictionaries
• Thefirststeptounderstandingthedatayouareworkingwith
• “astandarddefinitionofdataelements”.
HealthInformationManagementSystemsSociety(HIMSS).(November,2014).Clinical&BusinessIntelligence:AnAnalyticsExecutiveReviewNeedsAssessment.Retrieved2/21/16fromhttp://www.himss.org/ResourceLibrary/genResourceDetailPDF.aspx?ItemNumber=34692
DataDictionaries
CommonTermsUsedinStatisticalAnalysis
• Population• Sample• Pairedsamples• Dataset• Descriptivestatistics• Frequencytable• Histogram• Chisquare• T-Test• Correlationvs.causation
Term:Population• Agroupofthingsthathavesomethingincommon• Examples:
– Patientsinaparticularhospital– Patientswithacertaindiagnosis– Patientswithaparticularattribute(gender,smokingstatus,agegroup)
– Patientswhohadacertainsurgicalprocedureinagivenyearbyaspecificsurgeon
Term:Sample
Arepresentativeportionorsubset ofagroupofthings–partofapopulation
• Examplepopulation:babiesbornintheUnitedStatesin2015
• Examplesample:aselectionofthosebabies
• Paired samples:before-and-afterstudies,ormatchedononeormorecharacteristics
Imagecredit:Kernler,D.SimpleRandomSampling.Retrievedfromhttps://commons.wikimedia.org/wiki/File:Simple_random_sampling.PNG.Licensedunderthe CreativeCommons Attribution-ShareAlike4.0International license.
ConfidenceIntervals• Howwelldoesasampleapproximatetheentirepopulation?
• Oftensetat95%• Theresultingintervalswouldbracketthetruepopulationparameterinapproximately95%ofthecases
NIST/SEMATECHe-HandbookofStatisticalMethods,http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm
DataSet
Adatasetisacollectionofdataforaspecificpurpose.Forthispresentation,forexample,thedatasetisacollectionof500recordsthatconsistsofage,gender,stateofresidence,maritalstatus,bloodtype,weight,eyecolor,andsmokingstatus.
DescriptiveStatistics
• Basicoverviewofthedata• Excel:Dataà DataAnalysis
à DescriptiveStatistics• Shouldbeamongthefirst
analysesdoneonasetofdata• Canidentifysomeerrors• Mean(average),numberof
records(count),rangeofvalues,maximumandminimumvalues
PatientWeights
Mean 189.1554StandardError 2.916985099Median 180.6Mode 192.3StandardDeviation 65.2257697SampleVariance 4254.401033Kurtosis 8.86101958Skewness 2.554839369Range 475.6Minimum 89.4Maximum 565Sum 94577.7Count 500ConfidenceLevel(95.0%) 5.731086356
MeasuresofCentralTendency
� Mean– arithmeticaverageofanintervalorratio;verysensitivetooutliers
� Median– midpointofafrequency,with50%oftheobservationsaboveand50%oftheobservationsbelow
� Mode– themostfrequentobservation(s)inafrequency;maynotbeunique;canbeusedwithnominaldata
� 1,1,2,3,3,3,3,3,3,4,5,5,5,6,6,7,8,9,10
MeasuresofVariation� Range– differencebetweenthesmallestandlargestvaluesinafrequency;simplemeasureofspread;canalsobeaffectedbyextremevaluesoroutliers
� Variance– amountofvariationofallvaluesorscoresforavariable;averageofthesquareddeviationsfromthemean(variancenotmeaningfulatdescriptivelevel)
� Standarddeviation– amountofdispersionaroundthemean;squarerootofthevariance;mostwidelyusedmeasureofvariabilityindescriptivestatistics
InferentialStatistics
• Inferringfromsampletopopulationanddrawconclusions
• Dependupon– Datatype– Parametricvs.nonparametricdata
• Chi-square– categoricaldata• Correlation– continuousdata– relationships• ANOVA– continuousdata– differenceinmean
CorrelationandCausation• Correlation:relationshipbetweentwothings• Causation:onecausesanother
Correlationdoesnotequalcausation
ThePotentialofBigDatainHealthcare
1. Expandcapacitytogeneratenewknowledge– theeffectivenessoftreatments[Schneeweiss,2014]
– thepredictionofoutcomes[Schneeweiss,2014]2. Knowledgedissemination3. UsinganalyticstocombineEHRandgenomicdatato
translatepersonalizedmedicinetoclinicalpractice4. Deliverinformationdirectlytopatientsandincrease
patientparticipationintheirhealthcareMurdoch,T.B.,&Detsky,A.S.(2013).Theinevitableapplicationofbigdatatohealthcare. Jama, 309(13),1351-1352.Schneeweiss,S.(2014).Learningfrombighealthcaredata. NewEnglandJournalofMedicine, 370(23),2161-2163.
WhatisBigData?• Characteristicsofbigdata:
– Volume (i.e.,thesizeofthedataset)– Variety (i.e.,datafrommultiplerepositories,domains,
ortypes)– Velocity (i.e.,rateofflow)– Variability (i.e.,thechangeinothercharacteristics)– Value(i.e.,isthecostworthit?)
• Traditionaldataarchitectures(suchastypicalrelationaldatabases)cannothandlethistypeofdata
• NewarchitecturesarerequiredSource:NISTSpecialPublication1500-1.NISTBigDataInteroperabilityFramework:Volume1,Definitions.FinalVersion1,page4.NISTBigDataPublicWorkingGroupDefinitionsandTaxonomiesSubgroup.Retrievedfromhttp://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf http://dx.doi.org/10.6028/NIST.SP.1500-1
Tools• Hadoop
– Runsonclustersofhardware• MongoDB
– Storesdatausingdocumentswithfields• NoSQLutilities
RequirementsForAnalyticsforLearningSystems
• Awaytoensurethatpatientgroupsbeingcomparedaretrulysimilar
• Automatedtoolsforanalysis• Abilitytorapidlyrunautomatedtoolsagainstnew
data• Softwarethatcanbeusedwithlittletrainingandhelps
preventerrorsininterpretation• Easilyunderstoodresults
Schneeweiss,S.(2014).Learningfrombighealthcaredata. NewEnglandJournalofMedicine, 370(23),2161-2163.
ChallengesFacingBiomedicalBigData
• Amountofinformation• Lackoforganization• Lackofaccesstodataandtools• Insufficienttrainingindatasciencemethods
NationalInstitutesofHealth.WhatisBigData?Retrievedfromhttp://datascience.nih.gov/bd2k/about/what,para3.
Summary• Typesofdata.• Technologyortoolsforworkingwithdifferentdatatypes.
• DeterminewhetherdatafitsthedefinitionofBigData.
• ChallengesfacedwhenworkingwithBigData.• Commontermsusedindataanalysis,suchassample,paired,histogram,population,correlationvs.causation,anddescriptive.
COMMUNICATINGANDDATAANALYTICS
VisualizationObjectives1. Selectthebestdatacommunicationmode,giventheanalysisgoalsandresults.2. Interpretdataanalysisresults.3. Presentsolutionsforavarietyoftechnicaldatacommunicationchallenges.4. Prepareasimpledatavisualizationusingopen-sourcetools.5. Participateinthedesignanddevelopmentofacomplexdatavisualization.
CommunicationBasics
• DelineatetheProblem• DefineYourAudience• ChoosetheRightMode• ChoosetheRightWords• ChooseSupportingVisuals• Havean“Elevator”Speech
http://www.aaas.org/pes/communication-101-communication-basics-scientists-and-engineers
DelineatetheProblem• Whatareyoutryingtodowiththiscommunication?
• Isthereaparticulartimeframeinvolved?• Doesthenumberofpeoplematter?
DefineYourAudience• Colleaguesorco-workers• Stafforsupervisors• Subjectmatterexperts• Otherscientists• Journalists• Policymakers• Others
ChoosetheRightMode• Brainstormthepossiblecommunicationchannels– Email– Websiteorblog– PodcastorYouTube– Peer-reviewedmanuscript– Conferencepresentation– Dashboard
ChoosetheRightWords
• Fortheaudience• Becarefulofacronyms(ONC,EHRorMU,asexamples)
• Jargoncanhardtofollow• Shortwordsandshortsentences• Nottoomany!
DataVisualization• showthedata.• inducetheviewertothinkaboutthesubstanceratherthanthemethodology,graphicdesign,thetechnology,orotherthings.
• avoiddistortingwhatthedatahavetosay.• presentmanynumbersinasmallspace.• makelargedatasetscoherent.• encouragetheeyetocomparedifferentpiecesofdata.
• revealthedataatseverallevelsofdetail.• serveareasonablyclearpurpose.• arecloselyintegratedwiththestatisticalandverbaldescriptionsofthedataset.(Tufte)
BarCharts
• Showcomparisonsbetweengroups• Canbevertical(akacolumncharts)orhorizontal
• Canbeahistogram(nextslide)• CanbePareto(mosttoleast)
Histogram
Thissampleisfromfictionaldata.
StackedBarGraph
0%
20%
40%
60%
80%
100%
Yr1 Yr2 Yr3 Yr4
20% 25% 30% 10%
40% 25% 40%
25%
30% 40% 20%
45%
5% 5% 5% 10%
5% 5% 5% 10%
StudentGradeDistribution
F
D
C
B
A
This graph is total fiction. It does not represent actual grade distribution.
LineCharts
• Usedforlargeamountsofdataoccurringovertime
PieCharts(orShapeCharts)• Displaydataasaproportionofawhole• Noaxes• Canexplodeoutforemphasis• Canbeanyshape
PolarorRadarCharts
• Multipleseriesorcategoriesofdata• Largervaluesarefartherfromthecenter
Scatterplotsorscattercharts• Valuesrepresentedasaseriesofpointsonachart
• Distributionsofvaluesandclustersofdata• Displayingandcomparingnumericaldata
Stanfill,M.H.,Williams,M.,Fenton,S.H.,Jenders,R.A.,&Hersh,W.R.(2010).Asystematicliteraturereviewofautomatedclinicalcodingandclassificationsystems.JournaloftheAmericanMedicalInformaticsAssociation:JAMIA,17(6),646–651.http://doi.org/10.1136/jamia.2009.001024
DisplayinAction!• www.texmed.org/WorkArea/DownloadAsset.aspx?id=24815
The“Elevator”Speech• Canyoutellthestoryinthetimeittakestoridetheelevator?
• Threemainpoints• Meaningful• Easytounderstand
Summary• Effectivedatacommunicationrequiresthoughtandplanning
• Allcommunicationisnotequalforallaudiences
• Thevisualpresentationisparticularlyimportant,butcanbemoredifficult
• Havean“elevator”speech
WORKINGWITHDATA
Prepare!• https://uth.instructure.com/courses/27078/pages/16-activity?module_item_id=269525
• Downloadeachdataset.
ExerciseObjectives• Describereasonswhydataneedtobecleanedormodifiedbeforeanalysis
• Demonstrateabilitytoidentifyandcorrectbasicerrorsindata
• Demonstrateabilitytoperformdescriptivestatistics
• Demonstrateabilitytousepivottables• DescribetherelationshipbetweenadatabaseinanHITsystemanddataanalysistools
TechnologiesandTools• Commontechnologiesandtoolsusedfordataanalyticsinclude:– SpreadsheetprogramssuchasMicrosoftExcel®
– StatisticalprogramssuchasR,SAS,SPSS,andStata
– DatabasemanagementsystemssuchasMySQLandMicrosoftSQLServer®- canperformsomebasicanalysis
– BusinessintelligenceapplicationssuchasTableau®,QlikView®,IBMCognos
InstalltheExcelAnalysisToolPak• YoumustalreadyhaveMicrosoftOfficewithExcelonyourcomputer
• Clickthe File tab,thenclick Options.• Click Add-Ins,andthenintheManage box,select ExcelAdd-ins.
• Click Go.• Inthe Add-Insavailable box,selectthe AnalysisToolPak checkbox,andthenclick OK.
• AfteryouloadtheAnalysisToolPak,the DataAnalysis commandisavailableinthe Analysis groupontheData tab.https://support.office.com/en-us/article/Load-the-Analysis-ToolPak-305c260e-224f-4739-9777-2d86f1a5bd89
CleaningData• Identifyerrors
– Descriptivestatistics– Categoricaldata– Useofpivottables
• Determinecorrectvaluesorinfer/impute• Ifuncorrectabledeletetherecord• Workwithacopy ofyourdatasetandlogallchanges!
DataCleaning– ContinuousDataDescriptiveStatistics
TogeneratedescriptivestatisticsinExcel:Dataà DataAnalysisàDescriptiveStatistics
PatientWeights
Mean 189.1554StandardError 2.916985099Median 180.6Mode 192.3StandardDeviation 65.2257697SampleVariance 4254.401033Kurtosis 8.86101958Skewness 2.554839369Range 475.6Minimum 89.4Maximum 565Sum 94577.7Count 500ConfidenceLevel(95.0%) 5.731086356
DataCleaning– CategoricalData
A B
1 F2 U3 M4 M5 F6 D7 M8 M9 F10 M
COUNTIFfunction=COUNTIF(range,criteria)=COUNTIF($B$1:$B$10,“M”)- willgive5=COUNTIF($B$1:$B$10,“F”)- willgive3=COUNTIF($B$1:$B$10,“U”)- willgive1
• Canidentifysomeerrors
FilteringRecords• Displaysonlythoserecordsthatmeetcertaincriteria
• Clickacellinthecolumntobefiltered• OntheData tab,clicktheFilter icon
Unfiltered Filtered
FilteringRecords,continued
• Dialogboxdisplaysallthevaluespresentinthecolumn
• Cancheckonlyvaluesyouareinterestedin–Excelwilldisplayonlythoserecords
ColumnGraph• Columngraphshowsindividualweights• Butdoesn’tshowushowmanypatientsareina
particularweightcategory
FrequenciesandHistograms• Frequency:“HowmanyofXandYarethere?”• Afrequencycalculationgiveshowmanytimesa
particularvalueoccurs• Canbeshownas:• Frequencytable• Histogram:agraphofthenumberoftimesvalues
occurinasetofdata
ExampleFrequencyTableandHistogram
0 Frequency100 4149 107199 222249 136299 4349 6399 5449 7499 71000 1More 0
4
107
222
136
4 6 5 7 7 1 00
50
100
150
200
250
100 149 199 249 299 349 399 449 499 1000 More
Freq
uency
0
Histogram
Frequency
FrequencyTable
Categorie
sor”Bins”
Howmanyrecordsfellintothat
category(or“bin”)
ExampleHowmanypatientsareineachofthefollowingweightcategories(inpounds)?<100 300-349100-149 350-399150-199 400-499200-249 500-1000250-299 1000+
Setupthecategorybins
• AddacolumntoyourExcelspreadsheetwiththebinsthatyouwanttousetocategorizethepatientweights
CreatingaFrequencyTableandHistogram
• InMicrosoftExcel:ClickData,thenDataAnalysis,thenchooseHistogram
CreatingaFrequencyTableandHistogram
• IntheInputRangefield,entertherangeofcellsthatcontaintheweights
• IntheBinRangefield,entertherangeofcellsthatcontainthecategorybinsthatyoucreated
• ClickChartOutput
FrequencyTableandHistogramOutput
UsingtheExcelDataAnalysisToolPakFrequencyfunctionfor500records
0 Frequency100 4149 107199 222249 136299 4349 6399 5449 7499 71000 1More 0
4
107
222
136
4 6 5 7 7 1 00
50
100
150
200
250
100 149 199 249 299 349 399 449 499 1000 More
Freq
uency
0
Histogram
Frequency
FrequencyTable
Categorie
sor”Bins”
Howmanyrecordsfellintothat
category(or“bin”)
SortedHistogram(Pareto)
PivotTables• PivottablesareanExceltoolthatletyousummarize,analyze,andcreatedifferentviewsofyouryour data.Youcanarrangehowthedataisdisplayed.
• Pivottablesareveryusefulforidentifyingtrendsorrelationshipsamongdatainlargedatasets.
• Usethelaboratoryexerciseonpivottablestoexploredataonhospital-acquiredinfections
ExamplePivotTableFromthis: Tothis:
89
Chi-SquareTest• Aretwocategorical variablesrelated?• Categoricalvariableexamples:
– Gender– Ethnicity– Agegroup(e.g.40-49,50-59)– Diseasestage(I,II,III,IV)– Presenceorabsenceofadisease
VisualizationObjectives1. Selectthebestdatacommunicationmode,giventheanalysisgoalsandresults.2. Interpretdataanalysisresults.3. Presentsolutionsforavarietyoftechnicaldatacommunicationchallenges.4. Prepareasimpledatavisualizationusingopen-sourcetools.5. Participateinthedesignanddevelopmentofacomplexdatavisualization.
WheretoGetSkills
• TxHIMA/AHIMA• LocalColleges• MOOCs• Coursera• MITOpenCourseWare
JobsUsingTheseSkills
• HealthCareDataAnalyst• OperationsDataAnalyst• RevenueAnalyst• QualityImprovementAnalyst• DataIntegrationAssociate• AndSoOn!!!!
Conclusion
• It’sallaboutthedata!!!!
Question/Answer
• THANKYOUFORYOURATTENTION!
Bibliography• HealthandMedicineDivision.(n.d.).RetrievedApril28,2016,from
http://www.nationalacademies.org/hmd/Activities/Quality/LearningHealthCare.aspx• IBM(2013).Descriptive,predictive,prescriptive:Transformingassetandfacilities
managementwithanalytics.Retrievedfromhttp://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=TIW14162USEN.
• ManagingaDataDictionary.(2012). JournalOfAHIMA, 83(1),48-52.Retrievedfromhttp://library.ahima.org/doc?oid=105176#.VyeKJoQrJaQ
• Murdoch,T.&Detsky,A.(2013).TheInevitableApplicationofBigDatatoHealthCare.JAMA, 309(13),1351.http://dx.doi.org/10.1001/jama.2013.393
• NationalInstituteofStandardsandTechnology,.(2015). NISTBigDataInteroperabilityFramework:Volume1,Definitions.Retrievedfromhttp://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-1.pdf
• NIST/SEMATECHe-HandbookofStatisticalMethods.(n.d.).RetrievedMay02,2016,fromhttp://www.itl.nist.gov/div898/handbook/
• Overview- Sepsis- MayoClinic.(2016). Mayoclinic.org.Retrieved2May2016,fromhttp://www.mayoclinic.org/diseases-conditions/sepsis/home/ovc-20169784
• IdealGraphs…Tufte,E.R.(2001).TheVisualDisplayofQuantitativeInformation(2ndedition).Cheshire,Conn:GraphicsPr.
Bibliography• Schneeweiss,S.(2014).Learningfrombighealthcaredata. NewEnglandJournalof
Medicine, 370(23),2161-2163.• Shapira,G.(2016). TheSevenKeyStepsofDataAnalysis. Oracle.com.Retrieved28April
2016,fromhttp://www.oracle.com/us/corporate/profit/big-ideas/052313-gshapira-1951392.html
• SixStepsOfAnAnalyticsProject- QualityAssuranceandProjectManagement.(2015). QualityAssuranceandProjectManagement.Retrieved2May2016,fromhttp://itknowledgeexchange.techtarget.com/quality-assurance/six-steps-of-an-analytics-project/
• WhatisHadoop?.(2016). Sas.com.Retrieved2May2016,fromhttp://www.sas.com/en_my/insights/big-data/hadoop.html
• WhatisBigData?|DataScienceatNIH.(2015). Datascience.nih.gov.Retrieved2May2016,fromhttp://datascience.nih.gov/bd2k/about/what
• Charts,TablesandFigures• 1.1Figure:Smith,K.(2016).ClinicalDataWarehouse.Usedwithpermissionfrom
KimberlySmith.• 1.2-1.6Figures:Definition,P.(2012).BigDataAnalytics- PredictiveAnalytics- Gartner
Glossary.GartnerITGlossary.Retrieved28April2016,fromhttp://www.gartner.com/it-glossary/predictive-analytics
Bibliography
Images• Slide9:Farcaster.(2014).Datavisualizationprocessv1[OnlineImage].Retrieved
April28,2016fromhttps://commons.wikimedia.org/wiki/File:Data_visualization_process_v1.png#/media/File:Data_visualization_process_v1.png
• Slide25:Innesw.(2014).Simplepiechart [OnlineImage].RetrievedMay2,2016from https://commons.wikimedia.org/wiki/File:Charts_SVG_Example_5_-_Simple_Pie_Chart.svg