Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
DataMiningforAvia.onSafety
NikunjC.Oza,[email protected]
hBp://..arc.nasa.gov/people/oza
AmericanSta.s.calAssocia.on,SanFranciscoBayAreaChapter,October21,2010
Outline
• Thebreadth/depthoftheproblems• Combiningdiscreteandcon.nuoussequences:Mul.pleKernelAnomalyDetec.on(MKAD)Derivedfromanomalydetec.onmethodsondiscreteandcon.nuoussequences.
• TextMining:classifica.on,topicmodeling
• Ongoing,futurework
AccidentData
Opera.onalData(DiscreteandCon.nuousData)
Opera.onalSurveys(Text)
AnecdotalReports
Projec.ons
Forensics
WhatHappened?
DiscoveringCausalFactors
WhydiditHappen?
SUBJEC
TIVE
‐‐datacon.
nuum
‐‐‐OBJEC
TIVE
Predic.ons
WhatwillHappenNext?
Avia.onSafetyMapping
FromIrvingStatler,Avia.onSafetyMonitoringandModelingProject
DataArisesfromMoleculartoGlobalScales
VehicleTechnology Opera.ons
Molecules
MaterialsSensors So/ware Engines Aircra/
10‐2
100 101 102 10310‐1
10‐3
104 105 10610‐4
10‐5
10‐6
DataMininginSupportofGlobalOpera.ons
105
DatasharingJustculture/safetyculture
10‐2
100 101 102 10310‐1
10‐3
104 105 10610‐4
10‐5
10‐6
• 6
DataMining• IVHM/SSATprojectgoalsrequireleveragingsubstan.aldata.– Aircrab‐produced:Sensordata,flight‐relateddata(e.g.,origin,des.na.on),coveringmanyflightsovermanyyears.
– Other:Safetyreports,trajectories• Transformdataintousefulknowledge
– ToolsforDetec.on,Diagnosis,Prognosis,Mi.ga.on
– Levelsrangingfromflight‐leveltona.onalairspacelevel
Outline
• Thebreadth/depthoftheproblems• Combiningdiscreteandcon.nuoussequences:Mul.pleKernelAnomalyDetec.on(MKAD)Derivedfromanomalydetec.onmethodsondiscreteandcon.nuoussequences.
• TextMining:classifica.on,topicmodeling
• Ongoing,futurework
• Needtomodelthebehaviorofdiscretesensorsandswitchesinanaircrabduringflight.
• Focusisonprimarysensorsthatrecordpilotac.ons.
• Theaimistodiscoveratypicalbehaviorthathaspossibleopera.onalsignificance.
• WedevelopedsequenceMiner:Eachflightisanalyzedasasequenceofevents,accoun.ngfor
• orderinwhichswitcheschangevalues• frequencyofoccurrenceofswitches
• TwoTasks:
• Givenagroupofflights,findflightsthatareanomalous.
• Givenananomalousflight,describetheanomaliesandthedegreeofanomalousness.
• Methodbasedontechniquesusedinbioinforma.cs.
C=
O=
S1= S2=
S3=
– Sequences Si in a cluster is dependent on a prototype C
– The outlier O is dependent on the Si
– Therefore:
– This forms the basis of our objective function F that allows us to describe why sequences are anomalous.
P(O|C)=ΣiP(O|Si)xP(Si|C)
A B _ _ E _ A B C _ E F A B C _ _ F A _ C D E F
Missinga“B” Hasextra“D”
{ClusterMembers
OutlierSequence}
Alignment
• P(O|Si) is proportional to the normalized length of the longest common subsequence of O and Si • Discovery of missing and extra symbols is done by aligning the
sequences • Missing and extra symbols are those whose addition to or deletion from an
outlier sequence O would make O more normal with respect to the objective function F
SequenceMiner
NormalizedLongestCommonSubsequence(NLCS)
wherethefunc.onsandcalculatethelongestcommonsubsequenceandlengthofthesequences.€
L h si,s j( )( )L si( ) × L s j( )
SME opinion: Possible evidence of mode confusion.
MKADforFleetwideanalysis
How to integrate all information in a concise and intuitive manner?
Compression, Feature extraction,
Fusion, Anomaly detection
Sequences D and continuous data streams C interactions
MKADFramework
Kernelonmeasurements
Preprocessingdiscretedata
Preprocessingcon.nuousdata
Model
SequenceKernelonswitching
SAXrepresenta.on
Compression Fusion Detection
0 20 40 60 80 100 120
CC
baabccbc
Firstconvertthe.meseriestoPiecewiseAggregateApproxima.on(PAA)representa.on,thenconverttosymbols
Ittakeslinear.me
SlidecourtesyofEamonnKeoghandJessicaLin
Op.miza.onproblem
Discretekernel Con.nuouskernelIntheobjec.vefunc.on,eachentryofthediscretekernelandthecon.nuouskernelrepresentsthescoreobtainedusinglongestcommonsubsequence(LCS)ofdiscretesignalsandSAXifiedcon.nuoussignals,respec.vely.
Unique Linear Decision Boundary
PairwiseSimilarityMeasure
Kernelondiscrete:NormalizedLongestCommonSubsequence(NLCS)
Kerneloncon.nuous:Inverselypropor.onaltodistancebetweenSAXrepresenta.onsofsequences
GeneralCase,Mul.pleKernels
€
minQ =12
α iα j βλKλ xi,x j( )λ
∑
i, j∑
Subjectto:
OneclassSVMstrainingalgorithmsrequiresolvingthequadra.cproblem
Dual form
Linear equality constraint
Bounds on design variables
Control parameter
:Lagrangemul.pliersoftheprimalQPproblem
€
βλλ
∑ =1Also:
Anomalyscores
Datapoints with will be the support vectors
Magnitude of h: degree of normality/anomalousness
Sign of h: if negative – outlier if positive - normal
Indicator
Decisionboundaryisdeterminedonlybymarginandnon‐marginsupportvectorsobtainedbysolvingtheQPproblem
Synthe.cExperimentSimulation data
Type 1 – (Missing event) Flaps were not extended to normal full deployment at landing.
Type 2 - (Extra event) Landing gear was retracted after being deployed on final approach.
Type 3 – (Out of order event) Gear deployed before initial flaps below flaps limit.
Type 4 – (Continuous anomaly) High bank angles or rate of descent below 1,000 ft.
Casestudy:FOQAanomalydetec.on
• Thetradi.onmethodscannotdetectandmonitortheseanomalousac.vi.esthatmayhaveoccurredsimultaneouslyandareheterogeneousinnature.
Touchdownpoint
MKADSummary
1. Support flights safety experts 2. Schedule maintenance
Application
.… anomaly detection on multivariate mixed attributes where discrete may influences the system dynamics which is reflected on the continuous data streams.
Performs
.. High detection rate on most operationally significant anomalies in fleet wide analysis on large datasets
.. Discover some “unknown unknowns”
Highlights
Outline
• Combiningdiscreteandcon.nuoussequences:Mul.pleKernelAnomalyDetec.on(MKAD)– Derivedfromanomalydetec.onmethodsondiscreteandcon.nuoussequences.
• TextMining:classifica.on,topicmodeling
• Ongoing,futurework
Avia.onSafetyTextReports
• Pilotsareencouragedtoreportincidents,concerns,orunsafecondi.ons.Tworepositories:– TheNASA‐FAAAvia.onSafetyRepor.ngSystem(ASRS).
– EachindividualairlinehasanAvia.onSafetyAc.onProgram(ASAP).
• Reportscanbeanalyzedtoimproveavia.onsafety.
• Reportsneedtobecorrectlyandconsistentlycategorizedbyeventtypetodeterminedangeroussitua.onsandtracktrendsofincidentsandevents.
• Eventtypesarenotenoughtocaptureallanomalies.Topiciden.fica.onneededtoiden.fyneweventtypes,combina.onsofeventtypes.
ManuallyClassifyingReports• Reviewandanalysisofreportsislaborintensive.
– InASRS,reviewershaveclassifiedabout135,000ASRSreportsof715,000totalreportssubmiBedinto60overlappinganomalycategories.
– Thelengthofthereportsrangefrom0.5‐4pageslong.
– ASRSreportsaccrueatabout3,000permonth.
– Classifica.onsnotalwaysconsistentduetomul.plereviewers,changingexperiences.
• Historicalreportsmustberereadwhennewcategoriesareaddedorchanged.
• Currentsystemsrelyheavilyonhumanmemoriesforhistoricalperspec.ves.
ASRSReportExcerpt
JUST PRIOR TO TOUCHDOWN, LAX TWR TOLD US TO GO AROUND BECAUSE OF THE ACFT IN FRONT OF US. BOTH THE COPLT AND I, HOWEVER, UNDERSTOOD TWR TO SAY, 'CLRED TO LAND, ACFT ON THE RWY.' SINCE THE ACFT IN FRONT OF US WAS CLR OF THE RWY AND WE BOTH MISUNDERSTOOD TWR'S RADIO CALL AND CONSIDERED IT AN ADVISORY, WE LANDED...
Note:Industryspecificvocabularyandabbrevia.ons.
Automa.cCategoriza.onofASRSReports
Non Adherence to ATC Clearance
Critical Equipment Problem
Runway Incursion
Landing without a Clearance
Air Space Violation
Altitude Deviation Overshoot
Fumes
Altitude Deviation Undershoot
Ground Encounter, Less Severe
...
JUST PRIOR TO TOUCHDOWN, LAX TWR TOLD US TO GO AROUND BECAUSE OF THE ACFT IN FRONT OF US. BOTH THE COPLT AND I, HOWEVER, UNDERSTOOD TWR TO SAY, 'CLRED TO LAND, ACFT ON THE RWY.' SINCE THE ACFT IN FRONT OF US WAS CLR OF THE RWY AND WE BOTH MISUNDERSTOOD TWR'S RADIO CALL AND CONSIDERED IT AN ADVISORY, WE LANDED…
ASRS Report ExcerptSample of 60 ASRS Anomaly Categories
Asinglereportcanbeinmul.plecategories.
MarianaSta.s.calModelOp.miza.onFlowchart
Start at Co, γo, wo
Train SVM; Build model
Test Model w/ Validation
Data
Compare to ‘best’ AUROC
Update ‘best’ AUROC;
Save parameters
Move to new C, γ, w
Calculate AUROC
Apply statistical optimization
method
Yes
Save Model; Quit
ConvergedtoaSolu.on?
BeBerthanPrevious‘best’?
Yes
NoNo
AUROC‐areaunderreceiveroperatorcurve
Mariana w/ raw text only
Support Vector Machine w/NLP
Decision Tree w/NLP
Random Forest w/ NLP
Logistic Regression w/NLP
Linear Discriminant Analysis w/NLP
Marianavs.MethodsusingNaturalLanguageProcessing
TheMarianaalgorithmcangivesubstan.allybeBerresultsoverothermethods(greenbox),evenwhenprocessingrawtext.
Areaun
derRO
CCu
rve
Categories*
0.8
1.0
0.6
0.4
0.2
0.0
Outline
• Combiningdiscreteandcon.nuoussequences:Mul.pleKernelAnomalyDetec.on(MKAD)– Derivedfromanomalydetec.onmethodsondiscreteandcon.nuoussequences.
• TextMining:classifica.on,topicmodeling
• Ongoing,futurework
SubspaceApproxima.on
• Goals– Systemobenassumedexplainablebysimplemodelplusnoise.Construc.onofsimpleapproxima.onmayreducenoise.
– Derivesmall,keysetoffeaturestoexplainsystembehavior.
– Lowerstoragerequirements.
• Issue– Derivedfeaturesmoreintui.veiftheyareposi.ve,reflec.ngpresenceofkeycomponentsthatadduptocharacterizetheitemofinterest.
Non‐nega.veMatrixFactoriza.on
• NMFfindsfactorsrepresen.ngpartsoffaces(e.g.,nose,mouth).Easytointerpret.
• VectorQuan.za.onandPCAfindholis.crepresenta.ons(face+/‐otherstuff).Difficulttointerpret.
Non‐nega.veMatrixFactoriza.on
Problem:Givenanonnega.vematrixandaposi.veintegerfindnonnega.vematricestominimizethefunc.on
€
A ∈ Rmxn
€
k <min{m,n}
WHisanonnega.vematrixfactoriza.on.ColumnsofWrepresentkbasisvectorscapturedfromnexampleshavingmfeaturevalueseach.Hgivesfactor‐exampleweights.kisproblem‐specificandchosenbyuser.
NMFforclassifica.on
• NMFfactorscanbeusedforclustering,classifica.on,regression.
• Classifica.onofASAPreportsintooneormorecontribu.ngfactors.
Sample(ASRS)BasisVectorsBasisVector1 BasisVector2
Run 1 2 3 1 2 3
ASRSClassifica.onResults
Outline
• Combiningdiscreteandcon.nuoussequences:Mul.pleKernelAnomalyDetec.on(MKAD)– Derivedfromanomalydetec.onmethodsondiscreteandcon.nuoussequences.
• TextMining:classifica.on,topicmodeling
• Ongoing,futurework
Ongoing,FutureWork
• Anomalydetec.onoverdiscrete,con.nuous,andtext(u.lizewhenavailable,don’tpenalizewhennot).
• Anomalycause/precursoriden.fica.on.
• Predic.onovermul.plescales:withinflight,acrossflights,acrossfleetsoveryears.
HowDoesHumanPerformanceAffectAvia.onSafety?
PhysiologicalMeasurements* Sample Text Report JUST PRIOR TO TOUCHDOWN, LAX TWR TOLD US TO GO AROUND BECAUSE OF THE ACFT IN FRONT OF US. ...
NASADataMiningLab(MountainView,CA)
Luton,UK
ToNASADataMiningTeamDailydata300GBflightspermonthPhysiology,text,cockpit,enginesandflightparameters,flightpath,networkinforma.on.
ToEasyJetNearreal‐.medecisionsupport
Understandingpilotfa.gue
Mul.pleKernelAnomalyDetec.on(KDD2010)
*Diagramforno.onalpurposesonly
MiningHeterogeneousDataistheKey
K(xi,xj)
Discrete
Textual
Networks
Con.nuous
PrimarySource:aircrabCananswerwhathappenedinanaircrabduringanAvia.onSafetyIncident
PrimarySource:Humans.CananswerwhyanAvia.onSafetyIncidenthappened
PrimarySource:Radardata.CananswerwhathappenedintheNa.onalAirspaceduringAvia.onSafetyIncident
Sample Text Report JUST PRIOR TO TOUCHDOWN, LAX TWR TOLD US TO GO AROUND BECAUSE OF THE ACFT IN FRONT OF US. ...
MassiveData:Archivesgrowingat100Kflightspermonth.
TheAnatomyofanAvia.onSafetyIncident
Time
. . . . . .
State k-1 State k State k+1
Transition k-1
State n+1State n-1 State n
Transitionk
Transitionk+1
Transitionn-2
Transitionn-1
Transitionn
Compromised States Safe States or AccidentsSafe
Incident
AnomalousStates
Scenario=(ContextBehaviorOutcome)
FromIrvingStatler,Avia.onSafetyMonitoringandModelingProject
DASHlinkdisseminate.collaborate.innovate.hBps://dashlink.arc.nasa.gov/
DASHlinkisacollabora.vewebsitedesignedtopromote:• Sustainability• Reproducibility• Dissemina.on
• Communitybuilding
Userscancreateprofiles
• Sharepapers,uploadanddownloadopensourcealgorithms• FindNASAdatasets.
ComingSoon…DASHlink2.0.
JoinDASHlink!
ConferenceonIntelligentDataUnderstanding‐2010
• Conferencefocusedontheoryandapplica.onsofdataminingandmachinelearningtoEarthScience,SpaceScience,EngineeringSystems
• Loca.on:ComputerHistoryMuseum,MountainView,CA
• Date:October5‐6,2010• Registra.on:Free✔
• SteeringCommiBee• AshokSrivastava(chair)• StephenBoyd• JiaweiHan• EamonnKeogh• VipinKumar• ZoranObradovic• NikunjOza• RaghuRamakrishnan• RamasamyUthurusamy• RamasubbuVenkatesh• XindongWu
ProgramChairs:NiteshChawlaandPhilipYu
CallforPar.cipa.on
References
• S.Budalako.,A.N.Srivastava,andM.E.Otey,Anomalydetec.onanddiagnosisalgorithmsfordiscretesymbolsequenceswithapplica.onstoairlinesafety,inIEEESMCPartC,39(1),2008.
• SantanuDas,KanishkaBhaduri,NikunjC.Oza,andAshokN.Srivastava,nu‐Anomica:AFastSupportVectorBasedNoveltyDetec.onTechnique,inICDM2009.
• SantanuDas,BryanL.MaBhews,AshokN.Srivastava,andNikunjC.Oza,Mul.plekernellearningforheterogeneousanomalydetec.on:algorithmandavia.onsafetycasestudy,inKDD2010.
• NikunjC.Oza,J.PatrickCastle,andJohnStutz,Classifica.onofAeronau.csSystemHealthandSafetyDocuments,inIEEESMCPartC,39(6),2009.