22
TAC 2018 Streaming Multimedia KBP Pilot Hoa Trang Dang National Institute of Standards and Technology

TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

TAC2018StreamingMultimediaKBPPilot

HoaTrangDang

NationalInstituteofStandardsandTechnology

Page 2: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

Background

• NISTwillevaluateperformersinDARPAAIDAProgram(ActiveInterpretationofDisparateAlternatives)• SomeAIDAevaluationswillbeopenevaluationsinTACandTRECVID.• ThegoalofAIDAistodevelopasemanticenginethatautomaticallygeneratesmultiplealternativeanalyticinterpretations ofasituation,basedonavarietyofunstructuredsourcesthatmaybenoisy,conflicting,ordeceptive.• Documentscancontainamixofmultilingualtext,speech,image,video;includingmetadata.• Adocumentcanbeassmallasasingletweet,oraslargeasaWebpagecontaininganewsarticlewithtext,picturesandvideoclips.

§ Alldatawillbein streamingmode; systemscanaccessthedataonlyonceinrawformat,butmayaccessaKBcontainingastructuredsemanticrepresentationofalldataseentodate

Page 3: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

ACTIVE INTERPRETATIONOF DISPARATE ALTERNATIVES(AIDA)• Givenascenario(“Benghazi”),documentstream,andseveraltopics.Foreachtopic:

• TA1outputsallKnowledgeElements(entities,relations,events,etc.,definedintheontology)inthedocuments,includingalternativeinterpretations• TA2fusesKEsfromTA1intotheTA2KB,maintainingalternativeinterpretations• TA3constructsinternallyconsistenthypotheses(partialKBs)fromTA2KB

TA1TA2TA3

Page 4: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

Scenario-SpecificOntology

• Scenarioswillinvolveeventssuchasinternationalconflicts,naturaldisasters,violenceatinternationalevents,orprotestsanddemonstrations.• AIDAwillextendKBPontologyofentities,relations,events,beliefandsentimenttoincludeadditionalconceptsthatareneededtocoverinformationalconflictsineachtopicinthescenario• Ideally,wouldhaveasingleontologyforalltopicsinthescenario(?)

Page 5: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

AIDAKBrepresentation

• KnowledgeElement(KE)isastructuredrepresentationofentities,relations,events,etc.-- likelyanaugmentedtriplelikeinColdStartKB• Tripleisaugmentedwithprovenanceandconfidence• Provenanceisasetofjustifications.Eachjustificationhasajustification-levelconfidence• KE-levelconfidenceisexplicitlyprovidedbyTA1andTA2,andisanaggregationofjustification-levelconfidences

• KBcontainsconflictingKEs(asfoundintherawdocuments)• Representation-- notreconciliation-- ofconflicts

Page 6: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

WhatisallowedinKBrepresentation?

• AIDA:“Althoughtheremaybeneedforsomenaturallanguage,imagethumbnails,featurized media,etc.intheKBforreference,registration,ormatchingpurposes,itisexpectedthatmostoftheassertionsintheKBwillbeexpressibleinthestructuredrepresentation,withelementsderivedfromanontology.”• FeaturesaccessibletoTA1/TA2inKEcannotbedocument-levelcontentfeatures(?).Allowablefeaturesinclude• Numberofsupportingdocs,andlinktodocs(butcan’treaddocs)• Timeoffirstsupportingdoc,mostrecentsupportingdoc

• Comments/recommendationsfromparticipatingteamsarewelcomeregardingwhatfeaturesshouldbeallowedintheKB• Forevaluationpurposes,provenanceaccessibletoLDCshouldbepointersintotherawdocumentsdenotingtextspans,audiospans,images,orvideoshots

Page 7: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

TAC/TRECVID2018tasks(pilot)• Task1:Extractallevents,subeventoractions,entities,relations,locations,time,andsentimentfrommultimediadocumentstream ,conditionedonzeroormoredifferentcontexts,orhypotheses (TAC,TRECVID2018)• OutputisasetofallpossibleKEs,includingconfidenceandprovenance• Mention-leveloutput,includingwithin-documentlinking

• Task2:BuildKBbyaggregatingallKEsfromTA1and“user”(TAC2018)• OutputisKBincludingcross-doclinking• Evaluatebyqueries(withentrypoints)andassessment

• [Task3:CreatehypothesesfromTask2KBs(AIDAprogram-internalin2018)]

Page 8: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

Training/Evaluationdata

• Onenewscenarioperevaluationcycle;4 scenariostotaloverlifetimeofAIDAprogram.• 100Kdocs/scenario,includingrelevantandirrelevantdocuments• 5-20%ofdocswillberelevanttothescenario• 200labeleddocsperscenario

• 12-20topicsperscenario• Atleastoneforeignlanguageperscenario,plusEnglish• AIDA:“Governmentwillprovidelinguisticresources andtoolsofaqualityandcompositiontobedetermined,butconsisting atleastofthetypeandsizefoundinaLORELEIRelatedLanguagePack (LRLP)"

Page 9: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

LowResourceLanguagePacks• 1Mw- 2Mw+monotextfromnews,webtext&socialmedia• 300Kw- 1.1Mw+paralleltextofvariablequality(professional,crowd,found,comparable)• Annotationsfor25Kw- 75Kw/languageincluding

• SimpleNamedEntity(PER,ORG,GPE,LOC/FAC)• KBlinkingofnamestoGeoNames andCIAWorldFactBook• SituationFrames:needs/issuesforanincident(e.g.UrgentshelterneedinKermanshahprovince)

• FullEntity(name,nom,pro)andwithin-doccoref• Predicate-argumentannotationofdisaster-relevantActsandStates

• Grammaticalresourcesrangingfromfullgrammaticalsketchtofoundresources(dictionaries,grammars,primers,gazetteers)tolexicons• BasicNLPtoolsincludingword,sentencesegmenters,encodingconverters; nametaggers

Page 10: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

RelatedTRECVIDTasks

Page 11: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

TRECVID(2001– Present)• Shotboundarydetection:Identifytheshotboundariesinthegivenvideoclip(s)• High-levelfeatureextraction/SemanticIndexing:Givenastandardsetofshotboundariesandalistoffeature(concepts)definitions,returnarankedlistofshotsaccordingtothehighestpossibilityofdetectingthepresenceofeachfeature

• Ad-hocVideoSearch:Givenastatementofinformationneed,returnarankedlistofshotswhichbestsatisfytheneed;similartosemanticindexing,butwithcomplexconcepts(combinationofconcepts);e.g.,findgroupofchildrenplayingfrisbee inapark.

• RushesSummarization:Givenavideofromtherushestestcollection,automaticallycreateanMPEG-1summarycliplessthanorequaltoamaximumdurationthatshowsthemainobjectsandeventsintherushesvideotobesummarized

• Surveillanceeventdetection:detectasetofpredefinedeventsandidentifytheiroccurrencestemporally

• Content-basedcopydetection:givenatestcollectionofvideosandasetof(video,audio,video+audio)queries,determineforeachquerytheplace,ifany,thatsomepartofthequeryoccurs,withpossibletransformations,inthetestcollection

Page 12: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

TRECVID(2001– Present)• Known-itemSearch:Givenatext-onlydescriptionofthevideodesiredandatestcollectionofvideowithassociatedmetadata,automaticallyreturnalistofupto100videoIDsrankedbyprobabilitytobetheonesought• InstanceSearch:Givenacollectionoftestvideos,amastershotreference,andacollectionofqueriesthatdelimitaperson,object,orplaceentityinsomeexamplevideo,locateforeachquerythe1000shotsmostlikelytocontainarecognizableinstanceoftheentity[AIDATA2cross-doccoref]• MultimediaEventDetection:Givenacollectionoftestvideosandalistoftestevents,indicatewhethereachofthetesteventsispresentanywhereineachofthetestvideosandgivethestrengthofevidenceforeachsuchjudgment• Localization:Givenavideoshot,Determinethepresenceofaconcepttemporallywithintheshot,withrespecttoasubsetoftheframescomprisedbytheshot,and,spatially,foreachsuchframethatcontainstheconcept,toaboundingrectangle[AIDAprovenance?]

Page 13: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

Latesttaskintroducedin2016:Video-to-Text• Givenasetof2000URLsofTwitter(Vine)videosandsetsoftextdescriptions(eachcomposedof2000sentences),systemsareaskedtoworkandsubmitresultsfortwosubtasks:

• MatchingandRanking: ReturnforeachvideoURLarankedlistofthemostlikelytextdescriptionthatcorrespond(wasannotated)tothevideofromeachofthedifferenttextdescriptionsets.

• DescriptionGeneration: AutomaticallygenerateforeachvideoURLatextdescription(1sentence)independentlyandwithouttakingintoconsiderationtheexistenceoftextdescription

sets.

• Systemsandannotatorswereencouragedtodescribevideosusing4facets:• Who isthevideodescribingsuchasconcreteobjectsandbeings(kindsofpersons,animals,things)• What aretheobjectsandbeingsdoing?(genericactions,conditions/stateorevents)• Where suchaslocale,site,place,geographic,architectural(kindofplace,geographicorarchitectural)

• When suchastimeofday,season,etc

Page 14: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

AirplaneAnchorpersonAnimal Basketball BeachBicyclingBoat_ShipBoy Bridges BusCar_RacingChair CheeringClassroom Computers Dancing Demonstration_Or_ProtestGreetingHand Highway

Sitting_DownStadium Swimming Telephones ThrowingBaby Door_OpeningFields Flags Forest George_BushHill Lakes Military_AirplaneExplosion_FireFemale-Human-Face-Closeup Flowers GirlGovernment-Leader Instrumental_Musician

Oceans Quadruped Skating Skier SoldiersStudio_With_AnchorpersonTraffic Kitchen MeetingMotorcycle News_StudioNighttime Office Old_PeoplePeople_MarchingPress_ConferenceReportersRoadway_JunctionRunningSinging

ExamplesofconceptsusedintheTRECVIDSemanticINdexing(SIN)task

Page 15: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

Multimedia

• Eachdocumentcancontainamixoftext,speech,image,video;includingmetadata.• Multiplelanguages:Englishplus1-2foreignlanguages(TBA)• LDCwillprovidelanguagepackscontainingresourcesforeachlanguage

• Allparticipantswillbegiventhesamedocuments• Participantsareallowedtoprocessinfoinapropersubsetofthelanguagesormediatypes• NISTmayreportbreakdownevaluationresultsbylanguage,mediatype,etc.

Page 16: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

StreamingExtraction

• Documentsarriveinbatchesasachunk.• ~100documents/chunk(?),withcaponlengthoftimecoveredinachunk

• TA1(andTA2?)systememitsKE’s(triple+confidence+extras)aftereachchunk.• Atspecifiedtimepointsinthestream,thesetofaccumulatedKE’sisevaluated.• Rankedprecision/recallderivatives.

• Atsomeofthosepoints,awildhypothesisappears!• Ahypothesis=asetofproposedtuples.• TA1systemoutputsKE’sprimedbythehypothesis,whichareevaluated.

Page 17: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

TA1ExtractionConditionedonContext• TA1mustbecapableofacceptingalternatecontexts and producingalternateanalyses foreachcontext.• Forexample,theanalysisofacertainimageproducesknowledgeelementsrepresentingabuson aroad.However,knowledgeelementsinoneormorehypothesessuggestthatthisisariverratherthanaroad. Theanalysis algorithmshouldusethisinformationforadditionalanalysisoftheimagewithpriorsfavoringa boat.

• Simplifyingassumptionsforevaluationpurposes:• Contextsarecoherenthypotheses(representedasapartialKB)drawnfromasmallstaticsetofpossiblehypothesesthatareproducedmanuallybyLDC• Only“whatif”hypothesesareinputtoTA1;KEsandconfidencevaluesresultingfrom“whatif”hypothesesdonotgetpassedontoTA2butareevaluatedseparately

Page 18: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

HowisTask1differentfrompastTRECVIDandTACcomponenttasks?

• Multimedia• Streaminginput• Can’tgobacktoreanalyzerawdocsinpreviousdatachunks

• TA1hasaccesstoTA2KBencodingpreviouslyaddedKE’s

• Multiplehypothesesandinterpretations• Expandedontologytocoverinformationalconflictsinscenario• TA1outputsallpossibleextractionsandinterpretations,notjustthemostconfidentones• TA1extractionfromdataitemsmaybeconditionedonhypothesis

Page 19: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

HowisTask2 differentfromColdStartKBP?

• Multimedia• Streaminginput• TA2hasnoaccesstorawdataitemstoassistinfusingincomingKEswithexistingKB;canonlyusewhat’srepresentedintheincomingKEandexistingKB

• Multiplehypothesesandinterpretations• Expandedontologytocoverinformationconflictsinscenario• TA2KBmustmaintainallpossibleKEs(evenlow-confidenceKEs)inordertosupportcreationofmultiplehypothesesanddisparateinterpretations• TA2KEsandconfidencestheoreticallycouldbeconditionedonhypothesisinfuture,butfor2018theTA2KBisindependentofany“whatif”hypotheses.

Page 20: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

EvaluationbyAssessment

• Evaluateusingpost-submissionassessmentandclusteringofpooledmentions• TosupportevaluationofTA1extractionconditionedoncontext,ground-truthmustbeconditionedonasmallsetofhypotheses,predeterminedbyLDC.

• OnlytargetedKEs(relevanttohypotheses)willbeevaluated• Onlykhighest-confidencementions/justificationsforeachKEwillbepooledandassessed• LDCmight provideexhaustiveannotationofmentionsofentitiesforasmall setofdocuments,forgold-standardbased“NER”evaluation

Page 21: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

AIDAEvaluationSchedule

• 318-monthphases• January2018kick-off

• ~Sept2018:Eval Pilot• ~May2019:Eval 1(Phase1)• ~Nov2020:Eval 2(Phase2)• ~May2022:Eval 3(Phase3)

Page 22: TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background •NIST will evaluate performers in DARPA AIDA Program (Active Interpretation of

TAC2018StreamingMMKBPPilotEvaluationSchedule

• Sample/training/eval datarelease:• ~January:scenarioand3mostlylabeledtopicsfortraining;all100Kunlabeleddocsforthescenario(foreignlanguagesannouncedatthistime)• ~April:3additionallabeledtopicsfortraining• ~September:6“evaluation”topics

• EarlySeptember(?):Task1evaluationwindow• MidSeptember(?):Task2evaluationwindow