Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
MiningandUDlizingDatasetRelevancyfromOceanographicDataset(MUDROD)Metadata,UsageMetrics,andUser
FeedbacktoImproveDataDiscoveryandAccess NASAAIST(NNX15AM85G)
Dr.Chaowei(Phil)Yang,Mr.YongyaoJiang,Ms.YunLi,
GeographyandGeoInformaDonScience,GeorgeMasonUniversity
Mr.EdwardMArmstrong,Mr.ThomasHuang,Mr.DavidMoroni,Mr.ChrisFinch,Dr.LewisJ.McGibbney,Mr.FrankGreguska,Mr.GaryChen
JetPropulsionLaboratory,NASA
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
• ProjectBackground– Problems– ObjecDves– FuncDons
• System– Logmining– QuerySemanDcs– Ranking– RecommendaDon
• Results• Nextstep
Agenda
3
Data Discovery Problems
• Keyword-based matching (traditional search engines) – User query: ocean wind – Final query: ocean AND wind
• Reveal the real intent of user query – ocean wind = “ocean wind” OR “greco” OR “surface wind” OR “mackerel breeze” …
• PO.DAAC UWG Recommendation 2014-07
• NASA ESDSWG Search Relevance Recommendations 2016 & 2017
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
• Analyzeweblogstodiscoveruserknowledge(queryanddatarelaDonships)• ConstructknowledgebasebycombiningsemanDcsandprofileanalyzer• Improvedatadiscoveryby1)be^erranking;2)recommendaDon;3)
ontologynavigaDon
ObjecDves
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
• Weblogpreprocessing• SemanDcanalysisofuserqueries&NavigaDon
• Machinelearningbasedsearchranking• DataRecommendaDon
FuncDons/Modules
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Weblogprocessing
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
• Requestssentfromcliente.g.browser,cmdlinetool,etc.recordedbyserver
• LogfilesprovidedbyPO.DAAC(HTTP(S),FTP)
ClientIP:68.180.228.99Requestdate/Dme:[31/Jan/2015:23:59:13-0800]Request:"GET/datasetlist/...HTTP/1.1"HTTPCode:200Bytesreturned:84779Referrer/previouspage:“/ghrsst/"Useragent/browser:"Mozilla/5.0...
68.180.228.99--[31/Jan/2015:23:59:13-0800]"GET/datasetlist/...HTTP/1.1"20084779"/ghrsst/""Mozilla/5.0..."
Weblogs
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Goal: reconstruct user browsing pattern (search history & clickstream) from a set of raw logs
Weblogs
UseridenDficaDon
CrawlerdetecDon
StructurereconstrucDon
SessionidenDficaDon Searchhistory
Clickstream
AddiDonalstepsinclude:wordnormalizaDon,stopwordsremoval,andstemming
Datapreprocess
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Reconstructedsessionstructure
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Datapreprocessresults
1.Usersearchhistory2.Clickstream
Jiang,Y.,Y.Li,C.Yang,E.M.Armstrong,T.Huang&D.Moroni(2016)“ReconstrucDngSessionsfromDataDiscoveryandAccessLogstoBuildaSemanDcKnowledgeBaseforImprovingDataDiscovery”ISPRSInternaDonalJournalofGeo-InformaDon,5,54.
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
SemanDcsimilarity
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
ExisDngontology(SWEET)• SWEET(RaskinandPan2003)• FocusononlytworelaDons• Thecloser,themoresimilar
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Usersearchhistory• Createquery–usermatrix• Calculatebinarycosinesimilarity
Conceptualexample
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Clickstream• Hypothesis:similarqueriescanresultinsimilarclickingbehavior• Iftwoqueriesaresimilar,thedatathatgetclickedaoertheyaresearched
wouldbemorelikelytobesimilar
Queryb
Data
Querya
Similar?
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Metadata• Hypothesis:semanDcallyrelatedtermstendtoappearinthesame
metadatamorefrequently• EssenDallythesameastheclickstreamanalysis• PerformLatentSemanDcAnalyses(LSA)overtheterm–metadata
matrix
Queryb
Metadata
Querya
Similar?
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
IntegraDon• Allfourresultscouldbeconvertedto
• Problem:– Noneofthemareperfect(uncertaintyindata,hypothesisandmethod)
– Metadataandontologymighthaveunknowntermstosearchengineendusers
– SomeDmes,similarityvaluesfromdifferentmethodsareinconsistent
ConceptA
ConceptB
Similarity
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
IntegraDon
• Themaximumsimilarityofallofthecomponents(largesimilarityappearstobemorereliable)
• Theadjustmentincrementbecomeslargerwhenthesimilarityexistsinmoresources
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
SemanDcSimilarityCalculaDonWorkflow
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
ResultsandevaluaDonQuery Searchhistory Clickstream Metadata SWEET Integratedlist
oceantemperature
seasurface
temperature(0.66),sea
surfacetopography(0.56),
oceanwind(0.56),
aqua(0.49)
seasurface
temperature(0.94),
sst(0.94),grouphigh
resoluDonseasurface
temperaturedataset(0.89),
ghrsst(0.87)
sst(0.96),ghrsst(0.77),sea
surfacetemperature(0.72),
surfacetemperature(0.63),
reynolds(0.58)
None
sst(1.0),seasurfacetemperature(1.0),
ghrsst(1.0),grouphighresoluDonsea
surfacetemperaturedataset(0.99),
reynoldsseasurfacetemperature(0.74)
Samplegroup Overallaccuracy
Mostpopular10queries 88%
Leastpopular10queries 61%
Randomlyselected10queries 83%
Bydomainexperts
Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang&D.Moroni(2017)AComprehensiveApproachtoDeterminingtheLinkageWeightsamongGeospaDalVocabularies-AnExamplewithOceanographicDataDiscovery.InternaDonalJournalofGeographicalInformaDonScience(minorrevision)
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
• QuerysuggesDon• QuerymodificaDon
Whatcanweuseitfor?
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Searchranking
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Background
• Rankingisalong-standingproblemingeospaDaldatadiscovery• Typically,hundreds,eventhousandsofmatches• CangetlargerasmoreEarthobservaDondataisbeingcollected
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
ObjecDveandMethods
• Putthemostdesireddatatothetopoftheresultlist• Whatfeaturescanrepresentusers’searchpreferencesforgeospaDal
data?• HowcantherankingfuncDonreachabalanceofallthesefeatures?
• IdenDfiedelevenfeaturesfrom– GeospaDalmetadataa^ributes– Query– metadatacontentoverlap– Userbehaviorfromweblogs
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Rankingfeatures– Metadataa^ributesFeatures Description
Release date The date when the data was published
Processing level (PL) The processing level of image products, ranging from level 0 to level 4.
Version number The publish version of the data
Spatial resolution The spatial resolution of the data
Temporal resolution The temporal resolution of the data
• Fivemetadatafeatures• Verifiedbydomainsexperts• Query-independent:staDc,dependsonthedataitself,won’tchangewiththequery
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
SpaDalquery-metadataoverlap
• SpaDalsimilaritybetweenqueryareaandthecoverageofaparDculardata
• Overlapareanormalizedbytheoriginalareaofqueryanddata
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Rankingfeatures– Userbehavior• All-Dme,monthly,userpopularity,andsemanDcpopularity(retrieved
fromweblogs)• SemanDcpopularity:thenumberofDmesthatthedatahasbeen
clickedaoersearchingaparDcularqueryanditshighlyrelatedones(query-dependent)
oceantemperature
oceantemperature
seasurfacetemperature
sst
1.0
1.0
1.0
DataA
10
5
5
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
RankSVM• Oneofthewell-recognizedMLrankingalgorithm• ConvertarankingproblemintoaclassificaDonproblemthataregular
SVMalgorithmcansolve• 3mainsteps
Ø 1)Standardize:mean=0,std=1o SVMisnotscaleinvarianto Over-opDmizedo Longertotrain
Ø 2)Foranypairoftrainingdata,calculatethedifferenceØ 3)ArankingproblembecomesabinaryclassificaDonproblem,whereSVMis
appliedtofindtheopDmaldecisionboundary
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
ArchitectureSec.15.4.2
• All of these (except for training) can be finished within 2 seconds
• None of the open source mainstream ML library provide any ranking algorithm
• Implemented it by ourselves with the aid of Spark MLlib Index
User query
Semantic query
Top K retrieval
Ranking model
Learning algorithm
Training data
Re-ranked results
Feature extractor
Weblogs
User clicks Knowledge base
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
NDCG(K)forfivedifferentrankingmethodsatvaryingK(1-40)
Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)TowardsintelligentgeospaDaldiscovery:amachinelearningrankingframework.InternaDonalJournalofDigitalEarth(minorrevision)
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
DatarecommendaDon
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
HowtorecommendgeospaDaldata?• UsegeospaDalmetadataforcontent-basedrecommendaDon
-MetadataspaDotemporalsimilarity-Metadataa^ributesimilarity-Metadatacontentsimilarity
• LeverageuserbehaviorsdataforCFrecommendaDon
-Session-basedco-occurrenceofdata
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
AFributetype AFributename AFributedescripHon
SpaHotemporalaFributes DatasetCoverage-EastLon TheEastlongitudeoftheboundingrectangle
DatasetCoverage-WestLon TheWestlongitudeoftheboundingrectangle
DatasetCoverage-NorthLat TheNorthlaDtudeoftheboundingrectangle
DatasetCoverage-SouthLat TheSouthlaDtudeoftheboundingrectangle
DatasetCoverage-StartTimeLong ThestartDmeofthedata
DatasetCoverage-StopTimeLong TheendDmeofthedata
Categoricalgeographic
aFributes
DatasetRegion-Region Regionofdataset.Suchasglobal,AtlanDc
Dataset-ProjecDonType Projecttypelikecylindricallat-lon
Dataset-ProcessingLevel Dataprocessinglevel
DatasetPolicy-DataFormat Dataformate.g.HDF,NetCDF
DatasetSource-Sensor-ShortName Shortnameofsensor
OrdinalgeographicaFributes Dataset-TemporalResoluDon TemporalresoluDonofdataset
Dataset-TemporalRepeat TemporalresoluDonofdataset
Dataset-SpaDalResoluDon SpaDalresoluDonofdataset
Descriptive attributes Dataset-description Describe the content of the dataset
Geographicmetadata
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
• SpaDalvariables:NorthLat,SouthLat,WestLon,EastLon• Temporalvariables:DatasetCoverage-StartTimeLong,StopTimeLong
• UsevolumeoverlapraDotocalculatesimilarity
𝑠𝑝𝑎𝑡𝑖𝑜𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙↓𝑠𝑖𝑚(𝑟↓𝑖 , 𝑟↓𝑗 ) =(𝑣𝑜𝑙𝑢𝑚𝑒(𝑟↓𝑖 ∩ 𝑟↓𝑗 )/𝑣𝑜𝑙𝑢𝑚𝑒(𝑟↓𝑖 ) + 𝑣𝑜𝑙𝑢𝑚𝑒(𝑟↓𝑖 ∩ 𝑟↓𝑗 )/𝑣𝑜𝑙𝑢𝑚𝑒(𝑟↓𝑗 ) )∗0.5𝑣𝑜𝑙𝑢𝑚𝑒(𝑟)=|𝑒𝑎𝑠𝑡𝑙𝑜𝑛−𝑤𝑒𝑠𝑡𝑙𝑜𝑛|∗|𝑠𝑜𝑢𝑡ℎ𝑙𝑎𝑡−𝑛𝑜𝑟𝑡ℎ𝑙𝑎𝑡|∗|𝑒𝑛𝑑𝑡𝑖𝑚𝑒−𝑠𝑡𝑎𝑟𝑡𝑖𝑚𝑒|
SpaDotemporalsimilarity
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
• Fixednumberofvalues• Nointrinsicordering• sensor-name:"AMSR-E","MODIS","AVHRR-3"and
"WindSat"
𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙_𝑣𝑎𝑟_𝑠𝑖𝑚(𝑣↓𝑖 , 𝑣↓𝑗 )= 𝑣↓𝑖 ∩ 𝑣↓𝑗 /𝑣↓𝑖 ∪ 𝑣↓𝑗
Categoricalsimilarity
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Ordinala^ributeissimilartocategoricala^ributebutitsvalueshasaclearorder,e.g.spaDalresoluDon• Convertedintorankfrom1toR• NominalizetheseranksforsimilaritycalculaDon
𝑜𝑟𝑑𝑖𝑛𝑎𝑙↓𝑣𝑎𝑟↓𝑠𝑖𝑚(𝑣↓𝑖 , 𝑣↓𝑗 ) =1− |𝑛𝑜𝑟𝑚↓𝑟𝑎𝑛𝑘(𝑣↓𝑖 ) −𝑛𝑜𝑟𝑚↓𝑟𝑎𝑛𝑘(𝑣↓𝑗 ) |
𝑛𝑜𝑟𝑚_𝑟𝑎𝑛𝑘(𝑣↓𝑖 )= 𝑅𝑎𝑛𝑘𝑣↓𝑖 +1/𝑅+1
Ordinalsimilarity
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
OriginaltextAquariusLevel3seasurfacesalinity(SSS)standardmappedimagedatacontainsgridded1degreespaDalresoluDonSSSaveragedoverdaily,7day,monthly,andseasonalDmescales.ThisparDculardatasetistheseasonalclimatology,Ascendingseasurfacesalinityproductforversion4.0oftheAquariusdataset,whichistheofficialendofprimemissionpublicdatareleasefromtheAQUARIUS/SAC-Dmission.OnlyretrievedvaluesforAscendingpasseshavebeenusedtocreatethisproduct.TheAquariusinstrumentisonboardtheAQUARIUS/SAC-Dsatellite,acollaboraDveeffortbetweenNASAandtheArgenDnianSpaceAgencyComisionNacionaldeAcDvidadesEspaciales(CONAE).Theinstrumentconsistsofthreeradiometersinpushbroomalignmentatincidenceanglesof29,38,and46degreesincidenceanglesrelaDvetotheshadowsideoftheorbit.Footprintsforthebeamsare:76km(along-track)x94km(cross-track),84kmx120kmand96kmx156km,yieldingatotalcross-trackswathof370km.Theradiometersmeasurebrightnesstemperatureat1.413GHzintheirrespecDvehorizontalandverDcalpolarizaDons(THandTV).Asca^erometeroperaDngat1.26GHzmeasuresoceanbacksca^erineachfootprintthatisusedforsurfaceroughnesscorrecDonsintheesDmaDonofsalinity.Thesca^erometerhasanapproximate390kmswath.
ExtractedtermsRadiometersMeasureBrightnessTemperature,AQUARIUS/SACMission,ImageData,BroomAlignment,ResoluDonSSS,AQUARIUS/SACSatellite,Sca^erometer,Sca^erometer,AquariusData,ArgenDnianSpaceAgencyComisionNacional,IncidenceAngles,TimeScales,AcDvidadesEspaciales,Cross-trackSwath,OfficialEnd,AquariusInstrument,ShadowSide,AscendingSeaSurfaceSalinityProduct,Level,Level,SurfaceRoughnessCorrecDons,DataRelease,Salinity,Density,Salinity,Density,AQUARIUS,L3,SSS,SMIA,SEASONAL-CLIMATOLOGY,V4
Step1:PhraseextracDon1. ExtracttermcandidatesfrommetadatadescripDonwithPOS(partofspeech)Tagging2. Introduce“occurrence”and“strength”tofilterouttermsfromcandidates.“occurrence”:occurrencesnumberofterms“strength”:thenumberofwordsinaterm
DescripDvesimilarity
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Step2:Representmetadatainthephrasevectorspace(Thedimensionlowerthanwordfeaturespace)
Term1 Term2 Term3 Term4 Term5 Term6 Term7 … TermN
dataser1 1 0 1 0 0 0 0 1
dataser2
0 0 0 1 1 0 0 0
…
datasetk 1 0 1 0 0 0 0 1
Step3:Calculatecosinesimilarity
MetadataabstractsemanDcsimilarity
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Session1 Session2 … SessionN
Data1 1 0 1Data2 0 0 0…
Datak 1 0 1
Calculatemetadatasimilaritybasedonsessionlevelco-occurrence
Similarity (i,j) = 𝑁(𝑖ᴖ𝑗)/√𝑁(𝑖) ∗√𝑁(𝑗)
N(i):ThenumberofsessionsinwhichdatasetiwasviewedordownloadN(j):ThenumberofsessionsinwhichdatasetjwasviewedordownloadN(𝑖ᴖ𝑗):Thenumberofsessionsinwhichbothdatasetiandjwereviewedordownload
SessionbasedrecommendaDon
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
RecommendaHonmethod
Pros Cons Strategy
DescripDvesimilarity
1.NaturallanguageprocessingmethodscanbeadoptedtofindlatentsemanDcrelaDonship
1. Manydatasetshasnearlysameabstractwithfewwords/valueschanged2. Itishardtoextractdetaileda^ributesfromdescripDon
UsedasthebasicmethodofrecommendaDonalgorithm
A^ributesimilarity(spaDotemporal,ordinal,categorical)
1.Asstructureddata,geographicmetadatahavemanyvariables.
1.Variablevaluesmaybenullorwrong2.Thequalitydependsontheweightassignedtoeveryvariable
AssupplementtosemanDcsimilarity
Sessionconcurrence
1.Reflectusers’preference
1.Coldstartproblem:Newlypublisheddatadon’thaveusagedata
Fine-tunerecommendaDonlist
Recommend (i) =𝑊𝑠𝑠∗𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑣𝑒𝑐↓𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑖) + ∑𝑊𝑐𝑣∗𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑎𝑙↓𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑖) +∑𝑊𝑜𝑣∗𝑂𝑟𝑑𝑖𝑎𝑛𝑙↓𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑖) +𝑊𝑠𝑡𝑣 ∗𝑆𝑝𝑎𝑡𝑖𝑜𝑇𝑒𝑚𝑝𝑜𝑟𝑎𝑙𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦+ 𝑊𝑠𝑜∗𝑆𝑒𝑠𝑠𝑖𝑜𝑛𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
HybridrecommendaDon
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10
Precision
PosiDon
Hybridsimilarity Word-basedsimilarity A^ributesimilarity Session-basedsimilarity
QuanDtaDveEvaluaDon HybridsimilarityoutperformothersimilariDessinceitintegratesmetadataa^ributesanduserpreference.
Y.Li,Jiang,Y.,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)AGeospaDalDataRecommenderSystembasedonMetadataandUserBehaviour(revision)
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Conclusion
• LogminingenablesadataportalintegraDngimplicituserpreferences• Wordsimilarityretrievedbydataminingtasksexpandsanygivenquery
toimprovesearchrecallandprecision.• TherichsetofrankingfeaturesandtheMLalgorithmprovide
substanDaladvantagesoverusingotherrankingmethods• TherecommendaDonalgorithmcandiscoverlatentdatarelevancy• Theproposedarchitectureenablesthelooselycoupledsooware
structureofadataportalandavoidsthecostofreplacingtheexisDngsystem
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
• PublicaDons• Jiang,Y.,Y.Li,C.Yang,E.M.Armstrong,T.Huang&D.Moroni(2016)ReconstrucDngSessionsfromDataDiscoveryand
AccessLogstoBuildaSemanDcKnowledgeBaseforImprovingDataDiscovery.ISPRSInternaDonalJournalofGeo-InformaDon,5,54.
• Y.Li,Jiang,Y.,C.Yang,K.Liu,E.M.Armstrong,T.Huang&D.Moroni(2016)LeveragecloudcompuDngtoimprovedataaccesslogmining.IEEEOceans2016.
• Yang,C.,etal.,2017.BigDataandcloudcompuDng:innovaDonopportuniDesandchallenges.Interna>onalJournalofDigitalEarth,10(1),pp.13-53.(the2ndmostreadpaperofIJDEinit’sdecadalhistory)
• Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang&D.Moroni(2017)AComprehensiveApproachtoDeterminingtheLinkageWeightsamongGeospaDalVocabularies-AnExamplewithOceanographicDataDiscovery.InternaDonalJournalofGeographicalInformaDonScience(minorrevision)
• Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)TowardsintelligentgeospaDaldiscovery:amachinelearningrankingframework.InternaDonalJournalofDigitalEarth(minorrevision)
• Y.Li,Jiang,Y.,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)AGeospaDalDataRecommenderSystembasedonMetadataandUserBehaviour(revision)
• Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)Asmartweb-baseddatadiscoverysystemforoceansciences.(ongoing)
• Sourcecode:h^ps://github.com/mudrod/mudrod• PO.DAACLabs:h^p://mudrod.jpl.nasa.gov/• PDLeverage:h^p://pd.cloud.gmu.edu/
Products
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Component CurrentTRL ProjectendTRL DescripHonSemanHcsearchengine
SearchDispatcher 7 7 TranslaDngausersearchqueryintoasetofnewsemanDcqueries
Similaritycalculator 7 7 CalculaDngthesemanDcsimilarityfromweblogs,metadata,andontology
RecommendaDonmodule 7 7 Recommendingsimilardatasetstotheclickeddataset
Rankingmodule 7 7 Re-rankingthesearchresultsbasedonRankSVMMLalgorithm
Knowledgebase
Ontology 7 7 ExtensionsfromSWEETontologyforearthsciencedata
TripleStore 7 7 ESIPontologyrepositoryVocabularylinkagediscoveryengine
Profileanalyzer 7 7 ExtracDnguserbrowsingpa^ernfromrawweblogs
Webservices/GUI
Rankingservice/presenter 7 7 ProvidingandpresenDngtherankedresults
RecommendaDonservice/presenter 7 7 ProvidingandpresenDngtherelateddatasets
OntologynavigaDonservice/presenter 7 7 ProvidingandpresenDngrelatedsearches
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
Nextsteps• Addmorefeatures(e.g.,temporalsimilarity)• CreatetrainingdatafromweblogsforRankSVM• Developaqueryunderstandingmoduletobe^erinterpretuser’ssearch
intent(e.g.“oceanwindlevel3”->“oceanwind”AND“level3”)• SupportSolr• Supportnearreal-DmedataingesDontodynamicallyupdateknowledge
base• IntegraDonwithDOMSandOceanXtremesforanoceanscienceanalyDcs
center• LeverageadvancedcompuDngtechniquestospeeduptheprocess
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
• YangC.,JiangY.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2015.“UDlizingAdvancedITTechnologiestoSupportMUDRODtoAdvanceDataDiscoveryandAccess”,AGU,SanFrancisco,CA.
• YangC.,JiangY.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2016.“MiningandUDlizingDatasetRelevancyfromOceanographicDataset(MUDROD)Metadata,UsageMetrics,andUserFeedbacktoImproveDataDiscoveryandAccess”,ESIPwintermeeDng2016,WashingtonD.C.
• JiangY.,YangC.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2016.“AComprehensiveApproachtoDeterminingtheLinkageWeightsamongGeospaDalVocabularies-AnExamplewithOceanographicDataDiscovery”,AAG2016,SanFrancisco,CA.
• YangC.,JiangY.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2016.“MiningandUDlizingDatasetRelevancyfromOceanographicDataset(MUDROD)Metadata,UsageMetrics,andUserFeedbacktoImproveDataDiscoveryandAccess”,PO.DAACUWG,Pasadena,CA.
• LY.,YangC.,JiangY.,ArmstrongE.,HuangT.,andMoroniD.,2016.“LeveragingcloudcompuDngtospeedupuseraccesslogmining”,Oceans16MTSIEEE,Monterey,CA.
• JiangY.,YangC.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2017.“TowardsintelligentgeospaDaldiscovery:amachinelearningrankingframework”,AAG2017,Boston,MA.
• LY.,YangC.,JiangY.,ArmstrongE.,HuangT.,andMoroniD.,2017.“Ageographicrecommendersystemusingmetadataanduserfeedbacks”,AAG2017,Boston,MA.
PresentaDons
EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA
1. NASAAISTProgram(NNX15AM85G)2. PO.DAACSWEETOntologyTeam(IniDallyfundedbyESTO)3. HydrologyDAACRahulRamachandran(providingtheearlier
versionofNOESIS)4. ESDISforprovidingtesDnglogsofCMR5. AllteammembersatJPLandGMU
Acknowledgements