Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
VADA:ValueAddedDataSystems–PrinciplesandArchitecture
TheVADAArchitectureforCost-Effec>veDataWrangling
1. Automa*c Bootstrapping: Theuseriden>fiesacollec>onofsourcesand defines a target schema. Thesystem automa>cally orchestrates acollec>on of transducers thattogether generate an ini>al resultdataset.2.Datacontext:Theuserassociatesthe target schema with relatedextents(e.g.masterdata).Suchdataallows various of the steps frombootstrapping to be revisited,including matching and mappingvalida>on. Furthermore, it is nowalso possible to learn quality rules,and thereby to carry out repairs tothemappingresults.TheresultdatashouldnowbeofbeLerquality.
Demonstra>on
The VADA architecture: (i) combines wrangling components thatbetween themcover the completedatawrangling lifecycle; (ii)buildson automa>on wherever possible, using whatever informa>on isavailable;(iii)refinestheresultsofautomatedprocessesinthelightofuserfeedback;and(iv)takesintoaccounttheuser’spriori>es.UserInterface:Thedatascien>stprovidesinforma>onaboutthedatatheyneed,theirpriori>esandfeedback.Transducers:ComponentswithinputandoutputdependenciesdefinedasDatalog±rulesovertheknowledgebase.UserContext:Informa>onabouttherequirementsoftheuser.DataContext:Informa>onabouttheapplica>ondomain.VadalogReasoner:SupportsreasoningovertheknowledgebaseusingVadalog,amemberoftheDatalog±familyoflanguages.
End-To-EndDataWrangling
3.Feedback:Theuserprovidesfeedbacktoindicatethatsomeoftheresultsarecorrectorincorrect.Dependingonthefeedbackprovided,thiswillenablesomeofthepreviousstepsinthewranglingprocesstoberevisited,givingrisetoarevisedresult.4. User context: The result is now hopefully of reasonable quality, but thedata included in the resultmay not be especiallywell-suited to the task athand.Theuserspecifiestheusercontextbyindica>ngtherela>ve(pairwise)importanceofdifferentfeatures intheresult.Thepairwisecomparisonsareusedtoderiveweightsthatinformtheselec>onofmappingsbasedonmul>-dimensionalop>miza>on.
ARealEstateScenarioThe demonstra>on acts on realestatedata,bringingtogether:• Dataaboutproper>esforsale
fromwebdataextrac>onoverdeepwebsources.
• Open government data thatprovides informa>on aboutthe areas in which theproper>esarelocated.
1
43
2
www.vada.org.uk