17
http://latc-project.eu D1.6.1 Interface Definitions for 24/7 Platform Project GA No. FP7-256975 Project acronym LATC Start date of project 2010-09-01 Document due date 2011-02-28 Actual date of delivery 2011-02-28 Lead Partner DERI Reply to Michael Hausenblas, [email protected] Document status FINAL

latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

http://latc-project.eu

D1.6.1 Interface Definitions for 24/7 Platform

Project GA No. FP7-256975 Project acronym LATC Start date of project 2010-09-01 Document due date 2011-02-28 Actual date of delivery 2011-02-28 Lead Partner DERI Reply to Michael Hausenblas, [email protected] Document status FINAL

Page 2: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

2

Project GA No. FP7-256975 Project acronym LATC Project full title Linking Open Data Around The Clock Dissemination level PU Number of pages 17 Task responsible DERI Other contributors All partners Author(s) Michael Hausenblas, Richard Cyganiak EC Project Officer Stefano Bertolo Keywords platform, design, interfaces, API, linksets

Page 3: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

3

TableofContents

EXECUTIVESUMMARY 4

24/7PLATFORMGOALSANDDESIGN 4

1.1 WhyLinking? 4

1.2 TargetUserCommunity 4

1.3 Datasetsnowandthen 5

1.4 SoftwareasaServicevs.SoftwareDistribution 6

1.5 DesignAssumptionsandGoals 6

24/7PLATFORMUSERROLES 7

1.6 End­userRoles 7

1.7 AdministratorRoles 7

INTERFACEDEFINITIONS 8

1.8 24/7PlatformComponents 81.8.1 Workbench 81.8.2 MetadataStore(MDS) 91.8.3 DataSourceInventory(DSI) 101.8.4 ConsoleAPI 101.8.5 Console 111.8.6 Crawler&Indexer 121.8.7 Runtime 12

1.9 Interfaces 121.9.1 I1:Workbench–ConsoleAPI 131.9.2 I2:Workbench–MDS 131.9.3 I3:Workbench–Crawler&Indexer 131.9.4 I4:ConsoleAPI–MDS 131.9.5 I5:ConsoleAPI–Runtime 131.9.6 I6:MDS–Runtime 141.9.7 I7:MDS–Crawler&Indexer 14

1.10 ComponentsInterplay 141.10.1 LinkGeneration 141.10.2 QualityAssurance 15

1.11 DependenciesandExternalInterfaces 15

APPENDIXA 16

Page 4: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

4

ExecutiveSummaryThis deliverable motivates the LATC 24/7 Platform design and design goals. Itdefines the Platform scope as well as the target users. The 24/7 Platformcomponents are introduced, the interfaces between the components are definedand the workflow to generate links using the 24/7 Platform is described. Thisdeliverable establishes an initial understanding of the components and theinterfaces and is refined over the project’s runtime in various deliverables(highlightedintherespectiveplaces).

24/7PlatformGoalsandDesignTheLATC24/7InterlinkingPlatform(24/7Platform,inshort)produceslinksets1basedontwomajorinputs:i)linkspecifications,andii)datasets.Inthefollowingwedescribethewhy,what,andhowofthe24/7Platform.

1.1 WhyLinking?Linked Data enables lightweight and straightforward data integration scenarios.This is mainly achieved through providing explicit, typed connections betweenentities in different datasets. An application using different datasets can pullrelated entities directly (based on the links) from different datasets to solve acertaintask.Incontrary,withdatasetsthatarenotinterlinked,onequiteoftenhastouseoutofbandinformationtointegrateentities.Inasense,theLinkedDataecosystem–includingtheLODcloud,indexer,etc.–isanexampleofaDataSpaceSupportPlatform(DSSP2).OneofthedistinctfeaturesofLinkedData is its inherentsupport fordatadiscovery.Throughfollowingthelinks in theLODcloud,one isable toexplorenew,relateddata items thatcan inturnbeintegrated,ifdesired.Thetypeofthelinkcanbeusedtodetermineifandhowtointegratethe“targetentity”.

1.2 TargetUserCommunityExperiencetellsthatdesigningasystemwithouthavingaconcreteuser(group)inmind is sort of counterproductive. We have hence identified the primary usergroupofthe24/7Platform:

“ThecommunityofpeoplewithaninterestinorthatdealwithEU­leveldata”Thisgroup,referredtoasEUdatausersinthefollowing,includes: Originaldataowners: ingeneralallEuropean institutionsandagenciesthathave data (in all forms, including PDFs, spreadsheets, etc.) such as Eurostat,EEA,etc.

Applicationdevelopers:peoplethatdevelop(Web)applicationswhowanttobenefitfromtheavailableLinkedOpenData.

1http://www.w3.org/2001/sw/interest/void/#linkset2http://portal.acm.org/citation.cfm?id=1107502

Page 5: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

5

Data analysts: people who want to use the Linked Open Data to discoverrelations between entities or perform experiments with it, for examplejournalistsorresearchers.

Linked Data enthusiasts: early adopters, Semantic Web researchers andpracticioners, software developers and data engineers contributing to theinfrastructure, aswell as carry out schema‐level anddata‐level related tasks(quality,verification,exploration,etc.).

Additionally to the above, we note that EU country‐level initiatives concerningLinked Open Data publishing data (for example, as reported in “Technicalworkshoponthegoalsandrequirementsforapan‐Europeandataportal”3)arein‐scopeforthe24/7Platform,especiallyinEUmembercountriesthatalreadyhavean active community, such as the UK. The latter is of importance, as very likelyearly adopters and tester will be recruited from the pool of people that havealreadyexperiencewithLinkedOpenDataandcanprovidesuggestionsregardingnewfeaturesandoptimisation.Furthermore, we expect interlinking to happen not only between EU‐leveldatasets, but also from and to the country‐level datasets. For example, in thestatisticaldomain,both theEurostat aswell asnational statisticsbodiesproviderespectivedatasets,makingthemalogicaltargetofmutualinterlinking.Naturally, the usage of the 24/7 Platform is not limited to the EU data users,howeverprimarilydesignedtobeusedbythem.

1.3 DatasetsnowandthenIntherealmoftheaboveidentifiedprimary24/7Platformusergroup,theEUdatausers,weunderstandadevelopmentconcerningthedatasetsasdepictedinFig1.

Figure1–EU­leveldatasetdevelopment.

3 http://cordis.europa.eu/fp7/ict/content‐knowledge/docs/report‐ws‐pan‐eu‐dat‐porta_en.pdf

Page 6: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

6

FromLATCpoint‐of‐view,thismeans: Beforeprojectstart(endof2010)afew,mainlyexperimentalLODdatasets,suchasEurostatareavailable.

Inlate2012,withprojectend,morethan20newLODdatasetsinEU‐leveldataareahavebeenmadeavailablethroughLATCWP2aswellassomemoredirectlyby the original data owners (with LATC support, for example PUBLINKprogrammetogetherwithLOD2project).

In2020, themajorityofEUdataownerspublishtheirdata intotheLODcloud,nevertheless,theLATC‐provideddatasetsstillactasabackbone.

1.4 SoftwareasaServicevs.SoftwareDistributionThe24/7PlatformcanbeunderstoodasSaaSorasasoftwaredistribution,thatis,individual components (as discussed below) with well‐defined interfaces tointeract.Ingeneral,CloudComputingistypicallydividedintothreelayers(Fig.2).

Figure2–CloudComputingcategorisation.

The 24/7 Platform is in this sense a SaaS, providing users the functionality toproducelinksbetweenLODdatasets,withalltheadvantagesthatcomealongwithit,includingscalability,reliabilityandconvenience.Additionally to thegeneric SaaSattributes, the24/7provides (in contrast to thesoftware distribution) some specific advantages that stem from the integratedSindiceCrawler&IndexerandthebespokewaydatasetsareusedinthePlatform.

1.5 DesignAssumptionsandGoals

The primary goal of the 24/7 Platform is to support the above defined usercommunity, theEUdatausers.Anumberofsecondarygoalsexist,which includebutarenotlimitedto: ProvideademonstrationandverificationoftheSoftwareDistribution. EnlargetheLinkedOpenDatacloud.

!"

#$$%"

&$$%"

%$$%"

%'()*+,"-$)./+)"001#"%(223."1+*"4556"

Cloud Computing as Gartner Sees It

Page 7: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

7

Crafthigh‐quality,globallinksetsforgeneraluse. Establish incentives for others to publish their data, raise awareness, andsharpenthelinkqualityconcept.

Thekeydesigngoalsforthe24/7Platform,inordertoallowtheEUdatauserstocreateanduselinksets,are:

1. Makethe24/7Platformeasytouse,henceposingalowentrybarrier.2. Providefastandtightfeedbackloopforlinkgeneration.3. Keepthecouplingbetweencomponentstoaminimum.4. Ensurequalitythroughseparationinpersonalandpublicworkspaces

24/7PlatformUserRolesWehave identified twokindsof roles in the24/7Platform:LATCend‐usersandLATC administrators. The former are typically representatives from the EU datausersgroup,thelatterfromwithintheLATCproject.

1.6 End‐userRolesWefurtherdifferentiatetheend‐userroleinto: LinkConsumer,which isacasualend‐users that ismainly interested inusingthelinksproducedinthe24/7Platform,and

LinkAuthor,whichisatypeofpoweruserthatbothproducelinksandalsoisinterestedinusingownlinksaswellaslinksproducedbyothers.

1.7 AdministratorRolesTheLATCadminroleissub‐dividedinto:

Operator, focusingontheoverallmonitoring,maintenanceandfunctionofthe24/7 Platform, incl. minimising down‐times, (re)starting and upgradingcomponentsandmanageusers.

Linkset Reviewer, performing the Quality Assessment in form of reviewinggeneratedlinksetsandvetaccordingly.

Page 8: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

8

InterfaceDefinitionsIn the following, the 24/7 Platform components, their interaction and overallinterplayisexplained,alongwiththeabovedefineduserroles.

1.8 24/7PlatformComponentsWe briefly summarise the function of each component, here. Where applicable,screen‐shotsof thecurrentstateofUIcomponentsareprovided.Anoverviewofthe24/7PlatformisdepictedinFig.3,showingallcomponentsconceptually,thesystemboundariesaswellastheexternalinteractions.

Figure3–24/7Platformoverview.

1.8.1 Workbench

TheLATCWorkbenchallowscreatinglinkspecificationsandistypicallyusedbyaLinkAuthor. It is a specialised version of the SilkWorkbench, operated by FUB.The Workbench provides both a UI component and a backend component tohandleReferenceLinksets.ALinkAuthorconstructsoneoremorelinktasksintheWorkbenchandtypicallyusesReferenceLinksetstoassessthequalityofthelinksproduced:toenablethis,theWorkbenchoperatesalocalversionofSilk,allowingtheLinkAuthortopreviewageneratedLinkset.ThecurrentstateoftheWorkbenchisshowninFig.4andFig5.

Page 9: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

9

Figure4–TheLATCWorkbench:workspace.

Figure5–TheLATCWorkbench:editor.

1.8.2 MetadataStore(MDS)

TheLATCMetadataStore(MDS)isthecentralhubforalldataset(DS)andLinkset(LS)metadatainthe24/7Platform.Itisabackendcomponent,operatedbyTALIS,anddealswith: Listofcurateddatasets(C‐DS)fromCKAN Listofhost‐baseddatasets(H‐DS)fromSindice Sindice‐coveragestatisticsfordatasets Metadata for generated Linksets including precision, recall and pointer to theReferenceLinkset.

Internally,theMDSusesVoID4torepresentDS/LSmetadataandtotaketheC‐DSviaCKANintoaccount.ItisassumedthatC‐DSaremaintainedentirelyviaCKAN.

4http://www.w3.org/2001/sw/interest/void/

Page 10: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

10

The MDS provides a feed of vetted linksets to other components and also toexternalusers.Additionally,theMDSactsasthebackendfortheDataSourceInventory(DSI).TheDSI and the MDS communicate via an internal protocol, not in scope of thisdocument.

1.8.3 DataSourceInventory(DSI)

The LATCData Source Inventory (DSI) is a UI component operated by TALIS. Itsupportsthefollowingusecases:

AllowsLinkAuthorstofinddatasetstolinkagainst. Enables a Link Author to study example resources in order to decide how towritealinkspecificationorwhetheralinkspecificationisfeasible.

HelpsaLinkConsumertofindinterestingLATC‐generatedLinksets. NotifiesaLinkConsumeraboutre‐generatedLinksetsviafeeds. Enablesanyusertoexploreallavailabledatasets.AnearlyversionoftheDSIisshowninFig.6.

Figure6–TheLATCDataSourceInventory(DSI).

TheM12deliverableD1.2.1FirstDeployment ofData Source Inventorywill detailouttheMDSandDSIasintroducedhere.

1.8.4 ConsoleAPI

TheLATCConsole controls the executionof link tasks towards theRuntime andacts as an intermediate towards the Workbench. It is a backend component,operatedbyVUA,anddealswith: Alistoflinktaskstobeexecuted ThestatusofthelinkrunsAdditionally, the Console API acts as the backend for Console; these bothcomponentscommunicateviaaninternalprotocol,notinscopeofthisdocument.

Page 11: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

11

TheConsoleAPIexposesanHTTPAPI5thatsupportsthefollowingoperations:Link Tasks GET api/tasks Returns the ordered list of configuration files to run.

The answer is an JSON array with entries indicating the UUID (“identifier”) and full name (“title”) of the configuration file.

GET api/task/{UUID}/configuration Returns the XML configuration file associated with this UUID.

PUT api/task/{UUID}/configuration Update the item with a new configuration file whose content is passed in a form variable “configuration”.

DELETE api/task/{UUID} Delete a configuration file. When executed, the file is removed from the database and from the running queue. All the associated reports are also deleted.

GET api/task/{UUID}

Returns extra information about the configuration with the indicated UUID. This currently consists of the title (“title”), a long description (“description”), the identifier (“identifier”) and the position in the processing queue (“position”).

POST api/tasks Propose an XML file for addition. The file is passed as a multi-part form element with the name “fileToUpload”. Upon insertion, an upload report is automatically generated and the configuration is added to the end of the queue.

Link Runs GET api/task/{UUID}/notifications Returns a JSON array of reports for the

configuration under UUID.

POST api/task/{UUID}/notifications Create a new report. The API expects a form with the parameters “message” and “severity”. An optional JSON array can be stored in the variable “data”. The date is automatically set to the date+time of the report uploaded.

1.8.5 Console

TheLATCConsoleAPI is themainaccesspoint foranOperator,providingstatusinformation about the 24/7 Platform, including health, link runs, errors, qualitymeasures, etc. and controloptions for link tasks.TheConsole is aUI componentoperatedbyVUA.ThecurrentversionoftheConsoleisshowninFig.7.

5NotethattheURIsareformattedaccordingtotheIETFdrafton‘URITemplates’,seehttp://tools.ietf.org/id/draft‐gregorio‐uritemplate

Page 12: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

12

Figure7–TheLATCConsole.

1.8.6 Crawler&Indexer

The Sindice Crawler & Indexer is a backend component operated by DERI. Itprovidesaccess tohost‐baseddatasets (H‐DS)and isdescribedseparately in theM6deliverableD1.1DeploymentofCrawlerandIndexerModule.

1.8.7 Runtime

TheLATCRuntimeisabackendcomponentoperatedbyDERI.TheRuntimeusesaSilk MapReduce version and Hadoop. It takes a list of link tasks and producesLinksets along with metadata (in VoID) as well as log information, collectivelyknownasthelinkrun.TheM12deliverableD1.2.1FirstDeploymentofLinkingEnginewilldetailout theRuntimeasintroducedhere.

1.9 InterfacesIn order to function, a number of components in the 24/7 Platform need tocommunicatewitheachotherviaadefinedinterface.Aninterfaceinthiscontextisa defined communication exchange between two components. The initialinterfacesasoftimeofwritingarecapturedinTable1.X … not applicable – … not defined Ik … defined interface k

Workbench DSI Console Console API

MDS Runtime Crawler & Indexer

Workbench X – – I1 I2 – I3 DSI X X – – – – – Console X X X – – – – Console API X X X X I4 I5 – MDS X X X X X I6 I7 Runtime X X X X X X – Crawler & Indexer X X X X X X X

Table1–ComponentCouplingMatrix.

Page 13: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

13

1.9.1 I1:Workbench–ConsoleAPI

TheWorkbench submits a list of link tasks through the Console API and learnsaboutthelinkrunsviaanAtomfeed.

1.9.2 I2:Workbench–MDS

TheWorkbench requiresmetadata of the datasets to be linked, including name,accessmethods(SPARQLendpoint,etc.).ThisdatasetmetadataisprovidedbytheMDS.TheWorkbenchusesSPARQtoquerytheMDS,whichprovidesthemetadataexpressedinVoiD.

1.9.3 I3:Workbench–Crawler&Indexer

ForthepreviewoftheLinksets,theWorkbenchneedsaccesstothecontentofH‐DS,providedthroughtheSindiceSPARQLendpoint.

1.9.4 I4:ConsoleAPI–MDS

The Console API needs the following information from the MDS, provided viaSPARQL: Accessinformation,e.g.,SPARQLendpointlocation Linkrunstatistics,includingprecision/recall Datasetmodificationstatus

1.9.5 I5:ConsoleAPI–Runtime

The Runtime retrieves a list of link tasks via the Console API and sends statusinformationperlinkrun,againusingtheConsoleAPI.Upon execution, the LATC Runtime checks the Console API for link tasks thatrequire execution. If any link tasks require execution, the Runtime retrieves the

Console APIWorkbench

Link task

LRN feed

Metadata StoreWorkbench DS desc

Sindice Crawler &

Indexer Workbench DS

Metadata StoreConsole API status

LATC RuntimeConsole API

Link task list

status

Page 14: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

14

link tasks from the Console API, and launches appropriate SilkMapReduce jobs.TheRuntimetakescareofparallelizingjobswherepossible.Uponjobcompletion,the Runtime posts a response back to the Console API, and triggers any furtherrequiredactions(suchasloadingtheresultsintotheLinkSetAPI,etc.).

1.9.6 I6:MDS–Runtime

The Runtimemight need dataset access information (such as SPARQL endpointlocations for C‐DS) and needs to know if datasetsweremodified from theMDS.This information isretrievedviaSPARQL.Further, theRuntime informstheMDSaboutnewlygeneratedLinksetsviaVoID/SPARQLUpdate.

1.9.7 I7:MDS–Crawler&Indexer

TheMDSretrievesalistofH‐DSandcoverageofH‐DSviatheSindiceAPI.

1.10 ComponentsInterplayIn the following, the overall workflow and components interplay to generate aLinkset is explained. A core principle is that the dataset URIs of the involveddatasetsinthelinktasksarepassedaroundinthe24/7Platform.Thisensuresthatall components have a shared understanding of the datasets and can pull therelevantinformationfortheirtasksfromtheMDS,ifneeded.

1.10.1 LinkGeneration

A Link Author (LA) either creates a datasets or decides to interlink existingdatasetsfromtheLODcloud.TheLAeithermanuallyentersthedatasetintoCKAN(turningit intoaC‐DS)or let it indexviaSindice/Sitemaps6(H‐DS).Then,theLAusesthe24/7Platformtocreatelinks:

1. TheLAselectsthedatasetstobelinkedfromtheDSI.2. TheLAcreatesaLinktaskinthepersonalworkspace.3. The LA either creates Reference Linksets in the Workbench or uploads

existingReferenceLinksetsintotheWorkbench.4. TheLApreviewsLinksetsandqualityassessmentbasedon theReference

LinksetintheWorkbench.5. TheWorkbenchtransfersthelinktasktotheConsole.6. TheConsoleinstructstheRuntimeofpendinglinktasksandreceivesstatus

ofperformedlinkruns.7. TheConsolenotifiestheWorkbenchaboutlinkruns.

6http://sindice.com/developers/publishing

Metadata Store LATC Runtime

LS updates

DS metadata

Metadata StoreSindice Crawler &

Indexer

H-DS list

coverage

Page 15: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

15

1.10.2 QualityAssurance

The LATC Description ofWorkmentions a Quality Assurance (QA)Module. Theexperience gatheredwith the experimental 24/7 Platform set up in the first sixmonth has shown that it is more realistic that several components togetherperformQA;thisiswhatwecalltheinternalQA.Additionally, what is now known as the external QA, a number of approaches(basedondiscoveringdeadlinks,examiningtheBillionTripleChallengedatasets,game‐based‐drivendiscovery, etc.)will beused toenhance the internalQA.ThiswillbedetailedoutintheM12deliverableD1.4.1FirstDeploymentofQAModule.Typically, after having produced a Linkset in the personal workspace, a LinkAuthorwantstomaketheLinksetavailabletothewiderpublic.ThisiswheretheinternalQAkicksin: ifaLinkAuthordecidestosubmithisorherLinksetstothepublicworkspace,thefollowinghappens:

1. The Link Author submits a previously generated Linkset to the publicworkspace.

2. The Linkset Reviewer uses the Workbench to gather a list of pendingLinksetsforthepublicworkspace.

3. TheLinksetReviewerusestheMDStomarkaLinksetasvetted.OnlyvettedLinksetsareshownintheDSI.

1.11 DependenciesandExternalInterfacesTherearethreeexternalinterfacesprovidedand/orusedbythe24/7Platform:

TheCKANAPI,usedbytheMDS.DERIcollaborateswiththeOKFdirectlyandviaLOD2toensurethenecessaryinformationisprovidedinasustainableway.

TheLODcloud,whichiscrawledandindexedbySindiceasdescribedintheM6deliverableD1.1DeploymentofCrawlerandIndexerModule.

TheLinksetAPI,wherethelinksetsthemselvesareprovided,whichwillbepartofSindice(themetadatainformofVoIDdescriptionsismaintainedbytheMDS),alsotoensurethattheproducedlinksetsareindexed,inturn.

Page 16: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

16

AppendixAThe interfaces listedabovehavebeenderived fromanexperimentalsetupof the24/7 Platform. This setup has been performedwithin the first sixmonth of theLATCprojectandis,alongwithobservations,describedinthefollowing.TheexperimentalLATCRuntime(availableviatheLATCSourceForgerepository7)iswritten in Java,packagedasa JARfile. ItcommunicateswiththeConsole(alsoavailablevia theSFrepoandonline8) throughanHTTPAPI forgettingSILK linkspecificationfilesandpostinglinkrunresults.TheMapReducejobisrunningonaHadoopDistributedFilesystem(HDSF),producinglinksetsandlinksetmetadata9.TheLATCRuntimehasthefollowinginputs: Link specification: theSILK link specification inXML format,which isobtainedfromConsole.

Blacklist file: a list of link specifications that are should not be selected forexecution(time‐outsorfails).

Aconfigurationthatcanbeprovidedasfileorviathecommandline.

The output of the Runtime is threefold: a linkset, a respective description of thelinksetinVoIDandalinkrunstatusreport.Therearetwokindsofstatusreports: ThesuccessreportcontainshowmanylinksweregeneratedaswellasthebaseURIofthelinkset.

The failed report mentions the reason, why a linkset run could not generatelinks,suchasSPARQLEndPointdownortimeout,HDSFproblemorinvalidXML.

Atpresent,thespecificationfilesareprovidedbytheConsole,coveringare22files,where 12 files were executed successfully, 8 files failed due to the SPARQLEndpointand invalidspecification filesand four files took too longe togeneratethelinkset.Theaveragetimeforexecutingalinkrunis326.75seconds.ExampleRuntimeconfigurationfile:HADOOP_PATH = hadoop-0.20.2 HDFS_USER = xxx LATC_CONSOLE_HOST = http://fspc409.few.vu.nl/LATC-console/ LINKS_FILE_STORE = links.nt RESULTS_HOST = http://demo.sindice.net/latctemp RESULT_LOCAL_DIR = results SPEC_FILE = spec.xml VOID_FILE = void.ttl

ExampleRuntimeblacklistfile:climb_silk_link_spec db-geolinkd-boris dbpedia_drugbank_drugs dbpedia-lgd_city dbpedia-lgd_city2

7http://sourceforge.net/projects/latc/8http://fspc409.few.vu.nl/LATC‐console/9http://demo.sindice.net/latctemp

Page 17: latc-wp1-D161 Interface Definitions for 24-7 Platform...Workbench and typically uses Reference Linksets to assess the quality of the links produced: to enable this, the Workbench operates

FP7-256975 LOD Around The Clock (LATC)

17

ExampleVoIDfile,describingthegeneratedlinkset(forthedbpedia‐lgd_island.xmllinkspecification):@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix void: <http://rdfs.org/ns/void#> . @prefix : <#> . :dbpedia a void:Dataset; void:sparqlEndpoint <http://live.dbpedia.org/sparql/>; . :linkedgeodata a void:Dataset; void:sparqlEndpoint <http://linkedgeodata.org/sparql/>; . :dbpedia2linkedgeodata a void:Linkset ; void:linkPredicate owl:sameAs; void:target :dbpedia; void:target :linkedgeodata ; void:triples 9139;