Transcript
Page 1: Linked Data Queries as Jigsaw Puzzles: a Visual Interface ... · Linked Data Queries as Jigsaw Puzzles: a Visual Interface for SPARQL Based on Blockly Library Paolo Bottoni Sapienza,

Linked Data Queries as Jigsaw Puzzles:a Visual Interface for SPARQL Based on Blockly Library

Paolo BottoniSapienza, University of Rome

Roma, [email protected]

Miguel CerianiSapienza, University of Rome

Roma, [email protected]

ABSTRACTSPARQL is a powerful query language for Semantic Webdata sources but it is quite complex to master. The jigsawpuzzle methaphor has been succesfully used in Blockly toteach programming to kids. We discuss its applicability tothe problem of building SPARQL queries, through the pre-sentation of a dedicated Blockly-based visual user interface.

CCS Concepts•Information systems→World Wide Web; •Human-centered computing → Empirical studies in HCI;

Keywordsblock programming; visual query language; SPARQL;Linked Data; RDF

1. INTRODUCTIONOver the past few years the Web has been evolving from

an interlinked set of documents to an interlinked web of dataand services. This trend follows the long standing vision ofthe Semantic Web [3] and Linked Data [1]. The structureddata available online are increasing both in quantity anddiversity [4], thanks in part to the comprehensive data modelproposed by the World Wide Web Consortium (W3C) aspart of the Semantic Web effort: the Resource DescriptionFramework (RDF) [6].

A key advantage of the linked data model is its support forserendipitous exploration and reuse of existing data. Poten-tially, any expert of a specific domain can create a specificview of a set of Linked Data sources. In practice, exploringand querying Linked Data is not trivial and usually requiresadvanced programming skills.

Block programming languages, in which coding is doneby dragging and connecting fragments shaped like jigsawpuzzle pieces, have been succesfully used to introduce pro-gramming to non-experts. Recently, millions of users havebeen exposed to the basics of programming using Blockly [8]

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

CHItaly 2015, September 28 - 30, 2015, Rome, Italyc© 2015 Copyright held by the owner/author(s). Publication rights licensed to ACM.

ISBN 978-1-4503-3684-0/15/09. . . $15.00

DOI: http://dx.doi.org/10.1145/2808435.2808467

as part of code.org’s Hour of Code. The same metaphor wasused in Scratch [13] to create animations and games and inMIT App Inventor [16] to build Android Apps.

In this paper, we propose to use the block programmingparadigm to design queries on Linked Data sources. Thelanguage mimics the (textual) syntax of SPARQL [10] lan-guage, the standard query language for RDF. Compared tothe previous uses of block programming languages this pro-posal addresses novel challenges due to two main specificproperties: 1) the heterogeneous nature of Linked Data, thatrequires the ability to explore graph datasets even withoutany previous a priori knowledge; 2) the structural differ-ence between procedural imperative languages for which thisparadigm was previously used and a functional query lan-guage as SPARQL.

2. BACKGROUNDWe start by introducing the basics of the RDF data model,

the SPARQL language, and the Blockly library.The Resource Description Framework (RDF) [6] is the

data model proposed by W3C for the Semantic Web [3].In this model knowledge is represented via RDF statementsabout resources – where a resource can be anything in the“universe of discourse”, e.g. documents, people, physical ob-jects, abstract concepts. An RDF statement is representedby an RDF triple, composed of subject (a resource), predi-cate (specified by a resource as well), and object (a resourceor a literal, i.e. a value from a basic type). An RDF graph istherefore a set of RDF triples. Resources are uniquely iden-tified by an internationalized resource identifier (IRI) – gen-eralization of the uniform resource identifier (URI) [2] – orby a local (to the RDF graph) identifier if they do not have ameaning outside of the local context (in which case they arecalled blank nodes). The resources used to specify predicatesare called properties. A resource may have one or more types,given by the predefined property rdf:type. In RDF serial-izations, prefixes can be used in place of the initial part of anIRI, which represents specific namespaces for vocabularies orsets of resources. In the initial part of the query prefix dec-larations are used to associate prefixes with namespaces. Anamespace can be associated with any prefix, but the mostcommon namespaces are often abbreviated in the same way(e.g., the namespace http://www.w3.org/1999/02/22-rdf-

syntax-ns# is almost always abbreviated as rdf:). The website prefix.cc1 holds a collectively maintained set of standardmappings between prefixes and namespaces. The mappings

1http://prefix.cc/

86

Page 2: Linked Data Queries as Jigsaw Puzzles: a Visual Interface ... · Linked Data Queries as Jigsaw Puzzles: a Visual Interface for SPARQL Based on Blockly Library Paolo Bottoni Sapienza,

can be accessed through a Web user interface, a Web APIor downloaded in several formats.

SPARQL [10] is the standard query language for RDFdatasets. The basic building block of SPARQL is the TriplePattern, a triple in which each of the components can bereplaced by a variable. A Basic Graph Pattern is a set oftriple patterns associated with a specific input graph (the de-fault graph, a graph named by an URI or a generic namedgraph referred to via a variable). The Basic Graph Patternis matched against the input dataset and the result is a mul-tiset of tuples in which each tuple corresponds to a bindingfor each of the variables. The relations generated throughbasic graph patterns can be filtered, composed, or groupedusing relational operators. The final result of a SPARQLSELECT (one of the available query types and the one thatwill be considered in this work) query is a multiset of tuplesthat can be optionally ordered, making it a list of tuples.

Blockly [8] is a JavaScript library maintained by Googleand based on previous work at MIT. It provides a set of basicblocks covering the structure of typical imperative programs.Most importantly, it is extensible programmatically to definenew blocks. The interface to a Blockly application presents:the workspace, the area where blocks can be dragged to andcomposed; the toolbox, the area where all the blocks availableto the user are shown – organized by categories – and fromwhich they can be dragged to the workspace.

3. RELATED WORKSeveral interactive tools has been proposed to support the

structured querying of RDF data sources, at various levelsof abstraction and using different paradigms. A basic dis-tinction can be made between: 1) tools that require writ-ing and reading SPARQL syntax and 2) tools that provideother metaphors (usually visual) aiming to lower the learn-ing curve and provide more intuitive interaction. The firstkind of UIs include advanced editors as YASGUI [14] or in-tegrated environments as Twinkle2. To design the query theuser has still to know SPARQL and the vocabularies used.

UIs of the second kind provide interaction with anotherrepresentation of the query, that is then transformed toSPARQL to be executed. Some of the them are text based,using forms –as SPARQLViz [5]– or controlled constructionof natural language statements –as SPARKLIS [7]. Thesesystems do not scale well when the query complexity in-creases and do not easily permit code reuse. Others arevisual tools, often using a graph based paradigm, like NITE-LIGHT [15]. Graph-based interfaces fit the model very well,but are highly inefficient in terms of space on user screenand have often interaction issues.

A previous important proposal for using block program-ming for SPARQL queries is the SPARQL/CQELS VisualEditor designed for Super Stream Collider framework [12].In that case the blocks strictly follow the language structureand syntax and the tool requires at least basic knowledge ofSPARQL to be used. Conversely, the user interface we pro-pose is designed to provide blocks that should be mostly selfdescribing and usable without knowing the SPARQL syn-tax in advance. Finally, for most of the existing tools thevisualization of the result set is passive and often presentedin an indipendent panel/window (e.g., in many Web-basedinterfaces the result page replaces the query page). In our

2http://www.ldodds.com/projects/twinkle/

proposal, on the contrary, results and query share there sameworkspace to allow for an exploratory pattern of interaction.

4. PROPOSED USER INTERFACEWe propose a visual user interface where users can build

SPARQL queries without having to know SPARQL syntax,but simply requiring a basic conceptual understanding ofthe RDF model as a graph-based knowledge representation.That does not include necessarily knowledge of the structureof the data against which the query(ies) are being built.

We enumerate here a set of basic requirements based onthe kind of scenarios that we want to support: (1) usersshould not care about the syntax – hence visual clues andconstraints should prevent syntax errors; (2) the need toinput text by users should be minimized; (3) there shouldbe direct ways to build commonly used structures; (4) usersshould be able to use the tool as a step to learn the SPARQL(textual) syntax – hence the used blocks should follow thestructure of the language; and (5) users should be able towork even without prior knowledge of the dataset – henceexploratory queries should be explicitly supported.

Following the organization of Blockly, we use blocks oftwo different types: value-like, used for expressions, literalvalues, variables; statement-like, used for graph patterns andrelational operators.

As this categorisation is not sufficient to distinguish legalcases of composition from forbiddden ones, the connectionsavailable on the blocks are given a more specific type torestrict the attachement only to the meaningful ones. Forexample, in graph patterns, literals cannot be used as sub-jects or predicates of a pattern, but only as objects –in linewith RDF data model and SPARQL syntax.

In Figure 1 a construction representing a query is shown.The query is represented by the purple coloured select allblock and its subblocks. The where subblock is a graph pat-tern and corresponds to the where clause of the query. In-side graph patterns and expressions different types of graphterms can be used: IRIs (represented in brown and usingthe prefixed notation), variables (using Blockly appearanceof variables for consistency), and literals (represented in dif-ferent colours according to their type, numeric, string orboolean).

The query in Figure 1 is translated to the followingSPARQL query.

prefix rdfs: <http ://www.w3.org /2000/01/rdf -schema#>prefix dbo: <http :// dbpedia.org/ontology/>prefix dbpedia: <http :// dbpedia.org/resource/>

select distinct *where {

?film a dbo:Film;rdfs:label ?label;dbo:director dbpedia:Francis_Ford_Coppola.

}order by ?labellimit 3

Even if, by designing such query with blocks, potentialsyntax errors are avoided, it is still required some knowledgeabout the specific dataset (that there exist a resource dbpe-

dia:Francis_Ford_Coppola representing the person FrancisFord Coppola) and the used vocabularies (that movies havetype dbo:Film and the property dbo:director associates amovie with its director).

To address this issue, we designed the execution of thequeries in the tool in such a way as to allow exploratory ac-

87

Page 3: Linked Data Queries as Jigsaw Puzzles: a Visual Interface ... · Linked Data Queries as Jigsaw Puzzles: a Visual Interface for SPARQL Based on Blockly Library Paolo Bottoni Sapienza,

Figure 1: Execution of a query to get movies directed by Francis Ford Coppola.

Figure 2: A generic query to look for specific classes.

cess to the dataset. A specific block, the execution block, isdesigned for the execution of a query (the top block in Fig-ure 1); as soon as a query is attached to its query connectionthe SPARQL query is generated and run on the underlyingdataset; when the results are ready, they appear in a tabularformat attached to the results connection of the executionblock. Data from the table is represented as IRI blocks andliteral blocks that can be dragged and used to refine thesame query or to create a new one. The table of results canbe seen in Figure 1 as well 3.

To effectively favour this kind of exploration, the toolboxalready contains some pre-built queries that can be used toaccess the main features of a dataset. These pre-built queriesare just a set of pre-connected blocks that can be freelyrearranged and decomposed on the workspace. Figure 2shows one of the pre-built queries, used to look for classesin the dataset.

For example, in order to find the class used for moviesin the dataset, the user drags the query in Figure 2 to theworkspace and replaces the empty string in the provided thatblock – corresponding to the filter clause in SPARQL –with the word “movie”. The execution of the query, shownin Figure 3, returns all the classes in the dataset with alabel containing the word “movie”. In this case the first rowcontains the class the user was looking for: dbo:Film. Thefollowing is the corresponding SPARQL query.

prefix rdfs: <http ://www.w3.org /2000/01/rdf -schema#>prefix owl: <http ://www.w3.org /2002/07/ owl#>

select distinct *where {

3Here and in the following examples, the results shown arefrom the DBpedia [11] dataset.

?type a owl:Class;rdfs:label ?label.

filter( contains (?label , ’movie ’) ).}order by strlen (?label)limit 5

The dbo:Film block can be dragged from the query resultsand directly reused. Other pre-built queries allow users toretrieve and reuse the other IRI blocks for the query in Fig-ure 1. Thus, users need not use external means for exploringvocabularies and datasets before designing their queries.

5. IMPLEMENTATIONThe presented tool is based on an extension of the Blockly

JavaScript library, that works entirely client side. We ex-tended the library to supply the specific blocks needed forSPARQL queries and execution. We also added the neces-sary code to generate SPARQL fragments from the blocks.The SPARQL execution block listens for changes in its queryconnection; each time the query changes, the correspondingSPARQL query is generated and sent to a SPARQL end-point. The SPARQL endpoint used can be set through con-figuration of the client code, but in the future it could ratherbe chosen through the user interface. The results are usedto dynamically generate the result block and its subblocks.The standard prefix definitions from prefix.cc are used to addprefix declarations in the query sent to the endpoint and toconvert the IRIs in the result to the prefixed notation.

6. EVALUATIONAn initial evaluation of the tool is derived based on the

most relevant dimensions enumerated by Green and Petrefor the assessment of visual programming languages [9].

Consistency : the block syntax favours internal consis-tency; external consistency is satisfied with respect toSPARQL textual syntax because the structure is maintainedand partially in respect to other Blockly-based languages be-cause the appearance and behaviour of basic expressions ispreserved. Diffuseness: representation through blocks re-quire more space than textual representation, but the lan-guage is designed to be efficient in terms of graphic entities(e.g. the type constraint is included by default in the ba-sic block used for graph patterns, being commonly used).Error - proneness: the chance of making syntax errors is

88

Page 4: Linked Data Queries as Jigsaw Puzzles: a Visual Interface ... · Linked Data Queries as Jigsaw Puzzles: a Visual Interface for SPARQL Based on Blockly Library Paolo Bottoni Sapienza,

Figure 3: Execution of a query to get candidate classes corresponding to the set of movies.

extremely reduced compared to textual syntax (the affor-dances are first coarsely constrained by the differences be-tweeen shapes, then by other visual cues, and finally by thebehaviour of visual elements); the use of results for refin-ing existing queries or creating new ones reduces the chanceof mismatches between queries and data structure. Hiddendependencies: dependency and scope of variables is strictlyrelated to the SPARQL query structure that is closely fol-lowed by the blocks’ structure. Premature commitment andProgressive evaluation: fragment of queries can be built andpartial queries evaluated, also “in parallel” by using morethan one execution block.

7. CONCLUSIONS AND FUTURE WORKWe briefly presented a new visual user interface for query-

ing Linked Data. The interface potentially allows non-experts to build SPARQL queries. Moreover, prior knowl-dege of the dataset and used vocabularies is not required,as the tool favours an exploratory and constructive way ofbuilding queries. An initial evaluation of the approach basedon cognitive dimensions of notations seems to be promising.The presented interactive way of building queries representsa novelty that may offer an effective solution for costruc-tively building Linked Data queries. We plan to set up aformal evaluation with users in order to validate the idea.

8. REFERENCES[1] T. Berners-Lee. Linked data. Design Issues, 2006.

[2] T. Berners-Lee, R. Fielding, and L. Masinter. UniformResource Identifier (URI): Generic Syntax. RFC 3986(INTERNET STANDARD), Jan. 2005. Updated byRFC 6874.

[3] T. Berners-Lee, J. Hendler, and O. Lassila. TheSemantic Web. Scientific American, 284(5):34–43,2001.

[4] C. Bizer, T. Heath, and T. Berners-Lee. Linkeddata-the story so far. Int. J. on Semantic Web andInformation Systems, 5(3):1–22, 2009.

[5] J. Borsje and H. Embregts. Graphical querycomposition and natural language processing in an rdfvisualization interface. Erasmus School of Economicsand Business Economics, Vol. Bachelor. ErasmusUniversity, Rotterdam, 2006.

[6] R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1Concepts and Abstract Syntax. W3C REC 25

February 2014.

[7] S. Ferre. Sparklis: a SPARQL Endpoint Explorer forExpressive Question Answering. In ISWC 2014Posters & Demonstrations Track.

[8] N. Fraser et al. Blockly: A visual programming editor,2013.

[9] T. R. G. Green and M. Petre. Usability analysis ofvisual programming environments: a ‘cognitivedimensions’ framework. Journal of Visual Languages& Computing, 7(2):131–174, 1996.

[10] S. Harris et al. SPARQL 1.1 Query Language. W3CREC 21 March 2013.

[11] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch,D. Kontokostas, P. N. Mendes, S. Hellmann,M. Morsey, P. van Kleef, S. Auer, et al. DBpedia – Alarge-scale, multilingual knowledge base extractedfrom wikipedia. Semantic Web Journal, 5:1–29, 2014.

[12] H. N. M. Quoc, M. Serrano, D. Le-Phuoc, andM. Hauswirth. Super stream collider–linked streammashups for everyone. Proc. of the Semantic WebChallenge at ISWC2012, Boston, MA, US, 2012.

[13] M. Resnick, J. Maloney, A. Monroy-Hernandez,N. Rusk, E. Eastmond, K. Brennan, A. Millner,E. Rosenbaum, J. Silver, B. Silverman, et al. Scratch:programming for all. Communications of the ACM,52(11):60–67, 2009.

[14] L. Rietveld and R. Hoekstra. Yasgui: Not just anothersparql client. In The Semantic Web: ESWC 2013Satellite Events, pages 78–86. Springer, 2013.

[15] A. Russell, P. R. Smart, D. Braines, and N. R.Shadbolt. Nitelight: A graphical tool for semanticquery construction. 2008.

[16] D. Wolber, H. Abelson, E. Spertus, and L. Looney.App Inventor. ” O’Reilly Media, Inc.”, 2011.

89


Recommended