6
Towards Semantic Recommendation of Biodiversity Datasets based on Linked Open Data Felicitas Löffler Dept. of Mathematics and Computer Science Friedrich Schiller University Jena, Germany Bahar Sateli Semantic Software Lab Dept. of Computer Science and Software Engineering Concordia University Montréal, Canada René Witte Semantic Software Lab Dept. of Computer Science and Software Engineering Concordia University Montréal, Canada Birgitta König-Ries Friedrich Schiller University Jena, Germany and German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Germany ABSTRACT Conventional content-based filtering methods recommend documents based on extracted keywords. They calculate the similarity between keywords and user interests and return a list of matching documents. In the long run, this approach often leads to overspecialization and fewer new entries with respect to a user’s preferences. Here, we propose a seman- tic recommender system using Linked Open Data for the user profile and adding semantic annotations to the index. Linked Open Data allows recommendations beyond the con- tent domain and supports the detection of new information. One research area with a strong need for the discovery of new information is biodiversity. Due to their heterogeneity, the exploration of biodiversity data requires interdisciplinary collaboration. Personalization, in particular in recommender systems, can help to link the individual disciplines in bio- diversity research and to discover relevant documents and datasets from various sources. We developed a first prototype for our semantic recommender system in this field, where a multitude of existing vocabularies facilitate our approach. Categories and Subject Descriptors H.3.3 [Information Storage And Retrieval]: Informa- tion Search and Retrieval; H.3.5 [Information Storage And Retrieval]: Online Information Services General Terms Design, Human Factors Keywords content filtering, diversity, Linked Open Data, recommender systems, semantic indexing, semantic recommendation Copyright c by the paper’s authors. Copying permitted only for private and academic purposes. In: G. Specht, H. Gamper, F. Klan (eds.): Proceedings of the 26 th GI- Workshop on Foundations of Databases (Grundlagen von Datenbanken), 21.10.2014 - 24.10.2014, Bozen, Italy, published at http://ceur-ws.org. 1. INTRODUCTION Content-based recommender systems observe a user’s brows- ing behaviour and record the interests [1]. By means of natu- ral language processing and machine learning techniques, the user’s preferences are extracted and stored in a user profile. The same methods are utilized to obtain suitable content keywords to establish a content profile. Based on previously seen documents, the system attempts to recommend similar content. Therefore, a mathematical representation of the user and content profile is needed. A widely used scheme are TF- IDF (term frequency-inverse document frequency) weights [19]. Computed from the frequency of keywords appearing in a document, these term vectors capture the influence of keywords in a document or preferences in a user profile. The angle between these vectors describes the distance or the closeness of the profiles and is calculated with similarity mea- sures, like the cosine similarity. The recommendation lists of these traditional, keyword-based recommender systems often contain very similar results to those already seen, leading to overspecialization [11] and the “Filter-Bubble”-effect [17]: The user obtains only content according to the stored prefer- ences, other related documents not perfectly matching the stored interests are not displayed. Thus, increasing diversity in recommendations has become an own research area [21, 25, 24, 18, 3, 6, 23], mainly used to improve the recommendation results in news or movie portals. One field where content recommender systems could en- hance daily work is research. Scientists need to be aware of relevant research in their own but also neighboring fields. Increasingly, in addition to literature, the underlying data itself and even data that has not been used in publications are being made publicly available. An important example for such a discipline is biodiversity research, which explores the variety of species and their genetic and characteristic diversity [12]. The morphological and genetic information of an organism, together with the ecological and geographical context, forms a highly diverse structure. Collected and stored in different data formats, the datasets often contain or link to spatial, temporal and environmental data [22]. Many important research questions cannot be answered by working with individual datasets or data collected by one group, but require meta-analysis across a wide range of data. Since the analysis of biodiversity data is quite time-consuming, there is a strong need for personalization and new filtering techniques in this research area. Ordinary search functions in relevant data portals or databases, e.g., the Global Biodiversity In-

Towards Semantic Recommendation of Biodiversity Datasets …ceur-ws.org/Vol-1313/paper_12.pdf · 2014-10-15 · our semantic recommender system and some implementation details. In

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards Semantic Recommendation of Biodiversity Datasets …ceur-ws.org/Vol-1313/paper_12.pdf · 2014-10-15 · our semantic recommender system and some implementation details. In

Towards Semantic Recommendation of BiodiversityDatasets based on Linked Open Data

Felicitas LöfflerDept. of Mathematics

and Computer ScienceFriedrich Schiller University

Jena, Germany

Bahar SateliSemantic Software Lab

Dept. of Computer Scienceand Software Engineering

Concordia UniversityMontréal, Canada

René WitteSemantic Software Lab

Dept. of Computer Scienceand Software Engineering

Concordia UniversityMontréal, Canada

Birgitta König-RiesFriedrich Schiller University

Jena, Germany andGerman Centre for IntegrativeBiodiversity Research (iDiv)

Halle-Jena-Leipzig, Germany

ABSTRACTConventional content-based filtering methods recommenddocuments based on extracted keywords. They calculate thesimilarity between keywords and user interests and return alist of matching documents. In the long run, this approachoften leads to overspecialization and fewer new entries withrespect to a user’s preferences. Here, we propose a seman-tic recommender system using Linked Open Data for theuser profile and adding semantic annotations to the index.Linked Open Data allows recommendations beyond the con-tent domain and supports the detection of new information.One research area with a strong need for the discovery ofnew information is biodiversity. Due to their heterogeneity,the exploration of biodiversity data requires interdisciplinarycollaboration. Personalization, in particular in recommendersystems, can help to link the individual disciplines in bio-diversity research and to discover relevant documents anddatasets from various sources. We developed a first prototypefor our semantic recommender system in this field, where amultitude of existing vocabularies facilitate our approach.

Categories and Subject DescriptorsH.3.3 [Information Storage And Retrieval]: Informa-tion Search and Retrieval; H.3.5 [Information StorageAnd Retrieval]: Online Information Services

General TermsDesign, Human Factors

Keywordscontent filtering, diversity, Linked Open Data, recommendersystems, semantic indexing, semantic recommendation

Copyright c© by the paper’s authors. Copying permitted onlyfor private and academic purposes.In: G. Specht, H. Gamper, F. Klan (eds.): Proceedings of the 26th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken),21.10.2014 - 24.10.2014, Bozen, Italy, published at http://ceur-ws.org.

1. INTRODUCTIONContent-based recommender systems observe a user’s brows-

ing behaviour and record the interests [1]. By means of natu-ral language processing and machine learning techniques, theuser’s preferences are extracted and stored in a user profile.The same methods are utilized to obtain suitable contentkeywords to establish a content profile. Based on previouslyseen documents, the system attempts to recommend similarcontent. Therefore, a mathematical representation of the userand content profile is needed. A widely used scheme are TF-IDF (term frequency-inverse document frequency) weights[19]. Computed from the frequency of keywords appearingin a document, these term vectors capture the influence ofkeywords in a document or preferences in a user profile. Theangle between these vectors describes the distance or thecloseness of the profiles and is calculated with similarity mea-sures, like the cosine similarity. The recommendation lists ofthese traditional, keyword-based recommender systems oftencontain very similar results to those already seen, leadingto overspecialization [11] and the “Filter-Bubble”-effect [17]:The user obtains only content according to the stored prefer-ences, other related documents not perfectly matching thestored interests are not displayed. Thus, increasing diversityin recommendations has become an own research area [21, 25,24, 18, 3, 6, 23], mainly used to improve the recommendationresults in news or movie portals.

One field where content recommender systems could en-hance daily work is research. Scientists need to be awareof relevant research in their own but also neighboring fields.Increasingly, in addition to literature, the underlying dataitself and even data that has not been used in publicationsare being made publicly available. An important examplefor such a discipline is biodiversity research, which exploresthe variety of species and their genetic and characteristicdiversity [12]. The morphological and genetic information ofan organism, together with the ecological and geographicalcontext, forms a highly diverse structure. Collected andstored in different data formats, the datasets often contain orlink to spatial, temporal and environmental data [22]. Manyimportant research questions cannot be answered by workingwith individual datasets or data collected by one group, butrequire meta-analysis across a wide range of data. Since theanalysis of biodiversity data is quite time-consuming, there isa strong need for personalization and new filtering techniquesin this research area. Ordinary search functions in relevantdata portals or databases, e.g., the Global Biodiversity In-

Page 2: Towards Semantic Recommendation of Biodiversity Datasets …ceur-ws.org/Vol-1313/paper_12.pdf · 2014-10-15 · our semantic recommender system and some implementation details. In

formation Facility (GBIF)1 and the Catalog of Life,2 onlyreturn data that match the user’s query exactly and fail atfinding more diverse and semantically related content. Also,user interests are not taken into account in the result list.We believe our semantic-based content recommender systemcould facilitate the difficult and time-consuming researchprocess in this domain.

Here, we propose a new semantic-based content recom-mender system that represents the user profile as LinkedOpen Data (LOD) [9] and incorporates semantic annotationsinto the recommendation process. Additionally, the searchengine is connected to a terminology server and utilizes theprovided vocabularies for a recommendation. The result listcontains more diverse predictions and includes hierarchicalconcepts or individuals.

The structure of this paper is as follows: Next, we de-scribe related work. Section 3 presents the architecture ofour semantic recommender system and some implementationdetails. In Section 4, an application scenario is discussed. Fi-nally, conclusions and future work are presented in Section 5.

2. RELATED WORKThe major goal of diversity research in recommender sys-

tems is to counteract overspecialization [11] and to recom-mend related products, articles or documents. More booksof an author or different movies of a genre are the classicalapplications, mainly used in recommender systems based oncollaborative filtering methods. In order to enhance the vari-ety in book recommendations, Ziegler et al. [25] enrich userprofiles with taxonomical super-topics. The recommendationlist generated by this extended profile is merged with a rankin reverse order, called dissimilarity rank. Depending on acertain diversification factor, this merging process supportsmore or less diverse recommendations. Larger diversificationfactors lead to more diverse products beyond user interests.Zhang and Hurley [24] favor another mathematical solutionand describe the balance between diversity and similarity asa constrained optimization problem. They compute a dis-similarity matrix according to applied criterias, e.g., moviegenres, and assign a matching function to find a subset ofproducts that are diverse as well as similar. One hybridapproach by van Setten [21] combines the results of severalconventional algorithms, e.g., collaborative and case-based,to improve movie recommendations. Mainly focused on newsor social media, approaches using content-based filteringmethods try to present different viewpoints on an event todecrease the media bias in news portals [18, 3] or to facilitatethe filtering of comments [6, 23].

Apart from Ziegler et al., none of the presented approacheshave considered semantic technologies. However, utilizingontologies and storing user or document profiles in triplestores represents a large potential for diversity research inrecommender systems. Frasincar et al. [7] define semanti-cally enhanced recommenders as systems with an underly-ing knowledge base. This can either be linguistic-based [8],where only linguistic relations (e.g., synonymy, hypernomy,meronymy, antonymy) are considered, or ontology-based. Inthe latter case, the content and the user profile are repre-sented with concepts of an ontology. This has the advantage

1GBIF, http://www.gbif.org2Catalog of Life, http://www.catalogueoflife.org/col/search/all/

that several types of relations can be taken into account.For instance, for a user interested in “geology”, the profilecontains the concept “geology” that also permits the recom-mendation of inferred concepts, e.g., “fossil”. The idea ofrecommending related concepts was first introduced by Mid-delton et al. [15]. They developed Quickstep, a recommendersystem for research papers with ontological terms in the userprofile and for paper categories. The ontology only considersis-a relationships and omits other relation types (e.g., part-of). Another simple hierarchical approach from Shoval etal. [13] calculates the distance among concepts in a profilehierarchy. They distinguish between perfect, close and weakmatch. When the concept appears in both a user’s and docu-ment’s profile, it is called a perfect match. In a close match,the concept emerges only in one of the profiles and a child orparent concept appears in the other. The largest distance iscalled a weak match, where only one of the profiles contains agrandchild or grandparent concept. Finally, a weighted sumover all matching categories leads to the recommendationlist. This ontological filtering method was integrated into thenews recommender system epaper. Another semantically en-hanced recommender system is Athena [10]. The underlyingontology is used to explore the semantic neighborhood in thenews domain. The authors compared several ontology-basedsimilarity measures with the traditional TF-IDF approach.However, this system lacks of a connection to a search enginethat allows to query large datasets.

All presented systems use manually established vocabular-ies with a limited number of classes. None of them utilizea generic user profile to store the preferences in a seman-tic format (RDF/XML or OWL). The FOAF (Friend Of AFriend) project3 provides a vocabulary for describing andconnecting people, e.g., demographic information (name, ad-dress, age) or interests. As one of the first, in 2006 Celma [2]leveraged FOAF in his music recommender system to storeusers’ preferences. Our approach goes beyond the FOAFinterests, by incorporating another generic user model vo-cabulary, the Intelleo User Modelling Ontology (IUMO).4

Besides user interests, IUMO offers elements to store learninggoals, competences and recommendation preferences. Thisallows to adapt the results to a user’s previous knowledge orto recommend only documents for a specific task.

3. DESIGN AND IMPLEMENTATIONIn this section, we describe the architecture and some

implementation details of our semantic-based recommendersystem (Figure 1). The user model component, described inSection 3.1, contains all user information. The source files,described in Section 3.2, are analyzed with GATE [5], as de-scribed in Section 3.3. Additionally, GATE is connected witha terminology server (Section 3.2) to annotate documentswith concepts from the provided biodiversity vocabularies.In Section 3.4, we explain how the annotated documents areindexed with GATE Mımir [4]. The final recommendation listis generated in the recommender component (Section 3.5).

3.1 User profileThe user interests are stored in an RDF/XML format uti-

lizing the FOAF vocabulary for general user information. In

3FOAF, http://xmlns.com/foaf/spec/4IUMO, http://intelleo.eu/ontologies/user-model/spec/

Page 3: Towards Semantic Recommendation of Biodiversity Datasets …ceur-ws.org/Vol-1313/paper_12.pdf · 2014-10-15 · our semantic recommender system and some implementation details. In

Figure 1: The architecture of our semantic content recommender system

order to improve the recommendations regarding a user’sprevious knowledge and to distinguish between learning goals,interests and recommendation preferences, we incorporatethe Intelleo User Modelling Ontology for an extended profiledescription. Recommendation preferences will contain set-tings in respect of visualization, e.g., highlighting of interests,and recommender control options, e.g., keyword-search ormore diverse results. Another adjustment will adapt theresult set according to a user’s previous knowledge. In orderto enhance the comprehensibility for a beginner, the systemcould provide synonyms; and for an expert the recommendercould include more specific documents.

The interests are stored in form of links to LOD resources.For instance, in our example profile in Listing 1, a user isinterested in “biotic mesoscopic physical object”, which is aconcept from the ENVO5 ontology. Note that the interestentry in the RDF file does not contain the textual description,but the link to the concept in the ontology, i.e., http://purl.obolibrary.org/obo/ENVO_01000009. Currently, we onlysupport explicit user modelling. Thus, the user informationhas to be added manually to the RDF/XML file. Later, weintend to develop a user profiling component, which gathersa user’s interests automatically. The profile is accessible viaan Apache Fuseki6 server.

Listing 1: User profile with interests stored asLinked Open Data URIs

<rdf:Description rdf:about="http://www.semanticsoftware.info/person/felicitasloeffler">

<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/><foaf:firstName>Felicitas</foaf:firstName><foaf:lastName>Loeffler</foaf:lastName><foaf:name>Felicitas Loeffler</foaf:name><foaf:gender>Female</foaf:gender><foaf:workplaceHomepage rdf:resource="http://dbpedia.org/page/

University_of_Jena"/><foaf:organization>Friedrich Schiller University Jena</foaf:organization><foaf:mbox>felicitas.loeffler@uni−jena.de</foaf:mbox><um:TopicPreference rdf:resource="http://purl.obolibrary.org/obo/

ENVO_01000009"/></rdf:Description>

3.2 Source files and terminology serverThe content provided by our recommender comes from the

biodiversity domain. This research area offers a wide range of5ENVO, http://purl.obolibrary.org/obo/envo.owl6Apache Fuseki, http://jena.apache.org/documentation/serving_data/

existing vocabularies. Furthermore, biodiversity is an inter-disciplinary field, where the results from several sources haveto be linked to gain new knowledge. A recommender systemfor this domain needs to support scientists by improving thislinking process and helping them finding relevant content inan acceptable time.

Researchers in the biodiversity domain are advised to storetheir datasets together with metadata, describing informa-tion about their collected data. A very common metadataformat is ABCD.7 This XML-based standard provides ele-ments for general information (e.g., author, title, address),as well as additional biodiversity related metadata, like infor-mation about taxonomy, scientific name, units or gathering.Very often, each taxon needs specific ABCD fields, e.g., fossildatasets include data about the geological era. Therefore,several additional ABCD-related metadata standards haveemerged (e.g., ABCDEFG8, ABCDDNA9). One documentmay contain the metadata of one or more species observationsin a textual description. This provides for annotation andindexing for a semantic search. For our prototype, we use theABCDEFG metadata files provided by the GFBio10 project;specifically, metadata files from the Museum fur Naturkunde(MfN).11 An example for an ABCDEFG metadata file ispresented in Listing 2, containing the core ABCD structureas well as additional information about the geological era.The terminology server supplied by the GFBio project of-fers access to several biodiversity vocabularies, e.g., ENVO,BEFDATA, TDWGREGION. It also provides a SPARQLendpoint12 for querying the ontologies.

3.3 Semantic annotationThe source documents are analyzed and annotated accord-

ing to the vocabularies provided by the terminology server.For this process, we use GATE, an open source frameworkthat offers several standard language engineering components[5]. We developed a custom GATE pipeline (Figure 2) thatanalyzes the documents: First, the documents are split intotokens and sentences, using the existing NLP componentsincluded in the GATE distribution. Afterwards, an ‘Anno-tation Set Transfer’ processing resource adds the original

7ABCD, http://www.tdwg.org/standards/115/8ABCDEFG, http://www.geocase.eu/efg9ABCDDNA, http://www.tdwg.org/standards/640/

10GFBio, http://www.gfbio.org11MfN, http://www.naturkundemuseum-berlin.de/12GFBio terminology server, http://terminologies.gfbio.org/sparql/

Page 4: Towards Semantic Recommendation of Biodiversity Datasets …ceur-ws.org/Vol-1313/paper_12.pdf · 2014-10-15 · our semantic recommender system and some implementation details. In

Figure 2: The GFBio pipeline in GATE presenting the GFBio annotations

markups of the ABCDEFG files to the annotation set, e.g.,abcd:HigherTaxon. The following ontology-aware ‘Large KBGazetteer’ is connected to the terminology server. For eachdocument, all occurring ontology classes are added as specific“gfbioAnnot” annotations that have both instance (link tothe concrete source document) and class URI. At the end, a‘GATE Mımir Processing Resource’ submits the annotateddocuments to the semantic search engine.

3.4 Semantic indexingFor semantic indexing, we are using GATE Mımir:13 “Mımir

is a multi-paradigm information management index andrepository which can be used to index and search over text,annotations, semantic schemas (ontologies), and semanticmetadata (instance data)” [4]. Besides ordinary keyword-based search, Mımir incorporates the previously generatedsemantic annotations from GATE to the index. Addition-ally, it can be connected to the terminology server, allowingqueries over the ontologies. All index relevant annotationsand the connection to the terminology server are specified inan index template.

3.5 Content recommenderThe Java-based content recommender sends a SPARQL

query to the Fuseki Server and obtains the interests andpreferred recommendation techniques from the user profileas a list of (LOD) URIs. This list is utilized for a secondSPARQL query to the Mımir server. Presently, this queryasks only for child nodes (Figure 3). The result set containsABCDEFG metadata files related to a user’s interests. Weintend to experiment with further semantic relations in thefuture, e.g., object properties. Assuming that a specific fossilused to live in rocks, it might be interesting to know if otherspecies, living in this geological era, occured in rocks. An-other filtering method would be to use parent or grandparentnodes from the vocabularies to broaden the search. We willprovide control options and feedback mechanisms to support

13GATE Mımir, https://gate.ac.uk/mimir/

the user in steering the recommendation process actively.The recommender component is still under development andhas not been added to the implementation yet.

Listing 2: Excerpt from a biodiversity metadata filein ABCDEFG format [20]

<abcd:DataSets xmlns:abcd="http://www.tdwg.org/schemas/abcd/2.06"xmlns:efg="http://www.synthesys.info/ABCDEFG/1.0">

<abcd:DataSet><abcd:Metadata><abcd:Description><abcd:Representation language="en"><abcd:Title>MfN − Fossil invertebrates</abcd:Title><abcd:Details>Gastropods, bivalves, brachiopods, sponges</abcd:Details>

</abcd:Representation></abcd:Description><abcd:Scope><abcd:TaxonomicTerms><abcd:TaxonomicTerm>Gastropods, Bivalves, Brachiopods, Sponges</

abcd:TaxonomicTerm></abcd:TaxonomicTerms></abcd:Scope></abcd:Metadata><abcd:Units><abcd:Unit><abcd:SourceInstitutionID>MfN</abcd:SourceInstitutionID><abcd:SourceID>MfN − Fossil invertebrates Ia</abcd:SourceID><abcd:UnitID>MB.Ga.3895</abcd:UnitID><abcd:Identifications><abcd:Identification><abcd:Result><abcd:TaxonIdentified><abcd:HigherTaxa><abcd:HigherTaxon><abcd:HigherTaxonName>Euomphaloidea</abcd:HigherTaxonName><abcd:HigherTaxonRank>Family</abcd:HigherTaxonRank></abcd:HigherTaxon></abcd:HigherTaxa><abcd:ScientificName><abcd:FullScientificNameString>Euomphalus sp.</

abcd:FullScientificNameString></abcd:ScientificName></abcd:TaxonIdentified></abcd:Result></abcd:Identification></abcd:Identifications><abcd:UnitExtension><efg:EarthScienceSpecimen><efg:UnitStratigraphicDetermination><efg:ChronostratigraphicAttributions><efg:ChronostratigraphicAttribution><efg:ChronoStratigraphicDivision>System</

efg:ChronoStratigraphicDivision><efg:ChronostratigraphicName>Triassic</efg:ChronostratigraphicName></efg:ChronostratigraphicAttribution></

efg:ChronostratigraphicAttributions></efg:UnitStratigraphicDetermination></efg:EarthScienceSpecimen></abcd:UnitExtension></abcd:Unit></abcd:Units></abcd:DataSet></abcd:DataSets>

Page 5: Towards Semantic Recommendation of Biodiversity Datasets …ceur-ws.org/Vol-1313/paper_12.pdf · 2014-10-15 · our semantic recommender system and some implementation details. In

Figure 3: A search for “biotic mesoscopic physical object” returning documents about fossils (child concept)

4. APPLICATIONThe semantic content recommender system allows the

recommendation of more specific and diverse ABCDEFGmetadata files with respect to the stored user interests. List-ing 3 shows the query to obtain the interests from a userprofile, introduced in Listing 1. The result contains a list of(LOD) URIs to concepts in an ontology.

Listing 3: SPARQL query to retrieve user interests

SELECT ?label ?interest ?synWHERE{

?s foaf:firstName "Felicitas" .?s um:TopicPreference ?interest .?interest rdfs:label ?label .?interest oboInOwl:hasRelatedSynonym ?syn

}

In this example, the user would like to obtain biodiversitydatasets about a “biotic mesoscopic physical object”, whichis the textual description of http://purl.obolibrary.org/obo/ENVO_01000009. This technical term might be incom-prehensible for a beginner, e.g., a student, who would prefera description like “organic material feature”. Thus, for alater adjustment of the result according to a user’s previousknowledge, the system additionally returns synonyms.

The returned interest (LOD) URI is utilized for a secondquery to the search engine (Figure 3). The connection to theterminology server allows Mımir to search within the ENVOontology (Figure 4) and to include related child conceptsas well as their children and individuals. Since there is nometadata file containing the exact term “biotic mesoscopicphysical object”, a simple keyword-based search would fail.However, Mımir can retrieve more specific information thanstored in the user profile and is returning biodiversity meta-data files about “fossil”. That ontology class is a child node of“biotic mesoscopic physical object” and represents a semanticrelation. Due to a high similarity regarding the content ofthe metadata files, the result set in Figure 3 contains onlydocuments which closely resemble each other.

Figure 4: An excerpt from the ENVO ontology

5. CONCLUSIONSWe introduced our new semantically enhanced content

recommender system for the biodiversity domain. Its mainbenefit lays in the connection to a search engine supportingintegrated textual, linguistic and ontological queries. We areusing existing vocabularies from the terminology server of theGFBio project. The recommendation list contains not onlyclassical keyword-based results, but documents includingsemantically related concepts.

In future work, we intend to integrate semantic-based rec-ommender algorithms to obtain further diverse results and tosupport the interdisciplinary linking process in biodiversityresearch. We will set up an experiment to evaluate the algo-rithms in large datasets with the established classificationmetrics Precision and Recall [14]. Additionally, we wouldlike to extend the recommender component with control op-tions for the user [1]. Integrated into a portal, the resultlist should be adapted according to a user’s recommendationsettings or adjusted to previous knowledge. These controlfunctions allow the user to actively steer the recommenda-tion process. We are planning to utilize the new layeredevaluation approach for interactive adaptive systems fromParamythis, Weibelzahl and Masthoff [16]. Since adaptivesystems present different results to each user, ordinary eval-uation metrics are not appropriate. Thus, accuracy, validity,usability, scrutability and transparency will be assessed inseveral layers, e.g., the collection of input data and theirinterpretation or the decision upon the adaptation strategy.This should lead to an improved consideration of adaptivityin the evaluation process.

Page 6: Towards Semantic Recommendation of Biodiversity Datasets …ceur-ws.org/Vol-1313/paper_12.pdf · 2014-10-15 · our semantic recommender system and some implementation details. In

6. ACKNOWLEDGMENTSThis work was supported by DAAD (German Academic

Exchange Service)14 through the PPP Canada program andby DFG (German Research Foundation)15 within the GFBioproject.

7. REFERENCES[1] F. Bakalov, M.-J. Meurs, B. Konig-Ries, B. Sateli,

R. Witte, G. Butler, and A. Tsang. An approach tocontrolling user models and personalization effects inrecommender systems. In Proceedings of the 2013international conference on Intelligent User Interfaces,IUI ’13, pages 49–56, New York, NY, USA, 2013. ACM.

[2] O. Celma. FOAFing the music: Bridging the semanticgap in music recommendation. In Proceedings of 5thInternational Semantic Web Conference, pages 927–934,Athens, GA, USA, 2006.

[3] S. Chhabra and P. Resnick. Cubethat: News articlerecommender. In Proceedings of the sixth ACMconference on Recommender systems, RecSys ’12, pages295–296, New York, NY, USA, 2012. ACM.

[4] H. Cunningham, V. Tablan, I. Roberts, M. Greenwood,and N. Aswani. Information extraction and semanticannotation for multi-paradigm informationmanagement. In M. Lupu, K. Mayer, J. Tait, and A. J.Trippe, editors, Current Challenges in PatentInformation Retrieval, volume 29 of The InformationRetrieval Series, pages 307–327. Springer BerlinHeidelberg, 2011.

[5] H. Cunningham et al. Text Processing with GATE(Version 6). University of Sheffield, Dept. of ComputerScience, 2011.

[6] S. Faridani, E. Bitton, K. Ryokai, and K. Goldberg.Opinion space: A scalable tool for browsing onlinecomments. In Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems, CHI ’10,pages 1175–1184, New York, NY, USA, 2010. ACM.

[7] F. Frasincar, W. IJntema, F. Goossen, andF. Hogenboom. A semantic approach for newsrecommendation. Business Intelligence Applicationsand the Web: Models, Systems and Technologies, IGIGlobal, pages 102–121, 2011.

[8] F. Getahun, J. Tekli, R. Chbeir, M. Viviani, andK. Yetongnon. Relating RSS News/Items. InM. Gaedke, M. Grossniklaus, and O. Dıaz, editors,ICWE, volume 5648 of Lecture Notes in ComputerScience, pages 442–452. Springer, 2009.

[9] T. Health and C. Bizer. Linked Data: Evolving the Webinto a Global Data Space. Synthesis Lectures on theSemantic Web: Theory and Technology. Morgan &Claypool, 2011.

[10] W. IJntema, F. Goossen, F. Frasincar, andF. Hogenboom. Ontology-based news recommendation.In Proceedings of the 2010 EDBT/ICDT Workshops,EDBT ’10, pages 16:1–16:6, New York, NY, USA, 2010.ACM.

[11] P. Lops, M. de Gemmis, and G. Semeraro.Content-based recommender systems: State of the artand trends. In F. Ricci, L. Rokach, B. Shapira, and

14DAAD, https://www.daad.de/de/15DFG, http://www.dfg.de

P. B. Kantor, editors, Recommender Systems Handbook,pages 73–105. Springer, 2011.

[12] M. Loreau. Excellence in ecology. International EcologyInstitute, Oldendorf, Germany, 2010.

[13] V. Maidel, P. Shoval, B. Shapira, andM. Taieb-Maimon. Ontological content-based filteringfor personalised newspapers: A method and itsevaluation. Online Information Review, 34 Issue5:729–756, 2010.

[14] C. D. Manning, P. Raghavan, and H. Schutze.Introduction to Information Retrieval. CambridgeUniversity Press, 2008.

[15] S. E. Middleton, N. R. Shadbolt, and D. C. D. Roure.Ontological user profiling in recommender systems.ACM Trans. Inf. Syst., 22(1):54–88, Jan. 2004.

[16] A. Paramythis, S. Weibelzahl, and J. Masthoff. Layeredevaluation of interactive adaptive systems: Frameworkand formative methods. User Modeling andUser-Adapted Interaction, 20(5):383–453, Dec. 2010.

[17] E. Pariser. The Filter Bubble - What the internet ishiding from you. Viking, 2011.

[18] S. Park, S. Kang, S. Chung, and J. Song. Newscube:delivering multiple aspects of news to mitigate mediabias. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems, CHI ’09, pages443–452, New York, NY, USA, 2009. ACM.

[19] G. Salton and C. Buckley. Term-weighting approachesin automatic text retrieval. Information Processing andManagement, 24:513–523, 1988.

[20] Museum fur Naturkunde Berlin. Fossil invertebrates,UnitID:MB.Ga.3895.http://coll.mfn-berlin.de/u/MB_Ga_3895.html.

[21] M. van Setten. Supporting people in findinginformation: hybrid recommender systems andgoal-based structuring. PhD thesis, Telematica Instituut,University of Twente, The Netherlands, 2005.

[22] R. Walls, J. Deck, R. Guralnick, S. Baskauf,R. Beaman, and et al. Semantics in Support ofBiodiversity Knowledge Discovery: An Introduction tothe Biological Collections Ontology and RelatedOntologies. PLoS ONE 9(3): e89606, 2014.

[23] D. Wong, S. Faridani, E. Bitton, B. Hartmann, andK. Goldberg. The diversity donut: enabling participantcontrol over the diversity of recommended responses. InCHI ’11 Extended Abstracts on Human Factors inComputing Systems, CHI EA ’11, pages 1471–1476,New York, NY, USA, 2011. ACM.

[24] M. Zhang and N. Hurley. Avoiding monotony:Improving the diversity of recommendation lists. InProceedings of the 2008 ACM Conference onRecommender Systems, RecSys ’08, pages 123–130, NewYork, NY, USA, 2008. ACM.

[25] C.-N. Ziegler, G. Lausen, and L. Schmidt-Thieme.Taxonomy-driven computation of productrecommendations. In Proceedings of the ThirteenthACM International Conference on Information andKnowledge Management, CIKM ’04, pages 406–415,New York, NY, USA, 2004. ACM.