29
Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 91 Ontology-Based Automatic Annotation of Learning Content Jelena Jovanoviæ , University of Belgrade, Serbia and Montenegro Dragan Gaševiæ , Simon Fraser University Surrey, Canada Vladan Deved iæ , University of Belgrade, Serbia and Montenegro ABSTRACT This paper presents an ontology-based approach to automatic annotation of learning objects’ (LOs) content units that we tested in TANGRAM, an integrated learning environment for the domain of Intelligent Information Systems. The approach does not primarily focus on automatic annotation of entire LOs, as other relevant solutions do. Instead, it provides a solution for automatic metadata generation for LOs’ components (i.e., smaller, potentially reusable, content units). Here we mainly report on the content-mining algorithms and heuristics applied for determining values of certain metadata elements used to annotate content units. Specifically, the focus is on the following elements: title, description, unique identifier, subject (based on a domain ontology), and pedagogical role (based on an ontology of pedagogical roles). Additionally, as TANGRAM is grounded on an LO content structure ontology that drives the process of an LO decomposition into its constituent content units, each thus generated content unit is implicitly semantically annotated with its role/position in the LO’s structure. Employing such semantic annotations, TANGRAM allows assembling content units into new LOs personalized to the users’ goals, preferences, and learning styles. In order to provide the evaluation of the proposed solution, we describe our experiences with automatic annotation of slide presentations, one of the most common LO types. Keywords: please provide INTRODUCTION Over the past few years we have wit- nessed a tremendous amount of activity taking place in the development of Web-based e-learn- ing systems (Mohan & Greer, 2003). A substan- tial percentage of those activities have been related to learning content authoring. As authoring of high quality learning materials proved to be a highly expensive task in terms of both time and money, reuse of once created learning content soon become one of the hot- test research issues. Learning content repre- sented in the form of reusable learning objects (LOs) promised to significantly reduce the time

Ontology-Based Automatic Annotation of Learning Content

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 91

Ontology-Based AutomaticAnnotation of Learning Content

Jelena Jovanovi��, University of Belgrade, Serbia and Montenegro

Dragan Gaševi��, Simon Fraser University Surrey, Canada

Vladan Deved��i��, University of Belgrade, Serbia and Montenegro

ABSTRACT

This paper presents an ontology-based approach to automatic annotation of learning objects’(LOs) content units that we tested in TANGRAM, an integrated learning environment for thedomain of Intelligent Information Systems. The approach does not primarily focus on automaticannotation of entire LOs, as other relevant solutions do. Instead, it provides a solution forautomatic metadata generation for LOs’ components (i.e., smaller, potentially reusable, contentunits). Here we mainly report on the content-mining algorithms and heuristics applied fordetermining values of certain metadata elements used to annotate content units. Specifically,the focus is on the following elements: title, description, unique identifier, subject (based on adomain ontology), and pedagogical role (based on an ontology of pedagogical roles).Additionally, as TANGRAM is grounded on an LO content structure ontology that drives theprocess of an LO decomposition into its constituent content units, each thus generated contentunit is implicitly semantically annotated with its role/position in the LO’s structure. Employingsuch semantic annotations, TANGRAM allows assembling content units into new LOspersonalized to the users’ goals, preferences, and learning styles. In order to provide theevaluation of the proposed solution, we describe our experiences with automatic annotation ofslide presentations, one of the most common LO types.

Keywords: please provide

INTRODUCTIONOver the past few years we have wit-

nessed a tremendous amount of activity takingplace in the development of Web-based e-learn-ing systems (Mohan & Greer, 2003). A substan-tial percentage of those activities have beenrelated to learning content authoring. As

authoring of high quality learning materialsproved to be a highly expensive task in termsof both time and money, reuse of once createdlearning content soon become one of the hot-test research issues. Learning content repre-sented in the form of reusable learning objects(LOs) promised to significantly reduce the time

Page 2: Ontology-Based Automatic Annotation of Learning Content

92 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

and cost of authoring high-quality learningmaterials, making them more affordable andreadily available. The principal objective is toenable faster, cheaper, and better learning(Duval & Hodgins, 2003).

Current research efforts are almost exclu-sively oriented toward reusability of LOs in theirentirety. Annotations of LOs with the standard-compliant metadata sets (e.g., IEEE LearningObject Metadata [LOM, 2002] [LOM] andDublin Core) aim at enabling search and re-trieval of existing LOs stored in LO reposito-ries. Accordingly, metadata is seen as the pri-mary mean for fostering LOs reusability. How-ever, very often a content author needs to re-use just some specific parts of an LO, ratherthan the entire LO, for example, just a couple ofslides out of a slide presentation, or an imageor a table out of a text document. Faced withsuch a need, the content author typically turnsto what we call the search-read-copy-pasteapproach. Specifically, the process of authoringnew learning materials typically proceeds in thefollowing steps: an author first searches bothLO repositories and the Web to find potentiallyuseful learning content. Then (s)he reads theretrieved materials to determine whether theyreally contain content relevant for the courseunder development. Having recognized rel-evant parts of the retrieved materials, the au-thor copies/pastes them in the new materials(s)he is authoring. The process finishes by fine-tuning content units collected from differentsources and optionally adding some new, origi-nal contents. Obviously, the content authoringprocess demands an LOt of time and effort.Additionally, it is not scalable in terms of main-tenance (Verbert, Jovanovi��, Gaševi��, & Duval,2005). (Semi-)automating reuse of individualcomponents of LOs’ can improve the currentpractice by reducing the effort that contentauthors put in preparation of learning materi-als. However, an approach to such a kind ofautomation is still an open question.

To enable reusability of content units ofvarying granularity levels, an explicit definitionof the LO’s structure is needed. Additionally, if

the process of reusing content units has to be(semi-)automatic, the definition of the LO’sstructure must be formally specified and ex-pressed in a machine understandable language.Furthermore, to facilitate search and retrievalof content units based on the semantics of theircontent, those content units must be semanti-cally annotated, that is, semantic metadata mustbe attached to them. Ontologies and SemanticWeb languages provide means to achieve boththings.

In this paper we present our approach toautomatic annotation of LOs’ componentsbased on a number of ontologies. The approachis tested in TANGRAM — an integrated learn-ing environment for the domain of IntelligentInformation Systems (IIS).

PROBLEM STATEMENTThe objectives of this paper are:

• To present the rational for using SemanticWeb technologies, ontologies in particular,to annotate LOs and their components, andthus facilitate LOs reusability at the compo-nent (content unit) level;

• To present how automatic semantic annota-tion is implemented in a practical learningenvironment — TANGRAM — developedapplying Semantic Web technologies to en-able reusability at the level of LOs’ compo-nents;

• To discuss our experiences with automaticannotation of individual content units.

The principles we discuss are implemen-tation-independent. On the other hand, theirimplementation in TANGRAM helped us revealimportant practical details and problems wewere not aware of initially.

The rest of the paper is structured to fol-low the order of the objectives stated above.

THE RATIONAL FOR SEMANTICANNOTATION OF LOS

The starting point in our approach toontology-based LO annotation is the classifi-

Page 3: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 93

cation of ontologies relevant for the e-learningdomain suggested by Stojanovi��, Staab, andStuder (2001). This classification recognizes thefollowing types of ontologies: (1) structuralontologies that formalize the content structure;(2) context ontologies that specify the peda-gogical/instructional role of the content; (3)content (domain) ontologies that formally de-scribe the subject matter (topics) of learningcontent. In our approach an LO is representedby a structural ontology, whereas the other twotypes of ontologies are used to semanticallyannotate the LO. Figure 1 illustrates the pro-posed approach. Each LO is considered as anaggregate consisting of a number of contentunits/components. The components can differin types and levels of granularity. The conceptsof a Content Structure Ontology formally de-fine different kinds of content units (e.g., slide,paragraph, list), whereas the properties of suchan ontology enable formal expression of aggre-gation relationships between content units ofdifferent granularity and/or type. Additionally,each LO is annotated with a standards-compli-ant metadata set. Specifically, our proposal isbased on IEEE LOM standard (LOM, 2002).However, we argue for certain enhancements

of the standard in order to make the metadatamachine understandable. Therefore, we sug-gest using domain ontology concepts as val-ues of the metadata element describing the con-tent of an LO — for example, we assign con-cepts of an ontology for the IIS domain (Do-main Ontology in Figure 1) to the dc:subjectelement of our standards-compliant metadataschema (see the section on “TANGRAM’sLOM RDF Binding Profile” for more details). Inaddition, the concepts from a context ontology(Edu-Context Ontology in Figure 1) are usedto mark-up LOs with their pedagogical/instruc-tional roles (e.g., definition, illustration). Theproposed approach also assumes attachingmetadata to each component of an LO, thusmaking individual components searchable andreusable (this detail is left out from Figure 1 inorder to avoid excessive cluttering).

The rational for using ontologies in theproposed approach is to enable Semantic Webreasoners to perform an advanced search ofLO repositories. The advancement reflects inability to search for a content of a certain type(as defined in a context ontology, e.g., “defini-tion”), dealing with a certain topic (from a do-main ontology, e.g., “Semantic Web”) and be-

Figure 1. An LO compliant to the proposed ontology-based approach

Page 4: Ontology-Based Automatic Annotation of Learning Content

94 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

ing at a certain level of granularity (as definedin a structure ontology, e.g., “slide”). Besidesthe benefit of having a more convenient searchmechanism that better reflects the searchers’needs, another important benefit lies in an abil-ity to (semi-)automatically compose the re-trieved content units into a new LO compliantto the specific instructional approach of a con-tent author.

The suggested approach is also relevantin terms of learning content personalization.Explicitly defined structure of an LO facilitatesadaptation of the LO, as it enables direct ac-cess to each of its components and their tailor-ing to the preferences, objectives, competen-cies, and/or other specific features of a studentthat are relevant for the learning process. Be-sides, being able to directly access componentsof an LO, we are empowered to dynamically, onthe fly create a new, personalized learning con-tent out of those components.

WHAT IS TANGRAM?TANGRAM is an ancient Chinese mov-

ing piece puzzle, consisting of seven geometricshapes that can be assembled in different waysto create more elaborated shapes. This ancientgame perfectly reflects the basic notion of theapproach we propose — building new contentout of existing components and shaping up thatcontent differently to satisfy specific needs ofindividual learners. Accordingly, we gave thename TANGRAM to the application we are de-veloping to evaluate the feasibility of our ap-proach. TANGRAM is a learning Web applica-tion intended to be useful to both content au-thors and students interested in the domain ofIIS. In the rest of this section we first describeTANGRAM from two different view points,content authors’ and students’, and then pro-ceed with presenting its architecture.

What does TANGRAM Provideto a Content Author?

TANGRAM’s aim is to enable contentauthors to create new LOs out of existing learn-ing content with as little manual operations

(copy, paste) as possible. To this end,TANGRAM aims at providing the followingfunctionalities:

• Upload a new LO into the LO Repositorywith the idea of later being able to reuse itscomponents. The uploaded LO is decom-posed into smaller content units in accor-dance with the used content structure on-tology. The idea is to make each contentunit directly accessible, thus facilitating itsreuse.

• Describe the uploaded LO and its compo-nents with high-quality metadata, but with-out too much effort for the author. Annota-tion is based on a subset of the IEEE LOMmetadata elements (LOM, 2002), actuallyonly those elements that we found neces-sary to provide intended functionalities ofour system.

• Search the LO Repository for LOs and/ortheir components in order to employ themfor composing new LOs.

• Compose a new LO using components pre-viously retrieved from the repository.

To be able to use the system, an authorhas to register first. We made the registrationmandatory in order to acquire a basic set ofdata about the author. Availability of such datafacilitates generation of suggested values formetadata elements in the process of LOs anno-tation.

What does TANGRAM Provideto a Student?

TANGRAM provides adaptation of learn-ing content to the specific needs of individualstudents. Currently, TANGRAM is focused onenabling personalized learning experiences tostudents interested in the domain of IIS. Twobasic functionalities of the system from the stu-dents’ perspective are:

• Provision of learning content adapted to thestudent’s current level of knowledge of thedomain concept of interest, his/her learning

Page 5: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 95

style, and other personal preferences.• Quick access to a particular type of content

about a topic of interest, for example, accessto examples of RDF documents or defini-tions of the Semantic Web (both topics be-long to the domain of IIS).

Just like a content author, a student alsomust register with the system during the firstsession. Through the registration procedurethe system acquires information about the stu-dent sufficient to create an initial version ofhis/her profile (i.e., student model). The learner’slearning style is determined from a simplifiedversion of the Felder and Silverman question-naire1, whereas for determining the learner’sinitial knowledge about the IIS domain, the sys-tem relies on the learner’s self-assessment. Thesystem uses this profile to keep track of thestudent’s preferences, learning style, as well ashis/her level of knowledge about concepts fromthe IIS domain. With this data, the system isable to create personalized learning content (seethe “Ontology-Based Approach to Personal-ization of Learning Content” section).

TANGRAM’S ArchitectureFigure 2a illustrates TANGRAM’s archi-

tecture. As the figure suggests, TANGRAMhas a modular architecture, comprised of thefollowing four main modules coordinated bythe Coordinator module:

• Content Management Module is generallyresponsible for handling uploaded LOs andmanipulating TANGRAM’s repository ofLOs. Figure 2b illustrates the architecture ofthis module, whose main functionalities in-clude: (a) Decomposition of an uploaded LOinto content units of lower granularity lev-els, according to the content structure on-tology (LO Disaggregator); (b) Automaticannotation of content units (Annotator) —content units generated out of the uploadedLO are automatically annotated withmetadata elements of TANGRAM’s IEEELOM RDF Binding profile (see the section

on “TANGRAM’s LOM RDF Binding Pro-file” for details). Concepts of appropriateontologies (domain ontology and the ontol-ogy of pedagogical context), set as valuesof certain metadata elements, facilitate auto-matic interpretation of the semantics (i.e.,meaning) of the content mark-up; (c) Stor-age of LOs in a format compliant to the ap-plied content structure ontology (StorageFacilitator); (d) Semantic search of the re-pository and retrieval of content units of aspecific type, and/or dealing with a specificdomain topic (Search Engine).

• User Model (UM) Management Module isresponsible for handling any kind of requestfor accessing and/or updating the reposi-tory of user models (profiles).

• Dynamic Assembly Module is in charge ofdynamic (on the fly) generation of personal-ized learning content for a specific user (i.e.,student). This module knows how to com-bine available content units (obtained fromthe Content Management Module) to forma coherent learning content that suits a par-ticular student best (i.e., information that thesystem has about the student, acquired fromthe UM Management Module).

• User Interface Module handles interactionbetween the system and a user.

The current version of TANGRAM fo-cuses exclusively on the content structure, de-composition, and annotation of slide presenta-tions. Specifically, TANGRAM is presently ableto handle only slide presentations authored inOpenOffice, but we are in the midst of provid-ing the same support for the MS PowerPointauthoring tool. Our decision to firstly focus onslide presentations was motivated by the factthat teachers frequently opt for this type of LOwhen preparing learning content for in-classlectures and tutorials. A plethora of LOs of thiskind is already made available on the Web, pro-viding a valuable source of learning contentworth for reuse. However, our intention is touse the acquired experiences to enable decom-position and annotation of other types of LOsas well (e.g., MS Word, HTML).

Page 6: Ontology-Based Automatic Annotation of Learning Content

96 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

TANGRAM is implemented in Java pro-gramming language. It is built using Tapestry(http://jakarta.apache.org/tapestry) — an open-source framework for creating dynamic, robust,highly scalable Web applications in Java. Ad-ditionally, Jena — Java Semantic Web Frame-

work (http://jena.sourceforge.net/) — is usedfor storing, updating, and searching reposito-ries of ontological instances, as well as for rea-soning over the ontologies.

Figure 2. TANGRAM’s architecture (a) Content Management Module; (b) TANGRAM’sarchitecture also comprises two repositories: (1) a repository of LOs (stored in a formatcompliant to the content-structure ontology) and their metadata (based on TANGRAM’s IEEELOM RDF Binding profile); (2) a repository of user profiles represented in accordance withTANGRAM’s User Model ontology

(a)

(b)

Page 7: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 97

ONTOLOGICAL FOUNDATIONOF TANGRAM

TANGRAM is a fully ontology-basedlearning environment. In the following subsec-tions we briefly present each of the ontologiesupon which it is based. We also describe howthe ontologies are used to support personal-ization of learning content in TANGRAM. Allontologies are expressed in Ontology Web Lan-guage (OWL) — W3C’s official recommenda-tion for the standard ontology language. Theyare available at http://iis.fon.bg.ac.yu/TANGRAM/ontologies.html.

ALOCoM ContentStructure Ontology

The ALOCoM Content Structure(ALOCoM CS) ontology is an extension of theAbstract Learning Object Content Model(ALOCoM) (Verbert, Klerkx, Meire, Najjar, &Duval, 2004) with certain concepts of the IBM’sDarwin Information Typing Architecture(DITA)2. The ontology defines a number ofconcepts for different types of content unitsthat form the structure of an LO. The first ver-sion of the ontology is elaborated in Jovanovi��,Gaševi��, Verbert, and Duval (2005). However,having further studied existing LO contentmodels and content packaging formats (e.g.,SCORM Content Aggregation Model — CAM3,MPEG-214), we made a major revision of theontology and split it into two parts: an ontol-ogy of content structure and an ontology ofeducational content types.

The ALOCoM CS ontology distinguishesbetween content fragments (CFs), content ob-jects (COs), and LOs. CFs, formalized as in-stances of the alocomcs:ContentFragmentclass, are content units in their most basic form(e.g., text, audio, and video), and cannot be fur-ther decomposed. COs, formally representedas instances of the alocomcs:ContentObjectclass, aggregate CFs and add navigation. Navi-gational elements enable sequencing of CFs ina CO. Besides CFs, COs can also include otherCOs. LOs (alocomcs:LearningObject) aggre-gate COs around a single learning objective.

To enable more fine grained content structur-ing we analyzed the structure of widely usedcontent formats (primarily slide presentationsand textual documents) and identified a num-ber of specific content structuring types (e.g.,slide, slide body, title, table). These types areincluded in the ontology as subclasses of thethree root concepts (i.e., CFs, COs, and LOs).Finally, the ontology defines aggregation andnavigational relationships between contentunits. Aggregation relationships are repre-sented in the form of alocomcs:hasPart and itsinverse alocomcs:isPartOf properties. Naviga-tional relationships are specified as thealocomcs:ordering property that defines theorder of components in a CO or an LO in theform of an rdf:List. Figure 3 is a graphical repre-sentation of the ontology’s basic classes andproperties.

ALOCoM Content Type OntologyThe ALOCoM Content Type (CT) ontol-

ogy is also rooted in the ALOCoM model andhas CF, CO, and LO as the basic, abstract con-tent types. However, these concepts are nowconsidered from the perspective of pedagogi-cal/instructional roles they might have. There-fore, concepts like Definition, Example, Exer-cise, Reference are introduced as subclassesof the CO class, whereas concepts such asTutorial, Lesson, Test are some of the sub-classes of the LO class (Figure 4). The CF classis not sub-classed, as according to theALOCoM model (Verbert et al., 2004); an in-structional role can not be assigned to a singleCF. Creation of this ontology was mostly in-spired by a thorough examination of existingLO Content Models (such as SCORM [SCORM,2004] or Learnativity [Wagner, 2002]) as well asby a closely related work presented in Ullrich(2005). Concepts defined in the ontology areused to annotate LOs and their componentswith the pedagogical/instructional role(s) forwhich they were intended. One should notethat a CO can be assigned multiple pedagogi-cal roles, each one defined from a different per-spective: rhetorical, cognitive, supporting (Fig-ure 4).

Page 8: Ontology-Based Automatic Annotation of Learning Content

98 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Presently, the ALOCoM CT ontology hasa rather simple structure. It is more a taxonomythan a real ontology, since it defines only ahierarchy of concepts without specifying anykind of relationships among them. Despite itssimplicity, this ontology provided us withmeans to formally state identified pedagogical

role(s) of LOs and their components. Nonethe-less, our intention is to enrich the ontologywith semantic properties as formal expressionsof interrelations among different pedagogicalroles, and hence enable an advanced level ofreasoning.

Figure 3. ALOCoM Content Structure ontology — Basic classes and properties

Figure 4. Graphical representation of a part of the ALOCoM CT ontology

Page 9: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 99

Domain OntologyThe SKOS Core ontology (http://

www.w3.org/2004/02/skos/core/) is used as thebasis of the IIS course domain ontology. SKOSCore is a member of the SKOS family of ontolo-gies developed through W3C’s Simple Knowl-edge Organization System (SKOS) and the Se-mantic Web efforts. It is specifically developedto describe taxonomies and classificationschemes and hence has an excellent variety ofproperties to describe relationships betweentopics in a course.

We used an OWL binding of the SKOSCore ontology to formally represent sub-do-main of IIS5. Figure 5 illustrates a segment ofthe developed domain ontology. Each domainconcept is represented as an instance of theskos:Concept class, while the conceptualscheme of the IIS domain is represented as aninstance of the skos:ConceptScheme class. Theskos:inScheme SKOS property is used to as-sociate all defined instances of theskos:Concept class to the conceptual schemeof the IIS domain, that is, to the instance of theskos:ConceptScheme class as its formal repre-sentation. Likewise, each identified domain con-cept is assigned one or more aliases (terms typi-cally used in literature when referring to a con-cept) using SKOS properties: skos:prefLabel,skos:altLabel, and skos:hiddenLabel. SKOSsemantic properties, that is, properties derivedfrom the skos:semanticRelation property, en-abled us to structure the IIS domain in a gener-alization hierarchy (via skos:broader and itsinverse skos:narrower properties), as well asto define semantic relations between conceptsbelonging to different branches of the hierar-chy (via skos:related property). We usedskos:hasTopConcept property to relate mostgeneral domain topics (Intelligent Agents, Se-mantic Web, etc.) to the IIS concept scheme,thus formally stating that these concepts formthe top level of the created concepts hierarchy.

One should note that the domain ontol-ogy does not contain any information regard-ing topics sequencing, in terms of the order inwhich the topics should be presented to the

students. That kind of information is storedseparately in the Learning Paths ontology.

Other Developed OntologiesBesides the above-mentioned ontologies,

TANGRAM’s functionalities also largely de-pend on the Learning Paths and the User Modelontologies. Since these two ontologies are notessential for the automatic annotation of con-tent units, we just briefly explain them. For moredetails about these two ontologies and theirroles in TANGRAM one can refer to Jovanovi��,Gaševi��, and Deved��i�� (2006).

The Learning Paths (LP) ontology de-fines learning trajectories through the topicsdefined in the domain ontology. We definedthis ontology as an extension of the SKOS Coreontology that introduces three new properties:lp:requiresKnowledgeOf, lp:isPrerequisiteFor, and lp:hasKnowledgePonder. The firsttwo are semantic properties defining prerequi-site relationships between domain topics,whereas the third one defines difficulty level ofa topic on the scale from 0 to 1. The LP ontol-ogy relates instances of the domain ontologythrough an additional set of semantic relation-ships reflecting a specific instructional ap-proach to teaching/learning IIS. The main ben-efit of decoupling the domain knowledge fromthe pedagogical knowledge is to enable reuseof the domain ontology — even if the appliedpedagogical approach changes, the domainontology remains intact.

The User Model (UM) ontology formallyrepresents relevant information aboutTANGRAM users (both content authors andstudents). To enable interoperability with otherlearning applications and enable exchange ofusers’ data, we based the ontology on the offi-cial specifications for user modeling: IEEE PAPILearner (http://edutool.com/papi/) and IMS LIP(http://www.imsglobal.org/profiles/). Specifi-cally, the ontology focuses only on those ele-ments of the specifications that proved to beessential for TANGRAM’s functionality.

Page 10: Ontology-Based Automatic Annotation of Learning Content

100 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Ontology-Based Approach toPersonalization of Learning Content

In this section we briefly explain howTANGRAM leverages the synergy of the pre-sented ontologies and the automatically gen-erated semantic annotations to dynamicallybuild personalized learning content. In otherwords, having presented TANGRAM’s onto-logical foundation, we are able to provide moredetails on the TANGRAM’s functionalities in-troduced in the “What Does TANGRAM Pro-vide to a Student?” section.

A learning session starts after a user (reg-istered and authenticated as a learner) selects asub-domain of IIS to learn about. The systemverifies the learner’s knowledge of the chosensub-domain using the data stored in thelearner’s model, the IIS domain ontology, andthe LP ontology. Specifically, the LP ontologyis queried for the prerequisite topics for theselected sub-domain (i.e., topics related vialp:requiresKnowledgeOf property with thesub-domain’s topics). Subsequently, the learnermodel is queried for the learner’s level of knowl-edge about the topics of the selected sub-do-main as well as the identified set of prerequisite

concepts. The acquired information enablesTANGRAM to build a visual representation ofthe sub-domain (i.e., its hierarchical organiza-tion of concepts) in the form of an annotatedtree of links, exploiting link annotation and linkhiding techniques (Brusilovsky, 1998). Oneshould note that TANGRAM does not aim tomake a choice for a learner. Instead, the systemprovides adaptive guidance to direct the learnertoward the most appropriate topics for him/her,but eventually lets him/her decide on the topicto learn and the content from which to learn.

After the learner selects a domain con-cept from the topics tree, on-the-fly assemblyof learning content begins. Firstly,TANGRAM’s repository of LOs is searched forcontent units covering the selected domaintopic. The search is based on the dc:subjectmetadata element of the content units stored inthe repository. If content units on the selectedtopic are not available, the learner’s model isconsulted for the learner’s learning style, spe-cifically for its Sequential-Global dimension. Ifthe learner is described as a global learner, pre-ferring holistic approach and learning bestwhen provided with a broad context of the topic

Figure 5. A segment of the domain ontology describing the concepts of the Semantic Web

Page 11: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 101

of interest (Felder & Silverman, 1988), contentunits covering advanced topics (as specifiedin the LP ontology) are used instead. Other-wise6, the system informs the learner that thelearning content on the selected topic is cur-rently unavailable and suggests other suitabletopics. Subsequently the retrieved content unitsare grouped according to the same parent LOcriterion (following the containment hierarchyvia content units’ alocomcs:isPartOf property).Then, exploiting the alocomcs:ordering prop-erty of the group’s parent LO, each group issorted to reflect the original order of contentunits. Each sorted group (in the subsequenttext referred to as assembly) is assigned a rel-evancy — a decimal number between 0 and 1that reflects its compliance with the learner’smodel, that is, its relevancy for the learner. Thecomputation of an assembly’s relevancy isbased on the data stored in the learner’s model,such as the learner’s learning style, preferences,and the learning history. Subsequently, the as-semblies are sorted according to the calculatedrelevancy, and their descriptions are presentedto the learner. Description of an assembly isactually the value of the dc:descriptionmetadata element attached to the LO that thecontent of the assembly originates from. Asthe learner selects an assembly from the list,the system presents its content using its ge-neric form for presentation of dynamically as-sembled learning content. Finally, the learnermodel is updated.

ANNOTATION OFCONTENT UNITS

The majority of metadata required forannotation of an LO are directly (manually) sup-plied by the content author, when uploadingthe LO to the repository (Figure 6). In otherwords, LOs are semi-automatically annotated.However, annotation of LOs’ components isfully automated. In this section we firstlypresent the profile of the LOM RDF Bindingthat we developed to annotate content units inTANGRAM and then proceed to explain auto-matic generation of metadata for LO’s compo-nents.

TANGRAM’s LOM RDFBinding Profile

Each content unit should be annotatedin order to be more easily searchable and thusreusable. Annotations of content units inTANGRAM are based on the IEEE LOM stan-dard. However, since TANGRAM is envisionedas an application for the Semantic Web, that is,Web aimed for both human and machine con-sumption, all the data it deals with needed to bepresented in a machine comprehensible format.This means that not only content units but alsotheir metadata must be expressed in a SemanticWeb language. Accordingly, our starting pointwas the official proposal for the IEEE LOM RDFBinding specification (Nilsson, 2002). However,not all LOM elements are used, but a subsetnecessary to support the intendedfunctionalities of the system. In other words,we created an application profile of the LOMRDF Binding. Figure 7 illustrates elements ofthe profile that we made for TANGRAM.

All metadata elements presented in Fig-ure 7 are fully compliant with the LOM RDFBinding specification, except for two elements:learning resource type and classification.

Learning Resource Type. We introducedthe alocom-meta:type property instead of therdf:type property that is suggested by the LOMRDF Binding to be used for specifying learningresource type of a content unit. The reason forthis deviation from the official proposal lies inthe following: we introduced the alocom-meta:Metadata class to represent a metadataset attached to a content unit. Since an instanceof this class, representing one particular set ofmetadata, already has its own rdf:type prop-erty set (pointing to the alocom-meta:Metadataclass, see Figure 14b), adding another propertyof the same type, but with different semantics(type of learning resource) would bring in aconfusion. Furthermore, we do not use the LOMrestricted vocabulary as values of this element,as it mixes concepts of instructional (e.g., Exer-cise, Simulation, Experiment) and technical (e.g.,Diagram, Graph, Image) nature (Ullrich, 2005).Instead, we use concepts of the ALOCoM CTontology, as they describe instructional aspects

Page 12: Ontology-Based Automatic Annotation of Learning Content

102 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

of content units, thus properly reflecting thesemantics of this metadata element.

Classification. We use the dc:subjectproperty to point to a concept from the domainontology. This does not fully conform to theIEEE LOM RDF Binding since our domain on-tology is not an LOM Taxonomy, as it is sup-posed to be according to the binding specifica-tion. Furthermore, we use lom-cls:accessibityRestrictions to specify somefeatures of a student’s learning style the LO issuitable for. The IEEE LOM RDF Binding specifi-cation defines this property, without imposingany specific restrictions on the domain of itsuse. The application of this property inTANGRAM was inspired by the work of Dolog,Gavriloaie, Nejdl, and Brase in the ELENAproject (Dolog et al., 2003). They used this prop-erty to annotate an LO with access requirementsexpressed in terms of knowledge/competenciesa student needs to have in order to access theLO. As Figure 7 shows, the range of the lom-cls:accessibilityRestriction property in

TANGRAM’s LOM RDF Binding profile is re-stricted to instances of the tangram-um:LearningStyle class defined inTANGRAM’s User Model ontology.

Details of Automatic Annotationof LOs and their Components

in TANGRAMAutomatic annotation of LOs’ compo-

nents is performed by the Content Manage-ment Module (see Figure 2) as the final step inthe process of uploading a new LO toTANGRAM’s repository of LOs. It is precededby the decomposition step during which a sub-mitted LO is disaggregated into its constituentcontent units. Since the ALOCoM CS ontol-ogy provides an explicit definition of the LOcontent structure, formally specifying both LOcomponents and relationships between thosecomponents, it served as the foundation forthe disaggregation process. Actually, this de-composition can also be regarded as a metadatageneration process, since it provides content

Figure 6. Screenshot of TANGRAM’s page for annotation of uploaded LOs

Page 13: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 103

units with implicit metadata — structure relatedmetadata.

The process of automatic annotation ofLOs’ components is mostly based on a top-down approach, meaning that metadata for de-scribing components of an LO are derived fromthe metadata assigned to their parent LO. Pe-culiarities of this top-down approach can besummarized as follows:

• The values of some metadata elements areliterally copied from an LO to its components.This is how values are assigned todc:creator, dcterms:created, anddc:language metadata elements, refereeingto the author(s), date of creation, andlanguage(s) of a content unit, respectively.

• Some metadata elements of TANGRAM’sLOM RDF Binding profile are meaningfulonly in the context of an LO as a whole.Therefore, they are not supposed to be as-signed to the components smaller than LOs.Those metadata elements are: lom-

edu:difficulty (difficulty level of an LO) andlom-cls:accessibilityRestrictions (referringto the learning styles that an LO is particu-larly suitable for).

• The values of the other metadata elementsof a content unit are mined from its contentand presentational context. In the next sub-section we explain in details automatic gen-eration of values for those metadata ele-ments.

Mining Metadata Values

dc:title elementThe dc:title metadata element is assigned

only to COs of the alocomcs:Slide andalocomcs:SlideBody types; as for the othertypes of COs covered by the current version ofTANGRAM, this metadata element is of nomeaning. We use the text of the slide’s title toassign value to the dc:title metadata elementof the corresponding alocomcs:Slide andalocomcs:SlideBody instances. If a slide does

Figure 7. TANGRAM’s profile of the IEEE LOM RDF Binding

Page 14: Ontology-Based Automatic Annotation of Learning Content

104 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

not have a title, these instances will not havethe dc:title element in their metadata set.

The dc:title metadata element is assignedonly to the one type of CFs that TANGRAMdeals with, in particular, to instances of thealocomcs:Image class. We have noticed thatauthors of slide presentations very rarely usecaptions to describe the content (semantics) ofthe images appearing on their slides. In orderto fill this gap, we generate a textual value thatreflects the semantics of the content of an im-age and could serve as its caption. Therefore, ifa slide (i.e., slide body, to be more precise) con-tains an image, we generate a “caption” for theimage using the following template: “Figure<ordinal_num>. illustrating <title_of_the_slide>”. The generated value is assigned to thedc:title metadata element of the correspondingalocomcs:Image instance.

Finally, it is worth mentioning that dc:titleis one of the metadata elements of LOs that weautomatically generate a value for. Its value isgenerated from the title of the whole slide pre-sentation. Still, the author is given a chance tomodify the generated value.

dc:subject elementTo semantically annotate a CO with

concept(s) from the domain ontology we applya simple text mining approach. The starting

point is the concept(s) of the domain ontologythe author used to semantically markup the LOwhose components (i.e., constituent COs) weintend to annotate. To illustrate this approach,let us assume that the CO to be annotated isthe slide shown in Figure 8a. Additionally, weassume that this slide originates from a slidepresentation (i.e., LO) manually marked-up (us-ing the user interface shown in Figure 6) withthe ‘Semantic Web’ concept of the SemanticWeb. More technically, this means that the LO’sdc:subject metadata element was assigned areference to the iis:semweb instance of the do-main ontology (presented in the center of Fig-ure 5).

The first step is to query the domain on-tology for concepts that are semantically re-lated to the starting domain concept(s) (theconcept of the Semantic Web in our example).We assumed domain concepts as semanticallyrelated if they are interconnected via theskos:semanticRelation property and/or its sub-properties: skos:narrower, skos:broader orskos:related (see the “Domain Ontology” sec-tion for more details). The retrieved conceptsand their aliases, that is, labels assigned to themas values of skos:prefLabel, skos:altLabel iskos:hiddenLabel properties, are stored in ahashmap and serve as the basis of the subse-quent steps of the annotation process. Each

Figure 8. Recognition of domain concepts in a slide’s content: (a) slide to be semanticallymarked-up; (b) a segment of the slide’s metadata — Inferred values for the dc:subject element

Semantic MarkupSemantic Markup

•• Objectives and effects of semantic Objectives and effects of semantic markupmarkupof Web contentsof Web contents•• authoring tools should let Web page authors authoring tools should let Web page authors

create create markupmarkup through selections and formsthrough selections and forms

•• authors should be able to:authors should be able to:•• choose ontologies from a listchoose ontologies from a list

•• choose attributes and relations from another listchoose attributes and relations from another list

•• edit, add, remove, and merge ontologiesedit, add, remove, and merge ontologies

<rdf:Description rdf:about="http://alocom/metadata.owl# meta-Slide130074962749_356">

<!--..-> <dc:subject> http://tangram/iis-domain.owl#semweb02 </dc:subject> <dc:subject> http://tangram/iis-domain.owl#knowrep002 </dc:subject> <!--..->

</rdf:Description>

(a) (b)

Page 15: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 105

entry of the created hashmap consists of a key— URI of the domain concept, and a value — alist of the concept’s aliases retrieved from thedomain ontology. In our example, the hashmapwould contain one entry for each domain con-cept that is narrower in meaning than the Se-mantic Web concept (specifically, domain con-cepts with URIs: iis:semweb01, iis:semweb02,iis:semweb03), as well as those that are other-wise semantically related (through theskos:related property) to the Semantic Web(e.g., iis:knowrep002, iis:appl01) through theskos:releted property. We refer readers to Fig-ure 5 as it illustrates the segment of the domainontology that we discuss in this example, andthus can make the example more comprehen-sible. One entry of the hashmap from the ex-ample would have the form shown in Figure 9.

Subsequently each component of theslide containing text is searched for the aliasesstored in the hashmap, and if some of them arefound, the component (i.e., CO or CF) is anno-tated with the domain concepts to which thealiases refer. Afterward, we apply a bottom-upapproach to generate a value for the slide’sdc:subject element: the slide is annotated witha union of concepts assigned to its compo-nents. Figure 8b presents a segment of themetadata set assigned to the slide from the ex-ample (Figure 8a). As the figure shows, the slideis annotated with two domain concepts: theSemantic Annotation concept (iis:semweb02)and the Ontology concept (iis:knowrep002),since aliases of those concepts are identified inthe slide’s content. If no concept can be minedfrom the CO’s content, the CO is annotated withconcepts attached to the parent LO during theprocess of manual annotation.

For CFs that do not contain text at all, likeCFs of the alocomcs:Image type, this approach

is not applicable. Currently, in the absence of abetter solution, such CFs directly inherit thevalue of the dc:subject metadata from the COsin which they are aggregated.

Furthermore, the slide presented in Fig-ure 10a can help us explain the combined top-down & bottom-up approach we apply to pro-vide values for the dc:subject metadata elementof TANGRAM’s LOM RDF Binding profile.Observing the figure, one can notice that do-main concept(s) that best describe(s) the se-mantics of the slide’s content can only be in-ferred from the title of the slide (the title con-tains one of the aliases of a domain concept).Performing the text analysis of the slide’s title,we can identify XML as a domain concept thatshould be assigned to the dc:subject metadataelement of the title as a content unit (i.e., to theinstance of the alocomcs:Title class, as an on-tological equivalent of the slide’s title). As wehave explained, applying the bottom-up ap-proach we assign the same concept to thedc:subject metadata element of the slide thataggregates the title. Next, we analyze the con-tent of the slide’s body and each of its compo-nents, and find out that we cannot identify anyconcept of the domain ontology. Therefore, weapply the top-down approach, meaning that theXML domain concept, previously included inthe slide’s metadata set (via the bottom-up ap-proach), is now used to semantically markupcomponents that the slide aggregates. Specifi-cally, in this example only the semantic annota-tion of the paragraph aggregated in the slide’sbody is really relevant, since it is the only com-ponent of the presented slide that will poten-tially be reused.

Figure 9. An example entry of the hashmap used in the annotation process

key: iis:semweb02 value: {Semantic Annotation, Semantic Markup, Semantic Description, Ontology-based Annotation}

Page 16: Ontology-Based Automatic Annotation of Learning Content

106 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

alocom-meta:type elementThis metadata element is used to anno-

tate LOs and COs, but not for CFs, as accord-ing to the ALOCoM model (Verbert et al., 2004)an instructional role can not be assigned to asingle CF.

Due to the lack of well defined formatsfor representing learning content of a certaininstructional role (e.g., an explicit format for rep-resenting definitions), we opted for a heuris-tics-based approach to infer instructional roleof learning content units. The heuristics thatwe use are partially founded on our previousjoint research efforts done with the ARIADNEgroup (http://www.ariadne-eu.org/) from K.U.Leuven, Belgium. Together, we did some initialresearch aimed at defining patterns for recog-nizing content units having instructional roleof alocomct:Definition, alocomct:Example,and alocomct:Reference (Verbert, Jovanovi��,Gaševi��, Duval, & Meire, 2005). These patternsare defined using the experience discussed inLiu, Chin, and Ng (2003). Here, we explain howcontent units of type alocomct:Example arerecognized in TANGRAM. Figure 11 presentspatterns that we use to check whether a con-tent unit is an example of a certain domain con-cept. In other words, these patterns enable usto test if a content unit can be marked-up withalocomct:Example concept as its instructionalrole (i.e., value of the alocom-meta:typemetadata element).

It is important to note that the patternsshown in Figure 11 enable us to identify con-tent units indicating the appearance of an ex-ample. In other words, they help us recognize acontent unit that precedes an example. Figures8a and 8b further explain the approach. Figure10a shows a typical organization of a slide pre-senting an example of a domain concept: thetitle of the slide gives information about thedomain concept that the example refers to, whilethe slide’s body actually contains the example.To be more precise, the first component of theslide’s body is a list (an instance ofalocomcs:List) with only one list item (an in-stance of the alocomcs:ListItem) that, accord-ing to the pattern number 4 from Figure 11,

should be classified as an example (i.e., havinginstructional role of an example). However, it isobvious that such a conclusion would be incor-rect. Actually, the subsequent component of theslide’s body — a paragraph in this case (an in-stance of the alocomcs:Paragraph concept) —should be classified as an example. On the otherhand, it would be hardly possible to deduce theinstructional role of this paragraph just by ana-lyzing the text it contains. Fortunately, its struc-tural context gives us this valuable informa-tion. In the same manner we defined and ap-plied patterns to recognize definitions.

The slide presented in Figure 10a is suit-able for explaining another specific feature ofour approach to annotation of content units.When this slide is uploaded (as a part of thepresentation from which it originates) toTANGRAM’s repository of LOs, the ContentManagement Module actually stores an in-stance of the of the alocomcs:Slide class, aswell as an appropriate ontological instance foreach component constituting the structure ofthis slide (alocomcs:Title, alocomcs:SlideBody,alocomcs:List,...). Figure 10b provides graphi-cal representation of the content uploaded tothe repository (the slide structured accordingto the ALOCoM CS ontology). Furthermore,the Content Management Module uploadsmetadata to the repository: metadata for theslide as a whole, as well as metadata for eachthe slide’s component that can be reused. Oneshould note that metadata are not literallystored for every component of the slide. In-stead, we store metadata only for those con-tent units that are really reusable, in the sensethat we can realistically expect someone will beinterested to retrieve them form the repositoryand reuse. For example, in the case of the slidefrom Figure 10a, only metadata assigned to theslide as a whole and to the paragraph contain-ing the text of the example will be uploaded tothe repository (the ellipses highlighted in Fig-ure 10b). The rationale is that it is highly un-likely that someone would be interested in re-using a content unit that contains only the text“Example 2” or a content unit holding the titleof the slide.

Page 17: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 107

Figure 10. A typical organization of a slide presenting an example of a domain concept (a); thesame slide in the ALOCoM CS ontology compliant representation

(b)

(a)

Besides this pattern-based approach, weapply the following simple heuristics to deter-mine the instructional role of slides (COs oftype alocomcs:Slide):

• If the title of a slide contains one of the fol-lowing terms: “Content,” “Outline,” or

“Overview,” and the content of the slide’sbody is presented in the form of a list ofitems, the slide is assumed to have instruc-tional role of the type alocomct:Overview.Similarly, if the title of a slide is “Summary”or “Conclusion,” while the content of theslide’s body is structured in the form of a

Page 18: Ontology-Based Automatic Annotation of Learning Content

108 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

list, the alocom-meta:type metadata elementof that slide is assigned a reference to thealocomct:Summary concept.

• If we can identify an alias of a certain do-main concept(s) in the text of the slide’s title,and the slide’s body contains only an image(i.e., one or more CFs of the alocomcs:Imagetype), the slide is assumed to be an illustra-tion of the domain concept(s) identified inthe slide’s title. Therefore, the slide ismarked-up with alocomct:Illustration as itsinstructional role.

• If the content of the slide’s title is one of thefollowing terms/phrases: “Bibliography,”“References,” “Reference list,” while thecontent of the slide’s body is structured asa list, the instructional role of the slide is pre-sumed to be of type alocomct:Bibliography.Additionally, each list item appearing in theslide’s body is assumed to be ofalocomct:Reference instructional type.

dc:description elementWe generate a value for the

dc:description metadata element of a contentunit starting from the (inferred) values for otherelements of its metadata set. This metadata isautomatically generated both for COs and forLOs, that is, it is one of the metadata elementsthat are automatically generated even for LOs.Figure 12a shows the template used for gener-ating a description of an LO, that is, a value forthe LO’s dc:description metadata element.Note that metadata elements appearing in theangled brackets in the template are replaced bytheir actual values. Curly brackets indicate thatthe enclosed element can have multiple values,as the example in Figure 12b illustrates. Thefigure presents automatically generated de-scription for the LO from which originates theslide shown in Figure 10.

To generate a value for thedc:description element of a CO, we apply the

Figure 11. Patterns applied in TANGRAM for recognizing examples

1. {example, instance, case, illustration, sample, specimen} [of {concept} ] [:] 2. {for instance | e.g. | for example | as an example} [, | :] 3. {concept} {is | are} [adverb] {illustrated by | demonstrated by | shown by} [:] 4. {Example | example} [ord.num.] [of {concept}] [- | : ]

Figure 12. dc:description metadata element, templates (a, c), and examples (b, d)

“A <alocom-meta:type> with title: ‘<dc:title>’ authored by <dc:creator>; creation date <dcterms:created>; evaluated by the author as being of <lom-edu:difficulty> difficulty level and treating issues of {<dc:subject>}”

(a) “A tutorial with title: ‘Languages for the Semantic Web’ authored by Vladan Devedzic; creation date 22-09-2004; evaluated by the author as being of lom-edu:MediumDifficult difficulty level and treating issues of XML, XML Schema, RDF, RDF Schema”.

(b) “A <alocom-meta:type> of {<dc:subject>}; originating from <sketch of the parent LO>”

(c) “A example of XML originating from tutorial with title ‘Languages for the Semantic Web’ authored by Vladan Devedzic; creation date 22-09-2004.”

(d)

Page 19: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 109

template shown in Figure 12c. One should no-tice that the element “sketch of the parent LO”from the template refers to the concise versionof the template presented in Figure 12a. Moreprecisely, it is a part of an LO’s description madeaccording to the following template: “A<alocom-meta:type> with title: ‘<dc:title>‘authored by <dc:creator>; creation date<dcterms:created>.” Figure 12d shows the au-tomatically generated value for dc:decriptionmetadata element of the slide from Figure 10.

dc:identifier elementEach instance of the ALOCoM CS ontol-

ogy that represents either an LO or a contentunit of an LO (CF, CO, or one of their subclasses)is assigned a unique identifier. The identifier isstored in the form of the dc:identifier metadata.To generate a unique value for this metadataelement, we use a modified version of the algo-rithm proposed in Vaucher and Ncho (2004).Figure 13 presents a method of thetangram.utility.BasicUtilities class that we useto generate content units’ IDs.

The top part of Figure 14 shows the OWLXML binding of the slide presented in Figure10; the bottom part of the figure shows theslide’s metadata. As the figure shows, a slide isrelated to its metadata set (an instance of thealocom-meta:Metadata class) via alocom-core:metadata property. Additionally, eachslide is related with its components throughalocom-core:hasPart and alocom-core:ordering properties, whereas alocom-core:isPartOf property is used to establish therelationship between the slide and its parentLO (i.e., slide presentation). The same formal-ism is used to store all other types of contentunits (and their metadata) in TANGRAM’s LOrepository.

EVALUATIONWe did an evaluation of TANGRAM’s

annotation subsystem. Although limited inscope, the evaluation helped us identifystrengths and weaknesses of the current solu-tion. The evaluation was primarily focused onslides as content units that proved to be the F

igur

e 13

. Met

hod

for

gene

rati

ng a

uni

que

valu

e fo

r a

cont

ent

unit

’s i

dent

ifie

r

public class BasicUtilities {

protected static int cidCounter = 0;

public String genComponentID(String cuType) {

String cidBase = cuType + hashCode() + System.currentTimeMillis()%10000 + "_";

return cidBase + (cidCounter++);

}

}

Page 20: Ontology-Based Automatic Annotation of Learning Content

110 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Fig

ure

14. A

n in

stan

ce o

f the

alo

com

cs:S

lide

clas

s in

XM

L s

ynta

x (a

), a

nd it

s m

etad

ata

com

plia

nt to

TA

NG

RA

M’s

LO

M R

DF

Bin

ding

pro

file

(b)

<rdf:Description rdf:about="http://alocom/content-structure/slide.owl#Slide130074962749_356">

<alocom-core:metadata rdf:resource="http://alocom/metadata.owl#meta-Slide130074962749_356"/>

<alocom-core:hasPart rdf:resource="http://alocom/content-structure/slide.owl#Title130074962749_358"/>

<alocom-core:hasPart rdf:resource="http://alocom/content-structure/slide.owl#SlideBody130074962749_357"/>

<alocom-core:ordering rdf:nodeID="A333"/>

<rdf:type rdf:resource="http://alocom/content-structure/slide.owl#Slide"/>

<alocom-core:isPartOf rdf:resource="http://alocom/content-structure/slide.owl#SlidePres130074962368_0"/>

</rdf:Description>

(a

) <rdf:Description rdf:about="http://alocom/metadata.owl#meta-Slide130074962749_356">

<dc:format rdf:resource="http://ltsc.ieee.org/2002/09/lom-educational#mime03"/>

<dc:title>XML</dc:title>

<lom-edu:difficulty rdf:resource="http://ltsc.ieee.org/2002/09/lom-educational#medium_difficult"/>

<dc:subject>http://tangram/iis-domain.owl#xmltech01</dc:subject>

<alocom-meta:type rdf:resource="http://alocom/content-model.owl#Example"/>

<dc:identifier rdf:resource="http://alocom/metadata.owl#/id/Slide130074962749_356"/>

<dc:creator rdf:nodeID="A98"/>

<dc:language rdf:resource="http://ltsc.ieee.org/2002/09/lom-educational#lang01"/>

<rdf:type rdf:resource="http://alocom/metadata.owl#alocom-metadata"/>

<dc:description>I contain a example of XML originating from: A tutorial with title:"Languages for The

Semantic Web" authored by Vladan Devedzic</dc:description>

</rdf:Description>

(b

)

Page 21: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 111

most reusable. Additionally, we were primarilyinterested in evaluating the precision of the tech-niques and heuristics applied for semanticmarkup of learning content, since semanticmetadata proved to be the most relevant forautomating content reuse. Accordingly, recog-nition of domain concept(s) and instructionalrole(s) of content units was the central point ofthe conducted evaluation. The evaluation con-sisted of the following two parts:

1. Quantitative evaluation using well-knowninformation retrieval measures precision andrecall and involving human subjects tospecify the reference standard;

2. Qualitative evaluation involving humansubjects to provide their comments to therecognized concepts in both types of ontol-ogy-based recognition.

Test SetIn both evaluations of TANGRAM’s se-

mantic annotation subsystem, we used a set of54 slide presentations, all together consistingof 1,674 slides. The collected slide presenta-tions were authored collected by the membersof both the GOOD OLD AI Research Group ofthe University of Belgrade and the Laboratoryfor Ontological Research of Simon Fraser Uni-versity. The analyzed slide presentations havebeen developed for different purposes such asteaching undergraduate and graduate courses,conference presentations, tutorials, and invitedtalks. All the slide presentations cover topicscaptured by the discussed domain ontology ofIIS.

Quantitative EvaluationStandard information retrieval evaluation

measures, precision and recall have also widelybeen adopted by the Semantic Web commu-nity for evaluating different tasks such as se-mantic annotation (Cimiano, Ladwig, & Staab,2005) and ontology alignment (Ehrig & Euzenat,2005). In order to perform the evaluation usingthis approach, we first had to define a refer-ence standard — a predefined model that is

used to compare against the results ofTANGRAM’s annotation subsystem.

Reference StandardA reference standard is usually defined

by human experts. However, it is very hard tohave a full agreement of domain experts upondifferent classification decisions (Calvo, Lee,& Li, 2004). In order to define as more confidentreference standard as possible, we asked threehuman subjects to collaboratively annotate allslides from the sample with respect to the do-main and ALOCOM CT ontologies. In fact, theyhad to make a consensual decision how eachslide from the sample is to be annotated withrespect to both the domain and ALOCoM CTontologies.

Definition of Evaluation MeasuresGiven a number of answers in the refer-

ence standard (|R|), precision (Pre) is definedas the ratio of the number of correct answers(|R ∩ A|) and total answers (|A|), while recall(Rec) is the ratio of the number of correct an-swers (|R ∩ A|) and the number of answers de-fined in the reference standard (|R|). Formallyspeaking, they are defined as follows:

correct answersPre

total answers

R A

A

∩= = (1)

correct answersRec

answers in reference standard

R A

R

∩= =

(2)

FindingsIn Table 1 we give averaged values of

precision and recall we obtained by annotatingthe analyzed test set using TANGRAM’s se-mantic annotation subsystem. We chose to usemicroaveraging where average precision andrecall are calculated by summing over all indi-vidual decisions for each specific slide(Sebastiani, 2002). That is to say, we consideras equal the annotation of each slide.

Page 22: Ontology-Based Automatic Annotation of Learning Content

112 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

The reason why recall for COs’ (i.e.,slides’) annotations with domain ontology con-cepts has the value 1 is that TANGRAM’s al-gorithm for annotation (see the “MiningMetadata Values” section) attaches the domainontology concept(s) of the parent LO (i.e., slidepresentation) if no concept can be mined fromthe CO’s content. Therefore, slides are alwaysannotated with respect to the domain ontol-ogy. Of course, a slide typically covers a moreor less narrower concept than its parent (i.e.,concept related via skos:narrower property inthe domain ontology; see the “Domain Ontol-ogy” section), or a concept that is in some otherway semantically related to the one assignedto its parent (via skos:related property of thedomain ontology). It may also happen that thetopic of a slide (due to its specificity) is notincluded in the domain ontology at all. For ex-ample, a presentation might be annotated withthe “KDD”7 concept, while one of its slidesmight define “belief-driven evaluation” andhow it is used to verify the relevancy of theknowledge acquired in a KDD process. How-ever, as the domain ontology does not define aconcept of “belief-driven evaluation,” theTANGRAM’s annotation subsystem is igno-rant of this domain concept (all its domainknowledge comes from the employed domainontology) and hence not able to determine theslide’s semantic by mining the slide’s content.As a result, the slide is annotated with the con-cept of “KDD” assigned to the slide’s parent— not actually incorrect, rather not preciseenough. However, as the standard evaluationmeasures can not distinguish between fullycorrect answers and almost correct answers(Ehrig & Euzenat, 2005), we decided to countsuch answers as correct when computing re-

call. Still, we thought it is important to see theinfluence of such almost correct answers onprecision, so we used them as incorrect whencomputing precision. Actually, we had 1494 fullycorrect answers instead of 1674 correct answersused for calculating recall. The result was a bitlower precision (0.89) than recall, but still verycompetitive with the relevant Semantic Webannotators (Uren, Cimiano, Iria, Handschuh,Vargas-Vera, Motta, & Ciravegna, 2006). Itshould also be noted that the annotation withdomain ontology concepts is rather influencedby the set of domain concepts that an LO au-thor assigns to the LO when annotating it (dur-ing the upload procedure). The more precisethat initial annotation is, the more chance thatdomain topics assigned to LO’s componentsare satisfactorily precise. This also leads us tothe conclusion that we should start exploringthe use of more advanced text processing cat-egorization techniques in order to avoid thebig influence of the manually made annotationsof LOs.

From Table 1 it is obvious that the valueof recall is much lower for pedagogical role rec-ognition than for domain concept recognition,as the manually submitted annotation of par-ent LOs (e.g., slide presentations) could not beapplied to child COs (i.e., slides). This is due tothe different content types that are allowed tobe assigned to LOs and COs according to theALOCoM CT ontology (shown in Figure 4).The value of recall shown in Table 1 can slightlybe increased (0.75) if we consider the fact thattitle slides of slide presentations actually donot have a pedagogical role. Precision for peda-gogical role recognition has almost the samevalue as precision of the domain ontology rec-ognition. Precision of the pedagogical role is

Table 1. Precision and recall for the analyzed test set

Type of annotation Recall Precision Domain ontology 1 (1674/1674) 0.89 (1494/1674) Pedagogical role 0.72 (1080/1503) 0.88 (1080/1224)

Page 23: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 113

much higher than the average precision (0.6123)given in the paper (Liu et al., 2003) that wasused as an inspiration for TANGRAM’s peda-gogical role annotation algorithm. This is prob-ably due to the fact that authors of slide pre-sentations use much less presentational pat-terns than the authors of many different typesof Web resource on the Web. However, thatshows that there is a lot of room for furtherevaluation of TANGRAM’s approach in otherdomains (especially none computer science re-lated) as well as on some other types of con-tent units.

Note also that the most frequently oc-curring type of a slide is the one that explains adomain topic. It is not a definition in a literalsense, but somehow it defines the topic underdiscussion. The natural question raised: Howto classify it? We decided to classify it as hav-ing definition as its pedagogical role, and thatresulted in a large number of definitions. A fur-ther discussion about this issue is given in thenext sub-section on qualitative evaluation.

Qualitative EvaluationTo evaluate the effectiveness of the pat-

terns and heuristics we used for recognizingpedagogical role(s) of LOs’ content units, weasked the authors of the slide presentationsfrom the sample to determine the instructionalrole of each slide (s)he authored. Then we com-pared their responses with the values thatTANGRAM automatically generated to de-scribe instructional roles of the same slides.The system generally proved as satisfactorilyeffective, except for recognizing definitions. Thiscan be explained by different comprehensionof the semantics of the term definition amongthe interviewed content authors: it turned outthat some of them had a strict, “mathematical”approach to this term, whereas others under-stood definition as any text that either formallyor informally defines a concept from the sub-ject domain. As we had the latter view in mindwhen formulating patterns for definition min-ing, it can be said that TANGRAM is well ca-pable of recognizing such kind of content units.

Additionally, we used the same sampleof slide presentations to determine how effec-tively TANGRAM infers the semantics of con-tent units, that is, the concepts of the subjectdomain to which they refer. Again, the evalua-tion was based on a comparative analysis ofthe authors’ and the system’s “perception” ofthe semantics of the slides from the sample. Wenoticed that the system has a problem differen-tiating between two domain concepts if an aliasof one concept is a part of an alias of anotherconcept. For example, the domain concept withURI iis:xmltech01 has “XML” as one of itsaliases, whereas the concept with URIiis:xmltech02 has an alias “XML Schema.”When the system encounters a content unitcomprising “XML Schema” phrase, it assignsboth iis:xmltech01 and iis:xmltech02 conceptsto the dc:subject metadata of the content unit.We are currently exploring how text mining tech-niques (e.g., part of speech taggers) can helpus solve this problem. It is important to notethat the algorithm for inferring the semantics ofcontent units is completely ignorant (and inde-pendent) of the subject domain; all its knowl-edge comes from the applied domain ontology.Therefore, the same algorithm can be used toinfer the semantics of any other domain, pro-vided that a content ontology of that specificdomain is available.

RELATED WORKThe KIM platform and framework pro-

vides a novel Knowledge and Information Man-agement infrastructure and services for auto-matic semantic annotation, indexing, and re-trieval of documents (Popov, Kiryakov, Kirilov,Manov, Ognyanoff, & Goranov, 2003). The plat-form is based on the PROTON ontology (http://proton.semanticweb.org), a light-weight upper-level ontology developed in the scope of theSEKT8 project, as well as on two KIM specificontologies: KIM System Ontology and KIMLexical Ontology. Additionally, KIM is equippedwith a Knowledge Base (KB) providing exten-sive coverage of entities of general importance.The platform comprises an infrastructure for

Page 24: Ontology-Based Automatic Annotation of Learning Content

114 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

information extraction and ontology-based an-notation. This infrastructure is based on Gen-eral Architecture for Text Engineering — GATE(http://gate.ac.uk), which has proved as a ma-ture, extensible, and application-independentframework for information extraction and othernatural language processing tasks. The advan-tage of our approach over the one on whichKIM is based, is that we automatically gener-ate values for a variety of metadata elementsaimed at content markup, and not focus onlyon the subject matter of the content as KIMdoes. On the other hand, KIM applies muchmore elaborated text mining techniques thanwe do.

PiggyBank lets Web users extract indi-vidual information items from within Web pagesand save them in a Semantic Web format (i.e.,RDF), together with their metadata (Huynh,Mazzocchi, & Karger, 2005). The items, collectedfrom different Web sites, can then be manipu-lated (browsed, searched, sorted, organized,etc.) together, regardless of their origins andtypes. On sites that do not publish RDF,PiggyBank can invoke screen-scrapers to re-structure information found within their Webpages into RDF format. Semantic Bank is a re-pository of RDF triples to which a communityof PiggyBank users can contribute and sharethe information they have collected. The coreidea of PiggyBank is similar to ours: collect in-formation resources from various sources,present them in an ontology-based format, andannotate them with metadata — the primarymotivation is the need to re-purpose such in-formation in order to cater the individual user’sneeds and preferences. Unlike PiggyBank,which targets Web pages and Web sites, wefocus on the learning resources presented inthe form of slide presentations. However, weplan to extend our system to other types ofLOs in our future research.

Semi-automatic annotation of learningresources based on document layout featuresis proposed in Dehors, Faron-Zucker,Stromboni, and Giboin (2005). The approachpresumes that each content author has a spe-cific pedagogical approach that reflects on the

structure and layout features of the documentshe/she creates. The annotation task begins byinterviewing the author of a document, in orderto determine the relations between the em-ployed presentational features and the envi-sioned educational approach. Subsequently, aphase of content re-authoring takes place toensure that the employed visual features arecompliant to the established instructionalmodel. Only then it is possible to automaticallyidentify and annotate content units accordingto their pedagogical role. The employed peda-gogical ontology is generated on the fly andincludes concepts that formalize elements of acontent author’s specific pedagogical strategy.Although this approach tends to be more pre-cise in recognition of instructional roles of con-tent units than the approach we propose, it isalso more restrictive as it requires from contentauthors to strictly obey to the once establishedauthoring styles. Additionally, it requires morehuman effort: interviewing the author and con-tent re-authoring. Finally, learning resources areannotated only with their instructional roles,whereas we use a range of metadata elementsto annotate them.

Automatic Metadata Generation (AMG)framework (Cardinaels, Meire, & Duval, 2005)is aimed for automatic annotation of LOs withmetadata compliant to the IEEE LOM schema.Unlike our system, AMG does not enable se-mantic annotation of LOs — it cannot formallyrepresent the semantics of LOs. Therefore,metadata that it generates are aimed only forhuman consumption — they cannot be com-prehended and used by intelligent agents orany other piece of software.

Context-driven and Pattern-basedANnotation through Knowledge On the Web(C-PANKOW) is a method for automatic se-mantic annotation of Web content. The mainidea herein is to approximate semantics by con-sidering information about the statistical distri-bution of certain syntactic structures over theWeb (Cimiano et al., 2005). The ambiguity, asan important problem in such an approach, istackled by taking into account the context inwhich the entity to be annotated appears. C-

Page 25: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 115

PANKOW applies much more elaborated tech-niques for content annotation than our ap-proach suggests. However, it focuses only onidentifying domain topics in analyzed docu-ments, while we also aim at formally represent-ing the structure of analyzed documents andmining the pedagogical role(s) of the contentunits comprising that structure. The commonfeature of the two approaches is that both arequite successful when focused on a specificdomain (formalized in a domain ontology), whilenot that effective for domain-neutral annota-tions (when working with a general purposeontology [i.e., WordNet, http://wordnet.princeton.edu/], C-PANKOW pro-duces significantly poorer results [Cimiano etal., 2005]). Additionally, both are currently re-stricted to a specific document format: C-PANKOW focuses on HTML documents,whereas TANGRAM focuses on slide presen-tations.

KNOWITALL is an autonomous domain-independent system that automates the pro-cess of extracting large collections of facts fromthe Web (Etzioni, Cafarella, Downey, Kok,Popescu, Shaked, et al., 2004a). The only do-main-specific input to KNOWITALL is a set ofclasses and relations to set its focus; no manu-ally tagged training set is required. Informationextraction is performed in two stages: (1) a setof domain-independent extraction patterns isused to generate candidate facts, (2) the plau-sibility of the candidate facts is evaluated us-ing the pointwise mutual information (PMI)9

measure. The authors have provided three ex-tensions to the baseline system, namely rulelearning, subclass extraction, and list extrac-tion, hence improving the overall performanceof the system (Etzioni, Cafarella, Downey, Kok,Popescu, Shaked, et al., 2004b). The approachof Etzioni et al. can be viewed as being orthogo-nal to ours: while we are concerned with anno-tating a given document with appropriate do-main concepts, Etzioni et al. aim at learning thecomplete extension of a certain concept in or-der to build a search engine “knowing it all.”On the other hand, we believe that their workon automatic learning of domain specific rules

(i.e., patterns) can be equally well applied forsolving one of the main challenges we are facedwith — inferring pedagogical roles of contentunits originating from different domains andauthoring styles.

The recent research from the knowledgecapture field seems very relevant for the de-scribed approach. In Carenini, Ng, and Zwart(2005) the authors reported on their experiencein extracting knowledge from evaluative text.They tried to employ WordNet for discoveringsimilarities between a domain taxonomy andusers’ comments to some products written inplain text. A similar approach could be appliedin TANGRAM when automatically annotatingcontent units with respect to the domain ontol-ogy.

Finally, we briefly report what a recentcomprehensive study on the present state ofsemantic annotation (Uren et al., 2006) has tosay about automatic annotation. The study rec-ognizes three general categories of automationapproaches. The most common one uses manu-ally written rules (patterns or wrappers), hencerelying on the structure of documents (i.e., texts)for inferring proper mark-up. Our approach be-longs to this category, as well as the previouslymentioned KIM and PiggyBank systems. Theother two kinds of systems apply diverse ma-chine-learning approaches to learn how to an-notate content. Supervised systems learn fromsample sets of manually marked up documents.Their main disadvantage is that picking enoughgood examples is a non-trivial and error-pronetask. Unsupervised systems (the third category)are starting to tackle this challenge by exploit-ing unsupervised learning techniques (e.g., C-PANKOW, KNOWITALL). Uren et al. also iden-tify the present research challenges, amongwhich relation extraction and annotation ofmultimedia documents (images, audio, video)are the most notable.

CONCLUSIONAiming at reducing the cost of authoring

high quality learning materials, we first devel-oped an ontological foundation, calledALOCoM, for describing the structure and

Page 26: Ontology-Based Automatic Annotation of Learning Content

116 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

pedagogical role(s) of LOs and their compo-nents. Subsequently we extended that founda-tion by developing an approach aimed at auto-matic annotation of LOs and their content units.In this paper we present the developed ap-proach that relies on both ALOCoM and do-main ontologies. The general principles of on-tology-based annotation of LOs’ content unitsare implemented in TANGRAM, our learningenvironment for the domain of Intelligent In-formation Systems. Although the proposedapproach is illustrated on a specific domain (In-telligent Information Systems), it is domain in-dependent and can be applied to any otherdomain just by using another domain ontol-ogy. A brief description and demonstration ofTANGRAM’s functionalities as well as the on-tologies referred to in the paper can be found athttp://iis.fon.bg.ac.yu/TANGRAM/home.html

Our future work will be directed towardimproving existing functionalities ofTANGRAM’s annotation subsystem and aug-menting it with additional ones required for rec-ognition of pedagogical roles not included inthe current solution. Specifically, our intentionis to empower TANGRAM with advanced fea-tures of the latest frameworks for natural lan-guage processing and information extractiontasks, such as: the already mentioned GATEand KIM (Popov et al., 2003) frameworks, aswell as MontoLingua (http://web.media.mit.edu/~hugo/montylingua) — an end-to-end naturallanguage processor for English with commonsense. Furthermore, we plan to explore the po-tentials of learning designs and other formaleducational modeling languages to serve assources of context-related metadata for LOs andtheir content units. We believe that context-relevant metadata can be derived from descrip-tions of the learning processes in which LOshave actually been used or are intended to beused.

REFERENCES1484.12.1 IEEE standard for learning object

metadata. (2002, June). Retrieved October15, 2005, from http://ltsc.ieee.org/wg12

Brusilovsky, P. (1998). Adaptive hypertext and

hypermedia. The Netherlands: Kluwer Aca-demic Publishers.

Calvo, R. A., Lee, J. M., & Li, X. (2004). Manag-ing content with automatic document clas-sification. Journal of Digital Information,5(2), Article No. 282. Retrieved December20, 2005, from http://jodi.tamu.edu/Articles/v05/i02/Calvo/

Cardinaels, K., Meire, M., & Duval, E. (2005).Automating metadata generation: The simpleindexing interface. In Proceedings of the14th International WWW Conference, Chiba,Japan (pp. 548-556).

Carenini, G., Ng, R. T., & Zwart, E. (2005). Ex-tracting knowledge from evaluative text. InProceedings of the 3rd International Con-ference on Knowledge Capture, Banff,Canada (pp. 11-18).

Cimiano, P., Ladwig, G., & Staab, S. (2005).Gimme’ the context: Context-driven auto-matic semantic annotation with CPANKOW.In Proceedings of the 14th InternationalWWW Conference, Chiba, Japan (pp. 332-341).

Dehors, S., Faron-Zucker, C., Stromboni, J., &Giboin, A. (2005). Semi-automated semanticannotation of learning resources by identi-fying layout features. In Proceedings of theInternational Workshop on Applications ofSemantic Web Technologies for E-Learning,Amsterdam, The Netherlands (pp. 65-69).

Dolog, P., Gavriloaie, R., Nejdl, W., & Brase, J.(2003). Integrating adaptive hypermediatechniques and open RDF-based environ-ments. In Proceedings of the 12th Interna-tional WWW Conference, Budapest, Hun-gary (pp. 88-98).

Duval, E., & Hodgins, W. (2003). A LOM re-search agenda. In Proceedings of the 12thInternational WWW Conference, Budapest,Hungary (pp. 1-9).

Ehrig, M., & Euzenat, E. (2005). Relaxed preci-sion and recall for ontology matching. InProceedings of the KCAP2005 Workshopon Integrating Ontologies, Banff, Canada(pp. 25-32).

Etzioni, O., Cafarella, M., Downey, D., Kok, S.,Popescu, A.-M., Shaked, T., et al. (2004a).

Page 27: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 117

Web-scale information extraction inKnowItAll (preliminary results). In Proceed-ings of the 13th World Wide Web Confer-ence, Budapest, Hungary (pp. 100-109).

Etzioni, O., Cafarella, M., Downey, D., Kok, S.,Popescu, A.-M., Shaked, T., et al. (2004b).Methods for domain-independent informa-tion extraction from the Web: An experimen-tal comparison. In Proceedings of the 19thAAAI National Conference on Artificial In-telligence, San Jose, CA (pp. 391-398).

Felder, R., & Silverman, L. (1988). Learning andteaching styles in engineering education.Journal of Engineering Education, 78(7),674-681.

Huynh, D., Mazzocchi, S., & Karger, D. (2005).PiggyBank: Experience the Semantic Webinside your Web browser. In Proceedings ofthe 4th International Semantic Web Con-ference, Galway, Ireland (pp. 413-430).

Jovanovi��, J., Gaševi, D., & Deved��i, V. (2006).Dynamic assembly of personalized learningcontent on the Semantic Web. In 3rd Euro-pean Semantic Web Conference. Budva,Serbia & Montenegro (submitted).

Jovanovi, J., Gaševi, D., Verbert, K., & Duval, E.(2005). Ontology of learning object contentstructure. In Proceedings of the 12th Inter-national Conference on Artificial Intelli-gence in Education, Amsterdam, The Neth-erlands (pp. 322-329).

Liu, B., Chin, C. W., & Ng, H. T. (2003). Miningtopic-specific concepts and definitions onthe Web. In Proceedings of the 12th Inter-national WWW Conference, Budapest, Hun-gary (pp. 251-260).

Mohan, P., & Greer, J. (2003). Reusable learningobjects: Current status and future directions.In Proceedings of the 15th World Confer-ence on Educational Multimedia,Hypermedia and Telecommunications, Ho-nolulu, HI (pp. 257-264).

Nilsson, M. (Ed.). (2002). IEEE learning objectmetadata RDF binding. Retrieved October10, 2005, from http://kmr.nada.kth.se/el/ims/md-lomrdf.html

Popov, B., Kiryakov, A., Kirilov, A., Manov, D.,Ognyanoff, D., & Goranov, M. (2003). KIM

— Semantic annotation platform. In Pro-ceedings of the 2nd International Seman-tic Web Conference, Florida (pp. 834-849).

Sebastiani, F. (2002). Machine learning in auto-mated text categorization. ACM ComputingSurveys, 34(1), 1-47.

Sharable Content Object Reference Model(SCORM). (2004). Retrieved October 5, 2005,from http://www.adlnet.org/scorm/index.cfm

Stojanovi��, L. J., Staab, S., & Studer, R. (2001).E-learning based on the Semantic Web. InProceedings of the 6th World Conferenceon the WWW and the Internet, Orlando, FL.

Ullrich, C. (2005). The learning-resource-typeis dead, long live the learning-resource-type!Learning Objects and Learning Designs,1(1), 7-15.

Uren, V., Cimiano, P., Iria, J., Handschuh, S.,Vargas-Vera, M., Motta, E., & Ciravegna, F.(2006). Semantic annotation for knowledgemanagement: Requirements and a survey ofthe state of the art. Journal of Web Seman-tics, 4(1), 14-28.

Vaucher, J., & Ncho, A. (2004). JADE tutorialand primer. Retrieved July 5, 2005, from http://www.iro.umontreal.ca /~vaucher/Agents/Jade/JadePrimer.html

Verbert, K., Jovanovi��, J., Gaševi��, D., & Duval,E. (2005). Repurposing learning object com-ponents. In Proceedings of the OTM 2004Workshop on Ontologies, Semantics and E-learning, Agia Napa, Cyprus (pp. 1169-1178).

Verbert, K., Jovanovi��, J., Gaševi��, D., Duval,E., & Meire, M. (2005). Towards a globalcomponent architecture for learning objects:A slide presentation framework. In Proceed-ings of the 17th World Conference on Edu-cational Multimedia, Hypermedia and Tele-communications, Montreal, Canada (pp.1429-1436).

Verbert, K., Klerkx, J., Meire, M., Najjar, J., &Duval, E. (2004). Towards a global compo-nent architecture for learning objects: Anontology based approach. In Proceedingsof the OTM 2004 Workshop on Ontologies,Semantics and E-learning, Agia Napa,Cyprus (pp. 713-722).

Wagner, E. (2002). Steps to creating a content

Page 28: Ontology-Based Automatic Annotation of Learning Content

118 Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

strategy for your organization. The E-Learn-ing Developers’ Journal. Retrieved Novem-ber 20, 2005, from http://www.elearningguild.com/pdf/2/102902MGT-H.pdf

Winter, M., Brooks, C., & Greer, J. (2005). To-wards best practices for Semantic Web stu-dent modelling. In Proceedings of the 12thInternational Conference on Artificial In-telligence in Education, Amsterdam, TheNetherlands.

ENDNOTES1 The questionnaire is known as “Index of

Learning Styles” and is available at http://www.engr.ncsu.edu/ learningstyles/ilsweb.html.

2 http://www.oasis-open.org/committees/dita3 http://www.adlnet.org/4 http://www.chiariglione.org/mpeg/stan-

dards/mpeg-21/mpeg-21.htm5 SKOS Core OWL binding is presented in

Winter, Brooks, and Greer (2005): http://ai.usask.ca/mums/schemas/2005/01/27/skos-core-dl.owl

6 The learner is more sequential in his/herlearning style, hence tends to be confused/disoriented if the topics are not presented ina linear fashion (Felder & Silverman, 1988).

7 KDD stands for Knowledge Discovery inDatabases.

8 SEKT (Semantically-Enabled KnowledgeTechnologies) — http://www.sekt-project.com/

9 The PMI measure can be roughly defined asthe ratio between the number of search en-gine hits obtained by querying with the dis-criminator phrase (e.g., “Liege is a city”) bythe number of hits obtained by querying withthe extracted fact (e.g., “Liege”).

Jelena Jovanovi� (http://iis.fon.bg.ac.yu/Jelena) received a BS and an MS in computer sciencefrom the Department of Computer Science, University of Belgrade, Serbia and Montenegro in2003 and 2005, respectively. She is a PhD student and a teaching assistant at Department ofComputer Science, University of Belgrade, Serbia and Montenegro. Her major research interestsinclude Semantic Web technologies, learner modeling, adaptive learning environments,reusability of learning objects and learning designs, automatic metadata generation. She iscurrently pursuing her PhD thesis in the area of personalized learning on the Semantic Web.She is a member of the GOOD OLD AI research group.

Dragan Gaševi� (http://www.sfu.ca/~dgasevic) received his BS, MS and PhD in computerscience from the Department of Computer Science, University of Belgrade, Serbia andMontenegro, in 2000, 2002, and 2004, respectively. He is a postdoctoral fellow at the Laboratoryfor Ontological Research, School of Interactive Arts and Technology, Simon Fraser UniversitySurrey, Canada. His current research interests are in the area of ontologies, Semantic Web,integration between model driven engineering and ontology engineering techniques, andtechnology enhanced learning. So far, he has authored/co-authored more than 120 researchpapers, several book chapters, and two books. He has been severing on editorial and reviewingboards of several international journals. He has also been a program committee member andreviewer of several international conferences.

Page 29: Ontology-Based Automatic Annotation of Learning Content

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Int’l Journal on Semantic Web & Information Systems, 2(2), 91-119, April-June 2006 119

Vladan Deved� i� (http://fon.fon.bg.ac.yu/~devedzic) received his BS, MS and PhD in computerscience from the Department of Computer Science, University of Belgrade, Serbia andMontenegro, in 1982, 1988, and 1993, respectively. He is a professor of computer science at theDepartment of Information Systems and Technologies, FON - School of Business Administration,University of Belgrade, Serbia and Montenegro. His main research interests include softwareengineering, intelligent systems, knowledge representation, ontologies, Semantic Web,intelligent reasoning, and applications of artificial intelligence techniques to education andmedicine. So far, he has authored/co-authored more than 260 research papers, several bookchapters and books. He has been severing on editorial and reviewing boards of severalinternational journals. He has also been a chair, PC member, referee and for many internationaland conferences.