6
www.computer.org/intelligent Building a Semantic Wiki Adam Souzis Vol. 20, No. 5 September/October 2005 This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. © 2005 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. For more information, please see www.ieee.org/portal/pages/about/documentation/copyright/polilink.html.

Building a Semantic Wiki - OneCommons

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Building a Semantic Wiki - OneCommons

www.computer.org/intelligent

Building a Semantic Wiki

Adam Souzis

Vol. 20, No. 5 September/October 2005

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright

holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be

reposted without the explicit permission of the copyright holder.

© 2005 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for

creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained

from the IEEE.

For more information, please see www.ieee.org/portal/pages/about/documentation/copyright/polilink.html.

Page 2: Building a Semantic Wiki - OneCommons

SEPTEMBER/OCTOBER 2005 1541-1672/05/$20.00 © 2005 IEEE 87Published by the IEEE Computer Society

Editor: Steffen StaabUniversity of [email protected]

T h e S e m a n t i c W e b

of ideas: from Diderot’s universal encyclopedia in the 18thcentury to Vannevar Bush’s Memex at the beginning of thecomputer age to Ted Nelson’s Xanadu in the 1970s. How-ever, the Semantic Web’s development so far has focusedprimarily on metadata and carefully designed data struc-tures. To realize Berners-Lee’s vision, the Semantic Webmust capture and represent content created every day bypeople without special training—such content includesblogs, emails, and discussion groups.

Rhizome (www.liminalzone.org; download at http://sourceforge.net/projects/rx4rdf) is an experimental, opensource content management framework I’ve created thatcan capture and represent informal, human-authored con-tent in a semantically rich manner. Rhizome aims to helpbring about a new kind of commons—one of ideas. Thiscommons wouldn’t comprise just a web of interlinkedpages of content, as is the current World Wide Web, but a web of relationships between the underlying ideas and distinctions that the content implies: a permanent,universally accessible interlinking of content based onimputed semantics such as concepts, definitions, or struc-tured argumentation.

The challengesSemantic Web technologies such as RDF provide

a good foundation for creating this kind of network.RDF’s two most important characteristics are that RDFresources are globally unique and therefore can be referenced universally, and that the monotonic semanticmodel doesn’t make the closed-world assumption.2 Morespecifically, one can always make new statements about a resource, enabling a decentralized network in which tosafely discover new facts without invalidating previousconclusions.

However, current Semantic Web technologies fall short

of meeting some of the challenges that informal, human-authored content presents. Rhizome focuses on addressingtwo of these challenges. The first is the ability to handleambiguity, inconsistency, and multiple views and perspec-tives. Related to this, we need ways to model content sothat we can maintain appropriate semantics even as thecontent is replicated and altered. The second is SemanticWeb technology’s ease of use. We need to make this tech-nology dramatically easier to use, for both developers andend users. Semantic Web technologies are difficult con-ceptually, even for experienced software developers. It’sdifficult to imagine nontechnical end users effectivelyauthoring RDF statements without software assistance.Even for sufficiently trained users, writing the requiredprecise statements takes much more time and effort thanwriting informal text. Thus, these two challenges are inter-related, because the less we need precise, consistent state-ments, the easier it is to create semantic content.

The Rhizome application stack lets developers buildWeb applications that use familiar technology such asXML and XSLT (extensible stylesheet language transfor-mations) but run in an environment where the outsideworld (for example, data sources and incoming requests)is presented as a universe of RDF statements. Rhizome isdesigned for applications such as

• a wiki-like application that lets the user create arbitraryRDF resources and the application logic for presentingand interacting with them,

• a personal note management and publishing tool forkeeping track of and extracting metadata and user anno-tations from a variety of local sources such as plain-textfiles and comments in source code, and

• a discussion forum in which users can classify andstructure responses with typed annotations instead ofemail-like quoting.

OverviewRhizome consists of a stack of components (see figure

1). The “Architecture” section describes these componentsin more detail, but first I’ll start with an overview.

The Semantic Web vision of a “unifying logical lan-

guage that enables concepts to be progressively

linked into a universal Web” (as Tim Berners-Lee put it)1

is part of a long lineage of dreams of a universal repository

Building a Semantic Wiki

Adam Souzis, Liminal Systems

Page 3: Building a Semantic Wiki - OneCommons

The componentsAt the top of the stack is Rhizome Wiki, a

wiki-like content management and deliverysystem that treats all content, metadata, andstructure as RDF and lets users edit anyRDF resource as they would a wiki page.Rhizome Wiki uses a couple of custom textformats to make writing semantic contenteasier:

• ZML is a plain-text formatting languagesimilar to those found in wikis, exceptthat users can create ad hoc structure anduse special formatting conventions toindicate semantic intent.

• RxML is a simple, alternative format forRDF that aims to be as easy to use as anapplication properties file. Thus, novicescan author and edit RDF metadata. It’sspecified as XML but designed for author-ing in ZML.

Rhizome Wiki runs on top of Raccoon, asimple application server that uses an RDFmodel for its data store. Raccoon uses RxPathto translate arbitrary requests—currentlyHTTP, XML-RPC, and command line argu-ments—to RDF resources. Each of these canbe associated with style sheets in RxSLT andRxUpdate,3 languages that are subsets ofXSLT and XUpdate.

RxPath is an RDF data access engine thatprovides a deterministic mapping betweenthe RDF abstract syntax and the XPath datamodel. This lets users access RDF data stores

as a (virtual) XML DOM (document objectmodel) and query them using RxPath, a lan-guage syntactically identical to XPath 1.0.Building on this are RxSLT and RxUpdate,languages for transforming and updatingthe RDF data store; they are syntacticallyidentical to XSLT and XUpdate, respectively.Other XML technologies that rely on XPathcan be easily adopted, such as using Schema-tron to validate an RDF model.

Rhizome in actionTo see how Rhizome uses these various

components, let’s take a look at how a usercreates a page of content using RhizomeWiki. Figure 2 shows three different viewsof a page created in Rhizome Wiki, illus-trating how to create structured contentthat’s transformed into and stored as RDF.The page represents a FAQ (frequentlyasked questions), a familiar document con-vention. Rhizome makes it easy to turnrelatively informal document conventionslike FAQs into semantically meaningfulresources—in this case, an RDF resourcefor each question-and-answer pair. Thisfeature helps fulfill the Semantic Web’spromise: If FAQs were exposed on the Webthis way, building applications that couldaggregate them and provide specializedinterfaces for tasks such as browsing, tag-ging, and rating would be much easier.

Figure 2a shows the page’s presentationview, a FAQ with just one question andanswer. Notice that the presentation is spe-

cific to FAQ resources: When users request aFAQ resource, Rhizome invokes an RxSLTstyle sheet to transform the RDF resourceinto XHTML. Because RxSLT is syntacti-cally identical to XSLT, I was able to adopt,with minimal modifications, an XSLT stylesheet for displaying FAQs from an unrelatedproject (Apache Forest, http://forrest.apache.org). This ability to easily adapt XML tech-nology is one way Rhizome strives to makeRDF easier to adopt.

Figure 2b shows the same page in editmode, revealing that the source isn’t RDFbut ZML. Compared to similar wiki textformats, ZML has some unique features: Itlets users create ad hoc structural elementsmapped to XML elements or RDF resourcesand has syntactic constructs for explicitsemantics such as the metadata annotationin figure 2b. When a user saves a page ofZML content, Rhizome Wiki converts theZML to RDF before saving the RDF in itsdata store. We can see this RDF in figure 2c,which shows the sample page’s metadataview. The label “metadata” is a bit mislead-ing because Rhizome stores everything asRDF, so, in a sense this is also the data view.This view displays the RDF using RxML.

The RDF that makes up this page comesfrom three sources: the process of convert-ing the ZML to RDF, called shredding; thestructural properties created by the handlerthat handles the “save” action when an editis completed; and the user-defined meta-data. From this view, users can edit any ofthis RDF directly, thus enabling them tochange the wiki application’s behavior andstructure. To mitigate the inherent dangersof this level of openness, Rhizome Wikiprovides fine-grain authorization and vali-dation alongside the use of contexts.

ArchitectureAs figure 1 illustrates, Rhizome’s com-

ponents are arranged as a stack in whichhigher-level components depend on thelower-level components, but not vice versa.We can see how the components are inte-grated by examining each one from thebottom to the top of the stack.

RxPath data accessThe main motivation for creating RxPath

was to make RDF easier to use, especiallyfor developers more familiar with XMLthan RDF. RxPath does this in two ways.First, it enables users to apply familiar XMLtechnologies to RDF models in a straight-

88 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Rhizome Wiki

Presentation layer

RDF shreddingRxMLZML

Raccoon application server

(Web) applications

Other applications

Other

RxPath data access engine

Interfaces

XML DOMRxUpdateRxSLT Other

SPI (service provider interface)

Data access

P2P lookup (future)Local RDF datastores

Figure 1. The Rhizome architecture stack.

Page 4: Building a Semantic Wiki - OneCommons

forward way. For example, users can extractand format content in an RDF model usinga standard XSLT style sheet without relyingon any extension functions or elements.Users can author and even test and debugthe style sheet with standard XSLT devel-opment tools. Second, RxPath lets us visu-alize an RDF model much like a standardXML DOM, reducing much of the concep-tual impedance between RDF and XML.This should reduce RDF’s learning curvefor a developer with competence in XMLand related technologies and therefore con-tribute to RDF’s widespread adoption.

RxPath maps the set of (subject, predicate,object) triples in an RDF model into a virtualand possibly infinitely recursive tree in which

• the root has a child node correspondingto each resource in the model,

• each resource node has child nodes foreach statement that it’s the subject of, and

• each statement node has a single childnode corresponding to the statement’sobject.

If the object is a resource, it might in turnhave child nodes that correspond to thestatements that the resource is subject of, andso on. Given such a tree, an XPath expressionsuch as /foaf:Document/dc:creator/* will select aset containing all the authors of each docu-ment resource in the RDF model.

RxPath also supports named graphs4 (alsoknown as contexts), a common extension tothe RDF model used to partition RDF state-ments into groups. RxPath uses a uniqueapproach to contexts by treating them not asa one-to-one mapping with a subgraph of anRDF model, but as a collection of subgraphscomposed through union and differenceoperators. This enables Rhizome to use con-texts simultaneously and efficiently to modelmany different concepts, such as metadataversioning, transactions, provenance, appli-cation partitioning, and personalization (usercustomizations). For example, Raccoon’stransaction log of changes made to the RDFstore is represented as a collection of con-texts, each of which adds or subtracts fromthe previous context. Using contexts letsRhizome capture when, where, how, andby whom a set of statements was made.

Raccoon application serverRaccoon’s goal is to present a uniform

and purely semantic environment for appli-cations. This enables the creation of applica-

SEPTEMBER/OCTOBER 2005 www.computer.org/intelligent 89

(b)

(c)

(a)

Ad hocstructural elements

Metadata annotation

Hyperlinks

Unordered list

Propertiesextractedfrom ZML(“shredding”)

Structuralproperties

User-definedmetadata

Figure 2. (a) A sample page created in a fresh installation of Rhizome Wiki. (b) A screenshot of Rhizome Wiki in edit mode on a page created using ZML. (c) The metadata view of the sample page showing the underlying RDF.

Page 5: Building a Semantic Wiki - OneCommons

tions that are easily migrated and distributedand that are resistant to change. Raccoon isdesigned primarily for applications that lookat the world as a universe of RDF state-ments, but it also works with XML-centricapplications.

Raccoon isn’t designed to be a full-featured application server and in fact willoften be embedded in another applicationserver. Raccoon’s job as an applicationserver is a narrow one—to map a request to a response, possibly modifying the stateof the application in the process:

Request � Application (Rules � Store)� Response

A request is a dictionary of simple values,and an application defines a pipeline ofRxPath expressions that transform the requestinto the response. Raccoon presents both therequest and the application’s state using theRxPath data model. This approach enablesthe creation of applications that can be trans-parently distributed and aggressively cached.Application code is always executed withinthe context of a request. There are externalrequests, such as HTTP requests, and internalones, such as the requests sent when an appli-cation starts or stops. Raccoon also providesbasic transaction coordination for managingupdates to the RDF store. Using contextsenables the application to choose an appropri-ate consistency model for its needs. If fullglobal atomic consistency isn’t needed, Rac-coon can cache request responses even moreaggressively and still provide the appropriatelevels of cache coherency.

To get a sense of how an applicationbuilt on Raccoon uses rules to implementfunctionality, take a look at the metadata infigure 2c. In particular, take note of thefollowing properties:

• wiki:name. When Raccoon receives a request for a URL, it uses this sub-property of rdfs:label to find the RDFresource that corresponds to the URL.The rule is written in RxPath as /*[wiki:name=$_path].

• wiki:doctype. After the previous ruleselects the resource, other rules look for a display handler on the basis of the typeof resource and the requested action. Therule that invokes the RxSLT resourcethat transforms the FAQ resource intoHTML is /*[wiki:handle-doctype= $__resource/wiki:doctype].

Each rule I’ve discussed is defined in theapplication configuration file for RhizomeWiki. By simply adding rules and corre-sponding metadata, we can add sophisticatedfunctionality. Many Rhizome Wiki features,such as its release workflow and authoriza-tion model, are implemented in this manner.

ZML and RxMLA common assumption made about the

Semantic Web is that special software willassist in authoring content for the SemanticWeb. But until the day this functionality isbuilt into word processors, text editors, Webpage forms, and all the other places a usercan enter text, this assumption will limit theSemantic Web’s growth. Instead, Rhizomefollows wiki design principles5 and intro-duces ZML, a plain-text format with simpletext conventions to guide presentation, thus

allowing users to create content with explicitsemantics anywhere that text can be entered.Another advantage of using a wiki-like textformat for authoring semantic content isthat, unlike word processor formats or evenHTML with CSS, wiki text formats providevery limited presentational options. Thislimitation makes it hard for naive or undisci-plined users to express semantics throughpresentation (for example, by using texteffects).

However, enabling the authoring of con-tent with explicit semantics requires signifi-cant enhancements to existing wiki formats.First, users must be able to create arbitrarystructures. To this end, users can use ZML asa simple, concise alternative syntax for XML,enabling them to author any XML construct.In fact, the result of parsing ZML is XML.

One advantage of this approach is that itlets arbitrary HTML or XML be converted to

ZML again, enabling roundtrip conversions.For example, users can write content in ZML,edit it in a WYSIWYG HTML editor, or processit by some specialized tools that consumeXML, and then view it as ZML again.

On the other hand, ZML provides nointrinsic translation to RDF—instead, Rhi-zome Wiki maps the resulting XML usingrule-based shredding, which I’ll describe in amoment. Several reasons for this exist. First,current standardized ontology languages,such as OWL, aren’t yet powerful enough toinfer equivalencies between a generic RDFrepresentation and its appropriate domain-specific ontology. Second, the most intuitivemarkup structure for a particular applica-tion doesn’t always submit to a straightfor-ward mapping to RDF; shredding gives usflexibility to handle these cases.

Finally, there’s the pragmatic issue:Alwaysrepresenting all structural elements as RDFcreates a tremendous volume of RDF state-ments, especially if order is preserved. Creat-ing this mass of RDF statements would alsoopen heretofore unexplored (by Rhizome)questions of composition: how to reasonabout, present, modify, and reintegrateresources that are parts of a larger resource.

ZML needs special syntax to make it easyto explicitly express semantic intent. Onerequirement is the ability to create metadataannotations of both content and structuralelements (for example, section headers or listitems). We also need formatting rules that canexpress semantic distinctions that are elidedin current wiki text formats. For example,we must distinguish between creating a ref-erence to a WikiName (which, in the case ofRhizome Wiki, is an RDF resource’s name)and creating a hyperlink, which has explicitpresentational intent and generally implies a relationship between the content and thelink target. Similarly, we must distinguishbetween anchors and their common use as away to name document sections.

Figure 2b includes an example of ZMLthat illustrates some of these features, de-monstrating the syntax for creating struc-tural elements (transformed to XML ele-ments), unordered lists, hyperlinks, andmetadata annotation.

Rhizome Wiki also uses RxML (see fig-ure 2c). Although RxML can express anyset of RDF statements, it presents the RDFin a constrained, simplified manner: a listof resource URIs, each of which has a setof property name-value pairs. Its goal is tolet novices read and edit RDF metadata

90 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Rhizome uses a novel architecture

to enable a unique approach for

addressing some of the challenges

of creating semantically explicit

content, but many more

challenges exist.

Page 6: Building a Semantic Wiki - OneCommons

using a structure conceptually similar toand only incrementally more complicatedthan application properties’ file formatssuch as Microsoft Windows’ .ini files.

Rhizome WikiRhizome Wiki offers all the basic wiki

functionality, such as letting users create andedit pages on an ad hoc basis, along withsome more advanced content managementfeatures such as roles and groups, releaseworkflow, and basic facet navigation.

Almost all of Rhizome’s functionality isimplemented in its dynamic pages, whichare written in RxSLT, XSLT, and RxUp-date. Users can edit these like any otherpages, making it easy to incrementally addand change functionality. They can also use RxUpdate to migrate the underlyingschema at runtime. This flexibility makesaccess control very important—to this end,Rhizome uses a flexible schema for autho-rizing both application-level actions andstatement-level changes to the RDF store.

One factor limiting the Semantic Web’sgrowth is the reliance on generic formats forrepresenting RDF. Although RxML tries tomitigate this (especially compared with thecomplexity of the W3C’s standard XML rep-resentation of RDF), it’s unrealistic to expectthe vast installed base of applications to beenhanced to support RDF anytime soon. Rhi-zome’s approach to this issue is shredding, orproviding a framework for extracting RDFstatements from a variety of content formats.

Rhizome lets users create rules that trig-ger shredding on the basis of the content’stype. For example, shredding an RDF/XMLdocument would consist of parsing theRDF; shredding an HTML document witha GRDDL (Gleaning Resource Descriptionsfrom Dialects of Languages)6 link wouldconsist of invoking the referenced XSLTscript; and shredding an MP3 file wouldconsist of extracting the metadata out of theembedded ID3 tag. Using contexts, Rhi-zome can retain the relationships betweenan instance of content and statements ex-tracted from it, enabling it to know, forexample, that the statements might be out ofdate when content has changed.

For instance, when the user saves theZML in figure 2b, shredding occurs, result-ing in the RDF in figure 2c. A shredder canbe associated with content types, and shred-ding is recursive. So, in this case, the firstshredder invoked was for ZML, and it sim-ply parsed the ZML to XML and invoked

the XML shredder. The generic XMLshredder is an XSLT style sheet that recur-sively invokes shredders specific to anXML namespace. Finally, the shredderassociated with the FAQ XML namespaceis an RxUpdate script that for each FAQelement adds an RDF resource to the storeon the basis of a specific FAQ ontology.

Peer-to-peer futureA “commons” implies shared resources,

equally available to all. If we think con-cretely about how to build the commons ofideas I mentioned in the introduction, werealize it must be built on a robust, decen-tralized, peer-to-peer infrastructure. Onepossible architecture could look like this:

• All (public) RDF statements are storedin a global distributed store and are ac-cessed through the RxPath processorthrough next-generation, distributedhash tables such as PGrid.7

• Application files are stored in a peer-to-peer content distribution network such asCoral.8 Unlike a typical CDN, the nodeswould have Raccoon-like processors thatwould actively process the content usingthe global RDF store. During processing,the cached results would also be added tothe CDN.

Several of Rhizome’s design elementssupport this type of architecture:

• Raccoon presents a purely semanticview of the environment that enablesapplications to be independent of theirphysical environment.

• Raccoon’s rules-based approach makesit easer to distribute and cache applica-tion processing.

• RxPath’s path-based query model can usepeer-to-peer lookup primitives more eas-ily than tuple-based query languages suchas SQL (Structured Query Language) orSPARQL (Query Language for RDF).

• RxPath’s use of contexts lets applica-tions choose a consistency model appro-priate for a decentralized, distributedenvironment (contexts also enable state-ments to be clustered for efficient dis-tributed querying).

Rhizome’s implementation is freelyavailable, open source, and implemented in

Python. Although it’s fully functional, it’sstill very much in an experimental state.Rhizome uses a novel architecture to enablea unique approach for addressing some of the challenges of creating semanticallyexplicit content, but many more challengesexist. Some of the more difficult ones in-clude the stability of names, the composi-tion of resources, change management, andconflict resolution. These challenges willshape Rhizome’s future development.

References

1. T. Berners-Lee, J. Hendler, and O. Lassila,“The Semantic Web,” Scientific Am., May2001, pp. 28–37.

2, P. Hayes, RDF Semantics, World Wide WebConsortium (W3C) recommendation, Feb.2004; www.w3.org/TR/rdf-mt.

3. A. Laux and L. Martin, XUpdate—XMLUpdate Language, XUpdate Working Groupspecification, 14 Sept. 2000; http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html.

4. J. Carroll et al., “Named Graphs, Provenanceand Trust,” Proc. 14th Int’l Conf. World WideWeb (WWW 05), ACM Press, 2005, pp.613–622.

5. W. Cunningham, Wiki Design Principles,http://c2.com/cgi/wiki?WikiDesignPrinciples.

6. D. Hazael-Massieux and D. Connolly, Glean-ing Resource Descriptions from Dialects ofLanguages (GRDDL), World Wide Web Con-sortium (W3C) note, 13 Apr. 2004, www.w3.org/TR/2004/NOTE-grddl-20040413.

7. K. Aberer et al., “GridVine: Building Inter-net-Scale Semantic Overlay Networks,” Proc.3rd Int’l Semantic Web Conf. (ISWC 04),LNCS 3298, Springer, 2004, pp. 107–121.

8. M.J. Freedman, E. Freudenthal, and D. Maz-ières, “Democratizing Content Publicationwith Coral,” Proc. 1st Symp. Networked Sys-tems Design and Implementation (NSDI 04),Usenix Assoc., 2004, pp. 239–252.

SEPTEMBER/OCTOBER 2005 www.computer.org/intelligent 91

Adam Souzis isfounder of LiminalSystems. Contacthim at [email protected].