Digital library resources as a basis for collaborative work

Digital Library Resources as a Basisfor Collaborative Work

Robert WilenskyDepartment of EECS, UC Berkeley, Berkeley, CA 94720.E-mail: [email protected]

The creation of large, networked, digital document re-sources has greatly facilitated information access anddissemination. We suggest that such resources can fur-ther enhance how we work with information, namely,that they can provide a substrate that supports collab-orative work. We focus on one form of collaboration,annotation, by which we mean any of an open-endednumber of creative document manipulations that areuseful to record and to share with others. Widespreaddigital document dissemination required technologicalenablers, such as web clients and servers. The resultinginfrastructure is one in which information may be widelyshared by individuals across administrative boundaries.To achieve the same ubiquitous availability for annota-tion requires providing support for spontaneous collab-oration, that is, for collaboration across administrativeboundaries without significant prior agreements. Anno-tation is not more commonplace, we suggest, becausethe technological needs of spontaneous collaborationare challenging. We have developed a document model,called multivalent documents, which provides a meansto address these challenges. In the multivalent docu-ment model, a document comprises distributed data andprogram resources, called layers and behaviors, respec-tively. Because most document functionality is imple-mented by behaviors, the model is highly extensible, andcan accommodate both new document formats andnovel forms of functionality. Among other applications, itis possible to use the model to effect a wide class ofannotation types, across different document formats,without any administrative provisions. An implementa-tion of the model has allowed us to develop behaviorsthat currently support some quite different but commondigital document types, and a number of quite differentannotation capabilities—some familiar, and some novel.A related implementation provides some analogous ca-pabilities for geographic data. Such capabilities couldhave a beneficial impact on the “scholarly informationlife cycle,” i.e., the process by which researchers andscholars create, disseminate, and use knowledge.

Introduction

Computer technology has made a significant impact onthe way in which people work with documents. The changes

are most apparent in the areas of document creation, access-ing, and dissemination. However, technology has the poten-tial to profoundly enhance other aspects of the how we workwith information in the form of documents. The impact ofthe anticipated changes is likely to be broad, but in someareas it will be felt especially deeply.

This article focuses on one potential area of change,namely, ways in which digital libraries, i.e., collections ofdocuments and associated services, might enable collabora-tion. To do so, it describes some ongoing efforts that wereundertaken as part of the UC Berkeley Digital LibraryProject.1 There are, in fact, a number of rather differentways in which the software and services we have beendeveloping further collaboration, but we focus here on onedistinct component, namely, the use of a document model,called multivalent documents,as a technological enablerthat facilitatesspontaneous collaboration.By spontaneouscollaboration, we mean the ability of users to collaboratewithout first undertaking a substantial mutual administrativecommitment. We believe that spontaneous collaborationwill have widespread forms and applications, only some ofwhich are suggested by our current work. One area in whichit is likely to have a profound impact is the “scholarlyinformation life cycle,” i.e., the process by which research-ers and scholars create, disseminate, and use knowledge.

Spontaneous Collaboration

Computer-based collaboration encompasses a large num-ber of working systems and ongoing research efforts. Evenwithin the restricted area of digital documents, support forcollaboration has long been recognized as a goal (Halasz,1998) and there are many efforts that provide such support,including collaborative authoring systems, version featuresof document processors, enterprise-wide infrastructuressuch as Lotus Notes, e-mail support systems, and even themany collaborative uses of the Web. However, most sys-tems in existence, and most research efforts of which we are

© 2000 John Wiley & Sons, Inc. 1 http://elib.cs.berkeley.edu.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 51(3):228–245, 2000 CCC 0002-8231/00/030228-18

aware, generally envision collaboration within an infra-structure to which an individual or group has made a sub-stantial commitment, which operates on particular docu-ment formats, and which restricts use to those within theadministrative domain. Such approaches inhibit collabora-tion among individuals across administrative domains, asthey are unlikely to share infrastructure and to have agreedon common document formats, and so forth.

In addition to administrative hurdles, even today’s highlyevolved systems provide facilities that are generally limitedin various ways. For example, systems typically providefeatures that apply only to a given, often proprietary, doc-ument format. If a format is not supported by the system, ora feature is not provided by it, it is generally very difficultto add, especially if the feature was not envisioned by thesystem designed. Extensibility is important for supportingspontaneous collaboration because individuals in differentdomains are likely to be using different formats, and more-over, have different styles of collaboration, which mayrequire features that are not ubiquitously available.

Our prototypical example of spontaneous collaborationis annotation. By annotation, we mean any of an open-endednumber of creative document manipulations that are usefulto record and to share with others. Annotation is in itself aninteresting mode of document use, falling somewhere inbetween authoring and browsing. Although annotation iscertainly useful to the annotators themselves, sharing anno-tations is an important way in which we use documents tocollaborate. For example, scholarly publication often beginswith an author sending a draft document to close colleaguesfor comment; the journal submission process involves asimilar, if more formal, comment sharing process; shouldthe article be accepted, copious annotations are usuallypassed back and forth between the author and a copyeditor.After publication, articles may be passed between col-leagues with additional annotation focusing the reader’sattention and offering comments and criticism.

Annotation seems to be useful for many other kinds ofinteraction, and with widely different document types. Forexample, in our own work with practice studies of environ-mental information workers, we found that those informa-tion workers who used maps typically found the need toannotate them.

Despite its evident usefulness, digital annotation capa-bilities are not very widespread. Indeed, this is one area inwhich the paper medium still seems to be preferred to thedigital (Sellen & Harper, 1997). Perhaps this is because therequirements imposed by a good annotation mechanism aredifficult to satisfy, especially when considered from thevantage point of spontaneous collaboration. Specifically, wesuggest that an adequate on-line annotation facility supportthe following functionality:

1. It must be possible to place an annotation in situ, i.e., atthe location of the document to which it refers. Studiesshow that individuals “generally do not take notes bywriting their observations down on a separate sheet of

paper or in a text editor. . . . Instead, they mark on thedocuments themselves” (Levy & Marshall, 1995). Ex-cerpting the document, as is done in newsgroups ore-mail, or providing annotations in a separate frame,would not qualify. Given that the document is presumedto live on-line, then marking up a copy of the documentwould not qualify either, as the copy could change in-dependently from the original.

2. The supported types of annotations must behighly ex-pressive, extensible,and composible.Expressiveness isimportant because there many different types of annota-tions, which operate at very different granularity. Exten-sibility is important because individuals and groups tendto develop their own styles of annotation, which may behighly varied (Marshall, 1998). Different forms of anno-tation may occur together in the same document, andhence, annotations must compose coherently. Thus, thefacility must allow a wide variety of annotation types,include some not envisioned by the system’s creators,but nevertheless, allow these to work together.

3. Annotations must be essentiallyformat (and platform)independent.We assume that annotate has no controlover the document format in which the document wasauthored, but would, nevertheless, like to annotate it (andin a way that may not be supported by the documentformat). Similarly, we must assume that the annotater,the document creater, and subsequent annotation viewerwill not necessarily share the same platform.

4. Finally, it must be possible to annotate documentsopenlyanddistributedly.By this we mean that there can be noremote server requirement for annotation other thanread-access to the document, i.e., that no permission isneeded to store something on an annotation server, andno registration is needed within some administrativeinfrastructure.

As an example, suppose that, upon browsing the Web,we encounter a document upon which we wish to comment.There may be a wide variety of types of annotations that wemight find useful, ranging from the rather coarse grained(e.g., a Post-it™-style note attached to page) to more finegrained (e.g., a copy-editor-style correcting a typographicerror). We would, therefore, like to mark up the documentin a variety of ways, and save some result on our own serversuch that, should some else open our work, they would seethe original document with our annotations in place. Notethat it would not suffice to make a copy of the original, asthat would not qualify as in situ annotation (and would havethe undesirable effect of not tracking changes in the origi-nal). Moreover, we might want to make these various typesof annotations on documents in different formats, fromHTML to Acrobat to scanned page images. Finally, wemight come up with some new form of annotation, andwould like to add that to the mix, and have all the elementsstill compose coherently.

We seek to facilitate such work practices by providing adigital means for users to annotate one another’s on-linedocuments. In addition to providing the usual benefits—e.g., high availability, precision, easier distribution—digitalsupport should entail additional functionality to which other

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000 229

media have no analogue. For example, digital annotationscan be dynamic rather than static, thus enabling whole newgenres of annotation not previously available.

Most approaches to annotation that we have encounteredviolate one or more (indeed, frequently, most) of the con-ditions we stipulate. However, we believe that these condi-tions can be met by the properdocument model,the generalform of which we now present. We describe this model insome detail, and then describe how it may be used tosupport the above vision of spontaneous collaboration in theform of extensible, distributed, in situ annotation. In thesection below on related work, we compare the character-istics of the work we describe to those of other approaches.

Multivalent Documents

“Multivalent documents” (MVD) (Phelps & Wilensky,1996a, 1996b) is ahighly open, highly distributed,andhighly extensibledocumentplatform.Our idea was to createa document architecture that could accommodate both newdocuments formats and new ideas for document manipula-tion as these came into existence. Such a system couldevolve to support new functionality and formats withouthaving to be redesigned and implemented for each one.Moreover, by divorcing functionality from document for-mat, the same function might be implemented once, andapplied everywhere, both to multiple formats and in multi-ple applications.

Conceptually, a multivalent document is a set of (possi-bly distributed) layers and behaviors,denoting a docu-ment’s contents and functionality, respectively. Most doc-ument functionality is accomplished entirely via its behav-iors, the MVD infrastructure merely providing a means forthese to compose coherently. Behaviors are used to bridgenew document formats into the framework, implement ge-neric document functions, provide new user interface capa-bilities, and implement more exotic capabilities (such asdistributed annotation).

To allow arbitrary extensibility of all aspects of thedocument manipulation, each of the fundamental runtimeoperations of MVD has been opened to an extensiblepro-tocol.The MVD protocols encompass the generic aspects ofdocument life cycle that are present in one form or anotherin most document manipulation systems: duringrestore,behaviors and layers are loaded, and the behavior methodsare inserted into their appropriate places in the other proto-cols; build methods create an internal data structure repre-senting the document, using the information in the layers;format formats the resulting document;paint renders thedocument on the screen;user eventsawait input from thekeyboard, mouse, or other input device, and handing it tothe methods implementing the protocol;save,andclipboardare responsible for their suggested functions. Behaviorswork by contributing methods that override the given pro-tocol.

We omit here a description of the specific structure ofeach protocol, and instead, simply note that, as all aspects of

document processing are open to extension, extensions canbe arbitrarily powerful. For example, there is no particulardocument format of which MVD is aware. Instead, a be-havior has to be provided that understands the format of agiven layer of data, and bridges this format into the internalstructures of the system (by providing methods the overridethe protocols for building an internal representation, han-dling special cases of formatting and painting, as so forth).We call behaviors that handle formatsmedia adapters,tonote that they are media dependent, although such behaviorshave no privileged status in the architecture.

When a multivalent document is open in a compliantclient, it is represented as a tree. Internal nodes of the treerepresent document structure, and are medium independent;the leaves represent contents, and hence, contain encapsu-lated typed data to be interfaced with via the behavior thatcreated the leaf and understands its format. Media adaptersare responsible for encapsulated media types by properlyconstructing the tree, enabling behaviors other than mediaadapters to operate on any medium without special accom-modation and, as much as it applies to a given medium,operate on all media types.

The persistent form of a multivalent document is the“hub” document. The hub document is an XML2 documentreferencing the layers and behaviors that comprise the sin-gle conceptual document; i.e., therestoreprotocol reads ahub document, fetches the behaviors specified in it, placestheir methods into the appropriate protocols, and beginsfollowing the protocols. Similarly,savewrites a hub docu-ment (and, perhaps, individual layers).

A more detailed description of the multivalent documentarchitecture can be found in Phelps (1998).

MVD has been implemented in Java, and hence, runs onany Java-compliant platform.

Enlivening Legacy Documents

Before examining the application of MVD to spontane-ous collaboration, we illustrate the architecture by examin-ing its application to a different task, namely, that of im-proving the functionality of scanned image documents. Be-cause scanned images are so resistant to manipulation, sucha data type is a challenging application for a documentmodel. For example, Figure 1 shows a standard Webbrowser displaying a scanned document images from theUC Berkeley Digital Library Project server.3 The surround-ing HTML allows for movement through the document, butno manipulation or exploitation of the document contents isreadily available via the Web browser per se.

Figure 2 shows the result of clicking the button in Figure1 labeled “MVD.” Doing so provides access to this docu-ment as a multivalent document. More precisely, a Javaapplet implementing MVD is called, and is given as an

2 http://www.w3.org/XML/.

3 http://elib.cs.berkeley.edu/docs/.

230 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000

argument an MVD hub document. The hub document de-scribes the layers and behaviors that this particular docu-ment should contain. In the case of our scanned images, thehub document specifies as layers the images of the docu-ment, and the output of an OCR process, which containsboth the inferred text of the document and positional infor-mation about where these words appear in the image. Thebehaviors specified in the hub document include a standardset, plus a number of behaviors that are specific to particulardocuments or pages.

Users avail themselves of functionality by selectingmenu entries, provided by the particular behaviors loaded,or by interacting with regions of the document in whichbehaviors may have registered an interest. For example,Figure 2 shows some text, “CALIFORNIA DATA EX-CHANGE,” that has been selected; the background of thistext is highlighted to reflect this status. The text was selectedby a mouse click-and-drag on the corresponding portion ofthe image. As in a text processing system, the text corre-sponding to the highlight region is placed in the windowsystem’s selection buffer, from which it is available forpasting into other applications.

In addition, the user has previously chosen the “Search”entry on the “Edit” pull-down menu. This action called into

view the search dialog box, as shown. The text entered intothis widget is found and highlighted in the image.

Functionality, like selection and searching, is familiar inconjunction with other document formats. However, theyare not readily realizable for scanned images. Using MVD,it was straightforward to provide this functionality,despitethe fact that there is nothing in MVD that is specific toscanned images.That is, the exploitation of image and OCRlayers needed to achieve the capabilities just illustrated isdone by a behavior, not by the MVD infrastructure per se;behaviors like Search are not specific to scanned images, butwill work with other data types. Behaviors that add arbitraryforms of additional functionality may be included. For ex-ample, the document in Figure 2 happens to also haveincluded a “TableSort” behavior and a table layer. As aresult, clicking on the row header of the table in the imagewill cause the image to be sorted by the (OCR’d) contentsof that column. Figure 3 shows the same page after the tableregions have been highlighted, and the table sorted byclicking on the column labeled “TODAY.”

Distributed Annotation

Now let us return to the issue of spontaneous annotation,and in particular, to our prototype application, in situ anno-

FIG. 1. A scanned page image inside a web browser.


tation. Just as there is no built-in support for document typesin MVD, there is no built-in support for annotation, distrib-uted or otherwise. However, one can construct a multivalentdocument that includes a layer, and some behaviors thatmanipulate and associate other data with this layer in vari-ous ways. In effect, the layer becomes a “base” document,and the behavior instance have the effect of being a kind of“standoff markup” (Thompson & McKelvic, 1997) of thebase, i.e., what we may call “multivalent annotations” arejust MVD behavior instances used annotatively. The anno-tations are intrinsically distributed, as the base documentmay be created by one author, and live on one server, whilethe annotative behavior instances are created by anotherauthor, and reside elsewhere.

We have developed a number of multivalent behaviorsthat have annotative (as well as other) uses. We find itconvenient to divide these behaviors into three classes:there are behaviors that make use of (1) point-to-pointspans of media elements, (2)geometric regionsof adocument presentation, and (3)structureswithin the doc-ument tree.

Spans

Span are behavior instances that extend from one point inthe document continuously to another point. Spans are im-plemented as behavior instances with (robustly specified)start and end points in the document tree. As a span is justa behavior instance, there is generally one behavior corre-sponding to each span type. That behavior can create newspans, influence how these appear, handle user interactions,and save and restore spans from persistent storage.

Figure 4 shows a document containing several types ofspans. This document is a scanned page image, enlivened byan optical character recognition layer and a set of supportingbehaviors, as described in the previous section. One type ofscan illustrated in the example is a hyperlink. It is presentedby having the associated document span be underscored inmagenta. It has user interface properties that one has cometo expect from hyperlinks, in that moving the cursor overthe link changes the cursor to a pointer, and reveals adestination at the bottom of the page. Clicking on the linkaccesses the resource being referred to.

FIG. 2. A scanned image page “enlivened” as a multivalent document. The text corresponding to “CALIFORNIA DATA EXCHANGE” has beenselected. The search widget shows a search for the terms “LAKE” and “67”; words whose corresponding images begin with these strings are highlightedin the image.


Another span on this page is a highlight, meant to re-semble a yellow marker pen. In addition, the results of asearch are also displayed by spans marked by boxes. Thecurrent selection (display with a background) is also imple-mented as a span.

Another class of spans we have implemented are co-pyediting marks. Figure 5 is an example of the same doc-ument, but with copyediting behaviors loaded, and used tocreate a few spans. One span, in the left hand column,extends over the words “On the other hand.” This spansimply provides a comment. The second span, further downin the same column, specifies replacement text. In this case,the copyediting mark extends through the word “combina-tion,” and recommends that this be replaced with the text“synthesis.”

Note that these spans illustrate a powerful feature of theMVD infrastructure, namely, layout control. These spanseffectively change the height of associated lines, and imagemanagement behavior must relayout the document to ac-commodate them. It does so by painting the document fromits OCR rendering. In this example, only the left-hand

column is visibly changed. Because such rendering is im-perfect, a warning is displayed to this effect. The user canuse menu entries to toggle back and forth between these(and other) presentations of the document.

There are several advantages to using spans of this sortfor annotations. Unlike simple overlays, spans are anchoredto document contents, so the annotation will be coherentafter additional document manipulation. For example, Fig-ure 6 shows the same document in which layout control hasbeen used to additional line space in the page image (by theuse of a spacing behavior made available via the “View”menu). Note that the spans have stayed with the appropriatetext. Had this annotation been achieved by a simple bitimage overlay (which one can also easy accomplish in theMVD framework), the contents of the overlay would nolonger be aligned with the base after such a manipulation.4

In addition, spans can be made robust, so that they still may

4 Of course, one might want to have spans whose annotative content isa bit image, rather than a structured object, as in these examples. A

FIG. 3. Illustration of a “Table Sorting” behavior. Table regions are highlighted in the image (the table proper in blue and its headers in green), and sortedby clicking on the “TODAY” header. Note that highlighting of components (in the case, of matched search term regions) is preserved as the image ismanipulated.


be positioned correctly in the presence of change in theunderlying base document.

This type of annotation is more interesting when the basedocument is more readily subject to manipulation. As anexample, consider Figure 7, in which the base layer is theDARPA home page on the Web page, i.e., an HTMLdocument. Here we have added a number of copyeditingmarks: We have noted that the acronym has been changedback to “DARPA” again; we suggest that “It” be replacedby “DARPA,” that the span “dramatic advances” be set inbold face, and that the spans “traditional military roles” and“dual-use-applications” each be italicized.

This example illustrates a number of points. First, itdemonstrates truly distributed annotation: the hub documentcontaining the annotations resides on the author’s Web siteat UC Berkeley. It references the actual DARPA home pageas the base document layer; this is loaded dynamically whenthe hub document is opened, and the annotations correctlycomposed with the portions of the base to which they refer.

Second, the example demonstrates support for HTMLwithin the framework. Support for HTML was achieved

simply by writing a media adapter for this document format.Third, the same behaviors are seen operating on multiple,rather different, document formats: scanned images above,and HTML here. [We have explicitly illustrated copyeditormarks operating on two document formats; all the otherbehaviors shown above, e.g., highlighting and searching,also operate across formats. Indeed, HTML hyperlinks(many of which appear in the example) are implemented inMVD by turning the underlying HTML markup into MVDhyperlink spans.]

Fourth, because MVD annotations are supported by be-haviors, they can have useful functionality. Copyeditorspans, for example, are executable: moving the cursor overa copyeditor span and clicking carries out the advertisedaction. For example, clicking on the replacement, boldface,and first italicize spans in the current example produces theresult shown in Figure 8.

Fifth, the document fortuitously demonstrates robust ref-erences. Note that the page begins enigmatically with thephrase “I missing some.” This text is actually in the DARPAsource page at the time this hub document was opened tocreate this figure, no doubt due to an error by its author. Thistext was not present when the annotation were created, andno doubt will be removed soon. Indeed, the DARPA home

behavior supporting such spans should be easily accommodated within theMVD framework.

FIG. 4. An image document with some spans. The spans shown here include a hyperlink (presented by an underscore), and a highlight (yellowbackground). Spans corresponding to the current selection (gold background) and search results (red boxes) are also shown. A menu (Anno) has been pulleddown, revealing entries for creating hyperlink, highlight, anchor, and blinking text spans (among other things). If one is now chosen, the current selectionspan will become the chosen form of span.


page has changed many times since this hub document wascreated. Nevertheless, the annotations still align perfectlywith their designated text. (Robust references are discussedin the Robust References section.)

Executable copyediting marks illustrates the separationbetween document structure and media type in MVD. Thecopyediting spans refer to leaves in the MVD documenttree. The tree is essentially the same for documents createdfrom scanned images or HTML, although in the former casethe tree is created from the (generally simple) structuregiven to us by an OCR process, and in the latter, from theparse tree for an HTML document. However, the leaves ofthe tree for a scanned image refer to image regions; theleaves of the tree for HTML contain strings of text. Thelatter are relatively easy to manipulate; the former are muchharder. For example, it is relatively easy for a behavior toimplement “italicize” for text, as doing so corresponds tosetting a graphics property. If we wanted this same co-pyediting behavior to be executable in images, one wouldhave to implement a transformation specific to that mediatype. Doing so would require a considerable amount ofeffort, for marginal utility, so we have not done so. Ofcourse, someone believing that this functionality would bevaluable, can implement such behaviors within the frame-work.

Lenses

Spans allow reference to the fine-grain structure of adocument. Another class of multivalent behavior, lenses(Bier, Stone, Pier, Buxton, & DeRose, 1993), affect geo-metric regions of a document’s appearance. Like spans,MVD lenses can modify content display parameters as wellas receive events.

Figure 9(a) illustrates three instances of two kinds oflenses. Toward the left is a “Bit Magnify” lens, whichenlarges the image underneath it. In the middle-right of thepage is a “Show OCR” lens. Inside this lens, the image textis replaced by the results of an OCR process, rendered in thefont the OCR software estimates for the original text. To-ward the lower right portion of the screen is a second “BitMagnify” lens. This lens overlaps the corner of the “ShowOCR” lens. Where the lenses overlap, the effects compose(in this case, revealing the OCR process mistook the symbol“f” for a “J”).

A naturally annotative use of lenses is to implementnotes, which are just “opaque” lenses. Notes can containtheir own document contents, and therefore, make their ownuse of behaviors. In Figure 9(b) we show an HTML page,annotated with a note whose contents include a hyperlink toa location further down the page, allowing the reader toclick on the link to be transported to an off-screen comment.

FIG. 5. Copyediting spans on a scanned image.


Structures

Structural behaviors hook into the tree, representing afunction applicable to a structurally meaningful portion ofthe document. Whenever an action is happening in that areaof the document, structural behaviors are given an oppor-tunity to modify the results. Structural behaviors can investincremental knowledge into a document or leverage existingstructure.

As an example of a structural behavior that invests someincremental information, recall that the user can selectwords in the document image and paste the correspondingOCR. If further structuring can be imputed to a region, itmay be useful to paste different text more directly suited toanother application’s input. In Figure 10(a), the selectionincludes a bibliographic entry. To incorporate this entry intoanother application, one could start by pasting the OCR textand editing it as necessary. Instead, we have created a“Biblio” behavior that automatically performs such func-tions. First, the behavior automatically identifies an affectedregion to the user when a relevant structure is included inthe selection by highlighting the screen region correspond-ing to the bibliographic structure. The behavior, havingrecourse to a semantic description of fields for author, title,

pages, and so on, affects the Clipboard protocol, automati-cally inserting BibTeX- or refer-formatted text, as the userchooses. Once in the selection buffer, of course, this for-matted text could be pasted into any application, as evi-denced in Figure 10(b), which contains transformations ofthe text pasted into another application. The formatted textis computed on the fly, so that adding an additional outputformat merely requires coding the appropriate formattingstatements.

A similar “alternative select and paste” has also beenimplemented for mathematics, with a fixed set of outputformats (Lisp and TeX) available at this time.

This example of structural annotations is not intrinsicallyannotative. From the point of view of a user, the ability toselect bibliographic alternatives can simply be a usefuldocument feature. In this example, the alternatives wereadded by someone other than the original author, lendingthem an annotative quality. Similarly, the table sortingbehavior described in the previous section can be viewed asan annotative structural behavior, as it uses informationfrom an addition source (in this case, further documentanalysis) to alter the structure of the document, first byimposing a table on top of what was previously a linear

FIG. 6. A scanned image with extra line spacing. The various spans remain coherent.


document, and then by modifying the details of this struc-ture give user interactions.

An example of a type of annotation that combines struc-tural and span annotations is described in Phelps (1997).

Robust References

As MVD documents are composed of distributed layers,they are likely to be under the control of different authori-ties. Indeed, such is the normative case for virtually all theannotative behavior examples just shown. We are forced toassume that the owner of the annotated content may changeit. However, we do not want to impose a requirement ofstrict coherence between layers, as doing so would violatethe spirit spontaneous collaboration we seek to facilitate.Nevertheless, we would like the annotation to still applyafter asynchronous editing, assuming that it still has a ref-erent, for example, if a copyeditor annotation is attached toa span, and a section is inserted before or after that span, wewould still like the copyeditor annotation to apply.

To achieve this end, we provide a means of specifyingplaces in a document that promises to be robust in thepresence of asynchronous layer changes. We do so bygenerating redundant descriptions of document places. Eachsuch place reference includes (a) the place’s structural po-

sition in the tree (similar to a HyTime (DeRose & Durand,1994) TREELOC),5 (b) an excerpt of the underlying text,large enough make the string unique in the document, and(c) a unique identifier. If the document is restored at a latertime with the base document or other layers upon which itdepends edited, a series of incrementally permissive back-off strategies tries to resolve the reference to the newappropriate location. That is, if the referenced structuralposition does not correspond to the excerpted text, a heu-ristic is employed to try to find the best location that mightcorrespond to the reference.

Note that the ability to generate robust place refer-ences is not part of the MVD architecture per se. Rather,it is a service that we package along with the infrastruc-ture, because we view the need as significant. Had we notincluded such a facility, behavior authors could includeone of their own. Moreover, should a behavior author notbe satisfying with our service, that author is free to

5 Actually, a place typically includes an offset into a leaf, which has amedium-specific interpretation. For example, in a plain text document, it isuseful to make the nodes into words (or larger units) for the sake ofefficiency, so the offset is needed to designate an intraword position; if thenode is a video, the offset would be needed to designate a place in termsof frames.

FIG. 7. Copyediting marks on an HTML document. The copymarks appear in a magenta font. The first, attached to “DARPA” on the first line of text,merely makes a comment. The next four suggest, in order, the following actions: replacing the word “It” with “DARPA”; setting “dramatic advances” inboldface, and italicizing the spans “traditional military roles” and “dual-use applications.”


include a variant facility, or a different one altogether.For example, a behavior that could not adequately re-solve its reference might e-mail a notice to its authoradvising that individual of the need for manual revision;copyeditor behaviors might include an additional check

to see if the correction they describe is still valid, andnot show up at all if they align but are no longer neces-sary.

In general, we have packaged in with the infrastructureservices we suspect behavior writers (including our-

FIG. 8. The HTML page after executing three copyediting marks: The first replaced “It” with “DARPA”; the second set “dramatic advances” in bold face;the third italicized the span “dual-use applications.” Note that the layout of the document was automatically and incremented recomputed to reflect newtext characteristics.

FIG. 9. Geometric or lens behaviors: (a) shows one “Show OCR” and two “Bit Magnify” lens, in composition in overlapped regions. (b) shows a noteand an magnify lens on an HTML document. The note contains a hyperlink to additional annotations offscreen.


selves) will generally find useful. It is important forthe reader to bear in mind that properties such asrobust references (and indeed, the annotative and otherbehavior we have demonstrated) are not part of themodel, but merely applications that the model facilitatesproviding.

Other Media Types as Documents

We have been discussing a document-centric view ofcollaboration, but have not been explicit about what com-prises a document. Instead, we have assumed that docu-ments are essentially familiar, structured, “text-centric” ob-

FIG. 10. An example of structural behaviors. (a) Shows a page image in which behaviors associate with the subtrees corresponding to bibliographic entriesthe semantic contents of those entries. Here the selection completely spans one entry, which is automatically highlighted (by the labeled box). Now theselection will be transformed according to the entries under the “Selection” menu. In this case, the selection will ignore incomplete bibliography regions,and content of the bibliographic entry is generated in BibTeX format before being place in the clipboard. (b) Shows the results of choosing the “OCR,”“BibTeX,” and “refer” menu entries, respectively, from this menu, prior to selection, and then pasting the resulting selection in another application.


jects. And, indeed, much of the power of MVD is availableonly when there is some interesting structure with which towork. However, we are interested in exploring whetherother data types can be productively viewed within thisframework, especially those that have temporal and spatialextent. In particular, geographic information is an interest-ing example of data with spatial extent. Geographic data fitnicely into the MVD model, as they are conventionallythought of as layers of materials that align based on com-mon geographic references. A layer might be an image,such as a map or aerial photograph, or a set of vectors orpoints that have some geographic meaning, for example, thelocation of every dam in the state of California, or all thestreets in the San Francisco Bay Area. Behaviors wouldprovide manipulations of these layers, such as panning andzooming. A geographic “document” would be a collectionof related layers, for example, one whose vectors denote allthe streets in a location and another an image providinghigh-resolution photographs of the same area. Annotationwould be comprised of adding a new layer of geo-posi-tioned data that would comment on the data in other layers.

To experiment with this idea of geographic data types asmultivalent documents, we built a separate prototype, calledGIS Viewer(Geographic Information System Viewer). LikeMVD proper, GIS Viewer is a Java applet.6 The reason fordeveloping a separate code line, rather than attempting tosupport geographic data with MVD proper, was twofold:first, although we thought the general idea of MVD madesense for GIS, we were agnostic about whether a singlearchitecture for textual and geographic data was possible ordesirable: the ways in which the data types are manipulatedshare little in common, and the two types do not readilycompose. Second, there were many issues specific to GISdata manipulation that needed to be addressed whether ornot we integrated these functions in the MVD documentarchitecture. Hence, we decided to pursue the developmentof GIS Viewer in parallel with MVD document infrastruc-ture, putting off into the future exploring their possibleintegration into a single architecture.

In the GIS Viewer, a “document” is a collection ofgeo-referenced data sets. Behaviors are currently fixed inthe application, i.e., the application allows panning, zoom-ing, issuing queries about the data viewed, and makingannotations. Layers can also be turned on and off, anddisplayed semitransparently, so that a covered layer canbleed through a semitransparent covering layer. The GISViewer supports a number of different geographic datatypes, include several vector and several image formats, thedetails of which we will not elaborate here. Unlike MVD,GIS Viewer is not designed to be readily extended by usersor third parties, and hence, these behaviors are not loadeddynamically.

As an example, one geographic “document” we authoredspecifies a large number of distinct layers for northernCalifornia, including SPOT imagery, water shed bound-aries, vegetation maps, and so forth. This document is meantas a useful application for environmental planning. Figure11 shows the GIS Viewer displaying this document. On theleft are the layers available. The highlighted layers arerendered in the center canvas, and may be turned on or offby the user. For example, the “Shaded Relief (USGS)” layercontributes the large elevation map; the “Major N. CoastWatersheds” layer contributes the large colored regions.The indentation of layers shows containment. For example,the layer labeled “Russian River Region” itself contributesto the image the light box approximately in the center,outlining the Russian River region. The seven layers listedunderneath this entry contain data that are within this re-gion. In Figure 11, only the “Vegetation” layer is turned on,causing the small colored regions (shown in black and whitehere) to be rendered inside the Russian River bounding box.

Note that some of the layers are rendered semitranspar-ent. For example, one can see the texture of the elevation(“Shaded Relief”) layer through the Major North CoastWatershed layer, and through the Russian River Vegetationlayer.

The behaviors available allow the user to pan around andzoom in an out of the image. In particular, the user may grabthe center image with a mouse click and drag it so that adifferent portion of the layers is exposed. The small imageon the upper right shows the center viewing area as a meshwithin an image that bounds all the data. One can also movethe pan by grabbing and dragging the mesh. The user canalso zoom to a specific altitude, or zoom in and out. Finally,the user can issue queries that make useful (predefined)computations on the layers turned on.

Geographic Annotations

Our sample document also contains previously createdannotations. These annotations are simple geographic layerscontaining geo-positioned marks and text. Four such layers,all listed as Annotations within the Bay Area Region, areincluded in this document. First, let us pan over to the BayArea, zoom in a bit, and turn on the layer labeled “Univer-sities.” The result of these actions is shown in Figure 12.

The small labeled rectangles in Figure 12 are from the“Universities” layer. These show the position of UC Berke-ley and Stanford University. These rectangles are “geo-graphic hyperlinks,” meaning that moving the cursor withineither of these regions and clicking results in a pan to somecoordinates, a zoom to some altitude, and layers beingturned off and on.

Figure 13 shows the result of clicking within the “U. C.Berkeley” hyperlink. All the layers displayed in Figure 12have been turned off, and two other layers are turned on,one showing a high-altitude geo-rectified photographs of theUC Berkeley area, another showing yet another geographichyperlink annotation (“North Campus”). Following this link

6 It is available for use at http://elib.cs.berkeley.edu. The “GIS Viewer”tour is recommended.


FIG. 12. The Bay Area, with the “Universities” annotation layer turned on. (Note that the mesh in the upper right hand canvas is now lower andconsiderably smaller, as we have panned and zoomed from the previous figure.

FIG. 11. GIS Viewer showing a set of layers for Northern California. The middle canvas shows at a given apparent altitude the portion of the layershighlighted at the left. The behaviors at the right allow the user to pan, zoom, and issue queries. The small at the upper right shows the middle canvas asa mesh over the entire extent of the available layers.


produces the configuration shown in Figure 14. In Figure14, the layer for the North Campus has been turned off, andone containing another layer is turned on. This layer showsthe location of several buildings on campus, and providesconventional hyperlinks to pages describing those build-ings. In addition, a layer containing every street in the cityof Berkeley is turned on, and is overlaid on top of thehigh-altitude photographs. (Note that we have also zoomedin considerably closer; the mesh in the upper right handcanvas is now visible only as a small dot.)

Distributed Spatial Annotations

As the above example illustrates, there are GIS Viewerlayers that have a useful annotative function. These layersare data in one type of vector format supported by the GISViewer. This data format consists of elements, each ofwhich are geo-positioned points, lines, or polygons, anassociated text string, and an optional behavior. The exam-ple above illustrates two forms of behavior, namely, that ofgeographic hyperlinks, and that of conventional hyperlinks.A third type of behavior causes a query to be issued to aremote service.

In the examples, all the data, including the annotations,happen to reside on the same server. However, to supportspontaneous collaboration, it is necessary to allow any user

to create spatial annotations, store them separately fromother data sources, such as ours, but have them composewith such sources so as to comprise in situ annotations ofsuch data.

To support spontaneous collaboration over geographicresources, we provided the GIS Viewer with behaviors thatallow authoring of annotative geographic layers, and withthe ability to save state once such layers have been created.For example, in Figure 15, we show the top portion of theGIS Viewer, which we have clipped from the previousfigures. Here, our geographic document contains just thehigh-resolution photograph layer for the Bay Area, to sim-plify the presentation. The figure shows this layer, but withlabeled vector layers of various sorts having been added,and another (“Line”) layer in the process of being added.Each layer can be seen in the layer list. These layers can beeither dots, rectangles or lines, and may have hyperlinksassociated with them.

In addition to authoring new layers, the user may removeexisting ones, and thus arrive at a desired collection of oldand new layers. The user can then save the current set oflayer references to a file, similar to the hub document usedby MVD. Here, though, because there are only data layers,and no behaviors to save, the document saved is just anHTML document containing an applet call to the GISViewer, which supplies the applet with the layer references

FIG. 13. The result of following the geographic hyperlink to the UC Berkeley campus. Note that in addition to changing the pan and zoom, previous layershave been turned off, and new layers turned on, specifically, one showing digital orthophotographs of the UC Berkeley area, one another annotation (“NorthCampus”).


of the layers in the configuration of the Viewer at the timeof the save. As with hub documents, the annotation layersare small, and hence, reside within the applet call, whereasthe more substantial layers are large, and are merely referredto by the layer references in the applet call.

The resulting HTML page in effect comprises a distrib-uted annotation of geographic information: if opened by auser, it will regenerate the GIS Viewer configuration at thetime of the save, and thus show the original data layers andthe annotations made on them. Subsequent users can, ofcourse, add their own annotation (or remove previous ones).

MVD and Collaboration over Library Resources

Above we have given some examples of how MVD canbe used to annotate documents of a variety of formats. It isimportant to emphasize that the technology allows suchannotation on documents housed on read-only repositories,without any special services being provided. Although wehave annotated our own documents in the examples above,we have done so not as a privileged user. Any networkeduser may use the same technology to annotate these docu-ments, or any other documents of supported types. The userwould simply end up storing a hub document wherever thatindividual has write permission. Anyone accessing these

hub documents using MVD would see that individual’sannotations on the subject document.

There are several reasons why this idea might beespecially significant for scholarly information use. Oneis that it might be used to significantly enhance thereviewing and editing portion of the scholarly informa-tion cycle. Reviewers could use MVD to make commentson a document, share the results with appropriate parties.Similarly, an editor could make typesetting commentsthis way, and the author could execute those that areheeded, and so forth.

More interestingly, one can continue this annotation pro-cess once a document is officially posted. Such runningcommentary has similarities to other collaboration mecha-nisms, such as newsgroups, but provides for much moreprecise and highly functional commentary. In addition, ex-tension of MVD and GIS Viewer would allow annotation ofand collaboration for primary data sources, as well as de-rived work. The spontaneous collaboration theme meansthat users could enter into such forms of collaboration at anypoint, without any prior administrative coordination be-tween parties.

Mechanisms to find annotative hub documents and man-age large collections of annotations on a given document area focus of current research.

FIG. 14. Following a geographic hyperlink. The layer corresponding to the previous geographic hyperlink is switched off, and another, containing severalconventional hyperlinks (corresponding to the home page of each of the named buildings) is turned on. In addition, a layer containing every street inBerkeley is overlaid on top of the high altitude photos.


Related Work

There has been a great deal of prior work in the area ofdocument models, geographic information systems, andcomputer-based collaborative work that we have drawnupon in this research. Most notably, OpenDoc and OLEboth view documents as comprising multiple embeddeddocument segments, each to be interpreted by separatesoftware components. MVD, in effect, provides a thirddimension. Thus, it is straightforward to introduce behav-iors into MVD that operate over multiple formats, whereasit is not possible to do so readily in OpenDoc or OLE.

There are a number of existing systems that supportvarious kinds of in-place annotation. These include theannotations facility in Lotus Notes™, which requires mak-ing available “hooks” for annotation attachment in a givendocument. ForComment™ supports individuals in a groupmaking comments on documents in most common wordprocessing formats. Markup™7 supports annotation, includ-ing copyeditor marks, in the MacIntosh environment; HotOff the Web (Insight Development) provides several differ-ent types of annotations on HTML by interoperating with aWeb browser; the NeXT OS provides blue-pencil markupover any document rendered as Display PostScript. These

operate at the graphics level, and hence, have the niceproperty that any document can be annotated. However,annotation is superficial, and all these models require buy into a particular system. Moreover, they do not readily sup-port open, distributed, or extensible annotations.

HyTime (DeRose & Durand, 1994) and the XML linkinglanguage provide means for marking and linking to spans inread-only documents. Despite some restrictions—e.g., doc-uments linked by XML linking conventions must be XMLdocuments—the concepts introduced by these languagesmotivated aspects of how we designed our document loca-tors. Of course, these languages do not in themselves dealwith document mechanisms, such as how to accommodatenew annotation functionality, etc.

Microcosm (Fountain, Hall, Health, & David, 1998),ComMentor (Roscheisen, Mogensen & Winograd, 1995),and Knowledge Weasel (Lawton & Smith, 1993) are exam-ples of systems mindful of the fact that it takes a great dealof effort to build a document formatter-renderer, and hence,follow a strategy of interoperating with existing formatter-renders. In contrast, we pursued a strategy that imposesup-front costs to bridge existing application formats into themodel, and to reproduce the desired pieces of functionalityin exchange for greater functionality.

ComMentor, Knowledge Weasel, and other systems fo-cus on the server side of annotation support. Such support7 http://www.mstay.com/.

FIG. 15. Geopositioned annotations being added. The document contains just the high-resolution photographs of the Bay Area. (There are also some“utility layers,” one for the crosshair and one for a grid, the latter being turned off.) Then layers for various streets, buildings, and landmarks have beenadded (not all visible in the viewable image. The GIS Viewer is in the process of adding a new line annotation.


would be useful in many applications, and may form auseful complement to the functionality we describe. How-ever, in general, we view the requirement of server supportas an impediment to the realization of spontaneous collab-oration, and hence, have gone to some trouble to provideannotation support that does not require such. In addition,our view is that powerful forms of annotation require deepcontrol over the client, as in situ annotation may affectformatting, and because new forms of annotation may notbe possible within the constraints of a given documentformat, and may require access to client capabilities notsubject to ready alternation. This is why systems such asthese tend to present annotations in separate window, ratherthan in situ.

Future Work

We intend to broaden and test the multivalent documentarchitecture to support an ever-widening variety of datatypes, and to test its use in real and diverse situations.Above, we have shown how multiple document formats canbe incorporated in the framework. We would like to makethe claim of format independent more forceful by develop-ing media adapters to handle a variety of other commondocument types, most notably XML and near-image for-mats like PDF. We have also shown how we can applysimilar ideas to geographic data; we plan to investigatewhether these same functions can be accommodated withinthe MVD framework proper. Along with geographic data isimage data generally, i.e., photographs, and more interest-ing, temporal media, such as video. For example, by givingbehaviors temporal extent, it should be possible for anno-tations to become available only during portions of a video,or to effect animations of various sorts.

Much work remains to be done to provide better anno-tative behaviors. For example, many of the behaviors shownabove have little capacity to edit an annotation once it iscreated. More interesting is the need to be able to handlelarge numbers of annotations on a given document, and toprovide annotations upon annotations. Extensions to handlethese issues are under investigation.

More broadly, we feel that support for spontaneous col-laboration over digital library resources can fundamentallychange the way information is created and used. We havefocused on annotation, because we feel it is important in itsown right and because its technological challenges seemcrucial to spontaneous collaboration in general. No doubtthere are other modes of annotation, and of collaboration in

general that we have not envisioned. Support for suchmodes of collaboration will help realize the full promisedigital media has to offer communities of scholars, scien-tists, and students.

Acknowledgments

Many members of the UC Berkeley Digital LibraryProject contributed to the ideas in this article. Gary Kopec,Loretta Willis, Wojciech Matusik, and Hoon Kang madedirect contributions. We would like to thank Carl Staelin,Nic Lyons, and Steven Rosenberg of Hewlett-Packard Lab-oratories for their many valuable suggestions on this work.The GIS Viewer was written by Loretta Willis, with sub-sequent enhancements by Jeff Anderson-Lee.

References

Bier, E.A., Stone, M.C., Pier, K., Buxton, W., & DeRose, T.D. (1993,August). Toolglass and magic lenses: The see-through interface. Pro-ceedings of SIGGRAPH ’93, (pp. 73–80).

DeRose, S.J., & Durand, D.G. (1994). Making hypermedia work: A user’sguide to HyTime. New York: Kluwer Academic Publishers.

Fountain, A., Hall, W., Heath, I., & David, H. (1990). Microcosm: An openmodel for hypermedia with dynamic linking. Proceedings of ECHT ’90.

Halasz, F.G. (1998, July). Reflections on notecards: Seven issues for thenext generation of hypermedia systems. Communications of the Asso-ciation for Computing Machinery, 836–852.

Lawton, D.T., & Smith I.E. (1993). The knowledge weasel hypermediaannotation system. In Hypertext ’93 Proceedings, November 14–18, (pp.106–117).

Levy, D.M., & Marshall, C.C. (1995, April). Going digital: A look atassumptions underlying digital libraries. Communications of the ACM.

Marshall, C.C. (1997). Annotation: From paper books to the digital library.Proceedings of the Second ACM Conference on Digital Libraries. (pp.23–26), Philadelphia, PA.

Phelps, T.A. (1998, December). Multivalent documents: Anytime, any-where, any type, every way user-improvable digital documents andsystems. UC Berkeley Ph.D. Thesis. UCB Division of Computer ScienceTechnical Report No. UCB/CSD-98-1026.

Phelps, T.A., & Wilensky, R. (1996). Multivalent documents: Inducingstructure and behavior in online digital documents. Proceedings of the29th Hawaii International Conference on System Sciences, January 3–6.

Phelps, T.A., & Wilensky, R. (1996). Multivalent documents: Architectureand applications. Proceedings of the First ACM International Confer-ence on Digital Libraries, March 20–23, (pp. 100–108).

Roscheisen, M., Mogensen, C., & Winograd, T. (1995). Beyond browsing:Shared comments, SOAPs, trails, and on-line communities. Proceedingsof the Third World Wide Web Conference, April 10–14.

Sellen, A. & Harper, R. (1997). Paper as an Analytic Resource for theDesign of New Technologies. Proceedings of CHI ’97, March 22–27.

Thompson, H.S., & McKelvie, D. (1997). Hyperlink semantics for standoffmarkup of read-only documents. SGML Europe ’97.