19
Putting documents into their work context in document analysis A. Salminen*, V. Lyytika¨inen, P. Tiitinen Department of Computer Science and Information Systems, University of Jyva ¨skyla ¨, PO Box 35 (MaE), FIN-40351 Jyva ¨skyla ¨, Finland Received 10 September 1999; accepted 9 November 1999 Abstract In trying to achieve document standardization the goal is to find more eective, consistent, and standardized ways to utilize information technology. The specification and implementation of document standards may take several years requiring a profound analysis and understanding of document management practices. Document standardization does not concern documents only: it concerns workers, their work, business partners, and future systems as well. In this paper we discuss two ways of describing the work context of documents: process modelling and life cycle modelling. In process modelling, documents are regarded as resources produced and used in inter- or intra-organizational business processes. Dierent types of documents are typically produced and used in a business process. In life cycle modelling work related to processing of a document of a specific type is described. The modelling methods have been tested in an SGML standardization project called RASKE during the analysis of four case domains: the enquiry process in the Finnish Parliament and Government, national Finnish legislative work, budgetary work, and the Finnish participation in EU legislative work. This paper discusses the modelling requirements in document analysis and describes the techniques used in the RASKE project. 7 2000 Elsevier Science Ltd. All rights reserved. Keywords: Document analysis; Document standardization; Process modelling; SGML; XML 1. Introduction The data volume in the electronic document repositories of organizations is growing fast, but Information Processing and Management 36 (2000) 623–641 0306-4573/00/$ - see front matter 7 2000 Elsevier Science Ltd. All rights reserved. PII: S0306-4573(99)00070-9 www.elsevier.com/locate/infoproman * Corresponding author. Tel.: +358-14-2603031; fax: +358-14-2603068. E-mail address: [email protected].fi (A. Salminen).

Putting documents into their work context in document analysis

Embed Size (px)

Citation preview

Page 1: Putting documents into their work context in document analysis

Putting documents into their work context in documentanalysis

A. Salminen*, V. LyytikaÈ inen, P. Tiitinen

Department of Computer Science and Information Systems, University of JyvaÈskylaÈ, PO Box 35 (MaE), FIN-40351

JyvaÈskylaÈ, Finland

Received 10 September 1999; accepted 9 November 1999

Abstract

In trying to achieve document standardization the goal is to ®nd more e�ective, consistent, andstandardized ways to utilize information technology. The speci®cation and implementation of documentstandards may take several years requiring a profound analysis and understanding of documentmanagement practices. Document standardization does not concern documents only: it concernsworkers, their work, business partners, and future systems as well. In this paper we discuss two ways ofdescribing the work context of documents: process modelling and life cycle modelling. In processmodelling, documents are regarded as resources produced and used in inter- or intra-organizationalbusiness processes. Di�erent types of documents are typically produced and used in a business process.In life cycle modelling work related to processing of a document of a speci®c type is described. Themodelling methods have been tested in an SGML standardization project called RASKE during theanalysis of four case domains: the enquiry process in the Finnish Parliament and Government, nationalFinnish legislative work, budgetary work, and the Finnish participation in EU legislative work. Thispaper discusses the modelling requirements in document analysis and describes the techniques used inthe RASKE project. 7 2000 Elsevier Science Ltd. All rights reserved.

Keywords: Document analysis; Document standardization; Process modelling; SGML; XML

1. Introduction

The data volume in the electronic document repositories of organizations is growing fast, but

Information Processing and Management 36 (2000) 623±641

0306-4573/00/$ - see front matter 7 2000 Elsevier Science Ltd. All rights reserved.PII: S0306-4573(99)00070-9

www.elsevier.com/locate/infoproman

* Corresponding author. Tel.: +358-14-2603031; fax: +358-14-2603068.E-mail address: [email protected].® (A. Salminen).

Page 2: Putting documents into their work context in document analysis

the diversity of the document formats and systems, as well as continuing changes in theinformation technology, cause problems in the access and use of the information needed inwork tasks. The problems concern both companies and public sector organizations. Theseproblems have prompted organizations to start major document standardization projects wherethe intention is to agree upon rules which de®ne the way information is represented indocuments. The rules are needed in order to achieve more e�ective, consistent, and stable waysto utilize information technology in business processes. Problems with technological changes,and in the maintenance of long-term access to digital documents have motivated the search forapplication independent formats for documents. SGML (Standard Generalized MarkupLanguage) is an international standard for de®ning and representing documents in anapplication-independent form (Goldfarb, 1990). A subset of SGML called XML (ExtensibleMarkup Language) has been developed especially for specifying document standards to be usedin Web information systems (Bray, Paoli & Sperberg-McQueen, 1998).In SGML/XML standardization projects, a profound document analysis is needed. The

analysis is usually seen as an analysis of document structures (Travis & Waldt, 1995, Watson& Shafer, 1995, Maler & El Andaloussi, 1996, Magnusson SjoÈ berg, 1997, Weitz, 1998).Successful implementation of document standards in enterprises however requiresunderstanding of the role of documents in work processes. Especially in cases where thestandardization concerns several document types and the document production is part of inter-organizational business processes, the analysts as well as the actors in processes should be ableto see the process context of documents. In this paper we discuss the work process modellingas part of document analysis. We will introduce the modelling techniques used in a majorstandardization project called RASKE where the standardization has concerned the documentscreated in the Finnish Parliament and ministries (Salminen, Lehtovaara & Kauppinen, 1996,Salminen, Kauppinen & Lehtovaara, 1997, Salminen, Tiitinen & LyytikaÈ inen, 1999).The rest of the paper is organized as follows. Section 2 introduces a model for electronic

document management environments and de®nes the notions related to the model. Documentstandardization of enterprises is discussed in Section 3. As an example of a standardizationproject the RASKE project is introduced. Work process modelling approaches in otherapplication areas and needs in the document analysis of a document standardization projectare discussed in Section 4. The techniques used in the RASKE project are described in Section5. Experiences and implications from the RASKE project are discussed in Section 6.

2. Electronic document management environments

Organizations use documents as a means for information management: a means to cluster,organize, store, transfer, and use information to ful®ll their organizational purposes. The termelectronic document management (EDM) refers to the use of modern information technologyfor the purpose (Sprague, 1995). In document standardization it is important to identify, notonly documents and their structures, but also other entities of the EDM environment wherethe documents are created, manipulated, and used.Fig. 1 shows a model for an EDM environment using the central notions of information

control nets (ICNs): activities and resources (Ellis, 1979). Information is produced and used in

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641624

Page 3: Putting documents into their work context in document analysis

activities. The resources are information repositories where information produced can bestored, or from where information can be taken. The dashed lines in the ®gure denote theinformation ¯ow from and to resources. The set of activities is denoted by a circle and theresources by rectangles. The resources are divided into three types: documents, systems, andactors. Documents consist of the recorded data intended for human perception. A documentcan be identi®ed and handled as a unit in the activities, and it is intended to be understood asinformation pertaining to topic. Since the documents in an EDM environment are mostlydigital, it means that information technology is needed and utilized to operate on documents.Hence systems, i.e. hardware, software, and applications, are essential resources in an EDMenvironment. On the other hand, since the information in documents should be available alsoafter system changes, it is also important to separate the documents from systems as resources.Finally, the actors are people and organizations performing activities and using documents aswell as systems in the activities. In some fully automated activities a software system mayperform an activity (for example, create an email message and send it to a repository). In thispaper we will however consider activities where the actors creating and using documents arepeople and organizations. In relationship to documents and systems, actors are called users.Actors are grouped by roles. A role speci®es the tasks, responsibilities, and rights of an actorin an activity, as a user of a system, or as a user of a document repository.Information pieces needed and produced during an activity are stored in many di�erent

ways: in the heads and experience of people, in the organizational culture, as hardware andsoftware solutions, and as data in documents and applications. If the notion of information isunderstood according to the sense-making theory of Dervin (1992) as `the sense created in asituation, at a speci®c moment in time and space by a reader' (where Dervin means a humanreader), then information is subjective and the information needed by a person in order toperform an activity may be a complicated combination of pieces coming from di�erent sources.An EDM environment may be in a single organization. In the current networked world

however, business processes often concern several organizations and resources are shared moreor less by those organizations. Thus the EDM environments in which a speci®c organization orperson is involved may be quite complex.

3. Document standardization

One of the approaches for improving business processes is document standardization using

Fig. 1. Components of an electronic document management environment.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641 625

Page 4: Putting documents into their work context in document analysis

application-independent standard formats. In the standardization the idea is to plan digitalinformation structures and formats taking into account future changes in systems instead ofplanning them for a speci®c software system. The rules associated with a document, documentauthoring, and its storage format are intended to help consistent understanding of the contentby the authors and di�erent readers also in situations where the software and hardwarechanges. Sprague (1995) suggests the development of an electronic document managementstrategy in an organization. Standardization can be taken as such a strategy.

3.1. RASKE as a standardization project

One example of a standardization project is RASKE. The term RASKE comes from theFinnish words `Rakenteisten AsiakirjaStandardien KEhittaÈ minen' meaning the development ofstandards for structured documents. The project was commenced in spring 1994 by the FinnishParliament and a software company in cooperation with researchers at the University ofJyvaÈ skylaÈ . The Ministry of Foreign A�airs, Ministry of Finance, Prime Minister's O�ce, and apublishing house also participated in the project.Starting the RASKE project was motivated by document management problems in the

Finnish Parliament and government. Teams studying the legislative work carried out inParliament identi®ed, for example, the following problems concerning document management(Salminen et al., 1997):

1. Incompatibilities of the systems used caused the need for repeated typing of the same pieceof text, which in turn was a potential source of inconsistencies in documents.

2. Inconsistencies in document naming and document identi®ers caused problems and extrawork.

3. Lack of information management coordination between the ministries, and between thegovernment and Parliament.

4. In spite of the fact that almost all of the documents were digital, documents were mostlydistributed on paper.

5. The retrieval techniques of di�erent systems were heterogeneous.6. The retrieval techniques of the electronic archiving system and the tracking system of

Parliament were not satisfactory.7. Uncertainty concerning the future usability of the information in the archived digital

documents.

The document analysis in the RASKE project concerned four domains: the enquiry process,national legislative work, Finnish participation in EU legislative work, and the creation of thestate budget. During the case analyses, various methods of analysis were tested and developed.Preliminary DTDs were designed for 21 document types including, for example, GovernmentBill, Government Decision, Government Communication, Private Bill, Special CommitteeReport, Budget Proposal, and Communication of Parliament.The work of the RASKE project during 1994±1998 has been followed by several projects

where selected companies have developed and implemented SGML solutions for a speci®csubset of documents. The ®rst implemented document repository in SGML form was thearchive of laws and statutes, which was published by the Ministry of Justice in 1997. In

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641626

Page 5: Putting documents into their work context in document analysis

Parliament, the application of SGML started in 1998, and the Ministry of Finance preparedthe Budget Proposal for 1999 in SGML form. All of these documents are available to allcitizens on the Internet free of charge. The analysis and methodology development work of theRASKE project is continuing in the EULEGIS project (European User Views to LegislativeInformation in Structured Form) funded by the Telematics Applications Programme of theEuropean Commission and involving companies, universities, and public sector organizationsin di�erent European countries.

3.2. The standardization process

Inter-organizational document standardization covering many document types is anextremely complicated task and it may take several years before the ®rst standards areimplemented. Despite the availability of prede®ned standards for some sectors, for example,the CALS DTDs for the armed forces and industry (Smith, 1990), the TEI standard forhumanities (Sperberg-McQueen & Burnard, 1994), and the DocBook DTD for books (TheDavenport Group, 1998), choosing standards to e�ectively support business goals and activitiesnonetheless requires a lot of work and rethinking about document management in relation tobusiness processes. A survey in Finland in 1997 showed that a lot of Finnish companies andpublic sector organizations were interested in and motivated to use SGML, and were alsoparticipating in SGML activities, but few had operational solutions outside HTML(LyytikaÈ inen, 1998).Fig. 2 shows a model for SGML-based standardization in an organization. The circles depict

activities, also called phases, and the arrows show the control ¯ow specifying the order forstarting the activities. The small black circle indicates that all of the following three activitiescan be started either in parallel or in any order. The analysis phase produces preliminarystandards for a speci®ed domain. Design of new solutions usually requires evaluation of thepreliminary DTDs parallel with the development of new document production practices andselection of new systems. The new EDM solution will then be implemented. The solution mayrequire a lot of changes in the document processing (Braa & Sandahl, 1998). Hence carefulevaluation of new solutions and training of the workers is important. After evaluation, theimplementation may need corrections, further redesigning may be called for, or a new domainis selected for analysis.

4. Modelling work processes

4.1. Work process modelling approaches

The major purpose of work process modelling has been to improve the e�ciency of workprocesses in organizations. Means for improvements di�er depending on the area where themodelling has been deployed. In business process reengineering work process modelling startsfrom the current business processes (Kettinger, Teng & Guha, 1997). After the identi®cation ofine�ciencies in the processes, process redesign is used as a means for improvements. O�ceinformation systems (Ellis & Nutt, 1980), work ¯ow management systems (Bussler, 1995,

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641 627

Page 6: Putting documents into their work context in document analysis

Medina-Mora, Winograd, Flores & Flores, 1992), process-driven software developmentenvironments (Curtis, Kellner & Over, 1992), and some computer supported cooperative worksystems (Sarin, Abbott & McCarthy, 1991) o�er tools for describing work processes andautomated support for the process. The speci®cation of the process may be preceded byredesign. Thus the improvements are achieved by automation on one hand, and by changingthe process on the other hand.Modelling based on speech-act theory (AuramaÈ ki, Hirschheim & Lyytinen, 1991) and

coordination theory (Malone & Crowston, 1994) describes communication and coordination inwork processes, respectively, and improvements in processes are achieved by improvingcommunication and coordination. Modelling by the methods discussed above concerns usuallyone organization; the methods do not support inter-organizational modelling andimprovements in inter-organizational processes (Tolvanen & Lyytinen, 1994).

4.2. Modelling work processes in document analysis

One of the goals of the analysis phase as part of document standardization (Fig. 2) is toprovide preliminary document standards. The design of e�ective and useful standards howeverrequires a lot of knowledge about the context where documents are created and used, and by

Fig. 2. A model for the document standardization process.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641628

Page 7: Putting documents into their work context in document analysis

whom they are created and used. Along with the development of document standardsdocument analysis thus has other important goals:Understanding the domain and the role of documents on the domainThe document management environment may be a complex network of organizations,

people, activities, and documents related more or less to each other. The standardization workis often at least partly outsourced. Understanding the complex domain may be di�cult even topeople who have worked in the environment for a long time. The standardization experts andconsultants may be unfamiliar with the domain, the terminology used there, and the wayspeople work on the domain. The analysis should o�er tools for gaining the understanding.CommunicationThe standardization process may take several years of cooperation by in-house workers and

consultants. The analysis should help to ®nd a common understanding of the domain.Understanding the needs of the actorsTo succeed in document standardization, the requirements of organizational actors as well as

person actors should be carefully analyzed and taken into account in the new solutions.Support the redesign of document processingDocument standardization often means the redesign of old document types and the redesign

of the way documents are produced. For being able to implement e�ective documentprocessing practices the old practices have to be understood.

To achieve these goals, the analysis has to provide not only descriptions of documents, butalso descriptions of activities and actors as well as their tasks and needs related to documents.Fig. 3 shows a model for the document analysis process. The analysis starts by de®ning anddescribing the domain to be analyzed. The domain is an activity whose document managementthe standardization and improvements will concern. An example of an analysis domain couldbe Creation of the State Budget or Paper Machine Manufacturing.The domain de®nition is followed parallel by process modelling, document modelling, and

role modelling. Within the modelling, work activities are described in two di�erent ways. First,process modelling is used as a means to identify smaller activities of the domain activity, theorganizations responsible for them, and documents created or used in those activities.Document modelling gives more detailed information about a chosen subset of the documenttypes identi®ed in the process. The work concerning a document of a type during its life cycleis described in a state model as part of document modelling. Altogether document modellingconsists of three phases: object modelling, life cycle modelling, and content modelling. Inobject modelling, document objects and their relationships to each other are described. The lifecycle model shows the dynamic behavior of a document object over time. In content modelling,the structure of documents is analyzed. Content modelling includes the design of preliminarySGML or XML DTDs. In role modelling the most essential document users, their documentmanagement activities, and relationships to each other are described. The user needs analysisstudies the needs concerning future document management. At the end of the analysis process,the analysis report is collected on the basis of the descriptions and models produced in earlierphases. In the following section we will discuss in more detail how the modelling of work hasbeen carried out in the RASKE project. The other modelling methods and techniques used fordocument analysis in the project are described in Salminen et al. (1997) and Salminen (2000).

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641 629

Page 8: Putting documents into their work context in document analysis

5. Techniques for modelling work in the RASKE project

A wide variety of techniques have been applied for work process modelling. Most of thetechniques originate from the software engineering ®eld where the techniques have been usedto describe either automated software processes or the production processes where software isproduced by people with the help of some tools. Curtis et al. (1992) listed 12 language typesused as bases for process modelling. Among the well-known techniques are, for example,procedural programming languages, data ¯ow diagrams, Petri nets, state-transition diagrams,and control ¯ow diagrams. During the last years there has been a lot of activity for ®nding auni®ed approach to the modelling needed for information systems development. The resultreached through the work is a set of graphical notations called Uni®ed Modeling Language, inshort UML (Booch, Rumbaugh, & Jacobson, 1999). It is a synthesis from di�erent earlierobject-oriented modelling techniques.

5.1. Requirements

In choosing and testing di�erent modelling techniques in the RASKE project we haveidenti®ed the following as the major requirements for the techniques to be used for modellingwork in document analysis:

1. The models have to be descriptive showing how the work is actually done. In the redesign of

Fig. 3. A model for the document analysis process.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641630

Page 9: Putting documents into their work context in document analysis

work processes the same techniques may be used for prescriptive models showing how thework should be done after the changes have been implemented. However, these modelsshould remain as descriptive documents after the implementation phase. The models areintended for human readers to support communication and understanding the role ofdocuments in their work. The models need not show all special cases. The models shouldallow human judgement in determining the detailed meaning of the models in special cases.

2. The models have to be graphical, as clear as possible having few notions and symbols,supporting intuitive understanding. The major purposes of the models are to support theunderstanding of the domain and communication between people. The models are notintended to automate the processes described.

3. To avoid too many models in one communication situation the models should allowshowing the most important entities in the same model: activities, actors, and documents.

4. In the models describing the creation and use of documents in business processes thecapability for separating the control and data ¯ow is needed. The models should be able toshow what is the document repository created in a process and how the repository iscreated. Separate description of data and control ¯ow also allows the use of the samesymbols to describe control ¯ow between activities only, or data ¯ow only when needed.

We found two modelling techniques especially suitable for the requirements of the RASKEproject: Information Control Nets (Ellis, 1979) and state transition diagrams of the Object-Oriented Analysis method (Shlaer & Mellor, 1992). Techniques very similar to these two arealso available in the Uni®ed Modeling Language (Booch, Rumbaugh, & Jacobson, 1999) asdiscussed below.

5.2. Applying ICN graphical techniques

Compared to other techniques, the important feature in Information Control Nets (ICNs) isthe clear separation of the control ¯ow and information ¯ow. The same kind of separation canalso be seen in the UML's action-object ¯ow relationships diagram, which is a variation of thebasic action diagram (UML-1.1, 1997). In UML, however, if one activity produces outputwhich in turn serves as input to the next activity, the control ¯ow and information ¯ow aredrawn as one ¯ow. In ICNs the distinction between the data and control ¯ow is alwaysexplicit, and di�erent ICN types can be used to show either data ¯ow only, control ¯ow only,or both of them together. Fig. 1 in the introduction was already an example of an ICN. It wasa metamodel for EDM environments and it showed only information ¯ow but no control ¯ow.Figs. 2 and 3 instead were examples of ICNs with control ¯ow but no information ¯ow.Basically, the ICN models de®ne a set of activities, a set of resources, a binary relation

indicating the control ¯ow among the activities, and binary relations indicating the information¯ow from resources to activities and from activities to resources. The actors of activities can bedescribed in two alternative ways: either as resources or as attributes of activities. In thegraphical ICN models, the activities are usually depicted by circles, resources by rectangles,and binary relations by arrows so that control ¯ow is shown by solid lines and data ¯ow bydashed lines. Alternation and concurrency are controlled by special activities denoted byspecial symbols.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641 631

Page 10: Putting documents into their work context in document analysis

In inter-organizational document analysis it is important to identify the organizationsparticipating in the domain activity. A graphical description of the participative organizationscan be given by an ICN as shown in Fig. 4. The ®gure depicts the parties in the domain`Creation of the State Budget' by a model called organizational framework. The domain activityis the only activity in the model, the participating organizations are given as input resources.The broken arrows from the organizations to the activity indicate the information ¯ow. Thearrows are labelled by identi®ers and associated with phrases brie¯y describing the tasks of theorganizations in the activity. Hierarchic relationships of organizations are indicated by nestedrectangles.In the process modelling phase of document analysis (see Fig. 3) the domain activity is

divided into smaller activities, and the ¯ow of control between the activities is indicated. Forthe purposes of document analysis the interpretation of original ICN control ¯ow arrows ishowever too restrictive. In ICNs a control ¯ow arrow from activity A to activity B indicates

Fig. 4. Organizational framework for the Creation of the State Budget.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641632

Page 11: Putting documents into their work context in document analysis

that A must be completed before B can start (Ellis, 1979). ICNs were designed for o�ceautomation and there the speci®cation of the relationship between the ending of one activityand beginning of another activity was important. In document analysis the intent is not toautomate business processes. The major objectives of process modelling are to help in theidenti®cation of the activities, actors, and documents, and to facilitate human understandingand communication. Therefore, the selection of the correct level of abstraction in the processmodels is important. In document analysis modelling concerns real-life work processes wheremany activities take place in parallel and the exact speci®cation of the relationship of theending of one activity and beginning of another activity is rarely important. Ellis and Wainer(1994) also identi®ed the need for looser interpretation of the control ¯ow in ICNs in somecases but they did not suggest any solution by which the loose interpretation could be includedin the semantics of the models. We regard the control ¯ow arrows between activities A and Bto represent weak control ¯ow, indicating that activity B begins after activity A has begun.Figs. 2 and 3 were examples of the use of weak control ¯ow to show rather complicatedprocesses at a very general level of abstraction. Both processes include a lot of iterative andparallel activity, but this is not indicated in the models by separate control ¯ow arrows. Thesemantics of the models, however, also covers it. We could use di�erent kinds of control ¯owarrows for `strong' and weak control ¯ow but we have never found the separation of the twoimportant in our analysis cases.In some process modelling techniques an important requirement for a process is that it must

have well de®ned input and output data ¯ow (Avison & Fitzgerald, 1996; Garvin, 1998). Indocument analysis instead it is important to be able to study the creation of the documentrepository of a process separately from the use of documents. Therefore we have de®ned twokinds of process models for document analysis: document output models and document inputmodels. A document output model is intended to show the documents created during a process,the activities in which the documents are created, and the actors by which they are created. Adocument input model instead shows the documents used in the activities of a process. In bothof the models the documents are indicated by labels associated with information ¯ow arcs, notby separate rectangles. As in ICNs in general, the information ¯ow arcs re¯ect possibleinformation ¯ow rather than necessary one.In Fig. 5 the process model of Fig. 3 for document analysis has been extended to a

document output model. The ®gure shows the products of di�erent phases in documentanalysis. The vertical balk line on the right represents the documentation created during anactual analysis process. Fig. 5 does not show the actors of the process. In modelling inter-organizational business processes it is important to show also the organizations performing theactivities. In document output and input models the actors are given as attributes of activities.Fig. 6 shows a document output model for the Creation of the State Budget. The number inparenthesis following Budget Proposal refers to di�erent versions of the same document object.A document input model is otherwise similar to the document output model except that

instead of showing the documents produced by the activities, it shows the major documentsused in the activities. In general, the documentation used in an activity by an organizationdepends on the information needs of the people working in that activity. Therefore documentinput models are designed in document analysis of a standardization project after the userneeds interviews.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641 633

Page 12: Putting documents into their work context in document analysis

5.3. The life cycle modelling

In a life cycle model, the dynamic behavior of a document object over time is described. Inthe RASKE project we have used the state transition diagrams of OOA (Shlaer & Mellor,1992) for the purpose. State-transition diagrams are available also in UML (UML-1.1, 1997).The main di�erence between the two techniques is the speci®cation of the timing of an actionrelated to a state. In UML actions can be speci®ed to happen either upon entry into a state,upon exit from a state or within a state, while in OOA actions are always executed upon entryinto a state (Mellor & Lang, 1997). We have used the simple form of state transition diagramswith a single kind of timing but the capability to specify the timing of an action might beuseful in some situations.

Fig. 5. Document output model for the document analysis process.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641634

Page 13: Putting documents into their work context in document analysis

Fig. 6. Document output model for the Creation of the State Budget.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641 635

Page 14: Putting documents into their work context in document analysis

A state transition diagram for the Private Bill object is shown in Fig. 7. A state transitiondiagram consists of states, events, transition rules, and actions. A state represents a situation ofa typical instance of the document object during its lifetime. Each document instance can onlybe in one state at any given time. In its creation state, a document instance comes intoexistence. An action is an activity that must be done with an instance upon arrival in a state.The activity changes the values of some attributes of the object. One action is associated with

Fig. 7. A state transition diagram for a Private Bill.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641636

Page 15: Putting documents into their work context in document analysis

each state but that action may consist of a sequence of activities. Since activities are associatedwith states, and an object is in only one state at a time in a state transition diagram, thediagram cannot be used to indicate parallel activities graphically. Parallel activities can only bedescribed in textual descriptions. An event represents an incident that causes an instance tomove from one state to another. A transition rule speci®es which new state is achieved if aparticular event happens in a certain state. In its ®nal state, the instance either becomesquiescent, or it vanishes. A quiescent instance continues to exist but has no interesting dynamicbehavior.The life cycle of the object in Fig. 7 has seven states; each numbered, named, and

represented by a box in the state transition diagram. The number and name of each state iswritten inside the box in bold. The actors and the action associated with a state are describedinside the box. An arrow between the boxes illustrates a transition rule. The arrow is labelledwith the event (label and meaning) that causes the instance to move from its present state to itssuccessor state. State 1 is the creation state and state 7 is the ®nal state. In some cases, adocument object may have a circular life cycle: it exists all the time and has an operationalcycle of behavior. In case of documents handled in legislative organizations, a document objectmostly has a born-and-die life cycle containing one or more creation states and one or more®nal states. Letter P in the text describing an action or event refers to a paper form and letterE to an electronic form.The ®gure shows that a Private Bill is handled in the early stages of its lifetime mostly as a

paper document. It may be written on a word processor but it is usually transmitted in paperform. It is not until its transfer from state 4 to state 5 that electronic means are employed.Thus digital form is not e�ectively used. Depending on needs, state transition diagrams may beused to describe di�erent kinds of activities concerning documents in a class. In the RASKEproject, life cycle modelling was primarily used to describe document production. In someother cases, diagrams could be used to show the handling of a document in its ®nal contentand form.

6. Experiences

In the previous section we introduced four kinds of models to describe work dealing withdocuments on a domain: the organizational framework description, the document outputmodel, the document input model, and the life cycle model. The organizational frameworkshows the organizational actors participating the work of the domain. The document outputmodel describes the document repository created during the domain activity and thesubactivities in which the documents are created or updated. The document input modelinstead shows the most important documents used in the subactivities. The two models can ofcourse be combined but we have found it in most cases more clear to keep them separate. Thelife cycle model shows the behaviour of a document of a speci®c type over time.During the document analysis of the RASKE project all four kinds of models were

important for the analysts to learn to know the domain and the role of documents there.Especially the life cycle models and document output models o�ered valuable means for thecommunication between analyzers and domain experts. Both of the models were used to ®nd

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641 637

Page 16: Putting documents into their work context in document analysis

out the actors and their roles on the domain. The life cycle models were also useful in showingthe de®ciencies in fully utilizing the electronic format of the documents.Among the di�erent kinds of models, the document output models have had most versatile

use. During the user needs analysis phase the document output model of a domain was sent toexperts of the domain before their interviews. In most cases the interviewees had studied themodel and were well prepared to discuss documents and work tasks in the model. Sometimesthey were able to show some misunderstandings in the model. The model clearly helped inidentifying the work tasks important in the information needs analysis based on the sense-making theory (Dervin, 1992). The interviews showed that people working in di�erentorganizations had di�erent views of the document management in a work process as a wholeand of the role of other actors in the process. Di�erences were partly misunderstandingscaused by the lack of information concerning the work in other organizations. The documentoutput model helped in correcting the misunderstandings.Document output models were useful also in the DTD design. The user needs analyses

showed that people often need information from several documents created in a process. Hencereferences from a document to other documents created in the same process are useful. Thedocument output model was used in checking whether needed references are included in aDTD. The document input models were also used in the DTD design for planning referencesfrom a document to other documents needed in the same activity. The input model can also beused in specifying requirements for future retrieval systems. It shows the major document typesneeded in the work of the domain.Documentation of the information collected from di�erent components of a document

management environment is important, not only during the document analysis, but alsoafterwards when the analysis is continued by implementation projects or when the analysis isextended to concern new domains. Documentation is necessary also when new people arejoining the project. In the implementation projects, information in the analysis reports can beused to specify the requirements for new systems. Analysis reports of the RASKE project havealso been used for the training of workers. For example, the Finnish participation in the EUlegislative work as a member state is a new activity in the Finnish legislative work. Finlandjoined EU in the beginning of 1995. During the document analysis of the domain in 1996 therewas lot of uncertainty about the documents, roles, and work processes in the area. The analysisreports were found valuable for understanding the process. Two reports were published fromthe analysis of the domain, one less technical by the Parliament (Tiitinen, Salminen &LyytikaÈ inen, 1997) and a larger one by the Ministry of Foreign A�airs (Tiitinen, PaÈ ivaÈ rinta,Salminen & LyytikaÈ inen, 1997). In the training of workers, the descriptions of organizationsand work processes have been as important as descriptions of documents. Ideally, the analysisdocumentation could create a basis for `document management handbooks' supportingcontinuous improvements and training in the area of document management.The success of the organizational framework descriptions and document output models as

descriptive models has been clearly proven during the EULEGIS project where the aim is todevelop new means for the retrieval of legal information in Europe. The intent is to o�er asingle user interface to retrieve legal information from di�erent levels in Europe: EuropeanUnion, a country, a region, or a municipality. During the user needs analysis phase of theEULEGIS project it was found out that people in di�erent European countries need more

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641638

Page 17: Putting documents into their work context in document analysis

information about the legal systems while they are looking for information from legaldatabases. In the proposed EULEGIS system, the user can start the information retrieval byspecifying a legal system, for example, the European Union legal system or the Belgian legalsystem. The user may then ask either an actor view or a process view to the system. Theseviews actually show the legal system by the organizational framework description or thedocument output model, respectively. The graphical models aid users both in understandingthe systems and in choosing appropriate EULEGIS search forms. The EULEGIS user groupshave given a very positive feedback about the graphical interface using the two kinds of modelsfor information retrieval. They have found the graphical interface to be a unique means forlegal information retrieval.

7. Conclusion

Electronic document management environments are often complex networks of documentrepositories, systems, and organizations. The work processes in the environments areparticipated by people working in di�erent organizations in di�erent roles. Documentstandardization is intended to support the information exchange between the people andorganizations, and to improve the management of system changes. In the design of standards,documents should be viewed in their context and di�erent components of the documentmanagement environment should be taken into account. Modelling is a valuable tool formanaging the complicated standardization activities.Document standardization is one of the major business transformation areas of the future.

During the RASKE project we have seen that an inter-organizational documentstandardization covering many document types is an extremely complicated task and may takeseveral years. The standardization is not a single project, but a collection of projects ande�orts demanding methods and techniques for project and information management.Management of the whole standardization process requires more research. An especiallyinteresting and promising area is the utilization of graphical models in the user interface ofinformation retrieval systems.The methods introduced in this paper are developed in public sector SGML standardization

cases, but the methods themselves are general and most of them not tied to the use of SGML.The methods have been tested in an EDM project analyzing document management inindustry. The case analyses concern the document management in a power plant productionand in paper machine production. The methods created in the RASKE project have clearlybeen applicable also in the industrial environments.

Acknowledgements

The helpful cooperation and extensive knowledge of experts in the Finnish Parliament andministries as well as in many other organizations has been extremely valuable. The authorswould also like to thank Tero PaÈ ivaÈ rinta, Pasi TyrvaÈ inen, Kalle Lyytinen and Kalervo JaÈ rvelinfor their help and comments during this work. The ®nancial support of the Finnish Parliament

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641 639

Page 18: Putting documents into their work context in document analysis

and Telematics Application Programme of the European Commission is gratefullyacknowledged.

References

AuramaÈ ki, E., Hirschheim, R., & Lyytinen, K. (1991). Modelling o�ces through discourse analysis: the SAMPO

approach. The Computer Journal, 35(4), 342±352.Avison, D. E., & Fitzgerald, G. (1996). Information Systems Development: Methodologies, Techniques and Tools (2nd

ed.). London: McGraw-Hill.Booch, G., Rumbaugh, J., & Jacobson, I. (1999). The Uni®ed Modeling Language User Guide. Reading (MA):

Addison-Wesley.Braa, K., & Sandahl, T. I. (1998). Approaches to standardization of documents. In T. Wakayama, S. Kannapan, C.

M. Khoong, S. Navanthe, & J. Yates, Information and Process Integration in Enterprises: Rethinking Documents

(pp. 125±142). Norwell (MA): Kluwer Academic Publishers.Bray, T., Paoli, J., & Sperberg-McQueen, C. M. (1998). Extensible Markup Language (XML) 1.0. W3C

Recommendation 10-February-1998. URL: http://www.w3.org/TR/REC-xml (Retrieved March 31, 1999).

Bussler, C. (1995). Policy resolution in work¯ow management systems. Digital Technical Journal, 6(4). URL: http://www.digital.com/info/DTJG02/ DTJG02SC.TXT (Retrieved April 19, 1999).

Curtis, B., Kellner, M. A., & Over, J. (1992). Process Modeling. Communications of the ACM, 35(9), 75±90.Dervin, B. (1992). From the mind's eye of the user: the sense-making qualitative-quantitative methodology. In J. D.

Glazier, & R. R. Powell, Qualitative Research in Information Management (pp. 61±84). Englewood (CO):Libraries Unlimited.

Ellis, C. A. (1979). Information Control Nets: A mathematical model of o�ce information ¯ow. Proceedings of the

Conference on Simulation, Measurement and Modeling of Computer Systems (special issue). ACMSIGMETRICS Performance Evaluation Review, 8(3), 225±238.

Ellis, C. A., & Nutt, G. J. (1980). O�ce information systems and computer science. ACM Computing Surveys, 12(1),

27±60.Ellis, C. A., & Wainer, J. (1994). Goal-based models of collaboration. Collaborative Computing, 1, 61±86.Garvin, D. A. (1998). The processes of organization and management. Sloan Management Review, 39(4), 33±50.

Goldfarb, C. F. (1990). The SGML Handbook. Oxford, UK: Oxford University Press.Kettinger, W. J., Teng, J. T. C., & Guha, S. (1997). Business process change: A study of methodologies, techniques,

and tools. MIS Quarterly, 21(1), 55±79.LyytikaÈ inen, V. (1998). Rakenteisuuden hyoÈ dyntaÈ minen elektronisissa dokumenteissa. SGML-pohjaisen

dokumentaation tutkimus ja kaÈ yttoÈ Suomessa 1997, Teknologiakatsaus 57/98, Tekes.Magnusson SjoÈ berg, C. (1997). Corpus Legis: A Legal Document Management Project. International Journal of Law

and Information Technology, 5(1), 83±99.

Maler, E., & El Andaloussi, J. (1996). Developing SGML DTDs. From text to model to markup. Upper Saddle River(NJ): Prentice Hall.

Malone, T., & Crowston, K. (1994). The interdisciplinary study of coordination. ACM Computing Surveys, 26(1),

87±119.Medina-Mora, R., Winograd, T., Flores, R., & Flores, F. (1992). The action work¯ow approach to work¯ow

management technology. In Proceedings of CSCW 92 Conference (pp. 281±288). ACM Press.Mellor, S. J., & Lang N. (1997). Developing Shlaer-Mellor Models Using UML. Technical paper. URL: http://

www.projtech.com/info/uml.html (Retrieved March 31, 1999).Salminen, A. (2000). Methodology for document analysis, to appear in A. Kent (Ed.) Encyclopedia of Library and

Information Science. New York: Marcel Dekker, Inc.

Salminen, A., Lehtovaara, M., & Kauppinen, K. (1996). Standardization of digital legislative documents, a casestudy. In Proceedings of the 29th Annual Hawaii International Conference on System Sciences (pp. 72±81). IEEEComputer Society Press.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641640

Page 19: Putting documents into their work context in document analysis

Salminen, A., Kauppinen, K., & Lehtovaara, M. (1997). Towards a methodology for document analysis. Journal ofthe American Society for Information Science, 48(7), 644±655.

Salminen, A., Tiitinen, P., & LyytikaÈ inen, V. (1999). Usability evaluation of a structured document archive. InProceedings of the 32nd Annual Hawaii International Conference on System Sciences. IEEE Computer SocietyPress.

Sarin, S. K., Abbott, K. R., & McCarthy, D. R. (1991). A process model and system for supporting collaborativework. Conference on Organizational Computing Systems, SIGOIS Bulletin, 12(2,3), 213±224.

Shlaer, S., & Mellor, S. J. (1992). Object Lifecycles: Modeling the World in States. Englewood Cli�s (NJ): Yourdon

Press.Smith, J. M. (1990). Introduction to CALS: The Strategy and the Standards. Twickenham: Technology Appraisals

Ltd.

Sperberg-McQueen, C. M., & Burnard, L. (1994). Guidelines for electronic text encoding and interchange. Associationfor Computers and the Humanities, ACH. Is also made available by the Electronic Text Center at the Universityof Virginia at URL: http://etext.virginia.edu/TEI.html (Retrieved Sept. 2, 1998).

Sprague, R. H. (1995). Electronic document management: challenges and opportunities for information systems

managers. MIS Quarterly, 19(1), 29±49.The Davenport Group (1998). The maintainers of the DocBook DTD, URL: http://www.oreilly.com/davenport/

(Retrieved Sept. 2, 1998).

Tiitinen, P., Salminen, A., & LyytikaÈ inen, V. (1997). EU-lainsaÈ aÈ daÈ ntoÈ asiakirjat Suomessa. RASKE-projektinraportti, Eduskunnan kanslian julkaisu 1/1997.

Tiitinen, P., PaÈ ivaÈ rinta, T., Salminen, A., & LyytikaÈ inen, V. (1997). Suomalaisten EU-lainsaÈ aÈ daÈ ntoÈ asiakirjojen

rakenteistaminen. RASKE-projektin raportti, Tietohallinnon selvityksiaÈ , UlkoasiainministerioÈ , Tietohallintolinja.Tolvanen, J.-P., & Lyytinen, K. (1994). Modeling information systems in business development, alternative

perspectives on business process re-engineering. In Proceedings of the IFIP TC8 Open Conference on Business

Process Re-Engineering: Information Systems Opportunities and Challenges, 8±11 May, Queensland Gold Coast,Australia (pp. 567±580).

Travis, B., & Waldt, D. (1995). The SGML Implementation Guide. Berlin: Springer-Verlag.UML-1.1 (1997). UML Notation Guide. Version 1.1, 1 September, 1997. URL: http://www.rational.com/uml/

resources/documentation/notation/index.jtmpl (Retrieved March 31, 1999).Watson, B. C., & Shafer, K. (1995). Creating custom SGML DTDs for documentation products. In Proceedings of

the 13th Annual International Conference on Systems Documentation. Emerging from Chaos: Solutions for the

Growing Complexity of Our Jobs (pp. 189±196).Weitz, W. (1998). SGML nets: Integrating document and work¯ow modeling. In Proceedings of the 31st Annual

Hawaii International Conference on System Sciences (pp. 185±194). IEEE Computer Society Press.

A. Salminen et al. / Information Processing and Management 36 (2000) 623±641 641