Intelligent Content Management System Project Presentation

Intelligent Content Management SystemProject Presentation

April 2002

IST-2001-32429ICONSIntelligent Content Management Systemwww.icons.rodan.pl

Project Partners Rodan Systems (PL) The Polish Academy of Sciences (PL) Centro di Ingegneria Economica e Sociale (IT) InfoVide (PL) SchlumbergerSema (BE) University Paris 9 Dauphine (FR) University of Ulster (UK)

Intelligent Content Management SystemProject Presentation

Project name Intelligent Content Management SystemAcronym ICONSWorkpackage WP9Task T9.1Document type reportTitle Intelligent Content Management SystemSubtitle Project PresentationDocument acronym D01Author(s) Witold Staniszkis, Nicola Leone, Pasquale Rullo, Łukasz Balcerek,

Michał Śmiałek, Witold Litwin, Gérard Levy, Jules Georges,Kazimierz Subieta, Mariusz Momotko, Dorota Depowska, JanuszCharczuk, Waldemar Piszczewiat, Yaxin Bi, David Bell

Reviewer(s) Annette Bleeker, Bartosz NowickiAccepting Witold StaniszkisLocation I:\WP9 Project Management\ICONS WP9 T1 D01 0115.docVersion 1.15Date April 2002Status final versionDistribution public

April 2002

http://www.icons.rodan.pl/

Intelligent Content Management System 1.15History of changes April 2002

IST-2001-32429 ICONS Intelligent Content Management System page 3/86

History of changesDate Version Author Change description

6.4.02 1.14 Bartosz Nowicki final packaging4.4.02 1.12 Witold Staniszkis integration of partners’ inputs for chapter 5-930.3.02 1.8 partners’ inputs provided Mariusz Momotko (5.5, 7, 8.3)

Witold Litwin, Gérard Levy (8.1, 8.2)David Bell, Yaxin Bi (5.1-5.4)Nicola Leone, Pasquale Rullo (5.1-5.4)Kazimierz Subieta (5.6)Jules Georges (9.1)Łukasz Balcerek, Michał Śmiałek (9.2)Dorota Depowska, Waldemar Piszczewiat (6)

30.2.02 1.03 Bartosz Nowicki ICONS template applied30.2.02 1.02 Witold Staniszkis further elaboration; work distribution among partners1.2.02 1.01 Witold Staniszkis document creation

Intelligent Content Management System 1.15Executive summary April 2002


Executive summaryThe primary objective of the ICONS Project Presentation report is to provide a baseline platform for all ICONSproject stakeholders representing the consensus of the ICONS consortium members with respect to the ICONSresearch and development strategy. Much effort has gone into interactions among members of the ICONSresearch and development community aiming at reconciliation of diverse views and specialisations in therelevant research realms. We assume that the ensuing research results may require refinements and modificationsof the underlying ICONS assumptions and we plan to reflect them in the ensuing versions of the report. Hence,this report is to “live: status document reflecting the current views of the ICONS consortium.

The initial effort has gone into the Knowledge Management System (KMS) feature requirements analysis inorder to establish compatibility of requirements voiced by the knowledge management community and theprevailing opinions and conclusions of the on-going research work in the IT field. Our motivation has been toverify the ICONS project goals and objectives and possibly to re-orient some of the principal research anddevelopment objectives.

The representative results of the management science research pertaining to intellectual capital and knowledgemanagement have been examined. We have concentrated on the work of the Knowledge ManagementConsortium International [Firestone2000, McElroy1999], the seminal work in the area of learning organisations[Garvin1993] and knowledge modelling [Popper1971, Popper1977], as well as generally accepted views ofNonaka and Takeuchi [Nonaka1995] with respect to knowledge creation and dissemination processes. Theprincipal conclusions are that the current KM needs require IT support for KM processes in order tofacilitate innovation leading to enhanced competitive advantage. A mapping of the KM processes and thedesirable KMS features has been established.

Our findings have been confronted with the prevailing views of the IT research and development communitywith respect to the KMS architecture requirements. We have developed a KMS reference architectureenumerating the desirable KM features to provide a “common denominator” representation of the current ITresearch and development work. Principal results of the on-going European KM projects may be found inEuropean KM Forum web site [KMForum2001]. The principal KMS feature sets include knowledgedissemination features, domain ontology features, content repository features, KMS actor collaboration features,knowledge security features, and content integration features. The KMS features role semantics with respect tothe KM processes have been specified in order to confront the IT community prevailing views with thoserepresented by management scientists. We have established that the referential KMS architecture is sufficientlypowerful to provide significant enabling leverage for the KM field.

The above complementary views on the KM scene provide a solid referential background for the ICONSarchitecture specification providing a backbone for our research and development work. We concentrate ourproject work on three key technological areas, namely on the Knowledge Management Technologies area, theHuman/Computer Interaction (HCI) area, and the Distributed Architecture Technologies area. We furtherdemonstrate that such approach is fully compatible with the stated ICONS project goal and objectives and that itenables us to provide the required technical support for the KMS reference architecture. The complete view ofthe ICONS architecture comprises additional technological areas, auxiliary to our project, namely the ContentManagement Technologies area and the Development Technologies area. The software modules within theauxiliary areas are input into the project, preferably as “open source” or proprietary to consortium partners to besubsequently used and/or modified within the ICONS prototype. The cross-reference between the KMSreferential architecture and the proposed ICONS architecture indicating research and/or development effortneeded shows completeness of the ICONS features with respect to the established requirements.

Knowledge-based features are the important building block of the ICONS architecture therefore a multi-paradigm approach has been proposed. The research work on formal aspects of knowledge representationincluding rules and uncertainty, the Dempster-Shafer theory, and the extended relational model. DisjunctiveDatalog inference engine is to be extended and integrated into the system provides principal knowledge-basedplatform. Procedural knowledge based on workflow specifications is to extent the Workflow ManagementCoalition model with the time modelling features and the CPM (Critical Path Method) modelling capabilities.Such extensions allow for enhanced support for knowledge management processes usually unsuitable for theWfMC-based process modelling approach. We proposed an advanced graphic HCI interface to supportvisualisation and manipulation of structural knowledge comprising semantic nets, UML relationships, andprocess graphs.

Intelligent Content Management System 1.15Executive summary April 2002


The knowledge-based capabilities are to be used in development of the intelligent content integration features tosupport an open ICONS content repository. The ICONS content management functions are to integrate under aunique knowledge map information resources stored internally and those stored in Web information sources, aswell as in the legacy information systems and the heterogeneous databases. A wrapper-based architecture is toestablish the technological base content integration.

The key features of the ICONS workflow management platform are the dynamic workflow participantassignment functions, the dynamic control flow condition modification capabilities, and time modelling features.A knowledge-based support to be used within the workflow management engine is to be developed with the useof the disjunctive Datalog inference engine module. Appropriate extensions to the WfMC model will bedeveloped.

The ICONS distributed processing organisation, providing both for data and processing distribution, is to bebased on the SDDS approach with appropriate extensions to meet the system requirements. Distributedprocessing will be enabled by the load balancing algorithms to be embedded in the ICONS control functions.The workflow process distribution and inter-operability is to be based on the distributed workflowcommunication and synchronisation features to be developed for the ICONS prototype.

ICONS capabilities are to be demonstrated by a knowledge management application to be developed by theproject team as “The NAS Best Practices Portal”. The application development cycle and techniques are tofollow a KMS development methodology to be specified within the ICONS project. A preliminary analysis ofthe state-of-the-art in the area of KMS methodologies shows that, although sound methodological basis exists inthe software engineering area, no generally accepted approach exists in the knowledge management realm.

The conclusions of the report show that the proposed approach to the ICONS project research and developmentwork is compatible with the stated project objectives. The ICONS project activities are covering the followingresearch and development areas: (i) knowledge representation techniques and methodologies for a multimediacontent repository, (ii) advanced graphic user interface design and management tools, (iii) design andimplementation of efficient algorithms for management of large, distributed multimedia content repositories, andan analysis and design methodology for large, knowledge-based content repository systems.

Intelligent Content Management System 1.15Table of contents April 2002


Table of contentsHistory of changes................................................................................................................................................... 3Executive summary ................................................................................................................................................. 4Table of contents ..................................................................................................................................................... 6List of figures .......................................................................................................................................................... 8List of tables ............................................................................................................................................................ 81. Introduction ..................................................................................................................................................... 9

1.1 Objectives ................................................................................................................................................ 91.2 Scope ....................................................................................................................................................... 91.3 Relations to other documents................................................................................................................... 91.4 Intended audience .................................................................................................................................... 91.5 Usage guidelines...................................................................................................................................... 91.6 Notation conventions............................................................................................................................... 9

2. The ICONS Project Goal and Objectives ...................................................................................................... 103. Feature Requirements of a Knowledge Management System ....................................................................... 12

3.1 Knowledge Management: A Framework for User Requirements.......................................................... 123.2 The KMS Reference Architecture ......................................................................................................... 18

3.2.1 Domain Ontology features............................................................................................................. 193.2.2 Content Repository features .......................................................................................................... 213.2.3 Knowledge Dissemination features ............................................................................................... 213.2.4 Content Integration features .......................................................................................................... 223.2.5 Actor Collaboration features.......................................................................................................... 233.2.6 Knowledge Security features ......................................................................................................... 24

4. Architecture of the Intelligent CONtent management System (ICONS)....................................................... 254.1 The ICONS architecture specification................................................................................................... 25

4.1.1 Development Technologies ........................................................................................................... 254.1.2 Content Management Technologies .............................................................................................. 264.1.3 Knowledge Management Technologies......................................................................................... 274.1.4 Human Computer Interaction Technologies.................................................................................. 284.1.5 Distributed Architecture Technologies .......................................................................................... 29

4.2 The ICONS architecture vs. the KMS reference architecture................................................................ 295. The ICONS Knowledge Representation Features ......................................................................................... 33

5.1 Requirements for Knowledge Management (KM) ................................................................................ 335.2 Syntax/Semantics................................................................................................................................... 335.3 Formal foundations of knowledge representation.................................................................................. 35

5.3.1 Rules and uncertainty .................................................................................................................... 355.3.2 Data Representation using Dempster-Shafer theory...................................................................... 355.3.3 Extended relational database model .............................................................................................. 365.3.4 Hyperrelations used for representing mined knowledge................................................................ 365.3.5 Hyperrelations as knowledge representation ................................................................................. 365.3.6 Metadata ........................................................................................................................................ 375.3.7 Sharing data ................................................................................................................................... 37

5.4 Disjunctive Logic Programming............................................................................................................ 385.5 Procedural knowledge representation features ...................................................................................... 435.6 Knowledge representation and manipulation in the graphic user interface ........................................... 45

6. The ICONS Intelligent Content Integration Features .................................................................................... 506.1 The ICONS Global Knowledge Schema ............................................................................................... 506.2 The ICONS Content Repository............................................................................................................ 516.3 Integration of the heterogeneous content sources.................................................................................. 51

7. The ICONS Intelligent Workflow Features................................................................................................... 537.1 Dynamic workflow participant assignment ........................................................................................... 537.2 Dynamic control flow condition definition............................................................................................ 537.3 Time management ................................................................................................................................. 537.4 Task scheduling ..................................................................................................................................... 547.5 Extensions with respect to the WfMC's workflow process meta-model................................................ 54

8. The ICONS Distributed Processing Organisation ......................................................................................... 558.1 The ICONS scalable, distributed architecture ....................................................................................... 558.2 The ICONS distributed processing optimisation and load balancing .................................................... 57

Intelligent Content Management System 1.15Table of contents April 2002


8.3 The ICONS distributed workflow process communication and synchronisation .................................. 589. Demonstration of ICONS prototype capabilities........................................................................................... 60

9.1 The “Newly-associated States Best Practices” Portal............................................................................ 609.1.1 Introduction ................................................................................................................................... 609.1.2 Key Issues for Application Development ...................................................................................... 649.1.3 Key Success Factors ...................................................................................................................... 669.1.4 Remarks......................................................................................................................................... 66

9.2 The Knowledge Management System Design Methodology................................................................. 679.2.1 Approaches to Knowledge Management methodologies............................................................... 679.2.2 Requirements for defining a comprehensive KMS development methodology ............................ 679.2.3 The ICONS Development Methodology ....................................................................................... 70

10. Conclusions ............................................................................................................................................... 7210.1 Compatibility with the stated ICONS project goals and objectives....................................................... 7210.2 Overview of the ICONS project development plan............................................................................... 72

Appendix A. List of workpackages and deliverables ............................................................................................ 76Workpackages ................................................................................................................................................... 76Deliverables list ................................................................................................................................................. 77

Bibliography.......................................................................................................................................................... 78External references ............................................................................................................................................ 78ICONS references.............................................................................................................................................. 84

Dictionary.............................................................................................................................................................. 85

Intelligent Content Management System 1.15List of figures April 2002


List of figuresFigure 1. The scope of KM activities in 423 corporations surveyed by KPMG [KPMG1999]............................. 12Figure 2. The Knowledge Life Cycle (KLC)........................................................................................................ 13Figure 3. Four processes of knowledge conversion [Nonaka1995]....................................................................... 15Figure 4. ICONS taxonomy of knowledge. ........................................................................................................... 18Figure 5. The Knowledge Management System reference architecture. .............................................................. 18Figure 6. The ICONS architecture schematic model. ........................................................................................... 25Figure 7. Treatment relation. ................................................................................................................................. 36Figure 8. A hyperrelation. ..................................................................................................................................... 37Figure 9. Architecture of the GUI module............................................................................................................. 45Figure 10. ICONS GUI module with interfaces to databases. .............................................................................. 47Figure 11. A graph of objects. ............................................................................................................................... 48Figure 12. The idea of the user basket. .................................................................................................................. 48Figure 13. Models of workflow co-operation........................................................................................................ 58Figure 14. Main Concept of ICONS portal for NAS Best Practice. ...................................................................... 63Figure 15. The Knowledge life cycle of the NAS Best Practices Portal. .............................................................. 65

List of tablesTable 1. Cross-reference between the KM processes and the KMS features. ....................................................... 16Table 2. Feature roles within the knowledge management processes. .................................................................. 17Table 3. Feature requirements of a Knowledge Management System. ................................................................. 19Table 4. The ICONS focus technological area modules and the Domain Ontology features cross reference ....... 30Table 5. The ICONS focus technological area modules and the Content Repository features cross reference..... 30Table 6. The ICONS focus technological area modules and the Knowledge Dissemination features cross

reference. ....................................................................................................................................................... 31Table 7. The ICONS focus technological area modules and the Content Integration features cross reference..... 32Table 8. The ICONS focus technological area modules and the Actor Collaboration features cross reference. ... 32Table 9. Checklist of the acquis (chapters in Regular Reports). ........................................................................... 61Table 10. Overview of Phare................................................................................................................................. 62Table 11. Best practice taxonomy. ........................................................................................................................ 63Table 12. Key technological issues for development of the NAS Best Practices Portal. ...................................... 66Table 13. The ICONS project focus technological areas and the project objectives cross-reference.................... 72Table 14. The ICONS focus technological area modules and the research stream workpackages........................ 75

Intelligent Content Management System 1.15Introduction April 2002


1. Introduction

1.1 ObjectivesThe ICONS project presentation represents a refinement of the technical project specification comprised in theICONS project proposal and the ensuing Work Description [ICONS CONRACT] document developed as theaddendum to the research contract with the European Commission. It also reflects the commitments of projectpartners represented in the Consortium Agreement.

The primary objective is to present the current ICONS consortium views on the scope and directions of theresearch and development work specified in the project work description as well as on the methods andtechniques to reach the stated project objectives. It is assumed that the project presentation document reconcilesdiverse approaches to attainment of the project objectives proposed by the project consortium partners andharmonises the initial research work on standards, research and technological terms of reference of the ICONSproject.

Although the preliminary ICONS architecture representing the functional scope of the project has been definedin the Work Description document [ICONS CONTRACT], a flexible approach is adopted to allow for changingviews of the project team members, influenced by the ongoing research and development activities in theknowledge management field.

Hence, the ICONS Project Presentation is to evolve, under the constraints of the project change managementprocedure [ICONS D2], to be published as new versions of the document. Each new version of the projectpresentation is to highlight the important changes with respect to the previous technical approach and the scopeof work. The principal project change management rule indicates, that the scope of the project and thecorresponding ICONS architecture may not be changed without the written consent of ICONS ProjectOfficer representing the European Commission.

1.2 ScopeThe scope of this report covers the entire research and development work currently under way in the ICONSproject.

1.3 Relations to other documentsThis report provides a baseline specification of the principal directions of the research and development work tobe developed within the ICONS project. In this sense the report represent the consensus of the ICONSconsortium members regarding the ICONS architecture and principal features as well as with respect toresponsibilities and development tasks comprised in the project development plan.

All ensuing technical documents to be produced within the ICONS project should not contradict the designdecisions and research assumptions comprised in this report. Should there arise a need to modify the underlyingassumptions of the ICONS project development philosophy, appropriate changes will be applied to this report tobe published as the succeeding version.

1.4 Intended audienceThe intended audience comprises all members of the ICONS project consortium as well as the representatives ofthe European Commission monitoring and evaluating the progress of the project research and development work.

1.5 Usage guidelinesThe contents of the ICONS Project Presentation must be known to and evaluated by all by all members of theproject team. Since the document is to represent the current consensus of the ICONS consortium, it is mandatorythat no important deviations from the presented ICONS architecture and the principal technical directions, asrepresented in the current version of this document, are allowed.

1.6 Notation conventionsNo special notation conventions are used in this report.

Intelligent Content Management System 1.15The ICONS Project Goal and Objectives April 2002


2. The ICONS Project Goal and ObjectivesTurning information into knowledge has been one of the principal goals of advanced information systemsdeveloped in all realms of social and economic life of modern societies. Terms like “knowledge management”,“knowledge engineering” and “knowledge bases” became ubiquitous in corporate board rooms as well as ITdepartments. Easy access to information enabled by the explosion of Internet technologies has created newproblems related to exponentially growing wealth of information sources flooding the information system users.Many advanced information systems are focused on knowledge bases comprising large collections of facts,rules, and heuristics pertaining to a specific application domain. Such knowledge bases are typically divided intotwo principal parts, namely the content base comprising repositories of mutlimedia information objects andontologies representing formal knowledge pertaining to the corresponding application domain.

Our goal is to develop a prototype of an Intelligent CONtent management System (ICONS) supporting auniform, knowledge-based access to distributed information resources available in the form of web pages,pre-existing heterogeneous databases (formatted, text, and multimedia), business process specificationsand operational information, as well as legacy information processing systems.

The principal objectives of our research and development project are to obtain and present novel resultsin the areas of knowledge representation and inference, heterogeneous information integration, and user-friendly interfaces based on advanced information architecture techniques.

The overall approach of the ICONS project is to:(a) provide effective methods for analysing and modelling,(b) develop practical tools for exploiting and using,(c) assess in a pilot system the usefulness of ...an intelligent content management system with advanced knowledge management capabilities integratinginternal content repositories with external heterogeneous information sources.

To achieve these overall objectives four streams of technical work can be identified comprising the aboveoperational goals:

Objective 1: Development of knowledge representation techniques and methodologies for a multimediacontent repository.The following specific research problems must be addressed in order to develop the knowledge representationcapabilities of ICONS:(a) Application of semantic data models (UML) and deductive data base mechanisms as the domain ontology

specification tool.(b) Extraction of knowledge embedded in XML documents and in the associated RDF specifications.(c) Representing knowledge embedded in the schemata of pre-existing heterogeneous databases and legacy

information processing system outputs.(d) Design and implementation of an efficient, non-procedural content management framework providing

content and knowledge model definition and query capabilities.(e) Development of mechanisms for procedural knowledge definition and its further exploitation in the area of

effective knowledge and business processes management.

Results obtained in the above research areas will be embedded in the ICONS prototype and they will be verifiedin the pilot application environment. The principal research approach is to create synergies by integrating knownresearch results in novel configurations and contexts, as well as extending known results in order to meet theidentified new requirements.

Objective 2: Development of user interface design and management tools meeting the requirements of theinformation architecture methodologyThe user interface requirements fall into three distinct areas, namely the user tool set and dialogue model, thecontent presentation model, and the graphical knowledge presentation and manipulation model. All of the abovepresentation models must incorporate personalisation capabilities in order to enable dynamic adjustments tochanging user preferences discerned from the system usage patterns.

Intelligent Content Management System 1.15The ICONS Project Goal and Objectives April 2002


The information architecture methodologies and techniques are considered to be the prime requirements fordesign and implementation of the ICONS user interface management functionality. The multi-disciplinaryresearch involves skills of industrial designers, psychologists, and computer scientists.

The ICONS prototype and pilot application work is to provide a realistic test-bed for the proposed user interfacemanagement techniques.

Objective 3: Design and implementation of efficient algorithms for management of large, distributedmultimedia content repositoriesThere are two dimensions of the ICONS content distribution. The first pertains to distribution of the systemcontent repository comprising the Content Base and the Ontology Base and the hierarchical storage managementprocesses among the ICONS servers. The second concerns integration of external information sources, such aspre-existing heterogeneous databases, legacy information processing systems, and web information resources.Distribution of the ICONS components among the system servers requires efficient load balancing algorithmsinter-operational with the selective content and ontology replication mechanism. Research will also concentrateof adaptive data cashing techniques and the multi-criterial data distribution optimisation.

Integration of the external information resources is to be performed with the use of the XML wrappertechnology. Wrapper programs producing required XML envelopes for extracted data are to be enriched withRDF specifications resulting from extracting semantics from database schemata, in the case of the externaldatabases, or representing semantics, in the case of the legacy information processing system outputs. Thewrapper programs will be generated in the form of Enterprise Java Bean modules comprising the necessaryquery statements.

Objective 4: Develop an analysis and design methodology for large, knowledge-based content repositorysystems.The multimedia content repositories with knowledge representation capabilities require a novel approach to theanalysis and design methodology. An application development life-cycle and the associated methods andtechniques will be specified and a pilot application of ICONS will be developed. The pilot application is to bethe “Best practices of PHARE, SAPARD, and ISPA projects developed within the Newly Associated States”content repository accessible on the Internet. The aim is to present the viability of the proposed methodology andto provide a starting point for the clearly needed knowledge source.

Intelligent Content Management System 1.15Feature Requirements of a Knowledge Management System April 2002


3. Feature Requirements of a Knowledge ManagementSystemOur objective is to confront the contemporary requirements of the fast growing knowledge management fieldwith the current views on the KMS feature architectures as well as with the already existing IT technologypertaining to the KM realm.

3.1 Knowledge Management: A Framework for User RequirementsThe knowledge management field has been growing dynamically fuelled by intensification of the globalcompetition in all principal areas of the world economy. The state of the KM field at the turn of centuries isillustrated by a study of 423 corporations performed by KPMG (KPMG1999). The scope of the KM activities inthe study sample is presented in Figure 1.

34%

KM is currently being implemented 29%

17%

19%1%

KM is currently considered

KM is not currently plannedKM has been abolished

KM is currently in operation

Figure 1. The scope of KM activities in 423 corporations surveyed by KPMG [KPMG1999].

High interest in the field was evident (80% of corporations in some stage of KM activities) at the time of thestudy and judging by the increasing number of trade conferences and exhibitions pertaining to the KM field thediscipline has reached maturity. The principal questions from our point of view, to be discussed in this section,are (i) what is the role of IT as the enabling technology?, and (ii) what extension of the currently availableinformation management platforms is required in order to meet the growing requirements of the KM field?

The second question has been the root of the ICONS project proposal, so the proper identification of the addedvalue for the KM field emerging from the project is of paramount importance to the project consortium. Acritical appraisal of the state-of-the-art of the content management system area, massively claiming to providedirect support for KM, should provide the initial vantage point for evaluation of the ICONS project contribution.We commence with a brief overview of the requirements of the KM field identified in a number of researchstudies performed in the realm of the European KM Forum [KMForum2001]. We also consider views of the USknowledge management research community comprised in the research papers representing the current views ofthe Knowledge Management Consortium International (KMCI) [Firestone2000, McElroy1999] and focusing theKM research and practice in the USA [Garvin1993, Quinn1996, Baek1999, Becker1999, Coleman1999,Davenport1999, Huntington1999].

The common fallacy of the IT side of the KM scene is focusing on the purely technological view of the fieldwith the tendency to highlight features that are already available in advanced content management systems. Suchsystems are commonly referred to as corporate portal platforms or, more to the point, as the knowledge portalplatforms. From the KM perspective, as discussed in [McElroy1999], such claims may be justified only withrespect to a narrow view of the field focusing on distribution of existing knowledge throughout theorganisation. The above views, called by some authors the “First Generation Knowledge Management(FGKM)” or “Supply-side KM”, provides a natural link into the realm of currently used content management



techniques, such as groupware, information indexing and retrieval systems, knowledge repositories, datawarehousing, document management, and imaging systems. We shall briefly refer to existing contentmanagement technologies in the ensuing sections of the report to show that, within the above narrow view, theexisting commercial technologies meet most of the user requirements.

With the growing maturity of the KM field the emerging opinions are that IT support for accelerating theproduction of new knowledge is a much more attractive proposition from the point of view of gaining thecompetitive advantage. Such focus, exemplified in stated feature requirements for so called “Second GenerationKnowledge Management (SGKM)”, is on enhancing the conditions in which innovation and creativity naturallyoccur. This does not mean that such FGKM required features as systems support for knowledge preservation andsharing are to be ignored. A host of new KM concepts, such as knowledge life cycle, knowledge processes,organisational learning and complex adaptive systems (CAS), provide the underlying conceptual base for theSGKM, thus challenging the architects of the new generation Knowledge Management Systems (KMS).

The Knowledge Life Cycle (KLC), developed within the KMCI sponsored research [Firestone2000], provides uswith the high-level feature requirements abstraction to be used as the starting point for evaluation of the ICONSarchitecture. The KLC as proposed by KMCI is presented in Figure 2.

KnowledgeProduction

Knowledge Validation

Knowledge Integration

KnowledgeClaims

OrganizationalKnowledge

•Individual and group interaction•Data/Info acquisition•New knowledge claims•Initial knowledge codification

•Knowledge claim peer review•Application of validation criteria•Weighting of value in practice•Formal knowledge codification

•Knowledge sharing and transfer•Teaching and training•Operationalizing new knowledge•Production of knowledge artifacts

Experiental feedback loop

Figure 2. The Knowledge Life Cycle (KLC).

The concepts underlying the KLC model of knowledge management comprise the notion of a NaturalKnowledge Management System (NKMS) defined in [Firestone2000] as “the on-going, conceptually distinct,persistent, adaptive interaction among intelligent agents:(a) whose interaction properties are not determined by design, but instead emerge from the dynamics of the

enterprise interaction process itself,(b) that produces, maintains, and enhances the knowledge base produced by the interaction”.

The above definition of the knowledge management system fits the notion of a complex adaptive system (CAS)defined as “a goal-directed open system attempting to fit itself to its environment and composed of interactingadaptive agents described in terms of rules applicable with respect to some specified class of environmentalinputs” [Holland1995].

In order to keep compatibility with our project terminology we shall distinguish two classes of actors interactingwithin the KM environment; human beings called employees or knowledge workers, and knowledge-basedcomputer programs called intelligent agents. A thorough discussion of the intelligent agent technology may befound in [Baek1999] while a taxonomy of intelligent agent knowledge-based features is presented in[Huntington1999].

The Knowledge Base (KB) of the system is “the set of remembered data, validated propositions and models(along with metadata related to their testing), refuted propositions and models (along with metadata related totheir refutation), metamodels, and (if the system produces such an artifact) software used for manipulating these,pertaining to the system and produced by it” [Firestone2000].



A knowledge base, not necessarily meant as the IT-related concept, constitutes the principal element of anyknowledge management system and therefore requires a more detailed consideration. There are emergingschools of thought, deviating from the popular definition of knowledge as the “justified, true belief”[Goldman1991] in several important aspects. First of all, the knowledge base is to comprise justifiedknowledge, where justification is specific to the validation criteria used by the system (note, that such validationcriteria may vary from organisation to organisation), and, although the definition is consistent with the idea, thatindividual knowledge is a particular kind of belief, the notion of belief extends beyond cognition alone toevaluation.

The concept of the learning organization, defined in [Garvin1993] as “an organization skilled at creating,acquiring, and transferring knowledge, and at modifying its behaviour to reflect new knowledge and insights”,provides an important context for the KMS feature analysis. Garvin introduces five main activities, acting as thebuilding blocks of a learning organization, namely; “systematic problem solving, experimentation with newapproaches, learning from one’s own experience and past history, learning from experiences and best practicesof others, transferring knowledge quickly and efficiently throughout the organization”.

Attributes of a learning organization, important for management of professional intellect, have been identified in[Quinn1996]. The intellectual capital of an organization comprises such elements as: cognitive knowledge (knowwhat) – the basic mastery of a discipline that professionals achieve through extensive training and certification,advanced skills (know how) – the ability to apply the rules of a discipline to complex real-world problems,systems understanding (know why) – deep knowledge of the web of cause-and-effect relationships underlying adiscipline, and self-motivated creativity (care why) – the will, motivation and adaptability for success.

An important notion discriminating between the content management systems and the knowledge managementsystems is that of the domain ontology defined in [Becker1999] as “an explicit conceptualization modelcomprising objects, their definitions, and relationships among objects”. A well-defined terminology, calledtaxonomy [Letson2001], is used within a particular ontology to describe the classes of objects, their properties,and relationships. Domain ontologies are important elements of knowledge management systems, quite similarto the conceptual schema of the database management model, serving to organize the knowledge of anorganization. Thus, the domain ontology management features of a knowledge management system directlypertain to modelling of knowledge.

We concentrate on two distinct, but compatible, views pertaining to modelling of knowledge, represented by theseminal work of Popper [Popper1971, Popper1977], and by the generally accepted views of Nonaka andTakeuchi [Nonaka1995]. The above results directly relate to the KLC model, thus providing a base for theensuing discussion of feature requirements for a knowledge management system.

Popper’s views the body of knowledge existing in an organisation as three distinct worlds, namely; (a) the firstworld (World 1) made of material entities: things, oceans, towns etc., (b) the second world (World 2) made ofpsychological objects and emergent predispositional attributes of intelligent systems: minds, cognitions, beliefs,perceptions, intentions, evaluations, emotions etc., (c) the third world (World 3) made of abstractions created bythe second world acting upon the first world objects. This approach provides us with a two-tier view ofknowledge:1. Knowledge viewed as a belief is a second world predispositional object. This pertains to such situations,

where individuals, groups of individuals, and organizations, hold beliefs (subjectively considered to be true),that are immediate precursors of their decisions and actions. The predispositional knowledge is “personal”in the sense that other individuals have no direct access to one’s own knowledge in full detail and thereforecan not either “know it” as their own belief, or validate it.

2. Knowledge viewed as validated models, theories, arguments, descriptions, problem statements, etc., is athird world linguistic object. One can talk about the truth, or nearness to the truth of such knowledge,defined as the above third world objects in terms of being closer to truth then those hold by the competitors.This kind of knowledge is not an immediate precursor of decisions and actions, it rather impacts the secondworld beliefs and these, in turn, impact the behaviour of the KMS actors. Such knowledge is objective, inthe sense that it is not agent specific and is shared among agents. The above characteristics bring to theforefront the issue of community validation of the shared knowledge.

Looking at the above two distinct categories of knowledge, we may conclude, that the third world knowledge isthe principal product of a knowledge management system. Whereas the knowledge of the individuals in a socialorganisation is not produced by the system alone, although it may be strongly influenced by interaction with theobjective knowledge represented by the third world abstractions.



Importance of a widely recognized distinction between tacit and explicit knowledge, first introduced by Polonyi[Polonyi1966], is emphasized by the work of Nonaka and Takeuchi [Nonaka1995]. The principal idea is thatknowledge is created by interaction between tacit and explicit knowledge presented schematically in Figure 3.

Note, that the above two knowledge base models are compatible, since the tacit vs. explicit knowledgedistinction corresponds closely to Popper’s subjective (World 2) vs. objective knowledge (World 3) distinction.

Considering the knowledge categorisations and transformations from the organizational knowledge point ofview, constituting the principal knowledge management perspective, we view the following aspects of the modelas crucial from the knowledge creation process perspective:

Figure 3. Four processes of knowledge conversion [Nonaka1995].

1. Transformation from tacit to explicit knowledge. The process corresponds to the externalisationtransformation of Nonaka and Takeuchi and that of abstracting the objective knowledge, or transformationof World 2 beliefs into the World 3 objective knowledge, in Popper’s model. The process corresponds to theknowledge claim formulation in the KLC. However, in view of the KLC model, knowledge claims do notconstitute the “objective knowledge’ until they successfully pass the knowledge validation process. Onlythen the validated knowledge claims become the organisational knowledge, after having been formalisedand edited in the knowledge integration process of the KLC.

2. Transformation from tacit to tacit knowledge. The process corresponds to the socialisationtransformation of Nonaka and Takeuchi as well as to sharing of “personal” knowledge by intelligent agentinteractions implied in Popper’s approach. The process, although does not create “new” organisationalknowledge may be crucial to maintaining and enhancing the competitive advantage of many creativeorganisations (e.g. a software company). This transformation fits into the knowledge production process ofthe KLC.

3. Transformation from explicit to tacit knowledge. The process corresponds to the internalisationtransformation of Nonaka and Takeuchi and to the “impact” of the objective knowledge on the World 2beliefs, and consequently on the organizational decision making process, presented in Popper’s model. Thistransformation matches closely the knowledge operationalization step of the knowledge integration processof the KLC. Although no new knowledge is produced at this stage, the transformation may be veryimportant for highly innovative organizations.

We do not consider the explicit knowledge combination to be relevant to knowledge management, since either amechanical process of external knowledge takes place through some mechanism of information categorisation,or an intelligent agent must be involved in inferring new knowledge from a combination of external knowledgeartifacts. In the latter case, other transformations, namely the internalisation-externalisation path, would have tobe followed.

A distinction must be made at this stage between knowledge management, dealing with the above classes ofstructural and procedural knowledge, and information derived from information systems supporting the dailyoperation of an organisation. Data and results of such information systems are considered, for the sake of our

Socialisation Externalisation

CombinationInternalisation

To

From

Tacit knowledge

Explicit knowledge

Tacitknowledge

Explicitknowledge



KMS feature requirement analysis, to be a representations of Popper’s World 1 entities and their relationshipsand are, therefore considered merely objects of the KMS actors’ activities and decisions. A similar view is takenwith respect to ad hoc or unstructured business processes with flows determined by subjective knowledge of anintelligent agent, rather then by a validated artifact of objective knowledge. An artifact of the objectiveprocedural knowledge may be, for example, a formal workflow definition controlling execution of all processesbelonging to a given class.

The above discussion sets the stage for an analysis of the principal feature requirements pertaining to the distinctknowledge management processes of the KLC and to the characteristics of the knowledge transformationsunderlying the knowledge production process.

Note, that the KMS features are technological categories providing a taxonomy for user functions viewedcollectively as the KMS architecture and, as such, they should be discussed in the context of the knowledgemanagement processes present in the KLC. We relate the KMS features to the knowledge management processesin Table 1.

KLCKMS features

KnowledgeProduction (KP)

KnowledgeValidation (KV)

KnowledgeIntegration (KI)

Domain Ontology (DO) DO-KP DO-KV DO-KIContent Repository (CR) CR-KP CR-KVKnowledge Dissemination (KD) KD-KP KD-KV KD-KIContent Integration (CI) CI-KP CI-KVKnowledge Security (KS) KS-KIActor Collaboration (AC) AC-KP AC-KV AC-KI

Table 1. Cross-reference between the KM processes and the KMS features.

The user functions clustered in the principal KMS features may play varying support roles within the knowledgemanagement processes. Collectively, the sum of user requirements for a given principal feature, defined withinthe distinct knowledge management processes, represents the user requirement set for a given principal KMSfeature. We discuss the support role semantics corresponding to the principal KMS features in Table 2. Theprincipal KMS features serve as the basic building blocks for the reference KMS architecture presented in theensuing section.

Feature role Feature role semanticsDO-KP The domain ontology functionality supports:

1. The externalisation transformation by providing the KMS actor with the means for theinitial knowledge codification during formulation of knowledge claims. Codification isperformed on both declarative and procedural knowledge.

2. Referencing the content artifacts providing supporting evidence or providing the factbase for knowledge inference. The reference information provides a knowledge mapserving as the principal access path to the content repository.

DO-KV The domain ontology functionality supports:1. The formal knowledge codification pertaining to the validated knowledge claims.2. The formal specification of the models and rules supporting the knowledge claim

screening and validation activities, in particular those involving complex networks ofexperts.

DO-KI The domain ontology functionality supports:1. The internalization transformation by providing means to interpret and learn from

objective knowledge as well as to find reference to supporting evidence exemplified inthe real world cases comprised in the content repository.

2. The socialization transformation by providing means to find reference to peer expertiseand work results, including formulation of knowledge claims, thus fostering interactionbetween the KMS actors.

CR-KP The content repository comprises all content artifacts, actual and virtual, that support thedaily operation of an organization. In this sense, the content repository provides the principalplatform of information processing support for the knowledge worker (a KMS actor thatuses and/or produces knowledge) activities. The knowledge map, provided by the KMSdomain ontology, defines the structure and scope of the content repository.



CR-KV The content repository provides the body of supporting evidence as well as thedocumentation means for the knowledge claim validation activities. Information comprisedin the content repository may be used and processed during the normal activities ofknowledge workers and it may be the basis for new knowledge claim formulations.

CR-KI N/AKD-KP The body of organisational knowledge, formally codified in the domain ontology, and

supported by information comprised in the content repository, must be accessible to theknowledge workers in order to influence their subjective beliefs and predispositions (tacitknowledge) and thus to impact their activities and decisions. The quality of systems supportfor this process determines the efficiency of the knowledge externalization transformationfundamental for the knowledge production process.

KD-KV The knowledge claim validation process may heavily depend on the existing body ofinformation, accessible through the content repository, as well as on the already validatedand integrated objective knowledge pertaining to the subject domain. The validation processtypically involves complex, and variable, interactions among experts drawing upondeclarative as well as procedural knowledge. The quality of systems support, as in the caseabove, is of paramount importance to the efficiency of the validation process, which,additionally, must be supported by complex and flexible workflow procedures representingthe procedural knowledge.

KD-KI The dissemination functionality supports the principal facets of the knowledge integrationprocess, namely the knowledge sharing and transfer, as well as teaching and training. Boththe codified objective knowledge and the supporting information must be made available.

CI-KP Information represented in content artifacts may, either be created and retained in thecontent repository, or may be derived from heterogeneous information sources, usuallymaintained by external information systems. The derived content artifacts may be stored inthe repository or they may be materialized on demand by the appropriate interaction with theexternal source. The content integration functionality entails selection and retrieval ofstructured and semi-structured information, homogenization into a common content model,and derivation of semantics into the domain ontology representations.

CI-KV Same semantics as above.CI-KI Same semantics as above.

KS-KP N/AKS-KV N/AKS-KI The organisational knowledge comprised in the KMS, both in the form of the codified

objective knowledge artifacts, and of the supporting information artifacts, represents animportant part of the intellectual capital. Hence the system integrity and privacy must bemaintained.

AC-KP Interaction of knowledge workers is the basis of socialization processes. Interaction may bespontaneous, or it may result from a, more or less formally, specified and supportedprocedure. Automatic support for such interactions may vary from typical groupwarefunctions, such as chat rooms and messaging, to advanced ontology-based workflowprocedures. An important by-product of automatic support may be the possibility to captureoperational metrics characterising the knowledge production process.

AC-KV Knowledge claim validation may entail interactions within a complex network of experts,both internal and external to the organisation, using a variety of information processingenvironments. As in the case above, supporting expert interaction, possibly involving alsointelligent agents, may be a critical success factor of the knowledge claim validationprocesses.

AC-KI Production of the objective knowledge artifacts and of the supporting content, inherent inthe knowledge integration process, may require well-defined editorial procedures. Suchprocedures may typically be supported by automatic workflow management functionality.The requirements may vary from simple groupware-like support to complex, ontology-basedworkflow management environments.

Table 2. Feature roles within the knowledge management processes.

Further analysis of the KMS feature requirements in the context of the knowledge life-cycle, leading todevelopment of Use Case models [Rumbaugh1999] to be used for design and validation of the ICONSarchitecture, is to be performed in the succeeding phases of the ICONS project. We believe that the abovediscussion provides sufficient user requirements context for the ensuing presentation of the KMS reference



architecture. The reference architecture is to provide a beacon for the further unfolding of the research anddevelopment work of the ICONS project.

Within the document we use several types of the knowledge. Figure 4 presents the ICONS knowledge taxonomywhile Dictionary presents their meaning.

Figure 4. ICONS taxonomy of knowledge.

3.2 The KMS Reference ArchitectureThe European KM Forum [KMForum2001, KMForum2001_D11, KMForum2001_D11a, KMForum2001_D12]is an IST project with the goal to collect the current KM practices and to create an almost complete overview ofthe KM domain in Europe. The KMS reference architecture presented in Figure 5 has been developed on thebasis of the current KM technologies discussed in the EKMF project reports, as well as on the KMS featurerequirements identified in the preceding section.

AKnowledge

ManagementSystem

ContentRepository

KnowledgeDissemination

KMS ActorCollaboration

KnowledgeSecurity

ContentIntegration Domain

Ontology

BusinessIntelligence

SystemsData

Bases

Web Pages Files

DocumentManagement

LegacyInformation

Systems

Encryption

AccessControl

Autenthication

Electronicsignature

WorkflowManagement Internet

Intranet

MessageExchange

DiscussionForums

XML RDFFiles

Systems

HSM DBMS

Versioncontrol

Hyper-text

Processgraphs

Conceptualtrees

KnowledgeMap

graphs

FullText Content Object

Properties

KnowledgeEngineering

Semanticnets

Taxonomies

Semantic DataModels

Semanticnets

SDM nets Timemodelling

IntelligentAgents

Knowledge-basedreasoning

Pushtechnology

Rendering

Figure 5. The Knowledge Management System reference architecture.

Table 3 presents the above presented feature requirements of a KMS reference architecture in the tabular form.

knowledge

declarativeknowledge

proceduralknowledge

knowledgemaps

structuralknowledge

knowledge-basedreasoning



Feature requirements of a Knowledge Management SystemDomain

OntologyContent

repositoryKnowledge

DisseminationContent

integrationActor

CollaborationSecurity

Semantic Nets XML Push technology Files Messageexchange

Encryption

Conceptual trees RDF Content objectrepository

Data bases Discussionforums

Access Control

Semantic datamodels

File systems Knowledge mapgraphs

BusinessIntelligence

Knowledgeengineering

Authentication

Process graphs Version control Full text Web pages Workflowmanagement

Electronicsignature

Hyper text DBMS Semantic datamodels net

Legacyinformationsystems

Internet/Intranet

Knowledge-basedreasoning

HSM Semantic nets Intelligent agents

Time modelling Rendering Documentmanagement

Taxonomies

Table 3. Feature requirements of a Knowledge Management System.

The KMS features, grouped into six principal feature sets, represent our current views pertaining to the KMtechnology requirements. Some of the features are already common in the advanced content managementsystems, referred to as the corporate portal platforms, some other are subject to the on-going KMS researchefforts. We discuss each of the principal feature sets in more detail in order to define reference featurerequirements for the ICONS architecture presented in the succeeding section.

3.2.1 Domain Ontology featuresThe Domain Ontology features pertain primarily to knowledge representation including the declarativeknowledge representation features, such as taxonomies, conceptual trees, semantic nets, and semantic datamodels, as well as the procedural knowledge representation features exemplified by the process graphs. Timemodelling and knowledge-based reasoning features pertain both to the declarative and the procedural knowledgerepresentations. Hyper-text links are considered as a mechanism to create ad hoc relationships between contentartifacts comprised in the repository.

TaxonomiesTaxonomies provide means to categorize information objects stored in the content repository. Categorisationclasses may be arbitrary hierarchical structures grouping information objects selected by the class predicates.Class predicates are defined in the form of queries comprising information object property values or as full textqueries comprising key word and/or phrases. Categorisation classes are not necessarily disjoint.

Dictionaries are a special class of taxonomies, also organized into hierarchical structures, which may compriseany number of categories, usually corresponding to occurring information object property value (e.g. a namedirectory) with the maximum number of categories equal to the cardinality of the property value domain.

Automatic categorisation of information objects may also be based on arbitrary functions defined on objectproperty values and/or content and implemented as an arbitrary analytical algorithm or a knowledge-basedreasoning function. In the latter case, an inference engine provides for the actual categorisation of informationobjects. Analytical algorithms provide for automatic categorisation of formatted data objects, textual objects, aswell as multimedia objects, such as audio, images and video frames.

Taxonomies provide a powerful navigation device for browsing the content repositories, since they usuallyrepresent intuitive semantics of the user information requirements.

Conceptual treesConceptual trees are also a categorisation device used in conjunction with full text queries providing means todefine concepts on the basis of its hierarchical relationships with other concepts, key words, and phrases. Usually



conceptual trees allow for the full text query relevance ranking. This technique allows for easy extension of thedomain ontology terminology with the use of, usually abstract, concepts with arbitrarily rich semantics.

Semantic NetsSemantic networks provide means to represent binary 1:1 relationships, expressed usually as named arcs of adirected graph, where vertices are information objects belonging to any of the information object classes.Normally, the linked object classes are determined by the binary relationship semantics of the correspondingnamed arc. An example of a simple semantic net may be a binary relation Descendants defined as a subset of theCartesian product of the set of Persons.

Semantic nets may be constructed over an arbitrary number of information object classes and binaryrelationships.

Semantic Data ModelsThe Unified Modelling Language (UML) [Rumbaugh1999] is the currently prevailing specification platform forsemantic data models allowing for definition of structural as well as behavioural semantics. Class AssociationDiagrams provide easy to read, intuitive semantics closely matching the mental models of the KMS users. TheUML-based knowledge representation, in order to be useful, must be supplemented with a navigation facilityallowing the user to transverse the network of specified object associations and to view/retrieve thecorresponding object sets.

Hyper-text linksThe hyper-text links support referential link semantics that may exist among the information objects belonging toarbitrary object classes existing in the content repository. The ad hoc character of hyper-text links, usually noschema level information exists, limits their usefulness as a knowledge representation feature. However, they area useful annotation tool to express, possible transient, referential relationships of information objects stored inthe content repository.

Time modellingTime represented in domain ontologies, as well as in the content repository, conveys important information.Time valued properties may be important elements of search and automatic categorisation operations. Hence,formal representation of time is of paramount importance for knowledge descriptions and contentcharacterization. Problems that exist today are related to the lack of standard representation of time instances andperiods, incompatible time scales, granularities as well as periodicity definitions. Precise rules must beestablished as to representation and treatment of temporal properties to be comprised in a knowledgemanagement system.

Time modelling is also an important element of the procedural knowledge representation. CPM-like (CriticalPath Method) have been proposed for representation of time constraints and for optimisation of processexecution times in advanced workflow management systems.

Knowledge-based reasoningKnowledge-based (k-b) reasoning systems may be built for a wide range of decision-making problems. Thereasoning is based on a collection of facts, usually represented by content property values, and heuristicsrepresented as rules. The prevailing paradigms are production rules (forward and backward chaining), logicprogramming, and neural nets (reasoning about quantitative data). The k-b reasoning may be used for expertknowledge representation, knowledge and content categorisation and distribution, as well as for the intelligentagent implementation.

Intelligent workflow management is a new application area for k-b reasoning both for process routing as well asfor the dynamic role modification.

Process graphsBusiness processes are usually represented by process graphs, typically by the Event-Condition Petri Nets or bydirected graphs. Petri Net representation allows for expressing richer process semantics, in particular the pre-andpost-conditions for process activities. The process specification must also be supplemented by the set of roledefinitions, one definition for each process activity, to enable the workflow management engine to properlyassign tasks to KMS actors. The process graph representation should comprise a set of process metrics and,possibly, performance constraints and exception conditions.



3.2.2 Content Repository featuresExtensible Markup Language (XML)Light version, tag-oriented meta-language of SGML standard adapted to the web that provides facilities todescribe and diffuse structured documents through Internet. Also used as the emerging industry standard forexchange of data between information systems as well as for storage and retrieval of complex, multimediaobjects in content repositories.

Resource Description Facility (RDF)Extension of XML used to define complex relationships between documents or data. Popular as the target datastructure for mapping UML semantics into the content repository data models. RDF schema is used as a templateto define annotation in RDF syntax.

File SystemsFile systems are commonly used in multimedia content repositories to serve as containers for large contentobjects represented as files. The use of file systems is a convenient technique for mapping content onto diversehardware storage devices in order to exploit their inherent characteristics. E.g. for permanent non-modifiablestorage of electronic documents an optical storage device may be used. File systems are composed into storagehierarchies usually controlled by the content repository management software.

Hierarchical Storage ManagementThe hierarchical storage management (HSM) functions control allocation of storage space available in ahierarchy of storage devices to large content object files. Such systems are based on a directory of all contentobjects including information pertaining to storage allocation rules and migration predicates. Content objects areautomatically migrated up and down the storage hierarchy, where the top layer is the object-relational databasemanagement system, and the bottom layer may be an optical storage jukebox or a mass storage tape system.Migration predicates usually determine content object residence time at any given storage hierarchy level andserve to fire the storage allocation rules controlling the file migration operations.

Database Management System (DBMS)Object-relational database management systems serve as an implementation platform for the domain ontologymanagement functions and the content management functions. Solution architectures vary, yet a typical usewould be for storage of all KMS directories and control blocks, for representation of the domain ontology datamodel, and for storage of content object files and attributes.

Main memory relational database management systems may also be used to store frequently used ontologystructures as well as to provide a platform for representing data structures representing facts in knowledge-basedreasoning algorithms.

Version controlContent evolves over time. In some cases history of content change is as much important as the content itself.The versioning mechanism allows for transparent identification (incremental revision number) and storage(either full version or increments) of particular versions of content and content object properties. Access schemaspertaining to multiuser access problems is the neighbouring subject.

RenderingContent is held within the repository in a variety of native formats. Therefore the content can also be viewed oredited in the tool that originally created the content. However, a uniform web based browser requires renderingthat facilitates for presenting all of them in a consistent way. Content can be rendered and renditions includeHTML and XML, as well as PDF and other well know formats.

3.2.3 Knowledge Dissemination featuresPush TechnologyPush technologies providing facilities for automatic supply of selected content objects to a predefined group ofrecipients (a role), who are usually the KMS actors (knowledge workers, intelligent agents), are the bestapproach to combat the information glut. The push technologies are strongly correlated with such knowledgerepresentation features as the automatic content categorisation and knowledge-based reasoning.



Content Object PropertiesContent object properties characterize the principal object properties, such as object identifier, origin, author(s),date, etc, as well as provide information, usually in the form of key words, characterizing the content. The lattertype of properties are usually obtain at the object creation (storage) instant through automatic content analysisand categorisation, or through a manual content object description process (e.g. description of an ancientmanuscript image). Either way the content object properties provide a convenient access path for contentrepository queries, taxonomy structure allocations, and for materialisation of content object relationships.

Full TextFull text indexing and retrieval is a classical approach to content management. The full text retrieval techniques,used in conjunction with conceptual trees, are commonly used in automatic categorisation features. Oftencontent object property values are automatically obtained through a full text search-based categorisation process.

Knowledge Map GraphsMulti-level taxonomy trees, semantic nets and content object associations are usually represented as graphs onthe user interface level. This fits nicely with the user mental model of the domain ontology structure and itsrelationships with the underlying content object model. Because of substantial scope and complexity ofknowledge map advanced graph construction and manipulation techniques must be employed to provide therequired ergonomic level of the KMS user interface. The knowledge map graphs are used, usually in a querymode, for navigation within the semantically meaningful structures and for browsing the associated content.

Semantic NetsGraphic representation of semantic nets (SN-graphs), although quite straightforward, must be supplemented bymanipulation functions supporting transversal, SN-graph node visualisation/retrieval, and SN-graph selection(entry). SN-graphs, representing a given semantic net class implementation, may either be materialiseddynamically, or, usually in the case of complex association functions and large scope, may be cached as thepersistent ontology structures. Transient storage and off-line semantic net materialisation techniques may beused to achieve the required KMS performance levels. Note, that the SN-graph navigation typically occurs at thecontent object instance level, where the SN-graph arc represents a 1:1 content object relationship.

Semantic Data Model NetsSDM net graphs (SDM-graph) are envisaged as a representation of the UML graphic conceptual model notation.Hence, content object classes well represent subsets of the corresponding content object instances constrained byclass association used for navigational selection. Hence, navigation, list manipulation, visualisation/retrieval, andSDM structure entry functions are necessary to exploit the rich semantic potential of navigation on the contentobject class level. Note, the as opposed to the SN-graph navigation presented above, the SDM-graph navigationyields subsets of content object instances at each visit at a corresponding SDM-graph node. The only similarityis the SDM-graph selection effected as selection of the entry content object instance (e.g. a particular Personoccurrence).

3.2.4 Content Integration featuresAll entities, regardless of their character (structural, procedural), participating in the content integration processmust be accessible via the knowledge map graph, or via other existing access path to the content repository. Anyof the integrated content objects, constrained by the corresponding descriptions of the content repository schema,may either be physically stored in the repository as a content object (snapshot, re-freshable), or may bedynamically materialised at the reference time. Usage of the above integration modes should be entirelytransparent to the KMS user.

FilesFiles feature among candidates for content integration, due to the widely diffused usage of file systems asrepositories of large, multimedia content objects. Little, or no, analysis of the multimedia objects content, apartfrom the automatic categorisation analysis, is performed during the integration process.

Data BasesHeterogeneous databases are a typical source of data for content integration. Multi-database query andintegration techniques, as well as the homogenization of heterogeneous data models, are the underlyingtechnologies. The most straightforward cases entail querying a single database to materialise the required contentto be further exploited in the KMS context, either as an element of a content object stored in the repository, as avirtual content object materialised on-the-fly.



Business Intelligence SystemsData warehouses and OLAP system deliver relevant knowledge content, that should be integrated into the KMSenvironment. The BIS-generated content may be integrated into repositories as elements of content objects ormay be delivered dynamically.

Legacy Information SystemsSimilarly, the legacy information systems are the source of content that may be relevant to the KMS users.Selected legacy system reports may be accessible as content objects, or their elements, via the KMS contentrepository.

Intelligent AgentsIntelligent agent (IA) technology is a rapidly growing area of research and new application development.Applications of IA technologies in the KMS context are discussed in [Baek1999]. The definition of an intelligentagent proposed by IBM [IBM1995] states that an intelligent agent is “a software entity that carries out some setof operations on behalf of a user or another program with some degree of independence or autonomy, and in sodoing, employs some knowledge or representation of the user’s goals or desires”.

The IA technologies are clearly useful and applicable in the KMS context, meeting two broad functionalities,that of a personal assistant or that of a communicating/collaborating agent. In both roles the intelligent agentsare relevant as knowledge-based support for the content integration features.

Document Management SystemsDocument management systems are a particular class of legacy information systems providing a rich contentinfrastructure directly relevant to the KMS users. Electronic documents and image-based information typicallyintegrated into the KMS content repositories as principal factual knowledge artifacts. Some KMS architecturesthe document management functionalities are subsumed by the KMS features.

Web PagesParadoxically, the genuine knowledge is perfectly hidden in the enormous amount of data volumes that isavailable on web pages. Therefore even more intelligent and flexible mechanism are to be developed in the areaof external knowledge acquisition and, what is even more important, keeping it up-to-date. Interoperability ofsystems and ability to choose the best offered content are of the primary importance.

3.2.5 Actor Collaboration featuresMessage ExchangeInstant messaging relevant to the socialisation process (tacit to tacit knowledge transformation) is an importantvehicle supporting the knowledge production process. Hence, the KMS functionality should provide a platformfor a semi-disciplined exchange of electronic messages that may subsequently be categorised and stored in thecontent repository. Some collaboration metrics, similar to activity measures used in e-learning systems, may alsousefully applied for management of the knowledge production process.

Discussion ForumsDiscussion forums are the electronic equivalent of the water cooler or cafeteria discussions, that have long agobeen discovered as vital knowledge production activities. Again relevant and valuable statements and commentsshould be categorised, stored in the content repository and measures (e.g. attributed to the originating sources).

Knowledge EngineeringKnowledge-based reasoning applications and intelligent agents require analytical support to glean the expertknowledge out of individual (outstanding knowledge workers). The process of obtaining expert knowledge,required to build knowledge-based (or expert) applications, called traditionally knowledge engineering, requiresspecific methodologies and tools for the formal knowledge representation. Such tools may coincide with theknowledge representation paradigms used, both for declarative and procedural knowledge, within a specificKMS environment.

Workflow ManagementThe workflow management technology is an important platform supporting, both the knowledge managementprocesses of the KLC and the business processes of the organizations. In the latter case, application of theworkflow technology provides in-sight into the organization operations that is an important feed back into theknowledge production process. In fact it may be disputed that, in the case of organizations where knowledgemanagement in an explicit management function, the KLC process may be considered to belong to the realm of



business processes. We believe that keeping the above distinction may be advantageous in evaluation of thealternative KMS architectures viewed as the enabling platforms for KLC-driven knowledge managementprocesses.

Distinct workflow management paradigms have been discussed in [Swenson2001, Eder2001, Stader2001]. It hasbeen pointed out that substantially different application requirements pertain to production business processesthat today represent the principal realm of workflow management applications, then to the knowledge worker(called also an information worker) processes, and to the project-oriented activities such as development of anew product. In two latter cases, pertaining directly to the knowledge production processes, a substantiallydifferent workflow management paradigm, then that of the Workflow Management Coalition [WfMC1994], isdesirable. Indeed, it has been shown in [Stader2001] that intelligent, ontology-based workflow managementplatform is required to support development of complex new industrial products.

It is an open question, as to what degree of interaction should be present between the KMS workflowmanagement processes, and the classical workflow management supporting the business processes of anorganisation. It may very well be that, as in the case of the document management technology, the diverseworkflow management paradigms will be reconciled and consequently integrated into the KMS environment.

Internet/IntranetThe web technologies already prevailing in advanced content management systems are paramount to the KMSarchitectures due to several important factors. First of all, application of the web paradigm removes an importantinitial barrier between the user and the KMS functions (premise: all educated people use Internet). Secondly, thecost of ownership, particularly high in large, distributed organizations in the context of complex KMSarchitectures, may be kept under control. Since any useful KMS must constantly scout the content resources tobe integrated that are available on the Net, as well as to publish information relevant to organization’s partnersand customers, the Internet orientation of the system architecture is a must.

3.2.6 Knowledge Security featuresThe relevance of the knowledge security features is as obvious in the case of a KMS, as in the case of anyinformation system with architecture opened to the Internet. As the result any practical KMS must integrate suchsecurity features as electronic signature, encryption, access control and user authentication. Our research is notoriented towards adding value in this particular field and, in fact, the use of security features is identical, as in thecase of other information systems. Hence, we shall not elaborate the subject of knowledge security any further.

Intelligent Content Management System 1.15Architecture of the Intelligent CONtent management System (ICONS) April 2002


4. Architecture of the Intelligent CONtent managementSystem (ICONS)

4.1 The ICONS architecture specificationThe ICONS schematic architecture model is presented in Figure 6. Consistently with the ICONS project goal andobjectives we are aiming at developing a complete ICONS prototype to be demonstrated and verified in arealistic application environment. We propose to adopt an integration strategy combining existing, existing to beexpanded, and newly developed modules to provide building blocks for the ICONS architecture. Such approachallows to keep the ICONS project scope under control and to obtain research and development results addingvalue to the selected technological fields representing the project focus (marked with the thick boarder lines).

Figure 6. The ICONS architecture schematic model.

The project technological areas are discussed in more detail below. We concentrate on the ICONS projectprimary technological areas, providing cursory information, representing our view concerning technologicalenvironment prerequisites of the project, pertaining to the secondary technological areas. We assume that ourresearch efforts will concentrate on ICONS modules that are planned to be developed from scratch, whereas thespecification and development work will also comprise the extension efforts planned for the existing functionalmodules to be adopted

4.1.1 Development TechnologiesDevelopment technologies comprise modules providing basic functionalities and development tools required forweb-oriented software development. All of the modules comprised in this technological area are to be adopted“as is” into the ICONS project.



Since no budget has been planned for acquisition of development software licences, preference will be given to“open source” software tools. Detailed specification of the technological requirements with respect to the ICONSmodules, comprised in the DT area, will be provided in deliverable [ICONS D5].

4.1.2 Content Management TechnologiesThe premise of the ICONS project is not to develop solutions in technological areas, where a mature commercialtechnology already exists. Such approach allows us to realistically plan to achieve the project results on time andbudget. The detailed specification of technological prerequisites will be presented in deliverable [ICONS D5].We present our current views on technological requirements with respect to the CM technological area, in orderto allow for a complete overview of the ICONS architecture to be presented in this section.

One principal requirement, due to the necessity of developing extensions of the Content Management modules,is that all software is to be available in the source version and with the appropriate licence to modify it.

Content Repository ManagerThe Content Repository Manager (CRM) provides an implementation platform for a XML-based object orientedcontent repository, controlled by an enhanced RDF schema, and comprising complex XML objects withembedded multimedia objects. Structure of the repository objects is determined and controlled by the DTDstatements comprised in the RDF schema. Objects respond to methods implemented in Java classes, eachprincipal class corresponding to a XML object class. The object class inheritance is supported.

The embedded multimedia objects are stored as files and their location is managed by the hierarchical storagemanagement functions.

Content Semantic Model ManagerSelected fields of XML objects, as well as the contents of text-oriented multimedia object types, are used forconstruction of auxiliary data structures comprising relational database tables, relational database indices, as wellas full text search engine indices. These auxiliary structures serve to support the representation of contentsemantics with the use of such structural constructs as binary N:M relationships, N-ary N:M relationships withattributes, taxonomy hierarchical trees, and dictionaries. Operations on the auxiliary storage structures areavailable to application programmers creating new content repository objects as the CSMM applicationprogrammer interface (API). All structural semantic constructs are named and are used to reflect the applicationsemantics to be implemented in the Content Repository.

The auxiliary data structures are also used to support property-based selection operations as well as full textsearch operations.

Workflow ManagerThe Workflow Manager supports the web-oriented business processes providing standard access to task lists andprocess execution information via Internet browsers. The process semantics meet the Workflow ManagementCoalition [WfMC1994] requirements with some enhancements in the area of the dynamic role modification(roles are sets of potential candidates to execute a specified task within a business process).

Hierarchical Storage ManagerThe Hierarchical Storage Manager (HSM) provides functionality to manage allocation of storage space, and thesubsequent tracking functions, for the multimedia objects stored in the Content Repository. Hence, the ContentRepository storage space extends from the object relational database (objects stored as BLOBs), through anarbitrary path of file systems, to optical or tape mass storage devices. Object migration is performedautomatically, triggered by pre-specified events, according to migration predicates defined by the ContentRepository administrator.

External Content IntegratorThe external content integration functions accept any schema-compliant XML input, as well as results ofpredefined parametric queries and procedures developed as the Content Manager applications. Such objects arecalled “integration objects” and they are treated as first class objects with respect to taxonomies and structuralsemantics constructs. Integration objects may be materialised and subsequently stored in the repository, usuallytaking a form of a report file, or they may be created dynamically as transient objects in response to the userspecified parameter values.



Role ManagerRoles are subsets of the Content Management System users defined by common access rights and operationpermissions, as well as by execution rights within specified workflow processes. Roles may be defined by rolepredicates or by enumeration and they may be modified on the basis of the processing history.

Content Schema Definition EnvironmentThe Content Schema defines the data model of the Content Repository including both the XML object structureand the auxiliary data model created to represent the content structural semantics, and to facilitate the selectionoperations. The RDF schema is additionally annotated with system-defined tags or tag parameters to assigninternal significance to the selected XML document fields. The XML schema provides also the structuralinformation for generation of Electronic Form processing functions.

4.1.3 Knowledge Management TechnologiesOntology Model ManagerThe Ontology Model (OM) comprises formal knowledge representation pertaining to a particular applicationdomain, hence we interchangeably use the term domain ontology, as declarative knowledge or proceduralknowledge. The declarative knowledge may formally be represented by the structural knowledge representationconstructs, such as SDM relationships or Semantic Net links, or by rules supported by an inference engine. TheOM Manager is to provide functions to create, maintain, and use the knowledge representation structures and tomake those functions available to other KMS modules.

Structural Knowledge NavigatorThe Structural Knowledge Navigator (SKN) is to provide an ontology structure manipulation language, availablein the form of an API to developers of other pertinent ICONS modules, to provide navigation and selectionfacilities supporting the graphic object selection and graph navigation features available to ICONS users on theHCI level. The relationship and object link structures are to be defined in terms of link predicates, so the actualnavigation is based on dynamically materialised object sets.

Content Categorisation EngineContent categorisation of text files and other multimedia objects are gaining increasing importance in knowledgemanagement systems. The current automatic categorisation methods are based on evaluation of property valueswith straightforward SQL-like queries, on full text queries supported by appropriate full text indices constructedon-the-fly by full text search engines. In general the content categorisation engines processing formatted(electronic form) or text data address the problem using algorithms to: (i) select words from text that should beused for indexing, (ii) look for close matches to personal names, company names, product names, or places, (iii)extract data from formatted tables or forms, and (iv) search for words that regularly appear in the same contextand therefore may be related. In the case of image data algorithms already exist, that search image catalogues,provide face matching facilities, fingerprint identification, or medical image analysis. We are looking atcandidate algorithms, open source solutions, or software products to be potentially integrated into the ICONSarchitecture.

Datalog Inference EngineThe Datalog Inference Engine is to be based on the DLV system developed by CIES to be accordingly modifiedand interfaced to the ICONS architecture. DLV is a deductive database system, based on disjunctive logicprogramming, which offers front-ends to several advanced KR formalisms. Disjunctive Datalog combinesdatabases and logic programming. For this reason, DLV can be seen as a logic programming system or as adeductive database system. In order to be consistent with deductive database terminology, the input is separatedinto the extensional database (EDB), which is a collection of facts, and the intensional database (IDB), which isused to deduce facts.

An In-Core relational DBMS is to be used to host the extensional database, to be materialized as a persistent ortransient Content Object comprising the corresponding disjunctive logic programme as one of its methods.Execution of the logic programme on the basis of the EDB structure comprised in the Content Object will bematerialized as the In-Core relational database structure.

Intelligent Workflow ManagerExtending workflow applications beyond the realm of classical production-level business process support intothe realm of knowledge workers’ activities and large project control, require extension of the current workflowengine capabilities. The possible directions point at such WfMC architecture enhancements as application ofknowledge-based techniques, in conjunction with advanced time modelling capabilities, process routing



problems and to optimal workload allocation problems. Workload allocation problems, in conjunction withdevelopment and maintenance of reliable process metrics, may be solved with the use of knowledge-basedtechniques.

Semi-structured Content IntegratorSemi-structured information, such as XML (possibly with RDF annotations) and HTML pages and theassociated multimedia objects usually down-loadable as files, represent a wealth of content, that may be directlyrelevant to a knowledge management system. Such information as competitive content, financial and commercialdata, news reports, etc., should be directly accessible via the KMS content repository. Such objects should beassociated with the repository content via the structural knowledge representations (relationships, links) as wellas though taxonomy trees. Mapping the semi-structured objects into a predefined schema representing thecorresponding content repository object classes may present a serious structure homogenization problem, inparticular in view of the variety of representations used for the same entities in different, highly volatile Webinformation sources.

The knowledge-based wrapper technology may provide one of promising areas for developing solutions of theabove problem. We propose development of a class of intelligent agents, to be called intelligent contentintegrators, to solve the above problem.

Intelligent Agent Development EnvironmentIntelligent agents serving as personal assistants and/or communicating/collaborating agents are an importantKMS technology. A framework for specification and development of knowledge-based agents is to constitute anintegral part of the ICONS knowledge representation architecture. At this point that the logic programmingreasoning features will provide an important ingredient of the knowledge-based IA solution.

4.1.4 Human Computer Interaction TechnologiesHCI Personalisation EngineSound personalisation facilities already exist in advanced Web content management systems, called corporateportal platforms, with some of the software already available in the “open source” form. We plan furtherenhancements of the existing technology principally based on two technical areas: (i) advanced logging facilitiesof the KMS user activities, and (ii) knowledge-based analysis of user activities in conjunction to dynamicprofiling of the user preferences. Personalisation should focus, apart from the preferred layout of the userinterface frames (pages), on assisting the user in exploiting the complex ontology structures.

Electronic Form ManagerElectronic forms (EF) are ubiquitous in content management systems, in particular in the Web contentmanagement area, as means to create, update and search content objects. An outstanding problem, in particularpertaining to the Web-oriented solutions, is specification and enforcement of complex integrity constraints thatmay be enforced on the HCI level. At this point this is the potential area of the EF enhancement research anddevelopment to be undertaken in the ICONS project.

Content Presentation ManagerContent presentation pertains to displaying, usually in the Internet browser, of multimedia content objectscomprised in the KMS content repository. Standard viewer technologies exist, with most of the current contentmanagement systems using products of few global suppliers of the viewer technology. An appropriate interfaceis to be developed to accommodate selected viewer technologies, and no further enhancements are plannedwithin the scope of the project.

Knowledge Map Graph ManagerThe Knowledge Map graphs are primarily composed of multi-level taxonomy trees providing navigational, entrylevel access, to the complex ontology structures combining the taxonomy trees and the structural knowledgegraphs. The problem lies in representing large, nested tree structures in a user-friendly graphical way and inproviding easy to use navigational facilities. The HCI level navigation is to be implemented with the use of theStructural Knowledge Navigator API.

Structural Knowledge Graph ManagerThe structure knowledge graphs, representing the SDM nets and the Semantic Nets, also represent a hardproblem from the point of view of the HCI level presentation and manipulation. The structured knowledge graphnavigation is considered of paramount importance in communicating semantics of, and in providing thenavigational access to, the content repository objects. Intuitive, user –friendly structure navigation, and the result



list manipulation, is the cornerstone of the successful ICONS HCI environment. The HCI level navigation is tobe implemented with the use of the Structural Knowledge Navigator API.

Process Graph ManagerThe Process Graph Manager is to provide the following principal functionalities: (i) graphical design andconsistency checking of the intelligent workflow process graphs, and (ii) to provide a graphic interface formonitoring the state of a particular process instance. All principal process parameters and control data should beaccessible via the graphic interface.

4.1.5 Distributed Architecture TechnologiesLoad Balancing AlgorithmsLoad balancing algorithms should control device/media allocation to the active, i.e. process, elements of thedistributed ICONS architecture. Distribution may include the ICONS functional modules, or selected processesof such modules, as well as the application object classes. The object-oriented architecture of the system, both onthe ICONS and on the application software levels, renders itself well to distribution in the peer-to-peer as well asthe hierarchical computer system architectures. Load balancing is important to system performance, due to thediffused use of the processor-intensive knowledge-based techniques in ICONS modules.

Distribution Optimisation AlgorithmsDistribution optimisation pertains to the static elements of the ICONS architecture, i.e. to control data structures,domain ontology structures, and to the content object structures. Optimisation of the device/media allocation,with the possible replication of the above data structures, may be an important technique for the efficient systemimplementation.

Scalable Distributed Data Structure (SDDS)A SDDS system should provide the principal distribution platform for the selected static elements of the ICONSarchitecture. The system must be adapted to the ICONS requirement, possible to support distribution of the In-Core relational DBMS module.

Distributed Workflow CommunicationCommunication among workflow processes, managed by a common or by different workflow platforms, iscurrently subject to research and standardisation work of the Workflow Management Coalition task groups[Hayes2001]. XML-based messaging protocols are proposed as the means to transfer process information amongheterogeneous platforms. Messaging standards are to be implemented in the ICONS Intelligent WorkflowManager and experimented with in the ICONS distributed architecture environment.

4.2 The ICONS architecture vs. the KMS reference architectureThe goal of the ICONS project is to develop and demonstrate a KMS prototype meeting most of the featurerequirements generally accepted for such systems. We have discussed the KMS reference architecture in section3 in the context of user requirements identified within the principal streams of the knowledge managementresearch. We shall now show, that the proposed ICONS architecture addresses most of the feature requirementsdefined in the KMS reference architecture. We relate the ICONS modules to the KMS reference architecturefeatures in cross-reference tables Table 4 through Table 8, one for each principal feature of the KMS referencearchitecture. We do not discuss the Knowledge Security principal feature, since it clearly lies outside of theproject terms of reference as is considered as a ready-to-use development technology. We only consider theICONS focus technology modules, assuming that all the auxiliary technologies will be used as required andappropriately modified or enhanced as indicated in the ICONS architecture discussed in the preceding section.

ICONS functional modules(Focus Tech. Areas)

Conc.Trees

SemantNets

Taxonomies

TimeModel.

K-Breason.

Hyper-text

Processgraphs

SDM

Knowledge ManagementOntology Model Manager D R D R D R RStructural Knowledge Navigator R D RContent Categorisation Engine D DDatalog Inference Engine RIntelligent Workflow Manager R RSemi-structured Content IntegratorIntelligent Agent Development Environment R R R



Human Computer Interaction (HCI)HCI Personalisation Engine R R R RElectronic Form ManagerContent Presentation ManagerKnowledge Map Graph Manager RStructural Knowledge Graph Manager R RProcess Graph Manager R

Distributed ArchitectureLoad Balancing Algorithms RDistribution Optimisation Algorithms R R RScalable Distributed Data Structure (SDDS) RDistributed Workflow Communication R

R – research workS – specification workD – development work

Note that the work type notations imply the starting point of the project effort. I.e. R means that research work is necessary and theit will be naturally followed, if successful, by the specification (S), and development (D) efforts.

Table 4. The ICONS focus technological area modules and the Domain Ontology features cross reference

All Domain Ontology features are addressed, with the most of the work starting at the research level.Development pertains to enhancements of the adopted content management functionality to be utilized in theproject.


XML RDF DBMS FileSystem

HSM Vers.Contr.

Rendering

Knowledge ManagementOntology Model Manager SStructural Knowledge NavigatorContent Categorisation EngineDatalog Inference EngineIntelligent Workflow ManagerSemi-structured Content Integrator D DIntelligent Agent Development Environment

Human Computer Interaction (HCI)HCI Personalisation EngineElectronic Form Manager D D DContent Presentation Manager D D D DKnowledge Map Graph ManagerStructural Knowledge Graph ManagerProcess Graph Manager

Distributed ArchitectureLoad Balancing AlgorithmsDistribution Optimisation Algorithms RScalable Distributed Data Structure (SDDS)Distributed Workflow Communication



Table 5. The ICONS focus technological area modules and the Content Repository features cross reference

Most of the Content Repository features are outside the ICONS project focus technological area and they are tobe supported by the content management platform to be selected as the base line development environment.



There is some adaptation work to be performed with respect to the existing electronic form management, contentpresentation functions, and version control functions to meet the emerging new requirements of the XML andRDF standards. Research will be performed in the area of hierarchical storage management, where distributionoptimisation algorithms could substantially enhance the HSM functionality and performance.


Seman.Nets

SDMNets

K. MapGraphs

FullText

C.O.Prop.

PushTechn.

Knowledge ManagementOntology Model Manager R D DStructural Knowledge Navigator R RContent Categorisation Engine S SDatalog Inference Engine SIntelligent Workflow ManagerSemi-structured Content IntegratorIntelligent Agent Development Environment R

Human Computer Interaction (HCI)HCI Personalisation EngineElectronic Form ManagerContent Presentation ManagerKnowledge Map Graph Manager RStructural Knowledge Graph Manager R RProcess Graph Manager

Distributed ArchitectureLoad Balancing AlgorithmsDistribution Optimisation AlgorithmsScalable Distributed Data Structure (SDDS)Distributed Workflow Communication



Table 6. The ICONS focus technological area modules and the Knowledge Dissemination features crossreference.

The main thrust of the research effort to be undertaken in the area of Knowledge Dissemination will be directedtowards advanced graphic user interfaces to represent the knowledge map nested taxonomical trees and thestructural knowledge graphs. The remaining work will focus on adaptation of the existing content managementfunctions.


DataBases

Files Doc.Manag.

Intell.Agents

LegacyIS

WebPages

Busin.Intell.Syst.

Knowledge ManagementOntology Model ManagerStructural Knowledge NavigatorContent Categorisation EngineDatalog Inference EngineIntelligent Workflow Manager RSemi-structured Content Integrator R RIntelligent Agent Development Environment S S S R S S

Human Computer Interaction (HCI)HCI Personalisation EngineElectronic Form Manager



Content Presentation ManagerKnowledge Map Graph ManagerStructural Knowledge Graph ManagerProcess Graph Manager

Distributed ArchitectureLoad Balancing Algorithms RDistribution Optimisation AlgorithmsScalable Distributed Data Structure (SDDS)Distributed Workflow Communication



Table 7. The ICONS focus technological area modules and the Content Integration features cross reference

The research and specification work in the area of Content Integration will pertain to semi-structured contentintegration, that may be used to extract information out of the Web, and possibly document management,information resources. Intelligent agent technologies are candidate for the formatted data integration, mainlyfrom pre-existing databases and files, and from legacy information systems or business intelligence systems.


Know.Eng.

Wfk.Manag

Inter.Intranet

MessagExchan

DiscussForum

Knowledge ManagementOntology Model ManagerStructural Knowledge NavigatorContent Categorisation Engine R S SDatalog Inference Engine RIntelligent Workflow Manager RSemi-structured Content Integrator R RIntelligent Agent Development Environment R

Human Computer Interaction (HCI)HCI Personalisation Engine SElectronic Form Manager SContent Presentation ManagerKnowledge Map Graph ManagerStructural Knowledge Graph ManagerProcess Graph Manager

Distributed ArchitectureLoad Balancing Algorithms RDistribution Optimisation AlgorithmsScalable Distributed Data Structure (SDDS)Distributed Workflow Communication R



Table 8. The ICONS focus technological area modules and the Actor Collaboration features cross reference.

The major research interests of the ICONS project in the area of KMS agent collaboration pertain to theintelligent workflow management and to the intelligent agent (IA) technologies. Some enhancement of theexisting content management technologies is planned to provide support for the knowledge engineering features.

Intelligent Content Management System 1.15The ICONS Knowledge Representation Features April 2002


5. The ICONS Knowledge Representation Features

5.1 Requirements for Knowledge Management (KM)Current syntactic approaches to search for information and, in its broadest sense, knowledge, over networks haveproved useful for many applications – most conspicuously in applications using the Internet. However they donot retrieve the semantic content of documents.

Semantics are needed if we wish to retrieve facts and other knowledge. They can be used for shared practicalproblem solving by several agents (computers or people). They support concatenation of knowledge with thatfrom elsewhere and are therefore poorly suited to automated access and analysis.

Knowledge representation (KR) and extraction techniques are at the centre of these knowledge managementrequirements, in particular the value of shared domain definitions and the conceptual reasoning approach, havebeen convincingly presented in [O’Leary 1998]. The activities of acquisition (including content-based retrievalof multi-media knowledge and information, such as images), indexing, filtering, linking, distribution andapplication of knowledge must be supported in ICONS.

To match these requirements, the technical skeleton of ICONS is based on ontologies. Ontologies are“specifications of shared conceptualizations of particular domains”. They support knowledge access, integrationand mediation. The present focus is upon structured and unstructured textually-represented information. But partof our research will be seeking to widen this scope with the ultimate goal being to represent and accessmultimedia information using semantic methods. The basic vision is of a representation and inferencesuperstructure [Fensel, et al, 2000], based on ontologies, over distributed repositories.

Three components of the structure of this layer can be discerned. The first is the provision of a formal semanticsand efficient reasoning support sub-structure. At this level knowledge is described in terms of concepts,interrelationships and roles. The specific ICONS mechanisms for this will be detailed below.

The second sub-structure supplies a rich set of primitives for modelling the Universe of Discernment. No singletechnique is adequate for this. The main ICONS techniques for this is Datalog, and the other methods outlined inthe following sections will be invoked to supplement to this whole necessary.

The third sub-component of the ICONS KM superstructure supports the sharing and co-operative usage ofknowledge.

In practice within ICONS the first 2 components are combined to allow knowledge to be described in adisciplined manner that supports rich modelling of application domains through the use of ontologies. From thisit will be possible to derive classification taxonomies. In the research stream attention will be paid to the need forgrounding of concepts, and knowledge (especially that derived by data mining). The idea is to be able to supportexplanation of the answers supplied to users. However our initial concern is to provide back-bone modellingcapability, and for this reason we focus on Datalog, although other techniques will be used when needed forspecific functionality. As suggested earlier (mined) knowledge has to be shared among compliant applicationsvia an “ontology base’, and used as a “content base”. Hence a commonly accepted representation is required.

5.2 Syntax/SemanticsOne thing above any other distinguishes syntactic and semantic manipulation of information (includingconventional web searching), and it applies to next-generation knowledge management in general. Syntacticmanipulation is geared up to people rather than computers, while semantic manipulation is intended to bringstructure to the meaningful content of pages of information [Berners-Lee 1999]. It suitably represented, it can beinvoked and exploited by application programs.

To build a semantic web, for example, requires• access to structured collections of information• sets of inference rules with which to reason automatically

Sophisticated KR is required. Now, first-generation KR is centralized, although work has been done ondistributed heterogeneous expert systems, for example [Zhang and Bell 1990]. Early systems were also



‘shallow’, in that complied hindsight was recorded rather than deeper principles. A third conspicuous failing offirst-generation KR was the absence of an explicit well-understood representation of uncertainty of knowledge.All three of these deficiencies will be met to varying extents in the ICONS system.

A language that expresses both data and rules is required, and this makes it possible to export rules from any KRsystem to the web. The task of developing such a language has been simplified because much of the informationwe need is of the form

• “An ancestor of a parent is an ancestor”; or• “A truck is a kind of land vehicle which is a kind of vehicle”

Datalog is the obvious choice for this.

Three important technologies are already in existence to help in the endeavour of providing a Data/Ruleslanguage in a web context:

• XML – tags (hidden tabs) can be created to arbitrarily annotate part of pages, and thus structure them.However this gives no meaning, although scripts (programs) can use these in sophisticated ways.

• RDF – expresses meaning via a triple: things, their properties, and their values. For example, “this webpage was authored by D. Bell”. Things/values are each identified by Universal Resource Identifiers(URIs), like URLs, and their properties. They can be added to the syntax by just defining a URI forthem somewhere on the Web.

• Ontologies

As has long been recognised by DDB designers, DBs may use different identifiers or names and structure for asingle concept, so there is a need to discover common meanings. An ontology can be a document or file thatformally defines relations among terms, or more commonly, a taxonomy plus a set of inference rules. Anontology base is a collection of ontologies.

A research stream will be carried out on the content base and how it can cooperate with the ontology base tosupport a range of inference functionality in ICONS.

The goal of this work will be to explore how to capture XML objects (metadata) out of data from external datasources using content models, which will be stored in content repositories, and transfer essential metadata asfacts to the ontology bases for storage, using the formal knowledge representation and manipulation methods.

An ontology base holds domain ontologies, each of which provides a declarative knowledge representation(Datalog and see below) including concepts and semantics which can be exemplified by hierarchicalrelationships (semantics nets). It is not normally directly associated with specific applications. The theories andtechnologies described below will be utilized to implement the ontology bases.

In relation to a particular content model, it may directly pertain to particular applications, which provides ageneric way to represent a range of data sources as XML objects which are metadata information for storage andretrieval of complex multimedia objects in external data sources.

In ICONS, a mechanism will be developed to specify all aspects of the data transfer from content bases toontology bases as required by the ontology base, including the different kind of metadata, such as orders andlocations and relational structures. All of these can be represented as facts, rules, and semantics.

In the ICONS context, a content base is assumed to hold a variety of content models, each model can berepresented as an XML DTD which is associated with external data sources. The content model determines whatdata is extracted and how it is ultimately represented in the XML object.

A content model contains several pieces of information:• The original data structure, in the form of a data element. For example, if we take a data source as a

relational table, in the form of an SQL statement. In this way, we can use the content model to specifythat data should be drawn from more than one relational table.

• The overall structure of the XML DTD. This is in the form of the root element, which, throughattributes, specifies the name of the destination root element and the name of the elements that are torepresent tuples.



• The names and contents of data elements. These are contained in a series of elements. The elementsinclude the name and attribute or content elements. These two elements designate the data that shouldbe added, and, in the case of attributes, what it should be called.

The meaning of XML codes used on web pages can be defined by pointers from the pages to an appropriateontology. More complex applications use ontologies to relate the information on a page to associated knowledgestructures and information rules. The semantic web, in naming every concept simply by a URI, lets usersexpress new concepts with minimum effort. Its unifying language also enables these concepts to be progressivelylinked into a universal web.

5.3 Formal foundations of knowledge representationThe prevailing approach to representing knowledge embodied in existing information resources, in particular inthe web information resources, is by using metadata representing the complex information object relationshipsand in some cases inference rules. A summary and comparative analysis of knowledge management frameworksis presented in [Holsapple 1999].

A knowledge representation approach based on separately defined semantic schemes, usually based on specialpurpose knowledge representation languages, is increasingly gaining importance. An approach based onconceptual graphs has been proposed in [Martin 2000]. Representation of procedural knowledge and specifieddomain knowledge is proposed in [Fensel 1998]. Two separate knowledge representation language for procedure(P-Karl) and logic-based inference knowledge (L-Karl) are proposed. The use of logic as a knowledgerepresentation scheme has also been postulated in [Lambrix 1999]. Conceptual reasoning and the semantic netapproach have been proposed in [Lassila 1998, Martin 2000]. Prototype system implementations and knowledgemanagement application frameworks have been discussed in [Bassiliades 2000, Bouguetaya 2000, Chang 2001,Goeschka 2001, Hammer 1997, Knoblock 1998, Lawrence 2001]. A novel approach of integrating the datamining results into the knowledge representation framework was presented in [Buchner 2000].

We now present the main formalisms to be used for KR in ICONS. The research stream of the project will seekto harmonise the use of these with Datalog methods – both for knowledge acquisition and for knowledge use.

5.3.1 Rules and uncertaintyIn recent years, much emphasis has been placed on the “softness” required to model our imperfect world. Oneaspect of this on which the University of Ulster has been working on for many years (since the ideas of SecondGeneration knowledge representation, e.g. distribution, deep and shallow reasoning (grounding), and uncertaintyfirst appeared) is reasoning under uncertainty, and the implications this has for knowledge representation. Thishas been based on the Dempster Shafer theory of evidence, and we have extended it to general Boolean algebras(instead of merely applying it to subsets or propositions). The hypothesis in that the disjunctive nature if DLPmatches well the disjunction inherent in the relational representation outlined in Section 3.2/3.3. One aim of theICONS project is to include uncertainty in data representations (e.g. relations) and (ultimately, after research)include uncertainty in Datalog representation and use, and in multi-media knowledge representation.

5.3.2 Data Representation using Dempster-Shafer theoryThe Dempster-Shafer Theory of Evidence [Guan and Bell 1991] is a well-accepted basis for reasoning underuncertainty. It has been applied to reasoning using both uncertain rules and uncertain evidence.

A domain (frame of discernment) is a finite set of mutually exclusive and exhaustive values. Let t be a dataobject, ai be an attribute of t, and Dj be the domain of ai (i and j do not have to be equal). An attribute ai is amapping from a set of data objects to a domain Dj ∪ {⊥ } where ⊥ represents an undefined value, and t.ai

represents the mapped value in domain Dj ∪ {⊥ }. The inclusion of ⊥ in the range of ai allows us to handle thespecial case where applying an attribute ai to a data object does not make sense.

One major feature of the conventional relational database model is that every attribute value is atomic. In orderto represent imprecise and uncertain information, we should modify this feature. Instead of a single attributevalue, a set of values should be allowed for the representation of imprecise data. A probability distributionshould be allowed for the representation of uncertain data.

Definition 3.1. For any attribute aj of a data object ti, let Dk denote the domain the attribute maps into, and letmij represent the mass function for attribute aj of data object ti. Then, the attribute value



ti.aj={<d, mij (d) > | d ⊆ Dk ∪ {⊥ }, mij (d) > 0}.

This definition says that a probability distribution of the power set of a domain is allowed in every attribute value(see an example as illustrated in Figure 7). Note that | ti.aj| > 1 implies that ti.aj is uncertain.

5.3.2.1 Patient # 5.3.2.2 Disease006 Heart disease (.90)

Stomach upset (.10)175 Flu (0.25), pneumonia (0.64)

δ (0.11)… …

*/δ represents a full domain of disease and the implication is that 11% of our believe is assigned to ignorance

Figure 7. Treatment relation.

This mechanism also provides a solution to the traditional problem of handling null values in databases. A nullvalue can be naturally handled using a set. The null value is subdivided into three different cases such asunknown, inapplicable, and unknown or inapplicable, denoted by the special strings, respectively. The string nkrepresents the corresponding domain D itself for an attribute. Similarly, na, and nka represent {⊥ } and D ∪ {⊥ },respectively. Refer to [Bell, et al, 1996] for details.

5.3.3 Extended relational database modelIn the conventional relational model, information is represented by set-theoretic relations, which are subsets ofthe Cartesian product of a list of domains D1 × D2 × … × Dn. With the data representation in Definition 3.1,which is a probability distribution on the power set of a domain (a mass function), the definition of a relation ischanged to the following.

Definition 3.2. A relation (or table) T based on D1, D2, …, Dn is defined as T ⊆ G1× G2 × … × Gn × CL whereGi is a set of all the probability distributions on the power set of a domain Di and CL={ [b, p] | b, p ∈ [ 0, 1]; b ≤p}.

Each Gi corresponds to a domain, each element of which can be interpreted as a set of pairs – each being a focalelement and its value for some mass function m. In the set of CL, a pair of value [b, p] is used to represent theconfidence level for each tuple in a relation T. CL will be used also as a system attribute name included in everyrelation. Specifically, b and p represent the bel and pls functions, respectively. For example, in the TreatmentRelation of Figure 7, this could represent a doctor’s opinion, which could, for example, be valued less stronglyconsultant than for a newly qualified and for an experienced practitioner.

It should be noted at this point that the CL (Confidence Level) value is not, in any way, derived from the attributevalue uncertainties. It is an independent measure of the strength of the predicate represented by the tuple.In ICONS, uncertainty will be expressed, again, using “special cases” of conventional relation in standardDBMS, and can be manipulated by supplementary (application) programs.

5.3.4 Hyperrelations used for representing mined knowledgeHyperrelations generalise the database concept of relation, and are particularly useful for representing rulesderived from data mining exercises. There exists a semilattice structure with a (“more inclusive / less inclusive”ordering) in the set of all hypertuples of a domain, where hypertuples generalise traditional tuples from value-based to set-based.

Hyperrelations can represent rules just as decision trees can represent rules. We hypothesise that hyperrelationscan also represent semantic nets, and this will be investigated in the ICONS research stream.

5.3.5 Hyperrelations as knowledge representationThe semilattice structure in hypertuples can be used as a base for a hypothesis space. We take a hypothesis to bea hyperrelation, i.e., a set of hypertuples. A hyperrelation can be interpreted as a disjunction of conjunctions ofdisjunctions of attribute-value pairs. Such a hypothesis space is much more expressive than the conjunction ofattribute-value pairs and the disjunction of conjunction of attribute-value pairs. For a dataset there is a largenumber of hypertuples which are consistent with the data, some of which can be merged (through the semilattice



operation) to form a different consistent hypertuple. By definition, each field in a hypertuple is a set of values.For example, the following table is a hyperrelation where in the first row, the symptom field consists of twoalternative values of “sore throat” or “high temp”.

Symptom DiseaseSore throat ∨ hightemperature

Flu

High Blood Pressure∨ High Cholesterol

Heart disease

… …

Figure 8. A hyperrelation.

In ICONS, we propose to focus on those hypertuples which are consistent with the given data and can not bemerged further - they are said to be maximal. The version space is defined as the set of all these hypertuples,which is clearly a subset of the semilattice. An algorithm exists which is able to construct the version space.Implementation for the ICONS system will include use of conventional relational systems to representhyperrelations, again, as “special cases”. These represent knowledge mined from databases.

5.3.6 MetadataAdditional expressiveness to data content is supplied by relationally-specified metadata. We can store suchuseful information in relational format as a series of tables – e.g., in ADDSIA / MISSION [McClean 2002,McClean 2000] we used categorical table, numerical table, note table, correspondence table, etc. to tackleheterogeneity inherent in multiple data sources. Each table can be represented as a conventional relation and aselection from these table types will be available in ICONS.

Metadata is often described as “data about data”. It has increasingly become recognised over the last fewdecades that such metadata must be encoded alongside data in databases so that it may be used in both a passiveand an active role. We consider metadata as providing contextual and operational knowledge about the data in abroad sense and widen the scope to cover the encoding of general knowledge.

[Grossman, 1996] defines metadata as formatted, structured description elements. Metadata may be used for: (1)documentation (passive), (2) automated support (active). Metadata may contain relevant contextual informationconcerning issues of comparability or elaboration, even interoperability.

More generally, we categorise metadata into the following roles (using database examples again for illustration):1. for data processing, e.g., schema information2. for data access, e.g., locational information3. for data harmonisation and integration in a distributed, heterogeneous environments, e.g., schema

matching4. providing rules concerning the data integrity constraints5. providing contextual information to aid interpretation6. providing information on quality7. providing information on costs.

Agents collaborate within an agency, using metadata concerning processing, access, fusion, rules and context.These can be regarded as forms of knowledge that are utilised by the various agents. Agents compete usingmetadata on quality and costs. Thus rival agencies may offer higher quality, or lower cost, services to the user.

Time representation issues are an important ingredient of the knowledge representation schemes of a wide classof content repositories. Current results in the area of temporal aspects of knowledge management are presentedin [Dyreson 2000, Gregersen 1999]. A pragmatic representation of time will be included in ICONS.

5.3.7 Sharing dataModelling primitives and their semantics together give a very important aspect of an ontology-basedinformation/knowledge exchange language. The syntax of such a language must of course be formulated usingexisting web standards for information representation.



The knowledge representation approach based on introducing tags in HTML and/or XML objects to representthe content semantics has been presented in [Dieng 2000, Ginsburg 1999, Shim 2000]. Prototype systemsolutions based on this approach have been presented in [Corby 1999, Raborijaona 2000]. The disadvantages oftag-based knowledge representation approach have been discussed in [Martin 2000].

In ICONS, XML will be used as a serial syntax definition language for ontology- based information exchange.RDF / RDFS can also do this (encode /exchange/reuse of metadata). The Resource Description Framework(RDF) is the emerging semantic interoperability and knowledge management standard for the web informationresources. The RDF standard has been exhaustively discussed in [Decker 2000a, Decker 2000b, Lassila 2000]. Itprovides a means of adding semantics to a document without making assumptions about its structure. RDF hasthe advantage of providing a standard syntax for writing ontologies, and a standard set of modelling primitives.

RDF schemas (RDFS) provide a basic type schema for RDF. Object oriented concepts such as objects, classes,and properties, can be described. RDF provides a standardised syntax for writing ontologies, and a standard setof object oriented modelling primitives. Therefore, ICONS may offer two syntactical variants: one based onXML schemas and one based on RDF schemas.

5.4 Disjunctive Logic ProgrammingDisjunctive Logic Programming (DLP) is nowadays widely recognized as a valuable tool for knowledgerepresentation and common sense reasoning. DLP is, just like Datalog [Ullman 1989], a deductive databaselanguage, but, as is explained below, it extends Datalog's expressivity by allowing disjunction in the head ofrules. In this way, the conclusion of implications can be indefinite, which create different possible models ofreality, as is shown in the examples below. In general, according to the stable model semantics, a DLP programmay have several alternative models (possibly none), each corresponding to a possible view of the reality. In[Eiter et al. 1997f] it has been shown that, under stable model semantics, DLP has a very high expressive power:it captures the complexity class ∑P

2. This is strictly higher than Datalog's, as it is not always possible to emulatedisjunction through (non-stratified) negative rules.

The use of both disjunction and constraints makes DLP a language well-suited to represent and solve a wideclass of knowledge-based problems, including deductive database queries, incomplete knowledge, classicaloptimisation problems, planning, abduction, etc., in a very simple and natural way.

For the ICONS project we have selected the DLV system as an implementation for DLP.

In the following, we will briefly discuss the characteristics of knowledge representation with DLP, and the kindof problems that it is suited for. Considering the advantages and disadvantages of this approach, we will proposea way of incorporating the DLV system into the ICONS architecture, and we will address the questions andresearch issues that arise from this choice.

Syntax and semanticsIn this section, we provide a formal definition of the syntax of the Disjunctive Logic Programming (DLP). Forfurther background, see [Lobo et al. 1992, Eiter et al. 1997f, Gelfond and Lifschitz 1991]. We also provide ashort informal description of the semantics; for the formal definition, see [Gelfond and Lifschitz 1991].

The main notion in DLP is the rule, which is built from variables, constants, atoms, and literals as follows. Anatom is an expression p(t1, … ,tn), where p is a predicate of arity n and t1, … , tn are constants or variables. Forexample, supervisor(barbara, george) is an atom consisting of the 2-ary predicate supervisor and the twoconstants barbara and george. Similarly, one can use variables X, Y to form atoms like supervisor(X, george)and supervisor(X, Y). Strings starting with lower case letters denote constants and predicates, while stringsstarting with upper case letters denote variables. Such atoms or their negated versions (as in ¬supervisor(X, Y))are called literals. Finally, a rule is a formula of the form

a1 v … v an :- b1, …, bk, not bk+1, …, not bm

where a1,…,an,b1,…,bm are literals and n ≥ 0, m ≥ k ≥ 0. This rule can be read as "the disjunction of a1,…,an isimplied by the conjunction of b1,…, bk and not bk+1, …, not bm". Note that the D in DLP stands for the possibledisjunction (i.e. logical "or") in the rules. Furthermore, we call the disjunction a1 v … v an the head of the ruleand the conjunction b1,…,bk, not bk+1,…,not bm the body.



For example, the rule employee(X) :- supervisor(X, Y) can be read as "if X is a supervisor of some Y, then X is anemployee", and the rule female(X) :- person(X), not male(X) can be read as "if X is a person and not male, then Xis female". Finally, as an example with a disjunction and with an empty body, the rule female(X) v male(X) canbe read as "X is male or female". Note that, when the body is empty, we leave out the implication sign ":-" at theend.

A disjunctive datalog program is a finite set of such rules. From the definition it can be seen that many kinds ofrules are possible, each with different kind of knowledge that is represented by it. When the body is empty andthe rule contains no variables, we call it a fact. Facts are the representation of the intensional database, and thereexists a correspondence between e.g. rows of a relational table and DLP facts. The rules person(barbara) andsupervisor(barbara, george) are examples of facts. In the ICONS project, the translation of relational and otherexternal database data into DLP facts is of great importance.

When the head of a rule is empty, the rule is called an (integrity) constraint, as it expresses a condition of whatshould not occur in the model of reality. For example, the constraint :- male(X), female(X) expresses that any Xcannot be both male and female. Integrity constraints play an important role in database systems.

When the head of a rule is either empty or contains only one literal, the rule is called normal. Normal rulesexpress the definite knowledge of implications, where instantiations of the conditions of the body only lead toeither an contradiction (in the case of a constraint) or to an instantiation of one literal, in other words: to a fact.Consider the example rule employee(X) :- supervisor(X, Y), which using the knowledge supervisor(barbara,george) leads to the sure conclusion that employee(barbara), which is a fact.

A rule that is not normal is called disjunctive, and it expresses indefinite knowledge. The rule boss(X, Y) vboss(Y, X) v equal_worker(X,Y) :- same_team(X, Y) expresses that if X and Y are in the same team, then X is Y'sboss, or vice versa, or they are equal co-workers. That means that given the knowledge same_team(tony, beth)we cannot conclude a new fact, but we have only so-called incomplete knowledge that boss(tony, beth) orboss(beth, tony) or equal_worker(tony, beth).

The DLV system gives for every DLP program (which includes the facts, i.e. the data) zero or more possiblemodels of reality, called answer sets. Informally, a model can be seen as a consistent set of facts, which areinterpreted to be true in that model. An answer set of a DLP program is built up from the constants which appearin the program and it is closed under that program, that is: applying program rules to the facts in the set only leadto facts that are already in that set. Furthermore, answer sets of a DLP program are minimal with respect to setinclusion: that is, there exists no subset model closed under the program. (Note that these descriptions areinformal: more precise definitions can be found in [Gelfond and Lifschitz 1991].)

As an example, the program consisting of the two rules female(X) v male(X) and person(beth) has only twoanswer sets: the model {person(beth), female(beth)} and the model {person(beth), male(beth)}. Note that thereare no answer sets introducing new constants (like {person(beth), female(beth), male(tony)}, because tony hasnot been mentioned in the program). Also note that the model {person(beth), female(beth), male(beth)} is not ananswer set, even though it is consistent and closed under the program (recall that the disjunction female(X) vmale(X) is not exclusive!), because it is a superset of one (in this case both) of the answer sets.

ApplicationsNote that the language of DLP programs is declarative: it is not needed to provide the DLV system with aprocedure of how to find the matching answer sets, it suffices to tell the system what the rules are that it shouldobey. Combined with DLP's high expressivity this allows for a human-understandable description of complexproblems. Compared to standard query languages like SQL, which only handle so-called local queries, DLPgives a much more powerful mechanism, with which it is possible to answer questions about the structure ofrelations, like reachability and 3-colorability, of which we will discuss examples below.

In this section we will show how Disjunctive Logic Programming (DLP) allows us to represent and solve a largevariety of problems in a simple and highly declarative way. In particular, we will concentrate on the followingthree classes of problems: deductive database, incomplete knowledge and search problems.

Deductive databaseA typical deductive database query (inexpressible in SQL) is the transitive closure of a (binary) relation. As anexample, consider the classical reachability problem: given a directed graph G, determine all pairs of nodes (a,



b) of G such that there is a (directed) path from a to b. When we use edge(a,b) to denote the fact that there is anedge between node a and node b, the encoding of this problem is the following recursive program:

edge(a,b)edge(a,c). . .reach(X,Y) :- edge(X,Y)reach(X,Y) :- edge(X,Z), reach(Z,Y)

In other words, one can reach Y from X if there is an edge from X to Y or if there is an edge from X to anothernode Z, from where Y can be reached. Finding relatives in a family relation defined by the predicate parent is anexample of the usage of the reachability query.

Incomplete KnowledgeBesides database queries, DLP is suitable to represent common sense reasoning. The following is a simpleexample of how DLP enables the treatment of incomplete knowledge.

Consider this situation:

! we’ve seen Michael having a broken arm, but we do not remember which one.! we know that Michael is used to write using his left hand, so Michael is able to write if its left arm is

not broken.

The problem is to decide whether Michael can or can not write. Because of the uncertainty due to our incompleteknowledge about Michael’s arms, we cannot definitely answer. Anyway, we can trace two different sceneries:

• “Michael’s left arm is broken, so he cannot write”.• “Michael’s right arm is broken, so he can write”.

This situation can briefly be represented by the following disjunctive logic program:

PMichael = {r1: left_arm_broken v right_arm_broken. ; r2: can_write :- not left_arm_broken.}

What is represented by PMichael is very intuitive. It has two models: M1 = {left_arm_broken, not

right_arm_broken, not can_write} and M2 = {not left_arm_broken, right_arm_broken, can_write}. M1 e M2 arethe two possibile meanings of the problem, and match the sceneries we wanted to represent.

Note that it is possibile to represent this situation even through a normal logic program (i.e. without disjunction),simply replacing the rule r1 with the two {r’

1: left_arm_broken :- not right_arm_broken ; r’’1:

right_arm_broken :- not left_arm_broken}. It is easy to see that this second variant (with the so-called stratifiednegation instead of the disjunction) makes the program less intuitive.

Search ProblemsAnother class of problems that naturally can be represented and solved by DLP is that of search problems. Tothis end, we show how the Guess&Check paradigm is a suitable technique which supports a highly declarativeproblem representation.

The power of disjunctive rules allows one to uniformly express problems which are even more complex than NPover varying instances of the problem using a fixed program (i.e., a fixed program containing variables that workon any possible input). Given a set FI of facts that specify an instance I of some problem P, a Guess&Checkprogram P for P consists of the following two parts:

Guessing Part: The guessing part G ⊆ P of the program defines the search space, in a way such that answersets of G ∪ FI represent “solution candidates” for I.Checking Part: The checking part C ⊆ P of the program tests whether a solution candidate is in fact asolution, such that the answer sets of G ∪ C ∪ FI represent the solutions for the problem instance I.



In general, we may allow both G and C to be arbitrary collections of rules in the program, and it may depend onthe complexity of the problem which kind of rules are needed to realize these parts (in particular, the checkingpart).

Without imposing restrictions on which rules G and C may contain, in the extreme case we might set G to thefull program and let C be empty, i.e., all checking is moved to the guessing part such that solution candidates arealways solutions. This is certainly not intended. However, in general the generation of the search space may beguarded by some rules, and such rules might be considered more appropriately placed in the guessing part thanin the checking part. We do not pursue this issue any further here, and thus also refrain from giving a formaldefinition of how to separate a program into a guessing and a checking part.

In order to solve a number of problems, however, it is possible to design a natural Guess&Check program inwhich the two parts are clearly identifiable and have a simple structure:

! The guessing part G consists of a disjunctive rule which “guesses” a solution candidate S.! The checking part C consists of integrity constraints which check the admissibility of S, possibly using

auxiliary predicates which are defined by normal stratified1 rules.

Thus, the disjunctive rule defines the search space2, in which rule applications are branching points, while theintegrity constraints prune illegal branches.

As an example which matches this scheme, let us consider the well-known 3-Colorability problem.

3COL: Given a graph G=(V,E) in the input, assign each node one of three colors (say, red, green, or blue)such that adjacent nodes always have different colors.

3-Colorability is a classical NP-complete problem. Assuming that the set of nodes V and the set of edges E arespecified by means of predicates node (which is unary) and edge (binary), respectively, it can be encoded by thefollowing Guess&Check program:

r: col(X,r) v col(X,g) v col(X,b) :- node(X). } Guessc: :- edge(X,Y), col(X,C), col(Y,C). } Check

The rule r nondeterministically guesses color assignments for the nodes in the graph, and the constraint C checksthat these choices are legal, i.e., that no two nodes which are connected by an edge have the same color3.

More precisely, let us suppose that the nodes and edges of the graph G are represented by a set F of facts withpredicates node and edge. Then the (“guessing”) rule r above states that every node is colored either red or greenor blue, while the (“checking”) constraint C forbids the assignment of the same color to two adjacent nodes. Theanswer sets of F ∪ {r} are all possible ways of coloring the graph. Note that minimality of answer setsguarantees that every node has only one color.

If an answer set of F ∪ {r} satisfies the constraint C, then it represents an admissible 3-coloring of the graph.There is in fact a one-to-one correspondence between the solutions of the 3-coloring problem and the answer setsof F ∪ {r,c}. The graph is thus 3-colorable if and only if F ∪ {r,c} has some answer set, and each of the answersets of F ∪ {r,c} represents a (different) legal 3-coloring of G.

The problem 3COL is a popular example of NP-complete problems. We next show that even some harderproblem, which is located at the second level of the polynomial hierarchy, can be encoded in a straightforwardway in DLP. To this end, we consider the following problem Strategic Companies.

1 For a definition of stratification, see [Apt et al. 1988].2 In some cases it would be possible to replace the disjunctive guessing rule by rules with unstratified negation.However, this is not possible in general. Disjunctive rules also have the advantage of being more compact andusually also more natural.3 In this example, we assume that G contains no loops, i.e., edges from a node to itself. Such loops can be easilyhandled by adding X<>Y to the constraint.



STRATCOMP: Given the collection C of companies owned by a holding, together with information aboutthe products each company produces and company control, compute the set of the strategic companies in theholding.

Let us recall from [Cadoli et al. 1997] what a “strategic company” is in this context. Each company in theholding is producing a collection of goods, such that the holding produces a collection of goods G which consistsof all goods produced by its companies. Company control information models that a set of companies D ⊆ Cjointly may have control (e.g., by majority in shares) over another company c ∈ C. (Companies not in C, whichwe do not model here, might have shares in companies as well). The company control information inSTRATCOMP lists records of such control information in terms of “controlling sets” D for “controlled”companies c. Note that, in general, a company might have more than one controlling set, and only non-redundantcontrolling sets (i.e., no proper subset is a controlling set) are recorded then.

Now, some companies should be sold by the holding, while the following two conditions have to be maintained:1. After the transaction, the remaining set of companies C’ ⊂ C still allows one to produce all goods.2. No company is sold which would still be controlled by the holding after the transaction, i.e., if D is a

controlling set for c ∈ C and D ⊆ C’ holds, then also c ∈ C’ holds.

A set C’ ⊆ C is called a strategic set, if it is minimal with respect to inclusion, that is, it satisfies both (1) and (2),and no proper subset of C’ satisfies both (1) and (2). In general, the strategic set is not unique, and multiplesolutions for C’ exist. A company c ∈ C is called strategic, if it belongs to at least one of these strategic sets.

Computing the set of all strategic companies is relevant when companies should be sold, as selling any companywhich is not strategic for sure does not lead to a violation of any of the conditions (1) and (2). This problem isΣP

2-hard in general [Cadoli et al. 1997]; reformulated as a decision problem (“Given a particular company c inthe input, is c strategic?”), it is ΣP

2-complete. To our knowledge, it is the only KR problem from the businessdomain of this complexity that has been considered so far.

We next present a program, which solves the complex problem STRATCOMP in a surprisingly elegant way by afew rules:

r : strat(Y) v strat(Z) :- produced_by(X,Y,Z). } Guesss : strat(W) :- controlled_by(W,X,Y,Z), strat(X), strat(Y), strat(Z). } Constraint

Here strat(X) means that X is strategic, produced_by(X,Y,Z) that product X is produced by companies Y and Z,and controlled_by(W,X,Y,Z) that W is jointly controlled by X,Y and Z. We assume that a set of facts for company,controlled_by and produced_by is part of the input and have adopted the setting from [Cadoli et al. 1997], whereeach product is produced by at most two companies and each company is jointly controlled by at most threeother companies (in this case, the problem is still ΣP

2-hard).

The answer sets of the program together with the encoded facts correspond one-to-one to the strategic sets of theholding. Thus, the set of all strategic companies is given by the set of all companies c for which the fact strat(c)is true under brave reasoning.

In fact, it is possible to encode the same problem with the Guess&Check paradigm, in the same shape as theprevious example. For details about that, see [Eiter et al. 2000].

Strategic Companies is a good example of the kind of complex knowledge a user of a knowledge managementsystem may want to extract from the repository. Along the same line, one could think of personnel allocation andmanagement problems, which could be solved by similarly straightforward programs. Further examples can befound in [Eiter et al. 1997f].

DLV system in the ICONS architectureAs is clear from the above examples, the enhancement of a knowledge management system with DLP techniquesis a major innovation, as a number of complex problems can be solved that are not solvable (and expressible)within existing traditional systems.

Data from the repository can be used within the DLV system by transferring the relational data model into dataas modelled by DLP (that is, facts). There are already existing tools within the DLV system to use SQL queriesto extract the needed data. Still, the incorporation of the DLP techniques within the ICONS project is not entirely



straightforward, as some complexity issues have to be taken into account. Tests of the DLV system show that itsstrength lies in solving complex problems on reasonable amounts of data. Because the system does not have itsown internal DBMS, it does not effectively deal with larger amounts of data. However, within the ICONSsystem the amounts of available and accessible data will be large. For that reason, we seek to change the DLVsystem to a fruitful co-operation with a main memory database system, which would maintain the datamanagement for DLV. To this end, a mapper is to be developed, which selects the needed data from the datarepository and stores it into the MMDB, before invoking the DLV system.

Research issues are:- For what kind of problems is it needed to speed up the DLV handling time by pre-selecting data?- How to select the needed data given a DLP program plus optionally a query on the answer sets?- Given a particular program, can we develop a mapper which selects data with the actual query (or the

constants in the query, as focal points) as parameters?- How can we prove that the data we select give the correct answer, i.e. if they give in all cases the same

answer sets as the full data would have given?- Is it possible to considerably decrease the number of selected data (and hence increase efficiency) without

losing correctness, or will we have to pay a bit more efficiency with a great decrease of correctness?- Is correctness for all problems a discrete notion, or can we think of applications where a scale of correctness

would make sense? Think of optimization problems as the travelling salesperson where we may not beinterested exactly in the most optimal solution (which would take a lot of time), but rather in a fast solutionwhich is, say, at least 90% optimal. Are there straightforward ways to decrease the selected dataconsiderably, while retaining a level of correctness (or optimization) that is "good enough"?

- Would it be possible to have the user choose a level of correctness (e.g. 100% or 90%) which would have asemantics that is easy to understand, also for the user who is not a specialist?

Part of the selection can be done in a quite straightforward way by selecting only the relational tables which arementioned in the program, or by calculating the maximum needed "distance" from focal points, for relations thatare defined in a non-recursive matter. Another part is addressed by ongoing research, like the research on so-called magic sets.

UsageThe integration of DLV system as described above will allow for many different user applications. As thecomplexity of the problems that can be solved with DLP causes the language to be somewhat complicated, itmay be difficult for the incidental user to use it to its full power. This is not a problem in itself; it is the nature ofany computer system that different users will use different powers of the system. In the ICONS system, lessexperienced users can still be offered the possibility of querying via DLV, by means of available help schemas orpre-defined queries. In that way, we can distinguish the following 3 ways of accessing the DLV engine, indecreasing order of required familiarity with the system.

First, there will be the availability of a direct user interface to DLV, where users can construct their own DLPprograms and queries, possibly enhanced with options of keeping track of the individual search history and thesharing of often-used programs with others. Secondly, one can think of a shared library of expert programs thatcan serve as schemas to be edited for individual use by experts or other users. Experts could maintain thislibrary, possibly in co-operation with a database expert. Thirdly, there may be often-used queries that could beready for use without knowledge of DLP. This kind of queries could be implemented at the system installationphase, and could be maintained by local database experts. As an example, one can think of regular dependencychecks, like in fraud checks or testing, which could be executed at regular times (once a week, overnight), or atindividual instances of problems like dividing a set of persons into groups, with several constraints. These kindsof settings can be generalized and made available to people who do not have much knowledge of DLP (yet).

On the other hand, this sliding scale can also be seen as a natural means of education: after having used thestandard queries several times, one may try to edit an expert query. And after having dealt with several expertqueries, one could be ready to write his/her own programs.

5.5 Procedural knowledge representation featuresAs was stated earlier, there are several types of knowledge representations. One of them is procedural knowledgethat defines algorithms how to achieve a given goal. In the context of organisations, such algorithms are calledbusiness processes. A business process defines what units of work, when, and by whom should be performed inorder to achieve a given goal, that is to produce a product or to provide a service. Innovate, efficient and flexiblebusiness processes help an organisation to be competitive and play the leading role on the market.



From repeatability point of view there are two types of business processes: repeatable and non-repeatableprocesses. The former are well-defined and mass processes. Usually the influence of the management on processcontrol is rare. Changes in such process occur seldom and are evolutionary. The latter are requiring high degreeof flexibility and can be well-defined only at the high level of abstraction. They are unique – usually they can beexecuted only once. Changes of such processes occur frequently and can be revolutionary.

Business processes can be supported by computer automation (partially or fully). One of the most popular andeffective tools to support business processes are workflow management systems (WFMSs). In a WFMS anautomatable part of a business process is represented as a workflow definition. According to the WfMC’smeta-model defined in [WfMC2001], the main elements of a workflow definition are:! activities – pieces of work that form logical steps within a workflow process. An activity is performed by

one or more workflow participants;! transitions – a point during the execution of a process instance where one activity completes and the thread

of control passes to another, which starts. A transition can has a condition, which may be evaluated in orderto decide the sequence of activity execution within a workflow process;

! workflow participant – a resource set, resource (specific resource agent), organisational unit (within anorganisational model), role (a function of a human within an organisation), human (a WFMS user) or system(an automatic agent) that performs activities;

! control data - representing the dynamic state of workflow instances and the WFMS (e.g. workflowdefinitions);

! audit data - representing the history of workflow instances execution;! relevant data - used for evaluation of conditional expressions, for instance, expressing transitions or

participant assignments.

WFMSs enable workflows to be designed, executed, monitored and optimised. If a workflow process is executedfor a given case it is called a workflow process instance. Other elements of workflow definition are fullydescribed in the WfMC’s workflow glossary in [WfMC1999]. In the ICONS project, workflow definitions willbe stored as ordinary information objects and treated as a part of organisational knowledge.

Usually, in order to increase the readability of the defined workflows, a workflow definition is modelled in agraphical tool. Such tool helps users in understanding defined processes, and during execution, checking whichactivity(s) of a given process is being performed. In addition such tool is used to simulate and test workflowprocesses before their implementation at customers. In the ICONS projects we are going to use a well-known,commercial workflow modelling tools such as Aris Toolset and iGrafx.

Organisations expect that implementing their business processes as workflow processes in a WFMS can helpthem to produce a product or to provide a service:! of optimal quality,! by optimal period of time,! with optimal resource effort,! at optimal cost.

In this context, optimal means that something is done at expected or the best level that it can be done withrespect to the other factories of the workflow process.

In order to satisfy the above factories, WFMSs should support:! flexibility – a WFMS should be able to adapt dynamic changes that are required during a workflow process

instance(s) execution in order to satisfy the expected criteria. Dynamic changes can apply to all aspects ofworkflow definition such as control flow, workflow participant assignments, and time management.Dynamic workflow modifications, depending on their durability, concern workflow definitions or workflowinstances. A WFMS should use statistical, heuristic and artificial intelligence to modify workflow definitionor workflow process instances. Adaptation of dynamic changes should be done on the basis relevant as wellas control and audit data. Especially for non-repeatable processes, flexibility is very important, since theseprocesses can not be fully specified a priori, at the workflow definition stage.

In the ICONS we would like to implement a method of dynamic modification of control flow presented in[Aalst1999], extend a language for dynamic workflow participant assignments as well as control flowconditions (referred further to as WPAs and CFCs respectively). In order to increase the flexibility of the



defined WPAs and CFCs we would like to use Datalog rules as WPA and CFC functions. The extension tothe WfMC’s definition of WPA has been described in [Momotko2002].

! risk management - the main aim of the risk management is to avoid undesirable situations as well as tominimalise the negative results of those that already occurred. In our opinion the risk management should, atleast, take into consideration such aspects of workflow as time management and task scheduling. The formeris described in detail in the section 7.3 and the latter – in the section 7.4.

As it is stated in [Koloupulos1995] and [Stader2001] the above requirements are not fully supported by thecurrent WFMSs and should be developed in a knowledge-base, or intelligent WFMS. Moreover, it seems that atthe moment the above features of an intelligent WFMC are not well-defined in the appropriate WFMC standards.In the ICONS project we will suggest some extensions to the WfMC standards and to develop a prototype to testpractically their usefulness.

5.6 Knowledge representation and manipulation in the graphic userinterfaceICONS Graphic User Interface (ICONS GUI) is a tool to be used by a Web application developer forvisualisation of user requests and outputs from the ICONS data / knowledge base. ICONS GUI cannot beseparated from other issues related to the general data/knowledge base architecture. Its main role is visualisationof data stored in data/knowledge base. More precisely, it has to deal with visualisation of user requests to adata/knowledge base, together with visualisation of data retrieved from the database as the result of the requests.The interface should also allow some manipulations on the data/knowledge base, for instance, altering, creatingor deleting some data. Hence during ICONS GUI design we must deal with the following issues:

• A data model of a data/knowledge base that a graphic user interface will operate on.

• Stored data structures (presented on the proper level of data independence) that will be searched ormanipulated during requests. The data structures must be designed on the level of algorithmic precision, astheir semantic properties will be directly used by ICONS GUI.

• A user language for data description that will allow the user to have a view what data/knowledge basecontains. This language can be designed on the level of database schema (c.f. CORBA IDL or ODMG ODL)or on the level of business ontology that describes not only structural properties of the data/knowledge base,but also some metadata related to the business domain.

• Some universal API (a query language) that will allow one to make retrievals and manipulations on thedatabase. The API must contain not only specification of a retrieval/manipulation language, but alsospecification of formats that will be returned by retrieval requests. These formats may be (and usually are, cf.ODMG) different from stored data structures (but based on the same notions). Since the results of requestswill be an input to the GUI module, they must also be specified on the level of algorithmic precision.

• ICONS GUI should contain features that will allow an application developer to customise the package to aparticular application. The customisation can concern graphical icons that will be presented for the end user,navigation or browsing paradigms (i.e. additional actions connected with a single navigation act), as well asdatabase views that will simplify the conceptual model of the application.

Figure 9. Architecture of the GUI module.

GUI modulecustomization

GUI DB

API to a database:queries, manipulation requests

graphic API

Application program

API to a database:results of requests



The general GUI architecture, including the context of its use, is presented in Figure 9. The following elementsmust be considered during the development:

• GUI module: it is generic software used by a developer of a Web application to prepare programs makinginteractions of Web end users with the data/knowledge base. To this end the developer has to use thefollowing interfaces:

− Customisation: it means parameterization of the entire GUI, according to some wishes of the developer,e.g. fonts, colours, kinds of icons to be displayed, etc. The customization may also require some (virtualor materialized) database views on the database, which will be used by the developer forconceptualisation of an application.

− Graphic API: the interface makes it possible to activate/deactivate particular graphical widgets (buttons,menus, pictures, input/output text fields, tables, etc.) on the Web end user screen. Graphic API shouldenable presenting various forms of graphs on many levels of detail and with some possibilities ofmanipulations, e.g. changing colors to present the user navigation in the graph.

− API to database: it is used by a developer to write scripts associated with events that can hold onparticular widgets. For example, clicking a button named GetCompanies means issuing a request to adatabase “select * from Company”. API includes facilities to process results of requests received from thedatabase. These facilities are used within an application program prepared by the application developer.The results of the requests are the input to the GUI module.

• GUI DB: it is a database or a file storing customisation information (e.g. a palette of icons) and the currentstate of the interaction with a particular user (e.g. the history of operations, current results of search, views,etc.).

An important feature of the whole interface is genericity, which means flexibility, robustness and independenceon a particular application domain.

The ICONS project architecture assumes multi-paradigm data and knowledge representation and processing. Inparticular, the architecture assumes (more or less explicitly) the following data models and correspondingparadigms:

• “pure” object-oriented model,

• relational or object-relational model,

• XML model including typing facilities such as DTD and XML

• Schema and mapping facilities such as XSL and XSLT,

• RDF model,

• Rodan Portal model,

• Datalog model, semantic network,

• temporal model,

• model for process knowledge such as a workflow model assumed by WfMC,

• perhaps other models that will appear as results of contributions of ICONS participants.

This variety of considered and potential models has led us to the necessity to establish and develop a kind of acanonical data model that will present a “common denominator” of the various other models. As a candidatecanonical model we have chosen a variant of an object-oriented database model in the spirit of ODMG, withsignificant improvements concerning enhancing it with dynamic object roles, cleaning up its semantics,observing principles such as object relativism, total internal identification and orthogonal persistence. The modelwill be quipped with a data query/manipulation API based on a query language SBQL built in the spirit ofODMG OQL, but based on fundamentally new semantic principles known as the Stack-Based Approach (SBA).In Figure 10 we present architecture of the wider context of the ICONS GUI interface, which includes theinterface to canonical model through SBQL and wrappers to databases proprietary to particular data./knowledgerepresentation paradigms.

We plan that more sophisticated mapping of source data structures into canonical objects will be possiblethrough object-oriented virtual views built on top of SBQL queries. In effect, the ICONS GUI will beconceptually and physically isolated from particular solutions concerning representation of data, thus allowing



the developer and user of Web applications to have unified view on heterogeneous data resources that theICONS architecture will deal with. This idea is much influenced by the CORBA IIOP bus, but shifted to higherconceptual level (i.e. the level of a query language).

Figure 10. ICONS GUI module with interfaces to databases.

Navigation in a graph of inter-linked objects is a very attractive searching paradigm, which is so far notsufficiently explored in the context of Web applications. We can distinguish two kinds of such navigation:

• Direct manual browsing in a graph of explicitly presented graphically objects. For instance, we can present onthe screen the graph of connected objects and the user is allowed to move along named edges of this graphaccording his/her wishes. Another example of this kind of searching is navigation in a network of concepts(semantic network), navigation in a network of HTML pages, etc.

• Manual browsing and searching in a graph presenting some data description or conceptual model of storeddata. In contrast to the previous case, where the navigation concerns explicitly visualised objects, in this caseonly some description of objects is visualised, e.g. a UML schema. The user navigates in this schema; theeffect of navigation is retrieval of objects that are of interest to the user.

There are several problems connected with this kind of interfaces:

• Size of end-user screen: usually it is impossible to present a very big and complex graph, hence it must bepresented partly, with zooming facilities, perhaps with 3D views and with hiding details of objects dependingon the mode or stage of searching.

• User awareness: the user can very quickly lose orientation during navigation in a complex graph, thus specialgraphic facilities are necessary to keep him/her aware of current sub-goals or results of the search.

• Combining manual and predicate-based automatic navigation.

• Elliptic queries: for some kinds of navigation it would useful for the user to omit some details of navigation.

In the graph navigation facility we would like to combine manual browsing in a graph of associated objects,selecting objects by predicates, and collecting results in user baskets. The idea is that the user during navigationcollects interesting information within his/her personal baskets. This metaphor is illustrated in Figure 11 andFigure 12.

GUI module

HTML Page Generatoruser requests

customization

GUI DB

Web client

SBQL queries API(canonical model)

CRUD API

Rodan Portal wrapper

RODAN Portal DB

Rodan Portal API

another wrapper

another DB/file

XML/RDF DB wrapper

XML/RDF files

XML/RDF API

graphic API

HTML Page

User requests processor Application program

Object view processorObject query processor

SBQL DB

another API

other APIs



A

A

A

B

B

B

B

C

C

CD

D

x

y

x

y

yx

z

zt

z

t

wv

w

v

Figure 11. A graph of objects.

In Figure 11 we present a graph, where objects (named A, B, C and D) are connected by directed edges x, y, z, t,w, v. As seen, we do not require the names of objects and names of edges to be unique. Objects can storeinformation (attributes and their values) which can be displayed for the user. The user can select starting objectsfor navigation through the following actions:

• Manual choice through clicking and marking proper objects on the screen (e.g. on the basis of their content,which can be optionally displayed).

• Introducing a name of objects and a condition on their contents.

• Taking proper objects from his basket (which has been filled in at previous search).

After selecting initial objects she can navigate in the graph through named edges (selected from the menu).Suppose the user initially selects 1st and the 3rd object A, and then uses edge y. This means that she is moving to2nd and 4th object B. If then she uses edge z, then she is moving to the 2nd object C. Then if she is using edge v,she is moving to both objects D. Objects that are selected during this search we will call marked; other objectsare unmarked. During this process the user is allowed to do any actions, such as marking/unmarking objects,display object, move references to objects to her private basket, etc.

The idea of basket is directly corresponds to the virtual shops metaphor. It has to support user awareness. Abasket is a graphical element with icons representing selected objects. A basket has a unique name. Baskets canbe organized hierarchically (similarly to operating system catalogs). They are persistent structures, i.e. they arestored in the database. In this way a single search can be subdivided into many user sessions.

AA

BB C

D

AA

D

My today search

Figure 12. The idea of the user basket.

Each basket has a name. The user can also assign to the basket some longer description or comment. The contentof the basket can be presented in the 3D graphics. Icons representing particular kinds of objects can be different(they could be the subject of customization). The content of each object in the basket can be displayed. Eachobject in the basket should be supported by the following information (e.g. presented as a table): an iconrepresenting the object, object identifier and name, representation of the object content, date/time of finding theobject, and any string comment (annotation) introduced by the user. Example of the content of a basket isillustrated in the following table.



Icon Id ObjectName

Retrievaldate

Object content Comment

" 23156 Person 02.01.03 John Smith I have checked him yesterday.

" 23456 Person 02.04.05 Mike Brown Smart client!

# 766585 Document 02.08.19 Order 234527 Currently processed, ready in 2 days

! 3453453 Company 02.07.19 Brainstorm Ltd. Our best supplier.

Navigation in a graph of objects could be connected with additional options, in particular, calling applications.For instance, if the navigation concerns semantic network used as an intelligent searching index, then after thesearch within the network the user can display the corresponding objects from the database, or display Word file,go to the Web through a URL, etc.

A similar idea is assumed to navigation in a database schema, but a schema graph is displayed rather than thegraph of objects. The schema graph should correspond to the canonical data model and data description of storeddata according to the model. The graph will be presented as an improved subset of UML class diagrams (orODMG ODL), to make it relevant to data description language assumed for the canonical object model. All otherrules of marking, collection references to objects within baskets and calling applications should be similar as forthe case of navigation within a network of objects described previously.

An unexplored area in graphical querying concerns the paradigm known as Query By Example or Query ByForms (or simply Forms). This paradigm was extremely successful for relational database. We can consider toapply it for object/XML bases. The basic idea of this paradigm is that the system is displaying for the user anempty form based on a data description statement. For instance, it can present an DTD form, wherecorresponding XML values are initially empty. The user is filling in an empty field A in the form with a stringvalue V (and possibly with some additional mark determining the kind of comparison). Then the system is fillingin the rest of the form by values stored in the database, where field A has the value V. This paradigm can easilybe adopted for object or XML databases.

Intelligent Content Management System 1.15The ICONS Intelligent Content Integration Features April 2002


6. The ICONS Intelligent Content Integration FeaturesThe ICONS Content Repository (ICR) is to comprise content objects (CO’s) representing knowledge artifactsstored and manipulated by the ICONS content management functions. The knowledge artifacts directly representresults of intellectual work or may be derived from external information sources, such as information systems,databases and web sites.

The ICONS Global Knowledge Schema (IGKS) is to include partial definitions of the content object datastructures, content object methods, definitions of the content object relationships, as well as the content objecttaxonomies. The ICONS content objects are stored as XML documents conforming to the corresponding XMLschema and comprising un-interpreted binary elements stored as files in a hierarchical memory system. TheIGKS comprises meta-information pertaining to all CO classes represented in the repository regardless of thestorage and access modes used to materialize their values.

An important characteristic of the ICR is the flexible data structure, partially defined in the repository schemathat escapes the traditional database requirement of consistent and complete database schema vs. databaseinstance correspondence. Rather, the IGKS may be treated as a guide for interpreting the structured parts of thecontent objects and for navigation in the CO relationship structures. On the other hand, all CO methods must bedefined and implemented with support of the object model inheritance structure, in order to provide facilities tomanipulate the content object values.

There are two dimensions of the ICONS content distribution. The first pertains to distribution of the systemcontent repository comprising the Content Base and the Ontology Base and the hierarchical storage managementprocesses among the ICONS servers. The second concerns integration of external information sources, such aspre-existing heterogeneous databases, legacy information processing systems, and web information resources.The first case is addressed in chapter 8.

Integration of the external information resources is to be performed with the use of the XML-based wrappertechnology. Wrapper programs producing required XML documents for extracted data serving as containers forfile elements are to be enriched with RDF specifications resulting from extracting semantics from databaseschemata of the external databases, or appending semantic information in the case of the legacy informationprocessing system outputs. The wrapper programs will be generated in the form of Enterprise Java Bean modulesincluding the necessary query statements.

Due to the open nature of the ICR the content integration features are envisaged as natural extensions of the ICRmanagement features and they are discussed below in the context of the repository schema as well as in thecontext of the repository data structure. Finally, the content integration support to be developed within theICONS project is outlined in the final section of this chapter.

6.1 The ICONS Global Knowledge SchemaThe Icons Global Knowledge Schema (IGKS) is to comprise the structural knowledge representation features,including partial specification of CO data structures, definition of CO methods and inheritance hierarchies, andCO relationship bindings, as well as the knowledge map representation features to be developed as multi-leveltaxonomic trees.

The XML schema is to provide the partial specification of the CO data structure representing an arbitrary XMLdocument tree. The leaf nodes may represent unstructured binary objects stored as files in the ICONShierarchical storage structure. The CO class methods are to be defined within Java classes, where a Java classcorresponds to a CO class defined in the XML schema. The inheritance structure is to be specified within theJava classes. We propose that special CO methods called inference methods are defined as a triple <R, M, F>,where R is a set of Datalog rules, M is the materialization algorithm to dynamically create F, and F is therelational data structure representing facts. The inference methods are to be executed by the ICONS inferenceengine based on DLV (Buccafurri1998).

The CO relationship bindings, to represent relationships implementing the structural knowledge meta-information, are to be specified as relationship predicates. The relationship predicates are logical expressionsdefined on CO properties. The CO relationship bindings are to specify binary and n-ary object relationships witharbitrary relationship cardinalities (1:1, 1:N, N:M). It is proposed that all CO relationships represented in the ICR



are materialized dynamically during the corresponding query execution. Appropriate data structures are to bedeveloped to support efficient materialization of CO relationships.

The knowledge map consists of multi-level taxonomic trees representing either the closed taxonomies based on aspecified list of categories, or open taxonomies based on an arbitrary value (values) of CO properties.Taxonomies are defined as logical expressions defined on CO properties and are to be materialized dynamically.We introduce a special class of implicit taxonomies grouping content objects by CO class and CO identifier.Thus, content objects are always accessible by navigation via some taxonomy.

6.2 The ICONS Content RepositoryThe ICONS Content Repository consists of two distinct, strongly inter-related parts, the Content Base and theOntology Base. The Content Base, organized as a hierarchical storage configuration, is to store Content Objectsin the form of XML documents including binary file elements. The Ontology Base is to be organized as arelational database including tables comprising selected XML object properties represented by table attributes.Appropriate relational tables are to be created for each CO class. The table attributes are to used for attribute-based CO selection, or as arguments of relationship binding and taxonomy expressions.

The XML document properties will typically represent meta-information pertaining to the contents of theincluded file elements. Such property values are to be either defined manually, or extracted automatically fromcontents of the file elements. Properties representing structural or taxonomic knowledge will be replicated in theOntology Base. Data redundancy is introduced in order to enable efficient manipulation of meta-information andto avoid complex data mappings during XML object manipulation operations.

Content objects may either by persistent in the ICONS repository or they may be materialized on request duringa repository user session. The life cycle of a persistent content object starts from the object create operation andexpires after an explicit destroy operation. Content objects as well as their file elements may be organized in theform of version trees reflecting content modifications taking place during the object life cycle.

The transient content object classes are to be represented by class templates providing means to specifyproperties of objects to be dynamically materialized during the user session. The object materializationalgorithms must be implemented in the object class methods. Transient objects may be stored in the repositoryfor a specified period between user sessions, either as frames comprising the desired content materializationparameters, or as complete content objects. In the latter case, the content object property values and elementsmay be refreshed at specified interval times.

6.3 Integration of the heterogeneous content sourcesIntegration of heterogeneous, pre-existing databases has been an active research field in 1980ties and early1990ties. A collection of papers comprised in [Hurson1994] provides a good insight into the state of the art inthe area of multidatabase systems. The current research and development efforts have gone in direction ofintegrating the Web information resources, as shown in [Goeschka2001, Hammer1997, Knoblock1998],integration of object-oriented and multimedia databases [Chang2001], and extracting database semantics into aglobal dictionary [Lawrence2001]. Extracting semantic information from text-based information sources hasbeen presented in [Soderland1997].

Integrating information from legacy information processing systems, in particular dealing with results of datamining queries has been discussed in [Buchner2000].

The emerging approach is to represent a common schema of integrated information resources as a XMLrepository and the technique for extracting and representing the underlying semantics is based on construction ofwrappers to encapsulate the heterogeneity in accessing the diverse information sources. Wrappers are softwaremodules that can transform data from a less structured representation into a more structured one. Examples if thewrapper- based solutions may be found in [Hammer1997, Kushmerick1997, Sahuguet1999].

The ICONS architecture provides facilities in the form of standard interfaces to accommodate diverse wrappertechnologies ranging from Java beans including database queries and the required data mapping algorithms, tointelligent agents scanning predefined information sources for the required information. In all cases, we assumethat the required data integration and mapping rules must be specified manually at the ICONS applicationdevelopment time. Typically the integrated data will be stored as the XML content object file element with



semantics determined by the integration and mapping rules. The element meta-information may automatically beextracted and stored as the XML content object properties.

The bulk of our research effort will be directed towards development of knowledge-based wrappers supportingintegration of semi-structured information comprised in XML documents possible enhanced with the RDFsemantic information. The XML technologies are the emerging information exchange standard facilitatinginformation interchange and inter-operability of web-based as well as legacy information systems. Theknowledge-based wrappers will be developed as Datalog programs to be executed by the ICONS DLV module.Similar approach to integration of semi-structured data has been reported in [Baumgartner2001].

Intelligent Content Management System 1.15The ICONS Intelligent Workflow Features April 2002


7. The ICONS Intelligent Workflow Features

7.1 Dynamic workflow participant assignmentAs it is reported in [Momotko2002] a modern WFMS need to adapt dynamic changes. Especially dynamicchanges in WPA are important. Some of the main requirements for WPA declared by WFMSs customers are:• Control and audit data – data on finished or currently executed workflows, for example:

• a person that has the lightest workload or minimal number of tasks to perform,• a workflow participant that started the workflow,• a workflow participant that performed the previous/preceding activity,• a worker that does not have activities that have to be executed by Friday,• a salesman that in the last week performed more than 30 workflows.

• Relevant data – processed data, organisational structure or other data, for example:− a user participant that is defined as a tester of a given system bug,− an employee that is the supervisor of Mr John Bean,− a manager that is the chief of the sales department,− a person that knows Java and XML,− a workflow participant that has the ‘knows English’ role,− a salesman who is responsible to the region of the customer who sent the claim;

• A WPA should be able to express the situation when workflow participants assigned to a given activity areselected ad-hoc, manually during workflow execution;

• A WPA should be able to express organisational and functional structures, in particular user groups thatexists in an organisation;

• A WPA should be able to express the situation when exactly one workflow participant from a selected groupshould perform an activity;

• A WPA should be able to define a workflow participant who will perform an activity if workflowparticipant assignments return inadequate set of workflow participants (e.g. an empty set).

In order to satisfy the above requirements and to assure the high level of flexible WPA, in the ICONS project wewill use the WPAL language to define dynamic WPA presented in [Momotko2002]. The above mentionedapproach proposes an extension of the WfMC’s definition of WPA. Moreover we consider using Datalog rulesas WPA functions and an approach of assigning intelligent agents to activities on the basis of knowledgeavailable from ontologies. This approach has been described in [Jarvis1999].

7.2 Dynamic control flow condition definitionSimilarly, to the notation of WPA, we suggest to define a procedural language to express control flow conditions(CFCs). A control flow condition is a pre or post activity condition and a transition condition. A flow conditionshould be built on relevant as well as control and audit data. It should also use logical operators (AND, OR,NOT) and predefined functions, for example a function to check if the activity of testing a repaired car isnecessary or can be omitted. We consider using Datalog rules as such functions. In addition, there should bepossible to have a library of the flow conditions already defined in order to reuse them. Such feature couldreduce the cost of implementing a new workflow process.

Moreover such approach can express optional activities. The same idea but different implementation is presentedin [Klingemann2000].

7.3 Time managementIn the ICONS project we would like to extend the idea of time management presented by Eder and Panagos in[Eder2001], [Eder1999], and [Eder1997]. In order to represent time information, they defined two basic temporaltypes, namely durations and deadlines. Both durations and deadlines can be defined for individual activities andto the whole workflow process. Duration is a duration time to perform a given activity/process. Duration can beeither calculated from past workflow executions or it can be assigned by specialists based on their experienceand expectations. The most common duration values are minimum, maximum, and average. A deadline

Intelligent Content Management System 1.15The ICONS Intelligent Workflow Features April 2002


corresponds to maximum allowable execution time for an activity/process. Deadlines do not have to be assignedto every activity of a workflow process, but it is beneficial to assign deadlines to all activities.

In our opinion, the above approach to manage time in WFMSs seems to be promising. However, on the basis ofour experience we think that in real workflows also waiting time has to be considered. A waiting time is timebetween placing an activity in a given workflow participant’s task list and the moment when the participantbegins to perform the activity. Especially for workflow participants that have many activities to perform, suchtime can be significant. Waiting time depends at least on the type of performed activity, a workflow participantassigned to the activity, and the number of activities that have to be performed by the participant.

Moreover, since in distributed WFMSs time to transfer control flow between two consecutive activities (i.e.workflow participants that perform those activities) can also be significant, we suggest to consider waiting timeas well as transfer time. Transfer time depends mainly on the quality of communication links between workflowengines.

Users who define a workflow process can not assign waiting and transfer times. They should be calculated frompast/current workflow process executions

7.4 Task schedulingIn order to reduce waiting time we will adopt well-known task scheduling algorithms to WFMS’s requirements.In our opinion, the function to prioritise activities should be flexible, and defined in the context of a givenworkflow process. Such function could use relevant application data as well as control and audit data, forexample information about deadlines and durations, the cost of resources that have to be used to perform a givenactivity, the significance of the activity, etc. For each type of data, an administrator of the workflow processwould be able to define its importance. For example – duration violation – 10%, deadline violation – 30%, theoverdraft of the activity cost – 60%.

7.5 Extensions with respect to the WfMC's workflow process meta-modelIn order to disseminate the described features of an intelligent WFMS the following extensions to the WfMC’sstandards are needed:• introducing the WPAL language to express dynamic workflow participants assignments,• sorting out the language to represent CFCs. Introducing CFC functions and the CFCs reuse mechanism,• representation of a complete model for time management.

Intelligent Content Management System 1.15The ICONS Distributed Processing Organisation April 2002


8. The ICONS Distributed Processing Organisation

8.1 The ICONS scalable, distributed architectureTo reach the practical acceptance, the ICONS goals require especially efficient data storage and processingarchitecture. This condition is difficult although crucial. It prohibited most of ambitious projects with similargoals from becoming more widely used (or used at all). The main prerequisites can be listed as follows:

1. Permanent data volume is large (many GBs). It is continuously growing, because of new knowledge.The current practice shows that the growth rate could easily reach 100 % year.

2. Temporary data can have largely unpredictable volume. Joins, or transitive closures, or more complexrecursive computations often lead to tuple number explosion. Selectivity of these operations may beimpossible to evaluate in practice. Even large temporary files have to be nevertheless accommodated inreal-time and without performance deterioration.

3. Queries have to be processed in a way where response time is as independent of data size as possible.Definitively, this time cannot be a linear function of the file size.

4. Permanent data are highly valuable. They have to be reliably protected against loss and corruption.They have to be also highly available. With the Web available anytime & everywhere, 24/7 access istoday a must.

It becomes well accepted that no traditional centralized architecture can meet such goals, [CACM97],[Gray1996]. The single server CPU capacity, even if it is a multi-CPU one, or an expensive supercomputer, mustbecome overfilled. Likewise, the available RAM storage quickly suffices for a fraction of the data only. Accessto those on disk deteriorates the response time easily by two orders of magnitude. For many GB data sets, diskmay overfill to the next level of the storage hierarchy with a similar performance deterioration ratio. Thenumber of disk units that can be connected is also often rapidly reached in practice, and must be reached in anycase when a scaling data collection should be managed. Sophisticated data operations often use scans, which hasa response time, at any single server, at least linearly dependent of the data size. These were the constraints thatbasically no research or industrial system could successfully overcome till now. Finally, failure of the dataserver, may entirely prohibit the access to data at best, or may cause data destruction at worst. Many folks atWorld Trade Center made a bitter experience of this kind at 9/11/2001.

This state-of-the-art and technological progress brought a new type of architecture, often termed a scalabledistributed architecture (SD-architecture). Today, this framework seems the only one able to fulfil the ICONSgoals and constraints. Our goal is to base ICONS on an SD-architecture.

The keyword distributed in an SD-architecture is basically quite classical. It means that both data and processingare supported by multiple interconnected nodes. It seems reasonable to assume, and in our case is necessary, thatmost of processing nodes are linked by a high-speed network. This is typically assumed to be a local network, a1Gb/s Ethernet most often these days. An important new twist is that the nodes the network should link, aremass-produced. They can be cheaply available computers, workstations, PCs… in this way in large numbers.They also often pre-exist the distributed system to build-up. Finally a node role can be largely alternative: as dataserver, or as the client, or as the application tier…

All together, such configurations, proposed by prominent US researchers already a while ago, e.g., from UCBerkeley, [Culler1994], seem today the most efficient practical approach. If not the only one not utopian formost users, by their unbeatable price-performance ratio. Needless to stress, they have triggered a growinginterest, especially in recent years, at highest decisional levels [President1998]. The literature designated suchconfigurations as multicomputers, or as networks of workstations (NOW) [Culler1994]. More and more often,one is also buzzwording about the peer-to-peer architecture, and most recently, about the grid computing.Finally, IBM is pushing the concept of autonomic architecture [Gibbs2002].

The distributed architecture potentially meets also much better the goals of data reliability and high-availability.Data can be mirrored or partitioned over multiple nodes. Unavailability of a node still leave available all the datavalues, i.e., provides the high-availability of the data, through the access to the mirror, at the expense perhaps ofsome throughput deterioration, if both mirrors were regularly in use. In the case of partitioning, the unavailabilityof a storage node does not block access to other parts of the collection. Redundant partitioning with parity datamay further provide the high-availability as the mirroring with much smaller storage overhead [Litwin2000].

The keyword scalable in an SD-architecture is more novel. It appeared in early 90s & basically means thatperformance of data unit access should be independent of data volume. One is often talking about the flatscaleup. For a relation or file constitution or scan time, it means that this time should be a linear function of the



size at worst. Likewise, this property is often termed linear scaleup. If the scan time, or more generally, anoperation time, becomes too long because of the size of the data collection it operates upon, the speed-upresulting from a partitioning of the collection over more nodes should be linear as well. While all these goals areclearly in theory a wishful thinking, research has proven that they are often reachable in practice.

The goal of scalability puts new requirements on the distribution management, with respect to more traditionalarchitectures. Traditionally the distribution was designed for some fixed collection of data server nodes, oftencalled cluster, [Gray1996]. Any cluster, at some level of scale-up, must progressively fulfil its storage and CPUcapabilities and start presenting the limitations of a centralized system. This must adversely affect the goal ofscalability. The new and only way out is that the data and processing capabilities are dynamically distributedover the appropriate collection of nodes. The collection may need to scale up, in the number of nodes, or lessoften, scale down. Research is active these days to investigate the underlying technical issues.

A probably most advanced trend for building an SD-architecture are techniques for scalable distributed datastructures (SDDS)s. This concept has appeared in early 90s, [Litwin1993] and is actively investigated since.Dozens of references are available at CERIA Web site [CERIA]. An SDDS is a new type of a data structure thatdynamically partitions the application data over a collection of available server nodes. The number of serversincreases with the data size, the distribution itself is transparent to the application. The data may remain entirelyin distributed RAM or at local disks. The partitioned data can be also mirrored for high availability or providedwith the parity data for this purpose.

The CERIA team has widely recognized competence in SD-architectures based on SDDSs. A number oftechnical papers is available at [CERIA]. Research co-operation with HP Labs in Palo Alto and IBM AlmadenResearch led to three US patents (see IBM Patent Repository through http://www.ibm.com/). Recently, inMarch 2002, CERIA hosted an international workshop on Distributed Data & Structures (WDAS-2002). 1st

known prototype of an SDDS manager was also developed by CERIA. A version is available for public non-commercial download at CERIA Web site. Its allows for very large data sets in distributed RAM withdemonstrated data unit access performance of hundred times faster than to the disk.

This know-how and performance should be crucial to the ICONS efficiency. It will be used by CERIA todevelop the ICONS SD-architecture. It is planned to be based on SDDSs. More precisely, it should obey thefollowing principles we now overview.

The ICONS SD-architecture should be multi-tier. The ICONS private and permanent data should be stored atSDDS-server nodes, servers in short. The application agents, whether dealing with knowledge or databasemanagement should interact with SDDS client nodes, clients in short. The servers manage data storage (databuckets) and scalable distributed partitioning. More precisely, an overloaded server may split its bucketevacuating a part of it, usually a half of the data, to another node allocated dynamically. The main goal of thisprocess is to keep the data for processing in the distributed RAM. The corresponding performance gain withrespect to the disk storage (and centralized or cluster processing of scaling data) should provide to ICONSapplication data processing a leverage that crucially lacked to previous attempts in the domain.

The clients are not made aware of the splitting process. Each client has an image of the data distribution, notnecessarily the actual one. The client uses the image to issue the key queries. Such queries address (search,insert, update, delete) data units with identifiers (keys): records, tuples… Since its image can differ from theactual one, the client can send the query to an incorrect ICONS server. All servers have therefore the capabilityto recognize such a query and forward it towards the server that could be the correct one. This process shouldultimately, possibly in at most few hops, to the correct server. This one processes the query. It also sends aspecific message to the client, termed the Image Adjustment Message (IAM). The client uses this message toadjust its image. It still may be not the actual one. However, at least the same addressing error should not happentwice.

In addition to the key queries the ICONS SD-architecture should support the scan queries. A scan addresses inparallel all servers in some data range, or, ultimately, all the servers. The processing time is then basically boundby the size of data collection at each server, instead of the entire data size. As this size remains fixed, thescalability should be largely attained. The RAM processing speed should add up to new levels of performance inprocessing of the complex operations.

One new problem with the scans is that client may not know all the servers it should address. Hence, it may sendthe query to only some, but not all. The servers should forward the query to those who did not get it. The processshould guarantee that each server gets the query once and only once. The client gets replies. There are severalpolicies for organizing that reception to avoid the client’s overcharge. Furthermore, the client has then the choicebetween the probabilistic and deterministic termination protocols. The former means that the client terminates

http://www.ibm.com/



when no further reply comes after some time-out. The latter corresponds to a subsumption algorithm thatguarantees that all replies were received.

The servers should also guarantee the high-availability. In ICONS it should be done by providing the parity datato the groups of servers. A group with the parity should then be able to transparently tolerate k ≥ 1 unavailableservers. The degree of protection k should scale-up transparently with the collection size. These properties willbe provided by a variant of erasure correcting codes derived from the well-known Reed-Salomon errorcorrecting codes [Litwin2000a].

There are various choices for the message passing between clients and servers, as well as for the systemarchitecture at each node. Those will be analyzed during further work. As the overall assumption, one will usewhenever possible standard and popular components. Hence, for the communication, one should use the TCP/IPstacks, and faster UDP messaging, unicasting and multicasting, for service messages, with a dedicated flowcontrol when needed. Likewise, a multithread processing at each node seems the best basis as well [Diene2000].

Summing up, the ICONS SD-architecture should offer a number of novel features, to accommodate stringentperformance requirements. These features should allow for the practical acceptance of the project results, asperformance is then the key need.

8.2 The ICONS distributed processing optimisation and load balancingThe distributed processing optimization for a data management system at the gross architecture level passestraditionally by the load balancing among the nodes and the inter-query optimization on the clients and servers[Ozsu1999]. Main reason is that at this level, the semantic of a query is unknown, hence intra-query optimizationmay only be quite general. The inter-query optimization passes then by possibly executing a query, whileanother query is waiting for a resource, especially the network transfer. The most widely accepted approach isthe organization of the client and server processing as threads manipulating queues. We adopt this approach asthe basis for the ICONS SD-architecture as well, for both clients and servers.

More in depth, there should be a query queue at the where an application leaves its request at the client. Therequest consists of the query and data or a local pointer to. This queue should be read by a number of threadswhich remains to be determined for a given client. Each thread processes a query, finds the addressed server(s)and places the query in some internal send queue. Its role temporarily ends up by the request(s) to the sockets tosend-out the query, using UDP or TCP/IP messaging depending on the case. Other threads may continue the dataprocessing during this time, hence realizing the client side intra-query optimization.

At the server, all incoming requests are to be placed in the listen queue. Several threads process this queue andsearch or update the storage. Any data to return, as well as IAMs if any, are sent out. A thread working in apipeline mode can then be blocked while its current reply is being sent. The other threads may continue the dataprocessing during this time, hence realizing the sever side intra-query optimization.

Several threads at the client listen to the network buffers, and transfer ASAP the incoming replies into a replyqueue. In the case of an SDDS this approach is particularly useful, as a key query may be sent to one serverwhile the replies comes from another one. Other threads explore the reply queue, match it to the query queue andfinally reply to the applications. Some processing may be in pipeline mode making a thread waiting blocked fornext data item. Other reply can be processed in the meantime during this time, hence realizing the other facet ofthe client side intra-query optimization.

Likewise, the servers in ICONS SD-architecture should possibly support the load in adequacy with processingcapability of each server. Numerous research results, especially on the load balancing in a parallel DBMS showthat the processing load balancing usually follows the data load balancing [Vocking2002]. Sophisticated andcomplex research attempts of processing load balancing by analyzing query frequency, resources consumptionetc. did not lead yet to any practical acceptance. An SDDS may then allow for the load balancing in at least twoways. Those follow the similar ideas for a parallel DBMS. These offer the hash partitioning of the applicationdata, e.g., DB2, or range partitioning, e.g., SQL Server, or both, e.g. Oracle.

The most used one is the hash partitioning. A well performing hashing randomizes the data location and rendersa server load naturally uniform. One can expand it into the double or triple hashing with symmetric orasymmetric record placement schema [Vocking2002]. In our case, this type of balancing, translates to a scalabledistributed hash partitioning scheme. An LH* type of scheme appears best candidate [Litwin1996]. Especially,since variants of this scheme are known that provides also for the high-availability [Litwin2000] and others (see[CERIA]).



The range partitioning is another common type of partitioning. This one leads to an ordered collection of data. Inour case, an RP* scheme appears best candidate. Such schemes provide at present ranges such that each serverstores about the same number of data items. As for the hashing, this property usually provides good loadbalancing. However, the opposite is also naturally more frequent. Consider for instance that the rangepartitioning concerns a phone book of a region with partitioning key being the city and that some cities haveimportant administrative centres, whose phones are retrieved therefore much more often. That would lead tomore processing load of the servers with the ranges including those cities.

The solution we plan for the ICONS SD-architecture consists in a modification of the RP* schemes,[Diene2000], to be selected later for the ICONS needs, so that ranges on overloaded severs, are madedynamically smaller. For instance, they are halved. Such a decision can be made locally by each server, on thebasis of some statistics with respect to those from other servers. Through the splits triggered by the rangechange, the data items of an overloaded server spread on several servers. The processing load re-balancesaccordingly. Likewise, the under-loaded servers could merge.

Summing up, distributed processing optimization and load balancing are complex matters. At SD-architecturelevel in particular the query semantics is unknown. One should concentrate on the inter-query optimization andthe data load balancing, [Ozsu1999], [Vocking2002]. The ICONS solution for the latter should pass then throughthe concepts of threads co-operating through the queues, at both servers and clients. It will also be based on theload balancing, generalizing for the scalable distributed environment the more traditional widely-used techniquesof data partitioning.

8.3 The ICONS distributed workflow process communication andsynchronisationOne of the most challenging features of WFMSs is workflow interoperability. Such interoperability enables twoor more workflow engines to communicate and work together to co-ordinate their work.

There are several different models of workflow co-operation, namely: the chained process model, the nestedsubprocess model, and the parallel synchronised model.

Figure 13. Models of workflow co-operation.

In the chained process model after one workflow process is completed, another workflow process inherits theprocessing and starts. This is the most basic model. In the nested subprocess model, one workflow process has apart of its processing done by another workflow process. In the parallel synchronised model, two workflowprocesses that are proceeding independently become synchronised at some point and exchange information, and



then continue independently. When an activity reaches the synchronisation point, it waits for the other to arrivethere, and then they exchange information.

On the basis of the WfMC’s reference model, and the Interface 4 standard described in [WfMC1996], the ObjectManagement Group (OMG) had developed JointFlow specification. JointFlow defines a framework fordistributed workflow applications in the world of business objects ([OMG1998]). This specification enablesinteroperability of workflow process components, monitoring and workflow execution, and association ofworkflow components to resources involved in a workflow process. In the next step a simple workflow accessprotocol (SWAP) has developed. SWAP was envisioned as a binding of the jointFlow object model and relatedWfMC standards to an HTTP-based interaction protocol. Finally, in 1999, WfMC has presented the Wf-XMLspecification. This specification enhances some of its predecessors’ capabilities, providing:! a structured and well-formed XML body protocol that consists of message containing headers and data! logical interact model with synchronous, asynchronous, and batch capabilities! independence from transport mechanisms! easy extensibility through the use of XML and dynamic workflow context data.

In a synchronous messaging a process A can may wish to initiate a sub-process and suspend its normalprocessing until that sub-process completes. In an asynchronous messaging, the initiating process sends a requestto the enacting process. The enacting process then sends only an acknowledgement back to the initiator,informing that the request has been received. At some later point in time, the enacting process sends a responseto the initiating process. The initiating process sends then an acknowledgement back to the initiator, informingthat it received the response. In the batch messaging it is possible to place multiple Wf-XML interaction in asingle message.

In the ICONS project we will implement Wf-XML specification and used e-mail to transport XML workflowmessages.

Intelligent Content Management System 1.15Demonstration of ICONS prototype capabilities April 2002


9. Demonstration of ICONS prototype capabilities

9.1 The “Newly-associated States Best Practices” Portal

9.1.1 IntroductionThere is a proliferation of Web content management systems in various application realms. The integration ofinternal information repositories with the external data sources, is the current trend in the architecture ofmanagement information systems. Examples of active development in the areas of government, energy industry,and general B2B systems are presented in [Ambite2001, Bouguettaya2001, Elmagarmid2001, Mecella2001,Shim2000].

Although the current systems are designed according to disciplined life-cycles based on various designmethodologies, there exists a clear need to formulate a life-cycle and the underlying methodology fordevelopment of large scale, knowledge-based content management systems. Such methodology must besubstantiated by at least a pilot development of an application based on an intelligent content managementsystem.

The novelty of the ICONS project within the realm of this objective is exemplified by the following solutioncharacteristics:1. Specification of a prototype life-cycle and the underlying methodology for design and development of the

intelligent content management systems applications.2. Demonstrating the viability of the ICONS architecture and application development methodology by

developing of a pilot knowledge-based content management application.

In terms of project organisation, all this corresponds to Objective 4 of the ICONS project, i.e. to develop ananalysis and design methodology for large, knowledge-based content repository systems.

ICONS research results, especially those related to the ten technologies identified in Section 4 as ‘to bedeveloped’ will be demonstrated both at application level and at methodological level. The planned work (WP7),includes three tasks and corresponding deliverables: T1-> D35, T2->D24, and T3->D25. T1 has already beenstarted. Indeed D35 “Conceptual analysis of ‘NAS Best Practices’ portal” will be developed first. D35 willcontain the essential requirements for the ICONS prototype. These requirements will provide relevant“attraction” points for technology developers, active in other WPs. Of course, during the last semester of theproject, these same requirements, possibly updated, will serve as basis for the development of the prototype(D25). Another basic input will be provided by D24 “The knowledge-based content management applicationdesign methodology”.

For the sake of being specific, the pilot application is first described.

The specific objectives of ICONS prototype portal and its pilot application ‘NAS Best Practices’ are:Development and publishing for general use over Internet of a knowledge repository concerning procedures,management practices, and “best practice” projects funded by PHARE, ISPA, and SAPARD funds. Theknowledge repository is to contain public information to be made available to all interested parties over theInternet. [ICONS D02]

9.1.1.1 NASBy “Newly Associated States (NAS)” is meant in fact the ten candidates to EU membership from Central andEastern Europe (CEE), see [Enlarg-Report-2001]. These candidates are: Bulgaria, Czech Republic, Estonia,Latvia, Hungary Lithuania, Poland, Romania, Slovakia and Slovenia.

“This year’s Regular Reports and the present stage of the accession negotiations do not yet allow theCommission to conclude that the conditions for accession are fulfilled by any of the candidate countries. Amongthe twelve negotiating4 countries, ten have target dates of accession compatible with the Göteborg timeframe.

4 “The Copenhagen political criteria continue to be met by all presently negotiating candidate countries. Turkeystill does not meet these criteria.” (Political criteria/Conclusions of [Enlargement-Rep2001].



The Union should therefore be prepared to conclude accession negotiations by the end of the Danish Presidencyin 2002, in view of accession in 2004, with all countries meeting the necessary conditions. Necessaryadministrative preparations inside the Institutions are already under way and should be continued.(Conclusion/§4).

“The 2002 Regular Reports will examine whether the candidate countries will have, by accession, adequateadministrative capacity to implement and enforce the acquis.” (Conclusion/§5)

If we look at such a regular report, e.g. for Poland, we will see that progress towards the adoption of the acquis isexamined in 29 chapters. Here is the list of examined topics:

1: Free movement of goods 16: Small and medium-sized enterprises2: Free movement of persons 17: Science and research3: Freedom to provide services 18: Education and training4: Free movement of capital 19: Telecommunications and information technologies5: Company law 20: Culture and audio-visual policy6: Competition policy 21: Regional policy and co-ordination of structural instruments7: Agriculture 22: Environment8: Fisheries 23: Consumers and health protection9: Transport policy 24 - Co-operation in the field of justice and home affairs10: Taxation 25: Customs union11: Economic and monetary union 26: External relations12: Statistics 27: Common foreign and security policy13: Social policy and employment 28: Financial control14: Energy 29: Financial and budgetary provisions15: Industrial policy Plus : Translation of the acquis into the national languages

Table 9. Checklist of the acquis (chapters in Regular Reports).

In [Enlargement-Rep2001-A] it can be seen that for Poland 11 chapters were still in negotiation in 2001.

9.1.1.2 Phare, ISPA, and Sapard“During the period 2000-2006 financial assistance from the European Communities to the candidate countries ofCentral and Eastern Europe will be provided through three instruments: the Phare programme (CouncilRegulation 3906/89), ISPA (Council Regulation 1267/99) and Sapard (Council Regulation 1268/99)...” The tencountries are listed above. Turkey, Cyprus, and Malta have access to other funds (namely MEDA). Notice thatPhare funds exist since 1989.A synthetic, very simplified view of Phare is given in the following table. We skip the other two instruments asto remain focused on ICONS project and because Phare is the instrument which is the more important, anddocumented.

Item PhareAim/name To assist the candidate countries of central Europe in their preparations for joining the European

Union.Budget For the period 1995-99, funding under Phare totalled roughly EUR 6.7 Billion and covered fifteen

sectors, the main five of which were:infrastructuredevelopment of the private sectoreducation, training and researchenvironmental protection and nuclear safetyagricultural restructuring.The revamped Phare programme with a budget of over EUR 10 Billion for the period 2000-2006 nowhas two specific priorities, namely: institution building, financing investments. [EU-Glossary]

Instrument(s) Accession PartnershipsNational Programmes for the Adoption of the Acquis (NPAAs) ; Regular Reports.

Reforms Phare exists since 1989. In 1997, important reforms were introduced (decentralisation /deconcentration).An Extended Decentralised Implementation System is currently being prepared (EDIS).New approaches should help the countries to prepare for a smooth transition from pre-accessionassistance to Structural Funds.

Web sites Phare:http://europa.eu.int/comm/enlargement/pas/phare/index.htmTenders:(EuropeAid):

http://europa.eu.int/comm/enlargement/pas/phare/index.htm



Item Pharehttp://europa.eu.int/comm/europeaid/cgi/frame12.pl

Control of the EC ex-anteMain actors National Aid Co-ordinator (NAC)

National Authorising Officer (NAO)National FundImplementation Agencies (IAs)Central Financing and contracting Unit (CFCU)

EC DelegationsDG for EnlargementEuropeAid Cooperation Office (formerly theSCR)Final beneficiaries (institutions, municipalities,ministries).

Practical Guide The main features of the Practical Guide fall into three categories (simplification and harmonisation,increased transparency and more rights to companies participating in tenders, and eligibility criteriaand other essentials) -- for Sapard only applicable to procurement. Three types of contracts: services,supplies, works. Procedures (+/- complex) vary according to type of contract and value.See: Practical Guide to Phare, ISPA and SAPARD Contract Procedures athttp://europa.eu.int/comm/enlargement/pas/phare/procedures.htm#6.1

Table 10. Overview of Phare.

9.1.1.3 “Best Practices”To become member of the EU, candidate countries have to implement a large number of reforms. To help themfunds are especially made available to them by present Members States though the central services of the EC,located mainly in Brussels and ‘deconcentrated’ services i.e. in EC delegations.

Since Phare exists, thousands of projects have been tendered, contracted, implemented, assesses, and audited.The fact that organisational, and procedural context of these projects evolves is an indication that lessons learnedby various actors have been, explicitly or not, transformed into knowledge, and eventually changes in rules andprocedures.

It is obvious that Phare programming (see [Phare_Review_2000]) is complex: it spans at least four years and itconcerns multiple institutions and responsibility functions.Even if we limit ourselves to two phases:1. “Implementation -- Tenders -- Contracts and Management “2. “Monitoring and Assessment Reports”

It is clear that large amounts of information, structured or not, quantitative or not, could be considered as primematerial for our prototype application of ICONS. Here are some examples of best practice, which could besupported by our prototype:

Elements of context Relevant Knowledge (adapted on context!)Steps in Main actor/unit Elements/questions for Best PracticeProject Design IA and Beneficiary (see[Phare_Review_2000, p.44] for main requirements)

Which chapters/sections of the acquis are relevant ?Criteria for mixing or separating supply with/from services.How to estimate necessary budget and duration (study, tendering,implementation)?Technical specification or terms of references by ad hoc tender expertor by Beneficiary. Big projects or smaller numerous ones?

Tender preparation CFCU Variants allowed in tenders?Clarification meeting desirable?Which sections of Practical Guide apply ?Visit of premises by tenderers to be organised (instead of more detailedspecifications)How to formulate evaluation criteria for service contracts?

Tender evaluation CFCU Composition of evaluation committee, duration of evaluationPrequalification CFU & Beneficiary Optimal length of the “shortlist”Tendering Tenderer How to evaluate own strong and weak points, compare with other

shortlisted firms ?Contracting CFCU Addenda if any, payment schedules; guarantees, certificates or origin:

sources of administrative problems ?Project realisation Beneficiary Demanded reporting (frequency, details, languages)Financial control CFCU Budget was correct ?Assessment of results Beneficiary and EC Assessment costs, duration, and results.Assessment of results Contractor Added value for “goodwill”, individual experts; need for developing

http://europa.eu.int/comm/europeaid/cgi/frame12.pl



Elements of context Relevant Knowledge (adapted on context!)other/new skills.

Table 11. Best practice taxonomy.

Other K Bases

OriginalDocumentsDB Extractscentralised /decentralised

DATASO URCES

Knowledge WorkersSystem Manager

ECrepresentativescentral / local

Allavailable

andrelevant

data

UsableKnowledge

queries

DATA collectionprocesses

END USERSINFO presentation& distribution

Context ofinterest

---control

feedback

control

ICONS Portal

QueryM gr

EndUser

Interface

Inform ation & KnowledgeBase

Metadata / Ontologies ?

Control Data & ReportsBeneficiaries

Ten derersContractors

INFO quality &Knowledge

enhancements

CFCU&

IAs

Experts

Nationalcoordinaton &

deciders

Figure 14. Main Concept of ICONS portal for NAS Best Practice.

The main functional requirements are outlined hereafter, by considering the different ‘actors’ in succession. Itmust be underlined that during the elaboration of D35, these requirements will be made more precise and alsoadapted to the data which will be actually be made available. (See Remarks below).

9.1.1.4 Actors (1): End UsersThese end users will be or belong to the institutions as listed in Table 10. End Users will be identified personallyand by their role, if this one is not unique.

Possible outputs of the system are relevant parts of:1. NPAA, NPD, Regular Report & Negotiations2. Funding Programmes3. Community legislation in force, National legislation4. Fund request procedures5. Forms / templates, Contact points6. Success stories7. Call for Tenders (including technical specification or terms of references (ToRs))8. Contracts and addenda, if any9. Project implementation reports (from contractors)10. Project assessment reports (from independent auditors/assessors)...

selected on the basis of (assumed) end user interests combined with his own current indications.

A second kind of outputs of the system are advanced queries in DB describing projects and their progress. ECDBs like DESIREE and PERSEUS, or their successors, plus National DBs like newly developed PENELOPA inPoland.

9.1.1.5 Actors (2): Knowledge WorkersApplication Developers and MaintainersOn the basis of the common structure of the text documents (e.g. ToR, Tender forecast), they will establish thelinks between classes of documents representations, and classes of relevant and queriable DB (when permitted).The management and maintenance of the necessary Ontologies is an essential part of their work.



In particular, how end users own knowledge (or experience) in their domain of interest can be amplified by thesystem along the successive interactions, and how this knowledge can made reusable by other users facingsimilar problems is of the outmost importance.Data CollectorsData sources will evolve, URLs are modified, decision centres can change (centrally, locally), therefore means todetect these changes have to be established.

The need and permission to make cache copies of original documents (to guarantee permanent access) have to beestablished. Therefore, co-operation protocols need to be defined, especially in case classified information has tobe accessed, either as such by authorised end users, and/or only through statistical queries (reductions).

9.1.1.6 Actors (3): System AdministratorsTheir functions are classical. They will be responsible for giving permissions to access the specificknowledge/data to authenticated actors.The management and monitoring of all security and availability aspects of the system will be in their hands.

9.1.2 Key Issues for Application DevelopmentReminder 1: ICONS => Knowledge based access to pre-existing, distributed information in various forms (webpages, databases, legacy information systems, etc.)

Reminder 2: in general, Knowledge = “understanding gained from experience”, [Weidner2002, p. 18].

Hence: a working ICONS has to be developed and put into operation in a progressive manner: knowledge needsknowledge to grow, and this growth will be more sustainable if the right information is effectively identified andmade accessible in the most efficient way.

9.1.2.1 The IdeaKnowledge growth can be viewed as spirals made of growing from Knowledge Life Cycles (KLCs).Initially a basic ontology is selected as to seed the system together with a minimal set of information sourcesand reference documents.

The following cycles will develop and consolidate what has been integrated in the previous cycles:• Ontology cycle: enrich the initial ontology (domain of interests such as chapters of the acquis, technologies

(IT, environment, civil engineering, agro-bio-technologies, etc.), time models (dates), space (countries,regions, borders, rivers...), programmes, actors, projects etc.); add connections between these mainconcepts.5 Ontology is subject to validation, hence status of ontological objects has to be managed too.

• Knowledge extraction: establish mechanisms (Intelligent agents) (i) to identify existing & accessiblesources of information, (ii) to extract knowledge from these sources using standardised RDF, and (iii) topopulate an ‘extensional’ base of “facts” (EDB).

• Knowledge derivation : establish knowledge production rules (an ‘intensional’ DB) to derive additionalknowledge from the base of facts (EDB + already derived and integrated facts).

• Intelligent access: by combining informational goals expressed by end users, to deliver relevant facts andsupporting information (original documents or parts of ).

Finally, current achievements will be assessed; extensions or improvements proposed.

It must be underlined that the underlying workflow and co-operation mechanisms between human knowledgeworkers and automated agents (hence also their developers) constitute an ubiquitous challenge for this project.Schematically, a prototype development cycle is

5 See references given in Section 9.2.1, e.g. [Holsapple2002]; initial concepts can be drawn form EU andPhare glossaries published on the Web (e.g. [EU-Glossary], [Phare-Glossary], and [PG-Glossary]).



KnowledgePRODUCTION

KnowledgeCLAIMS

KnowledgeVALIDATION

RELIABLEKnowledge

KnowledgeINTEGRAT ION

INIT

Initial ConfigurationContents sources: : well known URLs (<50) +reference documents.Related Intelligent Agents: noneOntology: main topicsK Production rules: emptyK W orkers: none (only the administrator)W f: to manage K W orkers

Feedback: adapt/change/extend...Ontology: refine, relate, cutMore Contents sourcesIntelligent Agents: K extractionK production rulesK workers: more rolesW f... to support change processes: ontology, agents, rules,sources.

Figure 15. The Knowledge life cycle of the NAS Best Practices Portal.

N.B. After its validation, knowledge is ready for integration, i.e. use and dissemination within the userscommunity. It is not restricted to one organisation or to be of “organisational” nature. Four cycles will benecessary to reach a situation where “intelligent access” will be really meaningful in the prototype application.

Key Technological IssuesThe key issues are further identified in Table 12. Technologies in grey background are those “to be developed”in ICONS project.

Development Cycle focusTechnology Ontology Knowledge

extractionKnowledgeproduction

Intelligent access

Ontology Model Manager(OMM) = functions to create,maintain, and use knowledgerepresentation structures+ formal knowledgerepresentation pertaining to aparticular application domain.

Time modelling has toavailable from theoutset;Example. “concept ofNPAA exists sinceyyyy.mm.dd”.

(application domaintime, to bedistinguished fromsystem time)

Structural KnowledgeNavigator (SKN)makes OMM available toother KMS modules

Link predicates can betime dependent.

Content CategorisationEngine (potentially integratedinto the ICONS architecture)

Essential whenmetadata are missing,or to assess quality oforiginal information.

Datalog Inference Engine Seamless interface withontology = themetadata of the EDB,the extensional DBwhich is used to collectfacts.

Seamless interface withontology = themetadata of the IDB,the intensional database(IDB), which is used todeduce facts.

Fast queries imply in-core operations onextensional database(EDB).

Intelligent WorkflowManager

For each cycle a specific workflow has to be developed, and implemented.Cooperation between concurrent workflow processes has to be foreseen.Interactions between knowledge workers and Intelligent agents will depend on ICONS system time.

Semi-structured ContentIntegrator(knowledge-based wrappertechnology?)

Agents, to be calledintelligent contentintegrators, will provideuniform view onrelevant documents;also need seamlessinterface with ontology

Uniform view on alarge quantity ofvarious documents, isprecondition for“intelligent access”



Intelligent AgentDevelopment Environment(a different specific agent willbe central during early KLCs)

OMM-Awhen ontology modelbecomes complex: caremust be taken not toundermine integrated K

Ground-Aespecially relevant topopulate and maintainthe EDB6 (search forrelevant informationsources) and generationof uniform descriptors

Flying-Ato produce derived facts(‘on the fly’, whenneeded)

IA-A (IntelligentAccess Agent )corner-stone ofprototype HCI; timedependent, possiblyrecurrent and/orrecursive queries are tobe captured andmanaged

HCI Personalisation Engine Dialogue with IA-Aessential to end users(not ICONS experts)

Electronic Form Manager Same relevance asabove.

Content PresentationManager

Ancillary service.

Knowledge Map GraphManager

Visible interface toOMM.

Structural Knowledge GraphManager

Visible interface toSKN

Process Graph Managergraphical design andmonitoring the state of aparticular process instance

Again a challenge, because many users will need to cooperate, and will possibly be involved in severalKM related processes at the same time.

Load Balancing AlgorithmsDistribution OptimisationAlgorithmsScalable Distributed DataStructure (SDDS)

These three technologies should be totally transparent to end users and to knowledge workers.

Except that when usual performances of the (distributed or not) system are not possible, than human users(and possibly Intelligent Agents) should be duly informed.

Distributed WorkflowCommunication

To support the Intelligent Workflow ManagerDuring the early cycles of ICONS prototype, it is better to ensure first good cooperation betweenworkflow processes monitored by a common Workflow engine than to tackle heterogeneous Workflowengines.

Table 12. Key technological issues for development of the NAS Best Practices Portal.

9.1.3 Key Success FactorsMain one is the capability to provide quality information (or the right document), without duplication, noromission.

For filled in documents (e.g. Tender (forecast), ToRs, Final Reports) it is of course critical to link them together,to corresponding assessments, and to records in the DBs.

As an example, imagine some Customs officer (Beneficiary expert) has to create the terms of reference for aMIS development and corresponding training to introduce Taric (European Customs code) in his administration.With very high probability there are similar developments completed or in progress in other candidate countries.Accessing documents related to similar projects, if permitted, would be very helpful. If, in addition, he canaccess assessment report(s), he can deduce that similar project X was well “tendered” and realised, except maybefor the duration of the project (manifestly too short, if it can be observed that final duration was 3 months longerthan in tender documents).

If nothing similar is found and Customs officer knows that similar project (Phare funded) exist, than it should bepossible to him to signal this to “knowledge workers” in such away that, next time, the system will be morehelpful.

In case the systems is not allowed to access some known information sources, it should signal it to the user,maybe with hints about the way to reach this information.

9.1.4 RemarksIf the external data are limited to “public data”, it is likely that some of the outlined functionality has to bereoriented. Project team has to establish contacts with EC officers (in DG enlargement and EC delegations) toassess the interest and feasibility of the concept outlined above.It is assumed that, for the prototype, language at the user interface, and for textual contents is only English. 6 Records of the EDB are also known as “ground axioms”.



9.2 The Knowledge Management System Design Methodology

9.2.1 Approaches to Knowledge Management methodologiesKnowledge Management is a very broad area of human interests. It covers various activities from those ofbusiness and managerial nature to very technical problems associated with building software systems. The resultof these activities is a knowledge management system (KMS). Such a system can be treated as a group(corporate) memory [Kuhn1997] that involves not only a software system, but also all the associated businessprocesses and methodologies [Abecker1999]. As it was presented in previous chapters, the software part of KMSinvolves many novel technologies and a complex architecture in which six main groups of functions areidentified: domain ontology, knowledge dissemination, content integration, knowledge security, KMS actorcollaboration, and content repository management. Each group has its own requirements typically referring to aparticular technology. Each of these features or technologies needs a specific approach to its analysis, design andimplementation. On top of that we have the need to organise appropriately all the business processes(workflows) associated with acquiring, storing and retrieving knowledge. In fact IDM -- our developmentmethodology to be introduced below-- will consider best practices and specific methodologies, and place themwithin a common framework (methodology architecture).

Design, or rather development of KM systems is thus a sophisticated process containing many activities andtasks that result in many complex products. This raises an obvious need to introduce a systematic arrangement ofthese activities, and create guidelines for creating the products, i.e. to specify a KM system developmentmethodology. Such a methodology can be seen as Knowledge Management metaprocess [Staab2001] in contrastto the Knowledge Management process that is one of elements defined by it. So far, most of the research in thisarea was limited to design methodologies for specific KMS features, as Domain Ontologies, ContentRepositories or Knowledge Dissemination. Other approaches have focused only on the managerial issuesassociated with introducing a knowledge management system to business organisations [Tiwana2000,Dieng1999]. Especially broad research has been made in the area of methodologies for ontology construction.Various methodologies have been proposed [Uschold1995, Sure2001, Maedche2001, Holsapple2002]. Aninteresting overview of ontology methodologies and their analysis against IEEE Standard for DevelopingSoftware Life Cycle Processes (IEEE Std 1074-1995) can be found in [Lopez1999]. With such a variety ofmethods, there also exist efforts toward unification [Uschold1996]. A very interesting work presented in[Firestone 2001] is an exception from the rule of concentrating on one feature of Knowledge Management. It isan approach to describe a full lifecycle methodology based on iterative process with definition of all the systemdevelopment disciplines (business modelling, requirements, analysis and design, implementation, projectmanagement, etc.). This process can also be seen as a modelling process, where models are constructedincrementally by refining previously built ones (see: [Studer1998]). This proposition seems to be a good start forconstructing a comprehensive methodology for the ICONS-based KMS development projects.

9.2.2 Requirements for defining a comprehensive KMS development methodologyWhile developing any system, we perform certain tasks, use appropriate techniques, and produce specificdeliverables that conform to the used technologies. All these activities are carried out by people playing variousroles in a system development project. A Knowledge Management System (KMS) is not an exception here. Sucha system contains a software system and a business system surrounding it. To define a methodology for buildinga KMS we should specify elements from the following three groups (see: [Henderson-Sellers1999]):! technical process! techniques! notation (a modelling language).

At a larger granularity, the process includes not only a methodology but also consideration of the people(organisational culture) and the tools (technology) which are available. The KMS development methodologyshould thus provide an integrating framework. All the projects based on it would use this framework byinstantiating a particular part of the framework for their own circumstances. This means that “tailoring of theprocess” should be part of every KMS development project.

9.2.2.1 Technical processA very important feature of technical process is incremental (iterative) delivery [see: [Firestone2001,Studer1998]. This provides immediate feedback from the users and instantaneous verification of the employedarchitecture and technologies. Such a feature of the software lifecycle is very important when developingsystems with the use of new and untested technologies (as is the case for the ICONS based systems).



A project using iterative development has a lifecycle consisting of several iterations. An iteration incorporates aloosely sequential set of activities in business modelling, requirements, analysis and design, implementation,test, and deployment, in various proportions depending on where in the development cycle the iteration islocated. Iterations in the inception and elaboration phases focus on management, requirements, and designactivities; iterations in the construction phase focus on design, implementation, and test; and iterations in thetransition phase focus on test and deployment. Iterations should be managed in a timeboxed fashion, that is, theschedule for an iteration should be regarded as fixed, and the scope of the iteration's content actively managed tomeet that schedule. An iterative approach is generally superior to a linear or waterfall approach for manydifferent reasons.! Risks are mitigated earlier, because elements are integrated progressively.! Changing requirements and tactics are accommodated.! Improving and refining the product is facilitated, resulting in a more robust product.! Organisations can learn from this approach and improve their process.! Reusability is increased.

Another feature of the process is requirement management, treated as systematic approach to finding,documenting, organising, and tracking a system's changing requirements. Requirements management can beformally defined as a systematic approach to both:! eliciting, organising, and documenting the requirements of the system,! establishing and maintaining agreement between the customer and the project team on the system's changing

requirements.

The employment of requirement management allows for clear distinction between requirements and allows forestablishment of clear traces between the requirements and their realisations (design models, components). Thisprevents the projects from falling into the following difficulties:! Requirements are not always obvious, and can come from many sources.! Requirements are not always easily or clearly expressed in words.! There are many different types of requirements at different levels of detail.! The number of requirements can become unmanageable if they're not controlled.! Requirements are related to one another and also to other deliverables of the software engineering process.! Requirements have unique properties or property values. For example, they are not necessarily equally

important nor equally easy to meet.! There are many interested parties, which means requirements need to be managed by cross-functional

groups of people.! Requirements change.

Managing functional requirements is important for KM type systems where the value to the user is not only theknowledge itself, but also the way of using this knowledge.

The process should also put stress on constant verification of quality. It is important that the quality of allartifacts is assessed at several points in the project's lifecycle as they mature. Artifacts should be evaluated as theactivities that produce them get complete and at the conclusion of each iteration. In particular, as executablesoftware is produced, it should be subjected to demonstration and test of important scenarios in each iteration,which provides a more tangible understanding of design trade-offs and earlier elimination of architecturaldefects. This is in contrast to a more traditional approach that leaves the testing of integrated software until latein the project's lifecycle.

Finally, the process should have also incorporated issues associated with change management. This is veryimportant in projects (like those based on ICONS) which produce many products that change throughout thelifecycle. Co-ordinating iterations and releases involves establishing and releasing a tested baseline at thecompletion of each iteration.

Maintaining traceability among the elements of each release and among elements across multiple, parallelreleases, is essential for assessing and actively managing the impact of change. Controlling changes to softwareoffers a number of solutions to the root causes of software development problems:! The workflow of requirements change is defined and repeatable.! Change requests facilitate clear communications.! Isolated workspaces reduce interference among team members working in parallel.! Change rate statistics provide good metrics for objectively assessing project status.! Workspaces contain all artifacts, which facilitates consistency.



! Change propagation is assessable and controlled.! Changes can be maintained in a robust, customisable system.

9.2.2.2 TechniquesTo define a development methodology we need to specify “who” does “what” and “how” it should beperformed. This leads us to the definition of roles, activities, and most importantly – techniques. The mostcentral concept in any technical process is that of a role. A role defines the behaviour and responsibilities of anindividual, or a set of individuals working together as a team, within the context of a software engineeringorganisation. The roles are not individuals; instead, they describe how individuals should behave. The mappingfrom individual to role, is performed by the project manager when planning and staffing the project

All the roles should have associated activities that define the work they perform. An activity is something that arole does that provides a meaningful result in the context of the project. An activity is a unit of work that anindividual playing the described role may be asked to perform. The activity has a clear purpose, usuallyexpressed in terms of creating or updating some product, such as a model, a class, or a plan. Every activity isassigned to a specific role. The granularity of an activity is generally a few hours to a few days, it usuallyinvolves one role, and affects one or only a small number of artifacts. Activities may be repeated several timeson the same artifact, especially when going from one iteration to another, refining and expanding the system, bythe same role, but not necessarily the same individual.

Activities are broken down into tasks. Tasks fall into three main categories:! Thinking tasks: where the individual performing the role understands the nature of the task, gathers and

examines the input artifacts, and formulates the outcome.! Performing tasks: where the individual performing the role creates or updates some artifacts.! Reviewing tasks: where the individual performing the role inspects the results against some criteria.

Tasks have associated Techniques, which present practical advice that is useful to the role performing theactivity. Techniques range across project management through to detailed theories and practices for requirementsengineering and system modelling. An interesting overview of techniques can be found in [Henderson-Sellers1998].

9.2.2.3 NotationThe third important component of a methodology is the notation for the products produced. This gives thedevelopment team a common language for communication. The existence of such a common language is veryimportant for unambiguous communication when developing a very complex system as is the case for KMS. In atypical project that involves software development and business modelling we need to produce various artifactsthat describe all the aspects of the system. The best way to present these aspects is to use a graphical modellinglanguage (visual modelling [Simons1994]). Among groups of artifacts that can be represented graphically in aknowledge management system, are (for reference, see sections: 3.2, 4):! Description of the KM business processes and workflows! Overall architecture of the system! Definition of the ontology! Structure of the knowledge base! Detailed analysis and design models for the software system! Requirements for human-computer interaction! Design of a suitable Human Computer Interface (HCI).

Some products cannot be represented only graphically. These include:! Vision - defines the stakeholders view of the product to be developed, specified in terms of the stakeholders

key needs and features. Containing an outline of the envisioned core requirements, it provides thecontractual basis for the more detailed technical requirements.

! Glossary - defines important terms used by the project.! Supplementary specification - captures the non-functional system requirements that are not readily captured

in a graphical form. Such requirements include: legal and regulatory requirements, and applicationstandards; quality attributes of the system to be built, including usability, reliability, performance, andsupportability requirements; other requirements such as operating systems and environments, compatibilityrequirements, and design constraints.

! Software architecture description - provides a comprehensive architectural overview of the system, using anumber of different architectural views to depict different aspects of the system.



! Change request - Changes to development artifacts are proposed through Change Requests (CRs). ChangeRequests are used to document and track defects, enhancement requests and any other type of request for achange to the product. The benefit of CRs is that they provide a record of decisions and, due to theirassessment process, ensure that change impacts are understood across the project.

! Software development plan - a comprehensive, composite artifact that gathers all information required tomanage the project. It encloses a number of products developed during an inception phase and is maintainedthroughout the project.

! Development case - describes the development process that has been chosen to follow in the specificproject. This product includes all the decisions associated with tailoring of a methodology in a project.

All the above products can be represented in many different forms and notations. It is very important for adevelopment methodology to define a common form of expressing them. The basic role of the methodology hereis thus to prevent different teams from using different notations which could lead to many communicationproblems. A methodology should provide us with precise guidelines for producing coherent and unambiguousproducts.

9.2.3 The ICONS Development MethodologyOne of the important tasks of the ICONS project is to propose a development methodology for systems beingproduced on the basis of its results. We shall call this methodology the ICONS Development Methodology(IDM). Unfortunately, we cannot use directly any of the existing methodologies described in the first section ofthis chapter. Although none of them fulfils all the requirements presented above their best features will be reusedwhenever possible. The complexity and technological scope of ICONS based systems enforce us to develop amethodology that is very comprehensive and at the same time – common to all the paths of the developmentprocess. Said that we are convinced that the starting point for developing the IDM should be an existingmethodology in the area of software development (see also: [Firestone2001]). During the course of the ICONSproject, specific decisions have to made about the development process that has to be followed, notations used torepresent products and techniques used to create the products. These decisions, in many cases, mean the creationof completely new approaches to these three aspects of the development methodology.

We would postulate that the technical process and techniques of IDM be based on best practices of softwaredevelopment (see e.g. [DoD1997]). Some of them were presented in the previous section. This enforces ourpostulate of basing the methodology on an existing one. We should however seek for those that enable practisingthe best practices. This criterion is certainly met by several existing methodologies like ISO 12207 [ISO1995],RUP [RUP2002, Kruchten2000], OPEN [Henderson-Sellers1997], Adaptive Software Development[Highsmith2000]. The main challenge in this area is to choose from those practices that are applicable to KMSconstruction projects and possibly to describe new ones that might be based on experience from the ICONSproject. Some of them are very general, like incremental development or requirements management. Others arespecific to developing knowledge management systems and need certain amount of research efforts. We thinkthat a good starting point would be RUP, OPEN and ASD. These three methodologies introduce an iterative,incremental software construction process that seems to be crucial for the construction of complex systems likeKMS. A promising direction we plan to choose is to verify the applicability of adaptive process [Highsmith2000]in the knowledge management applications.

The IDM needs also a comprehensive and common notation. We have to remember that we need to create anotation for very disparate models associated with various features of the knowledge management system’sreference architecture. However, before building a unified notation it is necessary to baseline the informationcontents of all the possible models that can be created when developing an ICONS based system. The notationshould also take into account all the technologies used to develop the software system. An example of suchapproach can be found in [Connalen2000], where a notation for systems using different Web technologies isdefined. Here again we think that a good approach is to start with an existing language for notation. Thelanguage should be already well known and spread throughout the software development and business modellingcommunity. It should also cover broad aspects of system modelling (static structure diagrams, dynamic systembehaviour diagrams, temporal diagrams, diagrams and models for describing requirements, diagrams formodelling business processes and workflows). It is also very important for the language to have extensionmechanisms to add notational elements specific to KM domain. Currently we postulate that the best choice forthe starting point in this area is UML [Booch1998]. The analysis of all the features in the KMS referencearchitecture (see section 3.2) shows that this language would be already applicable for modelling many of them.UML is also widely known, comprehensive in its definition of various models, and has flexible extensionmechanisms (like stereotyping). However, it has to be stressed that the applicability of UML in the area of KM isnot yet fully explored (for an attempt, see [Aksit2001]). It already can be seen that UML alone is not capable of



representing all the aspects of KM systems. There can be seen some promises from applying agent-oriented (asopposed to object-oriented as in UML) notations and methods [Iglesias1998]. One path of our research would bethus to explore applicability of different UML (and non-UML) models to KM. Another path would be to createan extension of UML for building knowledge management systems. This second path seems to be veryinteresting and challenging in view of the fact that KM uses very broad range of different technologies that oftenneed very specific approach to their modelling. It is also very promising, as this new extended language couldserve as a common way of communication for the KM community (and specifically – the ICONS community).

Intelligent Content Management System 1.15Conclusions April 2002


10. Conclusions

10.1 Compatibility with the stated ICONS project goals and objectivesThe relationships of the ICONS functional modules comprised within the ICONS project focus technologicalareas (see Figure 6) with the project objectives are shown in a cross-reference table (Table 13). Clearlydemonstration of the ICONS prototype capabilities entails usage of all developed system features, therefore onlythe focus technological areas are marked for objective 4.

All ICONS project objectives are met by the proposed system architecture. We are attributing more weight, thanin the initial project proposal, to the procedural knowledge representation and the corresponding intelligentworkflow functionality. This is the result of the on-going research performed by the consortium members, whoagree, that the procedural knowledge pertaining to business processes is an important element of the learningorganization intellectual capital. Also stringent new requirements are formulated with respect to workflowmanagement platform, that are to support knowledge creating processes.

The ICONS project objectivesICONS functional modules(Focus Tech. Areas) Objective 1 Objective 2 Objective 3 Objective 4

Knowledge Management X XOntology Model Manager XStructural Knowledge Navigator XContent Categorisation Engine XDatalog Inference Engine X X XIntelligent Workflow Manager X XSemi-structured Content Integrator X XIntelligent Agent Development Environment X X X

Human Computer Interaction (HCI) XHCI Personalization Engine XElectronic Form Manager XContent Presentation Manager XKnowledge Map Graph Manager XStructural Knowledge Graph Manager XProcess Graph Manager

Distributed Architecture XLoad Balancing Algorithms XDistribution Optimisation Algorithms XScalable Distributed Data Structure (SDDS) XDistributed Workflow Communication X

Objective 1: Development of knowledge representation techniques and methodologies for a multimedia content repository.Objective 2: Development of user interface design and management tools meeting the requirements of the information architecturemethodologyObjective 3: Design and implementation of efficient algorithms for management of large, distributed multimedia content repositoriesObjective 4: Develop an analysis and design methodology for large, knowledge-based content repository systems.

Table 13. The ICONS project focus technological areas and the project objectives cross-reference

While demonstrating that the proposed ICONS architecture meets the stated project objectives, we concentratedon the project focus technological areas. Since enhancements that must be developed for the adopted contentmanagement functions also represent substantial specification and development effort, all such modules will beshown below, cross referenced with workpackages and their respective tasks.

10.2 Overview of the ICONS project development planThe overall project objective is to develop a mature prototype of an intelligent content management system(ICONS) supported by an application design and development methodology and a realistic pilot applicationproviding the final verification platform.



The project plan comprises three principal phases; namely the theoretical research phase, the ICONS prototypeconstruction phase, and the methodology and the pilot application development phase.

The theoretical research phase comprising workpackages WP1, WP2, WP3, WP5, aims at integrating andextending existing research results relevant to the overall project objective. The research results will be presentedin a series of reports and external publications. The objective of this phase is to provide a sound theoretical basefor the ensuing phases of the project. The principal research directions aim at extending existing research resultsin the area of knowledge representation based on logic (disjunctive Datalog) integrated with the semantic datamodel approach represented by the RDF standard, as well as in the area of integration of distributedheterogeneous information resources. Additionally, the consortium plans to work in the area of advanced graphicuser interfaces providing novel tools and techniques in the fields of information architecture and graphicrepresentation of knowledge.

The ICONS prototype construction phase comprising workpackages WP4 and WP6 aims at developing a fullyfunctional system prototype exploiting the research results achieved during the preceding phase and providing atest bed for their evaluation and presentation. The ICONS prototype will be developed as an extension ofexisting software platforms to be selected during WP1. This approach will allow the consortium partners toconcentrate on the novel aspects of the knowledge-based content management without the need to create therequired software environment from scratch.

The methodology and the pilot application development phase comprising the worpackage WP7 has twoobjectives: (i) to create an Internet portal comprising content and the corresponding ontologies of high interest toa large community of potential users, thus attracting attention and consequently securing growth of the selectedapplication realm, (ii) to develop a design methodology for the knowledge-based content management systems.The proposed “NAS Best Practices” portal is to provide much needed information and practical examples ofprojects and procedures required by the EC adhesion process. The second objective has a general applicability tothe fast growing knowledge management field

The technical track of the project is divided into nine workpackages consisting of several tasks each. Eachworkpackage describes a coherent objective; the task structure details the steps necessary to reach it. The closecollaboration between industrial and academic partners is secured by the fact that partners of the above typesparticipate in the all workpackages.

The work starts with the assessment of tools, standards and methods (WP1) relevant to project objectives. Sincethe aim of the project is to integrate and extend the relevant research results in the area of knowledgemanagement and information integration, the stress is laid upon selecting the leading edge research results toprovide the starting point for the project. The principal approach of the project is to extend the state-of-the-arttechnology in the realm of the multimedia content management with powerful knowledge representation andinformation integration capabilities by providing a fully functional software platform. Thus, in order to containthe size and cost of the project, the consortium plans to select an eligible software platform to provide thedevelopment environment and the test-bed for the novel research results produced by the project team.

The principal research streams of the project are, namely the multi-paradigm knowledge representation, and thedistributed content repository, are represented by workpackages (WP2) and (WP5) respectively. The knowledgerepresentation research aims at integration of two distinct knowledge representation schemes, namely the logicapproach (disjunctive Datalog) and the semantic data model approach (UML and the RDF standard), into aconsistent, multi-paradigm knowledge model. The distributed content repository research is to approach twodistinct problem areas, namely the content repository data structure and control process distribution among anarbitrary number of servers and data storage hierarchy, and integration of pre-exisiting, heterogeneousinformation sources accessible over the Inter/Intranet.

The graphic user interface research (WP3) aims at providing advanced solutions within two distinct informationarchitecture problem areas; presentation and manipulation of content maps and multimedia information objects,as well as the graphic knowledge representation. The technical standards to be utilized as the GUI platform areXML/XSL and the corresponding software tools.

The ICONS prototype construction phase entails definition of the system architecture (WP4) and subsequentlydevelopment of the prototype (WP6). Both workpackages involve advanced technological issues related tonovelty of the underlying research results, hence close integration of both type of consortium partners (academicand industrial) is planned. In order to ensure a stable, high quality software prototype, seamlessly integrated with



the selected content management software environment providing an initial platform to be extended with newfunctionality, the component-based software development methodology and tools are to be used throughout theentire software development process. The ICONS prototype stability is to be ensured by a well defineddisciplined quality assurance process.

Demonstrating the new, advanced functionality requires selection of an application area that promises potentiallyhigh attraction to the target user community. Additionally, a facility to publish useful knowledge, dealing with aproblem-ridden area such as efficient execution of complex technological and organizational projects in newlyassociated states funded by the EC aid programmes, is a desirable by-product. The NAS Best Practices” portal(WP7) amply meets the above requirements. The knowledge-based content management applications will surelyavail of a disciplined design methodology, as the new, Internet-based information processing systems gainpopularity. The methodology should be sufficiently general to prove useful for a wide class of knowledge-basedsystem featuring advanced knowledge representation capabilities.

Exploitation and dissemination of project results (WP8) aims at taking advantage of the project potential in thearea of KM technology development within partners’ organisations and ensuring that the project results arecapable of industrialisation as soon as possible. The primary activities include co-ordination with other relevantprojects, publication of the achieved results, planing further implementation of the technology developed,assessment of the ICONS prototype by end users and workshop organisation. Project management (WP9),necessary for such a challenging project, covers both administrative and technical management and is carried outon strategic and daily base.

Project milestonesThe following milestones are employed to measure the progress of the project:M1 By 6th month of the project technological base for the ICONS project will be selected and accepted

by the consortium partners. The technological base will comprise the standards and software tools, aswell as the content management platform to be extended, underlying the ICONS architecture.

M2 By 6th month of the project the feasible research base for the project will be defined and theintegration and extension work will commence.

M3 By 12th month of the project the multi-paradigm knowledge representation scheme will bespecified and accepted as the principal platform for knowledge management for the ICONSprototype.

M4 By 12th month of the project the ICONS architecture will be defined and accepted by theconsortium partners.

M5 By 20th month of the project the ICONS prototype will be developed, tested and installed atselected partners’ sites, as well as made available on Internet for the consortium partners.

M6 By 24th month of the project the “NAS Best Practices” portal will be operational, and the projectfinal report will be accepted by the Commission.

By the end of the project the ICONS software and methods will be fine-tuned with feedback from the pilotknowledge-based portal application.

In order to strengthen the project management attention on the principal objectives of the ICONS project, wecross reference the modules of the project focus technological areas with workpackages, and their respectivetasks, where the actual research work is being carried out. Note that some of the modules may not be assignedmeaning that the specification and/or development work will be performed in prototype developmentworkpackages. Since the ICONS prototype development concentrates on the prototype specification(Workpackage 4) and the prototype implementation (Workpackage 6) and it, by definition, concerns all systemmodules, there is no point in presenting cross reference information for these workpackages. It is worth noting,at this point, that the preliminary analysis shows that the enhancement work on the adopted technology modulesmay represent a substantial specification and development effort, as well as possibly also some research work.

Note that the distributed workflow communication module has not entered into the principal research stream.This is due to the currently on-going standardisation work, co-ordinated by WfMC [Hayes2001], will probablyresult in an industry standard to be implemented by all workflow engine developers.

Most of the pre-existing database integration work has been re-focused towards the intelligent agentdevelopment environment. We believe that the IA technology may be a reasonable answer to problems related to



extracting information from heterogeneous sources, although it may not lead to solutions providing a generalanswer to multidatabase management problems.


Workpackage 2 Workpackage 3 Workpackage 5

Task: 1 2 3 4 1 2 3 1 2 3Knowledge ManagementOntology Model Manager XStructural Knowledge Navigator XContent Categorisation Engine XDatalog Inference Engine XIntelligent Workflow Manager XSemi-structured Content Integrator XIntelligent Agent Development Environment X X

Human Computer Interaction (HCI)HCI Personalisation Engine XElectronic Form ManagerContent Presentation ManagerKnowledge Map Graph Manager X XStructural Knowledge Graph Manager X XProcess Graph Manager X X

Distributed ArchitectureLoad Balancing Algorithms XDistribution Optimisation Algorithms XScalable Distributed Data Structure (SDDS) XDistributed Workflow Communication X

Workpackage 2 Multi-paradigm knowledge representation (WP leader Ulster)

Task 1: Representing knowledge about complex content objects in an ontology base with disjunctive Datalog and the underlyingrelational data model (RDM) (Task leader CIES)

Task 2: Mapping UML semantic data model (SDM) into the Resource Description Facility (RDF) specification (Task leader Ulster)Task 3: Representing procedural knowledge (WfMC compliant workflow specifications) in a RDM ontology base (Task leader ICS)Task 4: Specification of the ICONS multi-paradigm integrated knowledge schema and query language (Task leader Ulster)

Workpackage 3 Advanced graphic user interface (WP leader ICS)

Task 1: Methodology and tools for information architecture design (Task leader ICS)

Task 2: Representing knowledge in the graphic user interface (Task leader ICS)Task 3: Design and implementation environment for the ICONS GUI (Task leader RODAN)

Workpackage 5 Distributed content repository (Task leader CERIA)

Task1: Access algorithms and data structure supporting the ICONS ontology base (Task leader ICS)

Task2: Distribution of ICONS processes and data structures (Task leader CERIA)

Task3: Design of a system architecture for integration of pre-existing, heterogeneous information sources (Task leader CERIA)

Table 14. The ICONS focus technological area modules and the research stream workpackages

Intelligent Content Management System 1.15Appendix A. List of workpackages and deliverables April 2002


Appendix A. List of workpackages and deliverables

Workpackages

Workpackage No

Workpackage title Leader DeliverableNo

WP1 Assessment of tools, standards, and methods ICS D4, D5, D6WP2 Multi-paradigm knowledge representation UU D7, D8, D9,

D10WP3 Advanced graphic user interface ICS D11,D12,

D13WP4 ICONS Architecture Rodan D14, D15,

D16, D17WP5 Distributed Content Repository CERIA D18, D19,

D20WP6 Development of the ICONS prototype Rodan D21, D22,

D23WP7 Design and development of the “NAS Best Practices” Portal SEMA D24, D25,

D35WP8 Exploitation and dissemination of project results Rodan D34WP9 Project Management Rodan D1, D2, D3,

D26, D27,D28, D29

Intelligent Content Management System 1.15Appendix A. List of workpackages and deliverables April 2002


Deliverables list

DeliverableNo

Deliverable title

D1 Project presentationD2 Consortium agreementD3 Evaluation criteriaD4 Standards base for the ICONS projectD5 Technological base for the ICONS projectD6 Research base for the ICONS projectD7 Extracting knowledge from complex content objects into an ontology base

with logic inference capabilitiesD8 Equivalence of UML semantic data model and the RDF content modeD9 Capturing procedural knowledge from process class definitions and from

process instance execution measuresD10 A multi-paradigm ontology base schemaD11 Information architecture: Evaluation of tools and methodsD12 Visualisation of domain knowledge: methods and techniquesD13 The ICONS graphic interface – design specificationD14 Specification of the ICONS software development platformD15 Installation of the integrated ICONS software development platformD16 Specification of the ICONS architectureD17 The ICONS prototype implementation planD18 Access algorithms and data structures underlying a distributed knowledge

baseD19 Optimization of a distributed knowledge-based system architectureD20 Integration of pre-existing, heterogeneous information sourcesD21 The ICONS software technical design manualD22 ICONS installed at selected consortium partners’ sitesD23 Software test cases and acceptance protocolD24 The “NAS Best Practices” portal accessible via InternetD25 The knowledge-based content management application design methodologyD26 1 Progress ReportD27 2 Progress ReportD28 3 Progress ReportD29 1 Management ReportD30 2 Management ReportD31 3 Management ReportD32 4 Management ReportD33 Final ReportD34 Technology Implementation PlanD35 Conceptual analysis of the ‘NAS Best Practices’ portal

Intelligent Content Management System 1.15Bibliography April 2002


Bibliography

External references

Aalst1999 van der Aalst, W., M., P. Flexible Workflow Management Systems: An Approach Based onGeneric Process Models, Proceedings of the 10th International Conference on Database and ExpertSystems Applications (DEXA'99), volume 1677 of Lecture Notes in Computer Science, pages 186-195. Springer-Verlag, Berlin, 1999.

Abecker1999 Abecker, A., Decker, S., Organizational memory. Knowledge acquisition, integration andretrieval issues, Knowledge-Based Systems, p. 113-124, 1999.

Aksit2001 Aksit, M., Marcelloni, F., Tekinerdogan, B., Developing object-oriented frameworksusing domain models, http://wwwhome.cs.utwente.nl/~bedir/papers/FrameworkDomainModels.ps, 2001.

Ambite2001 Ambite, J.L., Arens, Y., Philpot, A., Gravano, L., Hatzivassiloglou, Klavans, J.,Simplifying Data Access: The Energy Data Collection Project, IEEE Computer, February2001.

Apt1988 Apt, K. R., Blair, H. A., and Walker, A., Towards a Theory of Declarative Knowledge. InMinker, J., editor, Foundations of Deductive Databases and Logic Programming, pages89-148. Morgan Kaufmann Publishers, Inc., Los Altos, California, 1988, USA.

Baek1999 Baek, S., Liebowitz, J., Prasad, S.Y., and Granger, M., Intelligent Agents for KnowledgeManagement – Toward Intelligent Web-Based Collaboration within Virtual Teams, inKnowledge Management Handbook, J. Liebowitz (Ed.), CRC Press LLC, 1999, USA.

Baral1994 Baral, C., Gelfond, M., Logic Programming and Knowledge Representation, J. LogicProgramming, Vols. 19/20, 1994.

Bassiliades2000 Bassiliades, N., Vlahavas, I., Elmagarmid, A.K., E-DEVICE: An Extensible ActiveKnowledge Base System with Multiple Rule Type Support, IEEE Transactions onKnowledge and Data Engineering, Vol.12., No. 5, September/October 2000.

Baumgartner2001 Baumgartner, R., Flesca, S., Gottlob, G., Visual Web Information Extraction with Lixto,in Proceedings of the 27th VLDB Conference, Rome, Italy, 2001.

Becker1999 Becker, G., Knowledge Discovery, in Knowledge Management Handbook, J. Liebowitz(Ed.), CRC Press LLC, 1999, USA.

Becker2001 Becker, S.A., Mottay, F.E., A Global Perspective on Web Site Usability, IEEE Software,January/February 2001.

Bell1996 Bell, D. Guan, J. and Lee, S. (1996). Generalized union and project operations for poolinguncertain and imprecision information. Data & Knowledge Engineering. 18 (1996) pp 89-117.

Ben-Eliyahu1994 Ben-Eliyahu, R., and Dechter, R., Propositional Semantics for Disjunctive LogicPrograms. In Annals of Mathematics and Artificial Intelligence, 12:53-87, 1994.

Berners-Lee1999 Berners-Lee, T., J.,Syntax/Semantics, W3C, 1999.Booch1998 Booch, G., Rumbaugh, J., Jacobson I., The Unified Modelling Language User Guide,

Addison Wesley, 1998.Bouguettaya2000 Bouguettaya, A., Benatallah, B., Hendra, L., Ouzzani, M., Beard, J., Supporting Dynamic

Interactions among Web-Based Information Sources.Bouguettaya2001 Bouguettaya, A., Ouzzani, M., Medjahed, B., Cameron, J., Managing Government

Databases, IEEE Computer, February 2001.Buccafurri1998 Buccafurri, F., Leone, N., Rullo, P., Disjunctive Ordered Logics: Semantics and

Expressiveness, Proceedings of International Conference on Principles of KnowledgeRepresentation and Reasoning (KR ’98), 1998.

Buchner2000 Buchner, A.G., Baumgarten, M., Mulvenna, M.D., Bohm, R., Anand, S.S., Data Miningand XML: Current and Future Issues, Proc. of the International Conference on WebInformation System Engineering (WISE’00).

CACM97CERIACuller1994

Comm. of ACM. Special Issue on high-performance Computing.(Oct. 1997).Centre des Etudes et de Recherches en Informatique Appliquée. U. Paris 9 Dauphine,France. http://ceria.dauphine.fr/Culler, D & al. NOW: Towards Everyday Supercomputing on a Network ofWorkstations. EECS Tech. Rep. UC Berkeley.

Cadoli1997 Cadoli, M., Eiter, T., and Gottlob, G., Default Logic as a Query Language. In IEEETransactions on Knowledge and Data Engineering, 9(3):448-463, 1997.

http://wwwhome.cs.utwente.nl/~bedir/papers/FrameworkDomainModels.ps

http://wwwhome.cs.utwente.nl/~bedir/papers/FrameworkDomainModels.ps

http://ceria.dauphine.fr/



Chang2001 Chang, S-K., Znati, T., Adlet: An Active Document Abstraction for MultimediaInformation Fusion, IEEE Transactions on Knowledge and Data Engineering, Vol., 13,No., 1 January/February 2001.

Chen1999 Chen, C., Information Visualisation and Virtual Environments, Springer-Verlag, London,1999.

Chen2001 Chen, C., Paul, R.J., Visualizing a Knowledge Domain’s Intellectual Structure, IEEEComputer, March 2001.

Coleman1999 Coleman, D., Groupware: Collaboration and Knowledge Sharing, in KnowledgeManagement Handbook, J. Liebowitz (Ed.), CRC Press LLc, 1999, USA.

Connalen2000 Connalen, J., Building Web Applications with UML, Addison Wesley, 2000.Corby1999 Corby, O., Dieng, R., The Webcokace Knowledge Server, IEEE Internet Computing,

November/December 1999.Davenport1999 Davenport, T., H., Knowledge Management and the Broader Firm: Strategy, Advantage,

and Performance, in Knowledge Management Handbook, J. Liebowitz (Ed.), CRC PressLLC, 1999, USA.

Decker2000a Decker, S., Melnik, S., Van Harmelen, F., Fensel, D., Klein, M., Broekstra, J., Erdmann,M., Horrocks, I., The Semantic Web: The Roles of XML and RDF, IEEE InternetComputing, September/October 2000.

Decker2000b Decker, S., Mitra, P., Melnik, S., Framework for the Semantic Web: An RDF Tutorial,IEEE Internet Computing, November/December 2000.

Deutsch2000 Deutsch, A., et al., XML-QL: A Query Language for XML, WWW Consortium,www.w3.org/TR/NOTE-xml-ql (current May 2000).

Diene2000 Diène, A. W. Litwin, W. Performance Measurements of RP*: A ScalableDistributed Data Structure for Range Partitioning. 2000 Intl. Conf. onInformation Society in the 21st Century: Emerging Techn. and New Challenges. AizuCity, Japan, 2000.

Dieng1999 Dieng, R., Corby, O., Giboin, A., Ribiere, M., Methods and tools for corporate knowledgemanagement, Int. Journal of Human-Computer Studies, vol. 51, no. 3, pp. 567-598, 1999.

Dieng2000 Dieng, R., Knowledge Management and the Internet, IEEE Intelligent Systems, May/June2000.

DoD1997 The Program Manager’s Guide to Software Acquisition Best Practices, ver. 2.1, U.SDepartment of Defense, 1997.

Düntsch1997 Düntsch, I. and Gediga, G. (1997). Statistical evaluation of rough set dependencyanalysis. International Journal of Human--Computer Studies, 46:589--604.

Düntsch1998 Düntsch, I. and Gediga, G. (1998). Simple data filtering in rough set systems.International Journal of Approximate Reasoning, 8(1--2):93--106.

Dyreson2000 Dyreson, C.E., Evans, W.S., Lin, H., Snodgrass, R.T., Efficiently Supporting TemporalGranularities, IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 4,July/August 2000.

Eder1997 Eder, J.; Pozewaunig, H., Liebhart, W., ePERT: Extending PERT for WorkflowManagement Systems, Proceedings of the 1st East-European Conference on Advances inDatabases and Information Systems (ADBIS’97), 1997.

Eder1999 Eder, J.; Panagos, E., Pozewaunig, H., Rabinovich, M., Time Management in WorkflowSystems, Proceedings of the 3rd International Conference on Business Information System(BIS’99), p. 265-280, 1999.

Eder2001 Eder, J., Paganos, E., Managing Time in Workflow Systems, in Workflow Handbook2001, Layna Fischer (Ed.), Future Strategies Inc., Book Division, 2001, USA.

Eder2001 Eder, J.; Panagos, E., Managing Time in Workflow Systems, Workflow handbook 2001.Eiter1994a Eiter, T., Gottlob, G., and Mannila, H., Adding Disjunction to Datalog. In Proceedings of

the Thirteenth ACM SIGACT SIGMOD-SIGART Symposium on Principles of DatabaseSystems (PODS-94), pages 267-278. ACM Press, 1994.

Eiter1997 Eiter, T., Gottlob, G., and Mannila, H., Disjunctive Datalog. ACM Transactions onDatabase Systems, 22(3):315-363, 1997.

Eiter2000 Eiter, T., Faber, W., Leone, N., and Pfeifer G., Declarative Problem-Solving Using theDLV System. In Jack Minker, editor, Logic-Based Artificial Intelligence, pages 79-103.Kluwer Academic Publishers, 2000.

Elmagarmid2001 Elamagarmid, A.K., McIver, W.,J., The Ongoing March Toward Digital Government,IEEE Computer, February 2001.

Enlarg-Report2001 Strategy Paper 2001 in Key Documents related to the Enlargement Processhttp://europa.eu.int/comm/enlargement/report2001/index.htm

http://www.w3.org/TR/NOTE-xml-ql

http://europa.eu.int/comm/enlargement/report2001/index.htm



Enlarg-Report2001 -Annexes

http://europa.eu.int/comm/enlargement/report2001/annexes_en.pdf

EU-Glossary http://europa.eu.int/scadplus/leg/en/cig/g4000.htmFaber1996 Faber, W., and Pfeifer, G., DLV homepage.

URL:http://www.dbai.tuwien.ac.at/proj/dlv/, since 1996.Fairchild1988 Fairchild, K., Poltrock, S., Furnas, G., SemiNet: Three-Dimensional Graphic

Representations of Large Knowledge Bases, Cognitive Science and Its Applications forHuman Computer Interaction, in R. Guidon (Ed.) Lawrence Erlbaum Associates,Hillsdale, N.J., 1988.

Fensel 2000 Fensel, D., van Harmelen, F., Klein M., Akkermans, H. (2000). On-To-Knowledge:Ontology-based Tools for Knowledge Management. Report of EU-IST project No. 10132.http://www.ontoknowledge.org.

Fensel1998 Fensel, D., Angele, J., Struder, R., The Knowledge Acquisition and RepresentationLanguage, KARL, IEEE Transaction on Knowledge and Data Engineering, Vol. 10, No.,4, July/August 1998.

Firestone2000 Firestone, J.M., Knowledge Management: A Framework for Analysis And Measurement,White Paper No 17, Executive Information Systems, Inc, October 1, 2000,www.dkms.com.

Firestone2001 Firestone, J., M., Knowledge Management Process Methodology: An Overview,Knowledge and Innovation: Journal of the KMCI, vol. 1, no. 2, 2001.

Garvin1993 Garvin, D., A., Building a Learning Organization, Harvard Business Review, July-August, 1993.

Gelfond1991 Gelfond, M. and Lifschitz, V., Classical Negation in Logic Programs and DisjunctiveDatabases. New Generation Computing, 9:365-385, 1991.

Gibbs2002 Gibbs, W. Explore Autonomic Computing. Scientific American. May 06, 2002Ginsburg1999 Ginsburg, M., Kambil, A., Annotate: A Web-based Knowledge Management Support

System for Document Collections, Proc. of the 32nd Hawaii International Conference onSystem Sciences, IEEE 1999.

Goeschka2001 Goeschka, K.M., Schranz M.W., Client and Legacy Integration in Object-Oriented WebEngineering, IEEE Multimedia, January/March 2001.

Goldman1991 Goldman, A.H., Empirical Knowledge, 1991, Berkeley University, USA.Gray1996 Gray, J. Super-Servers: Commodity Computer Clusters Pose a Software Challenge.

Microsoft, 1996. http://www.research.microsoft.com/Gregersen1999 Gregersen, H., Jensen, Ch., Temporal Entity-Relationship Models – A Survey, IEEE

Transactions on Knowledge and Data Engineering, Vol. 11, No. 3, May/June 1999.Grossman1996 Grossman, W., Metadata. In Proceedings of New Technologies and Techniques in

Statistics, pages 183-185.Hammer1997 Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A., Extracting semi-

structured data from the web, Proc. of Workshop on Management of Semi-structuredData, IEEE 1997.

Hayes2001 Hayes, J.G., Peyorovian, E., Sarin, S., Schmidt, M-T., Swenson, K.D., Weber, R.,Workflow Interoperability Standards in the Internet, in Workflow Handbook 2001, LaynaFischer (Ed.), Future Strategies Inc., Book Division, 2001, USA

Henderson-Sellers1997

Henderson-Sellers, B., Younessi, H., Graham, I.S., The OPEN Process Specification,Addison Wesley, 1997.

Henderson-Sellers1998

Brian Henderson-Sellers, Tony Simons, Houman Younessi – The OPEN Toolbox ofTechniques, Addison Wesley, 1998.

Highsmith2000 Highsmith III, J.A., Adaptive Software Development, A Collaborative Approach toManaging Complex Systems, Dorset House, 2000.

Holsapple1999 Holsapple, C.W., Joshi, K.D., Description and Analysis of Existing KnowledgeManagement Frameworks, Proc. of the 32nd Hawaii International Conference on SystemSciences, IEEE 1999.

Holsapple2002 Clyde, W., Holsapple, K., Joshi, D., A collaborative approach to ontology design,Communications of the ACM, vol. 45, no. 2, pp. 42-47, 2002.

Huntington1999 Huntington, D., Knowledge-Based Systems: A Look at Rule-Based Systems, inKnowledge Management Handbook, J. Liebowitz (Ed.), CRC Press LLC, 1999, USA.

Hurson1994 Hurson, A.R., Bright, M.W., Pakzad, S.H., (Editors), Moltidatabase Systems: AnAdvanced Solution for Global Information Sharing, IEEE Computer Society Press, 1994.

IBM1995 IBM, Intelligent Agent Strategy, White Paper,(http://activist.gpl.ibm.com:81/WhitePaper/ptc2.htm, 1995.

http://europa.eu.int/comm/enlargement/report2001/annexes_en.pdf

http://europa.eu.int/scadplus/leg/en/cig/g4000.htm

http://www.dbai.tuwien.ac.at/proj/dlv/

http://www.ontoknowledge.org/

http://www.dkms.com/

http://activist.gpl.ibm.com:81/WhitePaper/ptc2.htm



Iglesias1998 Iglesias, C.A., Garijo, M., Gonzales, J.C., A survey of agent oriented methodologies, In:M. P. Singh J. P. Muller and A. S. Rao, editors, Intelligent Agents V. Agent Theories,Architectures, and Languages - 5th International Workshop, number 1555 in LectureNotes in Artificial Intelligence, Paris, France, Springer Verlag, 1998.

ISO1995 ISO/IEC 12207, Information Technology – Software lifecycle processes, 1995-2001.Jarvis1999 P. Jarvis, J. Stader, A. Macintosh, J. Moore, and P. Chung. 1999: "What Right Do You

Have to Do That? Infusing Adaptive Workflow Technology with Knowledge about theOrganisational and Authority Context of a Task"; In Proceedings of the First InternationalConference on Enterprise Information Systems (ICEIS-99), Setubal, Portugal. (Paper athttp://www.aiai.ed.ac.uk/~jussi/pubs.html)

Kahn2001 Kahn.,P., Lenk, K., Mapping Web Sites, Rotovision SA, 2001.Kirda2001 Kirda, E., Jazayeri, M., Kerer, C., Schranz, M., Experiences in Engineering Flexible Web

Services, IEEE Multimedia, January/March 2001.Klingemann2000 J. Klingemann; Controlled Flexibility in Workflow Management. In Proceedings of the

12th International Conference on Advanced Information Systems Engineering(CAiSE'00), Stockholm, Sweden, June 5-9, 2000. pp. 126-141. (Copyright Springer-Verlag).

KMForum2001 Weber, F., Kemp, J., Common Approaches and Standarisation in KM, EKMF Workshopon Standarisation, Brussels, June, 2001, www.knowledgeboard.com.

KMForum2001_D11 Kemp, J., Pudlatz, M., Perez, P., Ortega A.M., KM Technologies and Tools, EuropeanKM Forum, IST Project No 2000-26393, March, 2000, www.knowledgeboard.com.

KMForum2001_D11a

Kemp, J., Pudlatz, M., Perez, P., Ortega A.M., KM Terminology and Approaches,European KM Forum, IST Project No 2000-26393, March, 2000,www.knowledgeboard.com.

KMForum2001_D12 Simpson, J., Aucland, M., Kemp, J., Pudlatz, M., Jenzowsky, S., Brederhorst, B., Toerek,E., Trends and visions in KM, European KM Forum, IST Project No 2000-26393, April,2000, www.knowledgeboard.com.

Knoblock1998 Knoblock, C.A., Minton, S., Ambite, J.L., Ashish, N., Modi, P.J., Muslea, I., Philipot, A.,Tejada, S., Modeling web sources for information integration, Proc. of AAAI Conference,1998.

Koulopoulos1995 Koulopoulos, T., M., the Workflow Imperative, van Nostrand Reinhold, 1995.KPMG1999 KPMG Consulting, Knowledge Management Research Report 2000, November, 1999,

www.kpmg.co.uk.Kruchten2000 Kruchten, P., The Rational Unified Process, An Introduction, Addison Wesley Longman,

2000.Kuhn1997 Kuhn, O., Abecker, A., Corporate Memories for Knowledge Management in Industrial

Practice: Prospects and Challenges, Journal of Universal Computer Science, vol. 3, no. 8,pp. 929-954, 1997.

Kushmerick1997 Kushmerick, N., Weil, D., Doorenbos, R., Wrapper induction for information extraction,in Proc. of the Int. Joint Conference on Artificial Intelligence, 1997.

Lambrix1997 Lambrix, P., Shamehri, N., Aberg, J., Towards Creating a Knowledge Base for World-Wide Web Documents, Proc. of the 1997 IASTED International Conference on IntelligentInformation Systems (IIS ’97), IEEE 1997.

Lassila1998 Lassila, O., Web Metadata: A Matter of Semantics, IEEE Internet Computing,July/August 1998.

Lassila2000 Lassila, O., Swick, R.R., Resource Description Framework (RDF) Model and SyntaxSpecification, WWW Consortium, www.w3.org/TR/REC-rdf-syntax (current May 2000)

Lawrence2001 Lawrence, R., Barker, K., Integrating Data Sources Using a Standardized GlobalDictionary, in Knowledge Discovery for Business Information Systems, W. Abramowiczand J. Zurada (Eds.), Kluwer Academic Publishers, 2001.

Leone1997 Leone, N., Rullo, P., Scarcello, F., "Unfounded Sets, Fixpoint Semantics andComputation of Disjunctive Stable Models", Information and Computation, AcademicPress, Vol 135, N. 2, 1997, pp. 69-112

Letson2001 Letson R., Find A Match. TaxonomiesPutContentinContecxt, Transform Magazine,December 2001

Lifschitz1994 Lifschitz, V. and Turner, H., Splitting a Logic Program. In Van Hentenryck, P., editor,Proceedings of the 11th International Conference on Logic Programming (ICLP'94), pages23-37, Santa Margherita Ligure, Italy. MIT Press, 1994.

Lifschitz1996 Lifschitz, V., Foundations of logic programming. In Brewka, G., editor, Principles ofKnowledge Representation, pages 69-127. CSLI Publications, Stanford, 1996.

http://www.aiai.ed.ac.uk/~jussi/pubs.html

http://www.darmstadt.gmd.de/oasys/reports/ftp/pdf/P2000-04.pdf

http://www.springer.de/comp/lncs/index.html

http://www.springer.de/comp/lncs/index.html

http://www.knowledgeboard.com/




http://www.kpmg.co.uk/

http://www.w3.org/TR/REC-rdf-syntax



Lin2002 H.Lin, T.Risch, T.Katchanounov: Adaptive data mediation over XML data. To bepublished in special issue on "Web Information Systems Applications" of Journal ofApplied System Studies (JASS), Cambridge International Science Publishing, 2002

Litwin1993 Litwin, W., Neimat, M-A., Schneider, D. LH* : Linear Hashing for Distributed Files.ACM-SIGMOD Intl. Conf. On Management of Data, 1993.

Litwin1996 Litwin, W. Menon, J., Risch, T., Schwarz Th. Design Issues For Scalable AvailabilityLH* Schemes with Record Grouping. Distributed Data and Structures. CarletonScientific, (publ.) 2000.

Litwin2000 Litwin, W. Menon, J., Risch, T., Schwarz Th. Design Issues For Scalable AvailabilityLH* Schemes with Record Grouping. Distributed Data and Structures. CarletonScientific, (publ.) 2000.

Litwin2000a Litwin, W., Schwarz, T., LH*RS: A High-Availability Scalable Distributed Data Structureusing Reed Solomon Codes. ACM-SIGMOD-2000 Intl. Conf. On Management of Data.

Lobo1992 Lobo, J., Minker, J., Rajasekar, A., Foundations of Disjunctive Logic Programming,Cambridge, Mass., MIT Press, 1992.

Lobo1992 Lobo, J., Minker, J., and Rajasekar, A., Foundations of Disjunctive Logic Programming.The MIT Press, Cambridge, Massachusetts, 1992.

Lopez1999 Fernandez M. Lopez, A., Overview of methodologies for building ontologies, In:Proceedings of the IJCAI Workshop on Ontologies and Problem-Solving Methods,Stockholm, Sweden, 1999.

Maedche2001 Maedche, A., Staab, S., Strojanovic, N., et al. SEmantic portAL - The SEAL approach,In: Creating the Semantic Web. D. Fensel, J. Hendler, H. Lieberman, W. Wahlster (eds.)MIT Press, MA, Cambridge, 2001.

Martin2000 Martin, Ph., Eklund, P.W., Knowledge Retrieval and the World Wide Web, IEEEIntelligent Systems, May/June 2000.

McClean.2002 McClean, S., Páircéir, R., Scotney, B.,and Greer, K. (2002). A Negotiation Agent forDistributed Heterogeneous Statistical Databases. SSDBM, 2002.

McClean2000 McClean, Páircéir, and Scotney and Zhang (2000) Adding Context to the Retrieval ofAggregate Data.

McElroy1999 McElroy, M.W., Second-Generation KM, Knowledge Management, October 1999.Mecella2001 Mecella, M., Batini, C., Enabling Italian E-Government through a Cooperative

Architecture, IEEE Computer, February 2001.Mitchell1979 Mitchell, T. M. (1979). Version spaces: An approach to concept learning. PhD thesis,

Electrical Engineering Dept., Stanford University, Stanford, CA.Mitchell1997 Mitchell, T. M. (1997). Machine Learning. The McGraw-Hill Companies, Inc.Momotko2002 Momotko, M., Subieta, K., Dynamic change of Workflow Participant Assignment. Paper

accepted to Advances in Database Information Systems, ADBIS‘2002, Bratislava, 2002.Nguyen1998 Nguyen, S. H., Skowron, A., and Synak, P. (1998). Discovery of data patterns with

applications to decomposition and classification problems. In Polkowski, L. andSkowron, A., editors, Rough sets in knowledge discovery, Vol. 2, pages 55--97,Heidelberg. Physica--Verlag.

Nonaka1995 Nonaka, I., Takeuchi, H., The Knowledge Creating Company, Oxford University Press,1995, New York, USA.

O’Leary1998 O’Leary, D., Enterprise Knowledge Management, IEEE Computer, March 1998.OMG1998 Object Management Group, Workflow Management Facility, 1998.Ozsu1999 Ozsu, T., Valduriez, P. Principles of Distributed Database Systems. 2nd Ed. Prentice Hall,

1999.PG - Glossary Annexes to Practical Guide: Glossary of Terms

http://europa.eu.int/comm/europeaid/tender/gestion/pg/a01_en.pdfPG-2000 Practical Guide to Phare, Ispa & Sapard contract procedures

http://europa.eu.int/comm/europeaid/tender/gestion/pg/pg_phare_en.pdf (December2000), 170 pages.

PG-2001 Practical Guide to EC external aid contract procedureshttp://europa.eu.int/comm/europeaid/tender/gestion/pg/pg_en.pdf (January 2001), 176pages

Phare_Review_2000 Phare 2000 Review, 27.10.2000, C(2000)3103/2,http://europa.eu.int/comm/enlargement/pas/phare/pdf/review_2000.pdf

Phare-Glossary2001 http://europa.eu.int/comm/enlargement/pas/phare/glossary.htm

http://europa.eu.int/comm/europeaid/tender/gestion/pg/a01_en.pdf

http://europa.eu.int/comm/europeaid/tender/gestion/pg/pg_phare_en.pdf

http://europa.eu.int/comm/europeaid/tender/gestion/pg/pg_en.pdf

http://europa.eu.int/comm/enlargement/pas/phare/pdf/review_2000.pdf

http://europa.eu.int/comm/enlargement/pas/phare/glossary.htm



Phare-ISPA-Sapard-2001

The Enlargement Process and the three pre-accession instruments: Phare, ISPA, Sapard,Proceedings of a conference, 5 th March 2001,http://europa.eu.int/comm/enlargement/pas/phare/pdf/bro-phare-ispa-sapard-2.pdf

PL-Reg-Rep-2000 2001 REGULAR REPORT ON POLAND’S PROGRESS TOWARDS ACCESSION,122 pages

Plumtree2001 Plumtree Software Inc., A Framework for Assessing Return on Investment for aCorporate Portal Deployment, White Paper, 2001, www.plumtree.com.

Polanyi1966 Polanyi, Michael, The Tacid Dimension, Routledge and Kegan Paul, 1966, London,England.

Popper1972 Popper, Karl R., Objective Knowledge, Oxford University Press, 1972, London, England.Popper1977 Popper, Karl, R., Eccles, J., The Self and Its Brain, Springer Verlag, 1977, Berlin,

Germany.President1998 PRESIDENT’S INFORMATION TECHNOLOGY ADVISORY COMMITTEE.

INTERIM REPORT TO THE PRESIDENT OF THE UNITED STATES. August 1998Quinn1996 Quinn, J.B., Anderson, P., and Finkelstein, S., Managing Professional Intellect, Harvard

Business Review, March-April, 1996.Rabarijaona2000 Rabarijaona, A., Dieng, R., Corby, O., Ouddari, R., Building and Searching and XML-

Based Corporate Memory, IEEE Intelligent Systems, May 2000.Ramakrishnan2000 Ramakrishnan, N., PIPE: Web Personalization by Partial Evaluation, IEEE Internet

Computing, November/December 2000.Rumbaugh1999 Rumbaugh J., Jacobson I., Booch G., The Unified Modeling Language Reference Manual,

Addision Wesley, 1999RUP2002 Rational Unified Process, ver. 2002.05.00, Rational Software Corporation, 2002.Sahuguet1999 Sahuguet, A., Azavant, F., WysiWyg Web Wrapper Factory (W4F), Proceeedings of the

WWW Conference, 1999.Shim2000 Shim, S.Y., Pendyala, V.S., Sundaram, M., Gao, J.Z., Business-to-Business E-commerce

Frameworks, IEEE Computer, October 2000.Simons1994 Simons, G.F., Conceptual modelling versus visual modelling: a technological key to

building consensus, Consensus ex Machina, Joint International Conference of theAssociation for Literary and Linguistic Computing and the Association for Computingand the Humanities, Paris, France, 1994.

Soderland1997 Soderland, S., Learning to extract text-based information from the world wide web, Proc.of Knowledge Discovery and Data Mining

Staab2001 Staab, S., Studer, R., Schnurr, H-P., Sure, Y., Knowledge processes and ontologies,Intelligent Systems, vol. 16, no. 1, pp. 26-34, 2001.

Stader2001 Stader, J., Moore, J., Chung, P., McBriar, I., Ravinranathan, M., Macintosh, A., ApplyingIntelligent Workflow Management in the Chemicals Industries, in Workflow Handbook2001, Layna Fischer (Ed.), Future Strategies Inc., Book Division, 2001, USA.

Stader2001 Stader, J., Moore, J., Chung, P., McBriar, I., Ravinranathan, M., Macintosh, A., ApplyingIntelligent WorkFlow Management in the Chemicals Industries, In WorkFlow Handbook2001, 2001.

Struder1998 Struder, R., Richard Benjamins, V., Fensel, D., Knowledge Engineering: principles andmethods, DKE vol. 25, no. 1-2, 1998.

Sure2001 Sure, Y., A tool-supported methodology for ontology-based knowledge management,http://www.aifb.uni-karlsruhe.de/WBS/ysu/publications/2001_fgwm_ontokick.pdf, 2001.

Swenson1998 Swenson, K., Simple Workflow Access Protocol (SWAP), 1998.Swenson2001 Swenson, K., Workflow for the Information Worker, in Workflow Handbook 2001,

Layna Fischer (Ed.), Future Strategies Inc., Book Division, 2001, USA.Tiwana2000 Tiwana, A., The Knowledge Management Toolkit, Prentice Hall PTR, Upper Saddle

River, 2000.Ullman1989 J.D. Ullman, Principles of Database and Knowledge-Base Systems, Rockville, Md:

Computer Science Press, 1989Uschold1995 Uschold, M., King, M., Towards a methodology for building ontologies, In: Workshop on

Basic Ontological Issues in Knowledge Sharing, held in conjunction with IJCAI-95,Montreal, Canada, 1995.

Uschold1996 Uschold, M., Building Ontologies: Towards a Unified Methodology, Proceedings ofExpert Systems, 16th Annual Conference of the British Computer Society SpecialistGroup on Expert Systems, 1996.

Vocking2002 Vocking, B. Symmetric vs. Asymmetric Multiple-Choice Algorithms. Invited Paper.Aracne 2001. Carleton Scientific. 2002.

http://europa.eu.int/comm/enlargement/pas/phare/pdf/bro-phare-ispa-sapard-2.pdf

http://www.plumtree.com/

http://www.aifb.uni-karlsruhe.de/WBS/ysu/publications/2001_fgwm_ontokick.pdf



Wang1998 Wang, H., Düntsch, I., and Bell, D. (1998). Data reduction based on hyper relations. InAgrawal, R., Stolorz, P., and Piatetsky-Shapiro, G., editors, Proceedings of KDD'98,pages 349--353, New York.

Wang2000 Wang, H., Düntsch, I., and Gediga, G. (2000). Classificatory filtering in decision systems.International Journal of Approximate Reasoning, 23:111--136.

Weidner2002 Weidner, D., Using Connect and Collect to Achieve the KM Endgame, IEEE ITProfessional, Jan-Feb 2002, 18-24.

WfMC1994 Workflow Management Coalition, Information Pack, Grenoble, France, July 1994WfMC1996 Workflow Management Coallition, Workflow standard, Interoperability abstract

specification, WfMC-TC-1012 version 1.0, Oct 1996.WfMC1999 Workflow Management Coallition, Workflow standard, Workflow terminology &

glossary, WfMC-TC-1011 issue 3.0, Feb 1999.WfMC2001_A Workflow Management Coallition, Workflow standard, Workflow process definition

language – XML process definition language, WfMC-TC-1025 draft 0.03a, May 2001.WfMC2001_B Workflow Management Coallition, Workflow standard, Wf-XML Binding, WfMC-TC-

1023 version 1.1, Nov 2001.WfMC2002 Workflow Management Coallition, Workflow standard, Wf-XML Binding, WfMC-TC

1023, Final draft, Nov 2001 Version 1.1.Zhang1996 Zhang, M, Zheng C. (1996). Analysis Methodologies of Synthesis of Solutions in

Distributed Expert Systems. Proc ICMAS Kyoto, AAAI Press, pp417-421.

ICONS references[ICONSCONTRACT]

ICONS Consortium, Intelligent Content Management System Contract Number IST-2001-32429. Annex I – Description of work, October 2001

[ICONS D02] ICONS Consortium, The ICONS project consortium agreement, April 2002[ICONS D05] ICONS Consortium, Technological Base for the ICONS project, under development

Intelligent Content Management System 1.15Dictionary April 2002


DictionaryNotion Meaning

actor intelligent agent or knowledge workerCAS a goal-directed open system attempting to fit itself to its environment and composed of

interacting adaptive agents described in terms of rules applicable with respect to somespecified class of environmental inputs [Holland1995]

content any type of a multimedia objectcorporate portal uniform web-based access point to all the organisation’s data, applications and processes

regardless of geographical and temporal limitationsdeclarativeknowledge

knowledge pertaining to static entities like objects, relationships, taxonomies, ruls etc.non procedural knowledge

FGKM First Generation Knowledge Management; approach focusing mainly on distribution ofexisting knowledge

intelligent agent a software entity that carries out some set of operations on behalf of a user or anotherprogram with some degree of independence or autonomy, and in so doing, employssome knowledge or representation of the user’s goals or desires [IBM1995]

KLC Knowledge Life Cycle; a cyclic activity of production, validation and integration ofknowledge

KM Knowledge Management; a set of compounded activities aiming at increasingorganisations effectiveness and efficiency on the way of better exploitation ofinformation resources

KMS Knowledge Management System; an IT platform supporting knowledge managementprocesses

knowledge base the set of remembered data, validated propositions and models (along with metadatarelated to their testing), refuted propositions and models (along with metadata related totheir refutation), metamodels, and (if the system produces such an artifact) software usedfor manipulating these, pertaining to the system and produced by it [Firestone2000]

knowledgeengineering

a set of methodologies and tools for expert knowledge acquisition and formalrepresentation necessary for process of capturing experts’ knowledge

knowledge manager knowledge worker reposnsible for knowledge production, maintenance anddissemination

knowledge map a visual facility allowing for navigation over complex taxonomy and object relationshipstructures of a knowledge base

knowledge worker individual whose overall outstanding performance relay on a unique knowledge in aparticular application domain

KR knowledge representationlearningorganization

an organization skilled at creating, acquiring, and transferring knowledge, and atmodifying its behaviour to reflect new knowledge and insights [Garvin1993]

mediator intermediate virtual database between the integrated data sources and the applicationusing them for re-trieval and update [Lin2002]

NAS Newly Associated StatesNAS Best PracticesPortal

an ICONS architecture compliant portal comprising best practices of PHARE, SAPARD,and ISPA projects developed within the Newly Associated States

ontology an explicit conceptualization model comprising objects, their definitions, andrelationships among objects [Becker1999]

proceduralknowledge

knowledge defined in a prescriptive way i.e. by step by step procedure

RDF Resource Description Facility; emerging standard for standard for Webmetadata that syntactically consists of nodes and attached attribute/value pairs[Lassila1998]

SGKM Second Generation Knowledge Management; approach adding to the FGKM aspectsrelated to acceleration the production of new knowledge

structuralknowledge

knowledge incorporated in ontology-based structure of objects and relation among them

Intelligent Content Management System 1.15Dictionary April 2002


taxonomy 1. a set of means – topics, headings, categories – into which content can be sorted2. a well-defined terminology used within a particular ontology to describe the classes

of objects, their properties, and relationshipsUML Unified Modelling Languageworkflowmanagement

the automation of a business process, in whole or part, during which documents,information or tasks are passed from one participant to another for action, according to aset of procedural rules [WfMC1994]

wrapper an interface to data sources that translate data into a common data model used by themediator [Lin2002]

XML Extensible Markup Language