Context and Semantics for Knowledge Management: Technologies for Personal Productivity

Context and Semantics for Knowledge Management

Paul Warren l John Davies l Elena Simperl

Editors

Context andSemantics forKnowledgeManagement

Technologies for Personal Productivity

EditorsPaul WarrenEurescom GmbHWieblinger Weg 19/469123 [email protected]

Dr. John DaviesBritish Telecommunications plc.Orion G/11Ipswich, IP5 3REAdastral ParkUnited [email protected]

Dr. Elena SimperlKarlsruhe Institute of TechnologyInstitute AIFBEnglerstr. 1176128 [email protected]

ACM Codes: H3, H.4, I.2, J.1

ISBN 978-3-642-19509-9 e-ISBN 978-3-642-19510-5DOI 10.1007/978-3-642-19510-5Springer Heidelberg Dordrecht London New York

Library of Congress Control Number: 2011937697

# Springer-Verlag Berlin Heidelberg 2011This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violationsare liable to prosecution under the German Copyright Law.The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,even in the absence of a specific statement, that such names are exempt from the relevant protectivelaws and regulations and therefore free for general use.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Foreword

The Web and information technology have become part of our daily lives and an

integral part of work. In a short period of time, the way we access and use

information has undergone a fundamental change. This is not only due to the fact

that technology has enabled us to create new ways of storage and retrieval, and

novel forms of content, but it is also related to the increasing amount of information

now generated on a constant basis.

Knowledge and information form part of the biggest assets of enterprises and

organizations. However, efficiently managing, maintaining, accessing, and reusing

this intangible asset is difficult. The fact that much of corporate knowledge only

resides in employees’ heads seriously hampers reuse and conservation. This prob-

lem is not only evident on an organization-wide scale but also for the individual

user: knowing where information can be found and which data is relevant for a

certain workflow or context is typically a human-driven task where computers

provide only limited computational support. In an age where practically every

industry is becoming increasingly information based, the problem of information

finding, interpreting, and combining is omnipresent for knowledge workers.

While a human user can interpret and combine information from different

sources, integrate data using heterogeneous formats, or extract essential knowledge

from distributed chunks of information, a machine cannot easily handle such a

complex task. On the other hand, however, the human user is limited in terms of

computational speed. Consequently, both capabilities must be combined and

knowledge management systems must allow as much automation as possible to

support users and make use of human input where needed.

The Semantic Web and semantic technology address these computational chal-

lenges and aim to facilitate more intelligent search and smoother data integration.

With the recent success of Linked Data the technology has taken a more data-

centric and lightweight approach to semantics. Individual pieces of data are often of

little value, while the combination and integration of many create a new asset. Still,

a human contribution is required in several areas and this contribution can be

encouraged by providing incentive mechanisms: either through time saving or

other forms of rewards that are made visible to the user. The evolution of the

v

Web to a Web of people, Web 2.0, brought many examples that demonstrate the

power of such motivation mechanisms. This socio-technical combination integrates

computational power with human intelligence in order to improve and speed up

knowledge work and to create increased knowledge-based value.

The ACTIVE project acknowledged the challenge of today’s knowledge work-

ers with a pragmatic approach, integrating semantic technology, the notion of

context, the Web 2.0 paradigm, and supporting informal processes. The selection

of technologies and the objectives of the project were driven by the fact that

enterprises can only partially conserve and reuse their own knowledge. The out-

comes of the project are tools and methods that substantially improve the situation

for knowledge workers in their daily tasks and increase individual and collaborative

productivity. Validated in case studies in large organizations, ACTIVE technology

has proven to significantly improve the way users interact with and use information.

Common problems of knowledge work could be alleviated by a powerful combi-

nation of machine and human intelligence. The results of the project will have an

impact on individual and collaborative knowledge worker productivity and on the

capture, reuse, sharing, and preservation of knowledge in organizations.

Innsbruck Prof. Dieter Fensel

vi Foreword

Contents

Part I Addressing the Challenges of Knowledge Work

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Paul Warren, John Davies, and Elena Simperl

2 Web 2.0 and Network Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Yasmin Merali and Zinat Bennett

Part II ACTIVE Technologies and Methodologies

3 Enterprise Knowledge Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Basil Ell, Elena Simperl, Stephan Wolger, Benedikt Kampgen,

Simon Hangl, Denny Vrandecic, and Katharina Siorpaes

4 Using Cost-Benefit Information in Ontology

Engineering Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Tobias Burger, Elena Simperl, Stephan Wolger, and Simon Hangl

5 Managing and Understanding Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Igor Dolinsek, Marko Grobelnik, and Dunja Mladenic

6 Managing, Sharing and Optimising Informal

Knowledge Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Jose-Manuel Gomez-Perez, Carlos Ruiz, and Frank Dengler

7 Machine Learning Techniques for Understanding Context

and Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Marko Grobelnik, Dunja Mladenic, Gregor Leban, and Tadej Stajner

vii

Part III Applying and Validating the ACTIVE Technologies

8 Increasing Productivity in the Customer-Facing

Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Ian Thurlow, John Davies, Jia-Yan Gu, Tom Bosser,

Elke-Maria Melchior, and Paul Warren

9 Machine Learning and Lightweight Semantics

to Improve Enterprise Search and Knowledge

Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Rayid Ghani, Divna Djordjevic, and Chad Cumby

10 Increasing Predictability and Sharing Tacit Knowledge

in Electronic Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

Vadim Ermolayev, Frank Dengler, Carolina Fortuna,

Tadej Stajner, Tom Bosser, and Elke-Maria Melchior

Part IV Complementary Activities

11 Some Market Trends for Knowledge Management

Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

Jesus Contreras

12 Applications of Semantic Wikis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

Michael Erdmann, Daniel Hansch, Viktoria Pammer,

Marco Rospocher, Chiara Ghidini, Stefanie Lindstaedt,

and Luciano Serafini

13 The NEPOMUK Semantic Desktop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

Ansgar Bernardi, Gunnar Aastrand Grimnes, Tudor Groza,

and Simon Scerri

14 Context-Aware Recommendation for Work-Integrated

Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

Stefanie N. Lindstaedt, Barbara Kump, and Andreas Rath

15 Evolving Metaphors for Managing and Interacting

with Digital Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

Natasa Milic-Frayling and Rachel Jones

viii Contents

Part V Conclusions

16 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327


Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

Contents ix

Contributors

Zinat Bennett Independent Consultant, 19 Foxes Way, Warwick, CV34 6AX,

UK, [email protected]

Ansgar Bernardi German Research Center for Artificial Intelligence (DFKI)

GmbH, Postfach 2080, Kaiserslautern, D-67608, Germany, [email protected]

Tom Bosser kea-pro, Tal, Spiringen CH-6464, Switzerland, [email protected]

Tobias Burger Capgemini Carl-Wery-Str. 42, Munich D-81739, Germany,

[email protected]

Jesus Contreras iSOCO, Intelligent Software Components, S.A, Avenida

Del Partenon, 16-18, Madrid 1� 7a 28042, Spain, [email protected]

Chad Cumby Accenture Technology Labs, Rue des Cretes, Sophia Antipolis,

France, [email protected]

John Davies British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE,

Adastral Park, United Kingdom, [email protected]

Frank Dengler Karlsruhe Institute of Technology, Englerstr. 11, Building 11.40,

Karlsruhe 76131, Germany, [email protected]

Divna Djordjevic Accenture Technology Labs, Rue des Cretes, Sophia Antipolis,


Igor Dolinsek ComTrade d.o.o, Litijska 51, Ljubljana 1000, Slovenia, igor.dolinsek@

comtrade.com

Basil Ell Karlsruhe Institute of Technology, KIT-Campus Sud, Karlsruhe

D-76128, Germany, [email protected]

xi

Michael Erdmann Ontoprise GmbH, An der RaumFabrik 29, Karlsruhe 76227,

Germany, [email protected]

Vadim Ermolayev Zaporozhye National University, 66 Zhukovskogo st, Zapor-

ozhye 69600, Ukraine, [email protected]

Carolina Fortuna Jozef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia,

[email protected]

Rayid Ghani Accenture Technology Labs, Rue des Cretes, Sophia Antipolis,


Chiara Ghidini Fondazione Bruno Kessler, Via Sommarive 18, Povo, I-38122

Trento, Italy, [email protected]

Jose-Manuel Gomez-Perez iSOCO, Intelligent Software Components, S.A.,

Avenida Del Partenon, 16-18, Madrid, 1� 7a 28042, Spain, [email protected]

Gunnar Grimnes German Research Center for Artificial Intelligence (DFKI)

Gmb, Postfach 2080, Kaiserslautern D-67608, Germany, [email protected]

Marko Grobelnik Artificial Intelligence Laboratory, Jozef Stefan Institute,

Jamova 39, SI-1000, Ljubljana, Slovenia, [email protected]

Tudor Groza DERI & The University of Queensland, School of ITEE, The

University of Queensland Level 7, General Purposes South Building (#78), Staff-

house Road, St. Lucia Campus QLD 4072, Australia, [email protected]

Jia-Yan Gu British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE,


Simon Hangl STI Innsbruck, University of Innsbruck, Technikerstraße 21a, 6020,

Innsbruck Austria, [email protected]

Daniel Hansch ontoprise GmbH, An der RaumFabrik 29, Karlsruhe 76227,


Rachel Jones Instrata Ltd, 12 Warkworth Street, Cambridge, United Kingdom,

[email protected]

Benedikt Kampgen Karlsruhe Institute of Technology, KIT-Campus Sud,

Karlsruhe D-76128, Germany, [email protected]

xii Contributors

Barbara Kump Knowledge Management Institute, TU Graz, Inffeldgasse 21A,

Graz, A-8010, Austria, [email protected]

Gregor Leban Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova

39, Ljubljana, SI-1000, Slovenia, [email protected]

Stefanie Lindstaedt Know-Center and Knowledge Management Institute TU

Graz, Inffeldgasse 21A, Graz A-8010, Austria, [email protected]

Elke-Maria Melchior Kea-pro GmbH, Tal, CH-6464 Spiringen, Switzerland,

[email protected]

Yasmin Merali Warwick Business School, Warwick University, Coventry, CV4

7AL, UK, [email protected]

Natasa Milic-Frayling Microsoft Research Ltd, 7 J J Thomson Avenue,

Cambridge, United Kingdom, [email protected]

Dunja Mladenic Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova

39, Ljubljana SI-1000, Slovenia, [email protected]

Viktoria Pammer Know-Center and Knowledge Management Institute TU Graz,

Inffeldgasse 21A, Graz A-8010, Austria, [email protected]

Andreas Rath Know-Center, GmbH, Inffeldgasse 21A, Graz A-8010, Austria,

[email protected]

Marco Rospocher Fondazione Bruno Kessler, Via Sommarive 18, Povo, I-38122

Trento, Italy, [email protected]

Carlos Ruiz iSOCO, Intelligent Software Components, S.A, Avenida Del Partenon,

16-18, Madrid, 1� 7a 28042, Spain, [email protected]

Simon Scerri DERI, National University of Ireland, Galway, Lower Dangan,

Galway, Ireland, [email protected]

Luciano Serafini Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento

I-38122, Italy, [email protected]

Elena Simperl Karlsruhe Institute of Technology, KIT-Campus Sud, Karlsruhe

D-76128, Germany, [email protected]

Katharina Siorpaes STI Innsbruck, University of Innsbruck, Technikerstraße

21a, Innsbruck 6020, Austria, [email protected]

Contributors xiii

Tadej Stajner Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova

39, SI-1000 Ljubljana, Slovenia, [email protected]

Ian Thurlow British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE,


Denny Vrandecic Karlsruhe Institute of Technology / Wikimedia Deutschland e.V,

KIT-Campus Sud, Karlsruhe D-76128, Germany, [email protected]

Paul Warren Eurescom GmbH, Wieblinger Web 19/4, Heidelberg D-69123,


Stephan Wolger STI Innsbruck, University of Innsbruck, Technikerstraße 21a,

Innsbruck 6020, Austria, [email protected]

xiv Contributors

Part I

Addressing the Challengesof Knowledge Work

1

Introduction


1.1 Motivation for Our Book

Using and interacting with information is an important part of wealth creation in the

modern world. For many of us, much of our working time is spent at computer

screens and keyboards, and at other information devices. A report by The Radicati

Group (2009) indicates that the average corporate worker spends a quarter of his or

her time on email-related activities alone. Despite the importance of this activity,

we all know that this interaction can be both inefficient and ineffective.

The purpose of this book is to describe how a set of technologies can be used to

improve the personal and group productivity of those interacting with information.

Much of the material for our book comes from the ACTIVE project, which was

motivated precisely by the need to improve personal and group productivity.

ACTIVE was a European collaborative project which ran from March 2008 until

February 2011; an overview of ACTIVE can be found in Simperl et al. (2010).

However, this book is not just about ACTIVE. We also discuss how similar and

related technologies are being developed and used elsewhere, and we include

contributions from other projects using these technologies to respond to the same

challenges.

P. Warren (*)

Eurescom GmbH, Wieblinger Weg 19/4, D-69123 Heidelberg, Germany

e-mail: [email protected]

J. Davies

British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE, Adastral Park, United Kingdom


E. Simperl

Institute AIFB, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany


P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_1, # Springer-Verlag Berlin Heidelberg 2011

3

The people who use our systems undertake what, following the management

scientist Peter Drucker, has come to be known as knowledge work. Drucker (1999)

identified the increased productivity of manual work as a major distinguishing

feature of successful organizations in the twentieth century and saw increased

productivity of knowledge work as a similarly distinguishing feature of

organizations in the twenty-first century. Our concern in this book is with technology

to realize that increased productivity.

1.2 The ACTIVE Project

The ACTIVE project began with the insight that three factors were exerting a

significant impact on the efficiency and effectiveness of how we interact with

information.

Firstly, despite all the technological and organisational efforts which have been

applied to improve the sharing and use of knowledge, organisations still fail to

make full use of their own knowledge. At the same time that technologies are being

developed to address this, so the problem is becoming more complex through the

increasing volume of information available to be shared, through the global nature

of many organisations, and through modern working practices such as distributed

teams and home-working.

Secondly, we all of us face an information overload which makes it difficult to

find what is relevant to our current task. It is not just that there is too much

information; it is that at any given time we only need a part of what is available

to us and the sheer volume makes it hard to locate that part.

Thirdly, we most of us find the focus of our work changing, sometimes from

minute to minute, as we face continual interruptions. These may be in the form of

emails, instant message, phone calls, electronic reminders, or even the nagging

internal voice which reminds us that there is something we need to do urgently.

Each time we switch our task focus we need a different set of information. The

overhead of finding and re-finding that information inhibits our productivity.

These were the challenges we faced in ACTIVE. In response we saw three

technological approaches as being of particular importance: the combination of

Web 2.0 and semantic technology; the use of context; and the use of informal

processes. We do not see these technologies as stand-alone, but as interrelating to

respond to the challenges of personal and group productivity.

The synergy of Web 2.0 and semantic technology is viewed as valuable both to

share knowledge and to retrieve information. In the last decade, the development of

knowledge representation languages based on ontologies, of the corresponding

ontology editors, and of reasoners which can make inferences over these languages,

has provided powerful ways of finding and manipulating knowledge. There is a

perception, though, that the creation and maintenance of ontologies carries a

significant overhead. The requirement is to use these technologies but in a more

accessible way, i.e. in a way more like that of Web 2.0 applications. In ACTIVE we

have two responses to this; through the Semantic MediaWiki (SMW) and through

4 P. Warren et al.

the use of tagging. The SMW offers users the ability to create and share structured

knowledge alongside shared text. In addition, ACTIVE’s enhanced approach to

tagging helps the user to create tags and to use those tags for information retrieval.

The use of context is crucial both to locating the information relevant to us at any

given time and to reducing the overhead of switching tasks. In fact, the word contextmeans different things in different communities. To some it means characteristics

such as location, or the kind of device a person is using. Our interest is in taskcontext. By this we mean a grouping of information objects required by a user for a

particular task. These information objects could be documents, spreadsheets,

emails, images, or even people, e.g., represented by contact details. The point is

that they form a grouping which enables a user, or group of users, to better perform

their work. Viewed from the system, a context is used by an agent to define the

current working focus and determine working priorities (Ermolayev et al. 2010).

For an overview of how context is used in ACTIVE, see Warren et al. (2010).

We use the term informal processes to describe the procedures which we all of us

create and use to undertake our work. We differentiate between business processes,

which are created by the organisation and the informal processes which we create

ourselves. The problem with the latter is that they are frequently not well described.

Because of this they are not shared and hence reinvented many times in an

organisation. Moreover, because they are not shared, they are not subject to the

peer-review which leads to improvement. Hill et al. (2006) call these artful businessprocesses, because “there is an art to their execution that would be extremely

difficult, if not impossible, to codify in an enterprise application”. ACTIVE

has developed tools to make it easier to create, share, view and edit such processes.

We hope that these tools will help ordinary employees, rather than business process

designers, to create and share their own processes.

From the scientific standpoint, in ACTIVE we were testing three hypotheses:

• That the use of lightweight ontologies and tagging offersmeasurable benefits to the

management of corporate knowledge without offering significant user barriers.

• That the use of context helps users cope with information overload, and mitigate

the effects of continual switching of task focus; and that users are further aided

by the deployment of machine learning techniques to discover contexts and

detected changes of context, based on the user’s behaviour.

• That the productivity of knowledge work would be improved by providing tools

to create, view and edit informal processes; that the use of machine learning to

learn processes from users’ behaviour would further support knowledge work;

and that machine learning could also be used to suggest information resources,

based on process and context information.

1.3 The Structure of This Book

Our book is organised into five parts. The next chapter concludes Part I by looking

at the opportunities and challenges faced by organisations as they move towards

exploiting Web 2.0 capabilities. A challenge for knowledge management is the

1 Introduction 5

integration and exploitation of internal and external intelligence. The chapter

describes strategies for exploiting Web 2.0 intelligence.

Part II then looks at the technologies, and also some methodologies, developed

in ACTIVE. Part III describes how these technologies have been exploited and

evaluated in the ACTIVE case studies. Part IV starts with a chapter describing the

principal market trends in the areas addressed by our technologies, and then

includes a number of chapters describing related work in other projects. Finally,

Part V draws some conclusions and indicates some further areas for research.

Parts II to V are all briefly reviewed in the remainder of this chapter.

1.3.1 The Technologies

The first chapter of Part II, Chap. 3, is concerned with the development, mainte-

nance and use of knowledge structures within organisations. Such knowledge

structures are currently very diverse, ranging from database tables to files in

proprietary formats used by scripts. The chapter describes the use of the Semantic

MediaWiki (SMW) to provide an environment for enterprise knowledge structures.

Chapter 4 continues the theme of enterprise knowledge structures. It presents

models to estimate the costs and benefits associated with the development of

ontologies and related knowledge structures, and of the applications using them.

The chapter provides guidelines to assist project managers in using these models

throughout the ontology life cycle.

In our next chapter, Chap. 5, we look at another of the key technologies

underpinning ACTIVE, that of context. The chapter explains how the concept of

context is realised in the ACTIVE Knowledge Workspace (AKWS). It describes the

top-down and bottom-up perspectives of context. In the former, the users create

context, set their current context, and associate information objects with contexts.

In the latter, ACTIVE’s machine intelligence software is used to discover contexts,

based on the users’ behaviour; to detect a user’s current context; and to associate

information objects with context. In ACTIVE, these two perspectives are merged

into a single end-user experience.

Chapter 6 looks at the third of our ACTIVE technologies: informal processes.

The chapter describes tools to support how these processes can be captured, shared

and optimised. It also describes tools to provide the security mechanisms to allow

knowledge to be shared safely.

The final chapter in Part II, Chap. 7, looks at a fundamental technology which

supports ACTIVE’s use of context and process. As already noted, the application of

machine learning techniques to support the use of context and process information

is one of the project’s principal research challenges. The chapter also includes a

discussion of Contextify, an outlook plug-in which uses the technology to support

the management of email.

6 P. Warren et al.

1.3.2 Applying and Evaluating the Technologies

The central premise of the ACTIVE project is that the use of the three approaches

described above, based upon lightweight ontologies and folksonomies, context-

based information delivery, and informal processes, will significantly increase the

efficiency and effectiveness of knowledge work. In Part III we explain how this has

been put to the test. We describe how we have applied these technologies in three

case studies, and how in turn we have used these case studies to evaluate our

technologies, at the user level as well as the technical level.

Chapter 8 describes a case-study with customer-facing people in BT. They

confront the three challenges we discussed earlier: the need to share information;

to combat information overload; and to mitigate the effect of continual

interruptions. The chapter describes how ACTIVE tools, based both on the

AKWS and the SMW, are being used to respond to these challenges; a particular

application is the creation of customer proposals.

A case study in Accenture, a global consultancy with over 200,000 employees, is

described in Chap. 9. The chapter focuses on two enterprise problems: enterprise

search and collaborative document development. It describes how ACTIVE tech-

nology has been used to make generic knowledge management tools context and

task sensitive. The confidentiality of documents is often a bar to their being shared,

even where non-confidential aspects could be exploited across an organisation. The

chapter also describes the use of machine learning technology to automatically

redact documents to enable their being shared.

The third ACTIVE case study, described in Chap. 10, is with Cadence Design

Systems in the electronic design sector. Electronic designers do not follow

predefined workflows but use their tacit knowledge to navigate their own informal

processes. In this case study, ACTIVE’s machine intelligence technology is used to

learn and visualize these informal processes. In this way, bottlenecks can be

identified, and processes can be optimized and shared. A project navigation meta-

phor is used, whereby an electronics designer is helped to find a productive

execution path through the state space of an engineering design project.

All these case studies have been conducted in large organisations. However,

these technologies are also applicable to those working in smaller organisations, or

even those working alone. Similarly, the applications described here are accessed

via a conventional personal computer, but equally the approaches developed could

be used with other devices, including the mobile devices on which an increasing

volume of information interaction takes place.

1.3.3 Complementary Activities

Part IV looks beyond the ACTIVE project to see how others are using related

approaches to achieve the same or similar goals. The first chapter, Chap. 11, looks

1 Introduction 7

particularly at the Web 2.0 marketplace, at what products are available, and at

customer needs and perceptions. The chapter looks at some of the challenges being

faced by those developing tools to boost knowledge worker productivity and

identifies some market trends.

Chapter 12 complements Chap. 3 by describing a range of applications of the

Semantic MediaWiki. The chapter is divided into two sections. Section 12.1

presents SMW+, a product for developing semantic enterprise applications, and

describes applications in content management, project management, and semantic

data integration. Section 12.2 presents MoKi, a semantic wiki for modeling enter-

prise processes and application domains. Example applications of MoKi include

modeling tasks and topics for work-integrated learning, collaboratively building an

ontology and modeling clinical protocols.

The concept of the social semantic desktop is described in Chap. 13, and in

particular the work of the NEPOMUK project. Based on semantic web technology,

the NEPOMUK Social Semantic Desktop allows access to information across

various applications within a knowledge worker’s personal computer. It facilitates

the interconnection, management, and ontology-based conceptual annotation of

information items.

Chapter 14 describes the use of semantic technologies, machine intelligence and

heuristics to enable learning support during the execution of work tasks, a paradigm

known as work-integrated learning. The work described here, which formed part of

the APOSDLE project, includes the automatic detection of a user’s work task, and

context-aware recommendation both of relevant content and relevant colleagues.

The work described also includes the inference of a user’s competences based on

her past activities.

The final chapter of this Part, Chap. 15, relates the shifts in the metaphors we use

to manage and relate to digital content. The chapter notes, for example, the

evolution of email from a communication channel to a rich authoring environment

and the increased rate at which knowledge workers are exposed to information

through content streams. These changes present challenges to the segmented and

application bound information storage on our PCs. An approach is presented which

de-emphasizes storage management and focuses on support for user activities, at

the same time generalizing the notions of folders and files.

1.3.4 Concluding Words

In our final chapter we review the chief themes of our book, in particular our three

technology themes. For each theme we outline some remaining challenges. We also

make some predictions about how these technologies will develop and be used. Yet

that is as much for you, the readers, to determine as for the authors of this book. Our

hope in writing the book is that you will be as excited by the technologies’

possibilities as we are, and help to take them forward.

8 P. Warren et al.

1.4 Knowledge, Knowledge Work and Knowledge Workers

A final comment about language may be of value. The terms knowledge, informa-tion and data will be used in the course of the book. A widely quoted articulation of

the difference between the first two of these is due to Ackoff (1989). Ackoff sees

information as useful data, providing answers to the ‘who’, ‘what’, ‘where’, and

‘when’ questions. Knowledge, on the other hand, enables the application of infor-

mation; it answers the ‘how’ questions. This is a useful differentiation. However,

our approach is pragmatic and in general we will use the word which seems most

natural in the particular circumstances.

We have already noted that the term knowledge work was introduced by

Drucker. He also introduced the term knowledge worker for someone who spends

much of his or her time undertaking knowledge work. It is important not to be elitist

about these terms. Knowledge work is not restricted to those with professional

training. For us, knowledge workers are those who spend much of their working

time interacting with information at a computer or similar device. It is for all such

people that our technology is intended.

Acknowledgement Much of the work reported in this book has received funding from the

European Union’s Seventh Framework Programme (FP7/2007–2013) under grant agreement

IST-2007-215040. Further information on ACTIVE is available at http://www.active-project.eu.

References

Ackoff RL (1989) From data to wisdom. J Appl Syst Anal 16:3–9

Drucker P (1999) Knowledge-worker productivity: the biggest challenger. Calif Manage Rev

41:79–94

Ermolayev V, Ruiz C, Tilly M, Jentzsch E, Gomez-Perez J, Matzke W (2010) A context model for

knowledge workers. Proceedings of the second workshop on context, information and

ontologies, 2010. http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-626/ last

accessed 8 Aug 2011

Hill C, Yates R, Jones C, Kogan S (2006) Beyond predictable workflows: enhancing productivity

in artful business processes. IBM Syst J 45(4):663–682

Simperl E, Thurlow I, Warren P, Dengler F, Davies J, Grobelnik M, Mladenic D, Gomez-Perez J,

Ruiz Moreno C (2010) Overcoming information overload in the enterprise: the ACTIVE

approach. IEEE Internet Comput 14(6):39–46

The Radicati Group (2009) Email Statistics Report, 2009–2013, Palo Alto, CA

Warren P, Gomez-Perez J, Ruiz C, Davies J, Thurlow I, Dolinsek I (2010) Context as a tool for

organizing and sharing knowledge Proceedings of the second workshop on context, informa-

tion and ontologies, 2010. http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-

626/ last accessed 8 Aug 2011

1 Introduction 9

2

Web 2.0 and Network Intelligence

Yasmin Merali and Zinat Bennett

2.1 Introduction

ICTs have been implicated as both the drivers and enablers of the “Information

Society/Economy”, “Network Society/Economy”, and concurrently, in the emer-

gence of Knowledge Management as a distinctive management practice. The

communication and information capabilities and processes enabled by the Internet

and associated technologies are integral to the realisation of the network society and

the network economy (Castells 1996). At the most fundamental level the techno-

logical developments have the potential to increase:

• Connectivity (of people, applications and devices),

• Capacity for distributed storage and processing of data,

• Reach and range of information transmission, and

• Rate (speed and volume) of information transmission.

The exploitation of these capabilities has given rise to the emergence of network

forms of organising as processes, information and expertise are shared across

organisational and national boundaries. Increased global connectivity and speed

of communication have contracted the spatio-temporal separation of world events:

informational changes in one locality can very quickly be transmitted globally,

influencing social, political and economic decisions in geographically remote

places (Merali 2004). In the managerial discourse (Merali 2004; Evans and Wurster

2000; Axelrod and Cohen 1999; Shapiro and Varian 1999), these changes, seen

Y. Merali (*)

Warwick Business School, Warwick University, Coventry CV4 7AL, UK


Z. Bennett

19 Foxes Way, Warwick CV34 6AX, UK



11

as the harbingers of a “new” economy (or “information economy”), were

characterised by

• The critical role of information and knowledge in competition,

• Increased dynamism, uncertainty and discontinuity in the competitive context,

• Pressures for fast decision making in the absence of complete information, and

• The importance of learning and innovation to afford requisite flexibility

and adaptability for survival.

Consequently, whilst most of the 1990s were characterised by management

engagement with Business Process Reengineering (BPR) and the development of

infrastructures and opportunities for e-commerce and e-business, the end of that

decade and the 2000’s saw the shift to a concern with knowledge management,

innovation and business intelligence. Some of us believed that internet-based ICTs

would lead to a qualitative change in management theory and practice:

The “network” form of organising is a signature of the Internet enabled transformation of

economics and society. We find that strategy and managerial discourse are shifting from

focusing solely on the firm as a unit of organisation to networks of firms, from

considerations of industry-specific value systems to considerations of networks of value

systems, and from the concept of discrete industry structures to the concept of ecologies.

In the domain of information systems, the focus on discrete applications development

has become imbued with issues of flexibility, connectivity and compatibility with other

systems. Driven by the business need for intra- and inter-organisational integration of

information processes, we have moved from concentrating on applications development

to engaging with issues of information architectures.

The Internet is implicated as both an enabler and a driver of this interconnected world.

At a more general level, there is an escalation of interest in the idea that information

technology networks and social networks self-organise into a constellation of networks of

networks (Watts 1999, 2003; Barabasi 2002). This is analogous to conceptualising the

interconnected world as a kind of global distributed information system comprising

networks of networks.

Merali 2004, p. 408

Web 2.0 technologies are a gateway to realising the potential of “global

distributed information system” for business and society. However they also pose

significant challenges for firms with regard to the development of appropriate

enterprise architectures and strategies for dealing with the emergent networked

competitive terrain. This chapter provides a perspective on the implications of Web

2.0 developments for enterprise knowledge processes in this context.

2.2 Enterprise Knowledge Processes and the Network Economy

Knowledge Management initiatives in the 1990s were characterised by their focus

on developing information infrastructures (Knowledge Management Environ-

ments, or KMEs) that would enable cross-boundary processes and collaborative

work to be executed efficiently, and their attempts to codify, organise and extract

value from information assets. Table 2.1 provides an illustration of the kinds

12 Y. Merali and Z. Bennett

of capabilities afforded by ICTs that were at the heart of most knowledge manage-

ment strategies in that era.

Experience with these ICT-enabled knowledge management environments

(KMEs) clearly demonstrated that whilst technology solutions could provide access

to information, the value of the information could only be leveraged if it could be

applied effectively by those participating in the information-based interactions.

Consequently the strategy literature on the Information Economy in the 1990s

focused on new business models that were enabled by the Internet with an increased

emphasis on customer relationship management, business intelligence and

innovation. Much of the corporate development in the 1990s and the early 2000s

was concerned with providing integrated platforms and architectures for seamless

delivery of service in conventional value chains, or more ambitiously, in value webs

that were well-defined in scope and scale. Grid computing was feted for enabling

models of provision that could accommodate demand irregularities and spikes in

capacity requirements for computing power, but the potential for ubiquitous con-

nectivity offered by the internet and www remained under-realised by most

organisations. The mid-to-late 1990s brought us pioneers like Napster and Amazon

who used the network capabilities of the web not just for delivering services but for

harnessing social intelligence to enhance the value and utility of their provision.

Looking at developments over the past 5 years, it is clear that realising the full

potential of connectivity in the network economy entails a step change in the scale

of business networks and in the richness, speed and reach of communications across

these networks. This leads to a more complex competitive terrain as there are a

number of different ways that organisations can exploit network effects with

different business models. For example, whilst faster diffusion of information and

ideas may lead to speedier imitation and short-lived first mover advantages for

innovators, it is also possible for smart innovators to exploit network effects to

rapidly establish a dominant footprint in the market space, making it difficult for

imitators to usurp them.

Table 2.1 Capabilities and functionality associated with ICT-based KMEs

Capability Functionality

Discovery and creation • Search

• Filter

• Alert

• Analysis and Synthesis (content analysis, pattern

matching, discovery, “reasoning”)

Sharing • Organisation (classification and clustering)

• Creation of summaries and abstracts

• Display

• Dissemination (e.g. profile-based targeting)

Utilisation and development • Collaborative work space

• Work flow

• Communication and interaction support (wikis,

blogs, social computing environments, etc.)

• Simulation environments

• Learning environments

2 Web 2.0 and Network Intelligence 13

Similarly, whilst the scale of the internet and the extended reach it affords may

enable large corporations to occupy dominant positions in the global market, it also

enables many more niche players to survive comfortably. The increasing scale and

scope of web-based enterprise intelligence is predicated on requisite corporate

competences associated with the management and analysis of very large databases

and scalable architectures, capable of supporting web-based transactions at any

scale. In particular, it has spawned an ecology of business models supporting a

diversity of business size and scope.

The long tail effect1 (Anderson 2006) is a popularly cited characteristic of the

Web 2.0 landscape (see Fig. 2.1). Even if one is catering for only a relativelyminiscule niche market, the global reach of the internet makes it possible to attract

the critical mass of customers necessary for the viable delivery of a service or

product offering (i.e. a small share of a very large market is still large in terms of

absolute numbers). Whilst Anderson pointed out the impact of global reach on

enabling businesses to survive with small numbers of high-value sales, the web has

also spawned a population of business models based on small (low-value)

transactions enabled by the evolution of efficient micro-payment systems lowering

the transaction costs for small purchases. Thus global reach and socio-economic

developments enable many diverse niche providers (ranging from those engaging in

high volume, low value transactions, to those engaging in high value, low volume

transactions) to co-exist on the competitive landscape.

Fig. 2.1 The long tail (modified from (http://www.longtail.com/about.html last accessed 21/02/12))

1 The long tail refers to the statistical property that a larger share of the population rests within thetail of a probability distribution than observed under a “normal” (symmetric about the mean) or

Gaussian distribution.


The increase in power and impact of capabilities afforded by successive

generations of ICT over the decades has been accompanied by a change in the

scope and scale of organisational knowledge processes. Whilst the knowledge

management agenda for the 1990s was generally focused on improving knowledge

processes within organisational boundaries, organisations today are engaging with

exploiting the intelligence of markets and civil society for open innovation and

crowd sourcing. Web 2.0 capabilities and the possibilities of exploiting user-

generated content extend the scope of KMEs beyond organisational boundaries,

supporting employees engaging in novel interactions with external players in

dynamic contexts to make decisions “on the hoof”. The next sections outline the

impact of Web 2.0 and some of the challenges that this raises.

2.3 Web 2.0 Capabilities and Knowledge Work

Although Web 2.0 has been in the public lexicon for several years, its definition is a

“work in progress”. In terms of the Network Economy we can think of Web 2.0 as

the enabler for realising the networkness of networks (Merali 2011). O’Reilly’s

(2007) statement provides a good working definition and captures the features of

Web 2.0 capabilities and use that are the most significant for the current discussion:

Web 2.0 is the network as platform, spanning all connected devices; Web 2.0 applications

are those that make the most of the intrinsic advantages of that platform: delivering

software as a continually-updated service that gets better the more people use it, consuming

and remixing data from multiple sources, including individual users, while providing their

own data and services in a form that allows remixing by others, creating network effects

through an “architecture of participation”, and going beyond the page metaphor of Web 1.0

to deliver rich user experiences.

O’Reilly 2007

Industry observers like Gartner (Valdes et al. 2010) also point to Web ubiquity

and utility of web browser combined with

• Better performing (bandwidth, speed, price) networks and mobile devices,

• Provision of services on the cloud, and

• Social computing and access to the social web;

as the enablers of the transition from what O’Reilly labels as “going beyond the

page metaphor of Web 1.0” to business models for the future.

Mobile and pervasive devices and social computing add the quality of dynamiccontext specificity to the “richness and reach”2 of Internet communications. The

2 In the early 1990s Evans and Wurster’s Blown to Bits underlined the idea that compared to

traditional business models for reaching customers, where there had to be a trade-off between the

richness and reach of communications (e.g. one-on-one interaction with a salesperson versus a

broadcast advertisement in a national paper with a wide circulation) the Internet made it possible to

have both richness and reach, as it was possible to have customised communications for different


promise of cloud computing is very closely associated with the Web 2.0 vision

because it confers two essential capabilities: the possibility of accessing and

managing distributed resources (data, applications, computing power), and the

flexibility of the utility model (offering platform as a service, infrastructure as a

service, software as a service).

Whilst there has been an explosion of new businesses spawned by the emergence

of Web 2.0 capabilities, it is perhaps not surprising to note that the most cited

incumbent organisations making the transition smoothly to “Web 2.0” practices

were those who

• Integrated access to information from large numbers of different sources to

serve dynamic needs (data management competence), and/or

• Provided space for social/economic interactions between large numbers of

people who were not previously connected.

This applies to both pure information businesses (such as Google and Facebook)

and those whose services are implicated in the sale or exchange of material objects

(such as Amazon and e-bay). They were all orchestrators and hosts of transaction

spaces, and used network effects to amass a large following very rapidly. Most did

not own the information or material objects involved in the transactions, and they had

a variety of business models, but all provided ease of access to network resourcescombined with an attractive value proposition for non-paying and paying participants

(very often advertisers). It is salutary that whilst their rapid establishment of a critical

mass in the marketspace made it difficult for new entrants to usurp them immediately,

none have ceased to innovate and expand, often acquiring and absorbing innovative

start-ups to enhance their portfolio of capabilities.

Whilst firms are adopting social media and using social networks, blogs, wikis

and tweets to enhance communications and information sharing within the tradi-

tional paradigm of knowledge management, there is an important change in the way

that firms need to perceive enterprise intelligence in the Web 2.0 era. The

challenges for enterprise intelligence in the “pre-Web 2.0 businesses intelligence

era” were concerned with managing internal data, understanding its value and

leveraging it effectively for competitive positioning, whilst those of the Web 2.0

era are concerned with integration and exploitation of internal and external intelli-

gence for competition and collaboration. The boundary between “business intelli-

gence” and “social intelligence” is no longer a clear one, and this leads to some

interesting tensions in emerging business models. Integrating data from large

numbers of sources and realising the potential for global reach to harness business

and social intelligence poses new challenges for process and data management in

dynamic use contexts.

audiences spread across vast geographical areas. Hal Varian’s Information Rules developed the

idea of developing value propositions based on the versioning of information for different

audiences. These ideas were influential in shaping e-business models, but were largely used in

the context of the traditional linear value chain.


For process design and implementation, the central challenge is that of develop-

ing scalable, agile architectures for secure inter-operability and integration across

diverse organisational boundaries, enabling mashups to deliver context-sensitive

information in real time. The promise of cloud computing for dealing with issues of

scalabilty and demand-based delivery is important in this aspect, and Gartner

(Valdes et al. 2010) predict that content driven architectures building on the

interaction and partitioning styles pioneered by Service Oriented Architectures

and Event Driven Architectures will enter mainstream adoption in the next

5 years. On the software development side, the large scale development and

maintenance for much of the software is often characterised by the Open Source

ethos of collective development and continuous improvement. Dynamic program-

ming languages, syndication and reuse of software all contribute to the speed and

diversity of context-specific assembly that is demanded by users on the hoof.

Use-context information may be provided explicitly by devices or sensors

(e.g. for spatial location) or it may have to be inferred from user queries or dialogue.

This means that application design faces the challenge of having to cater for different

media and device characteristics. However the emerging consensus is that the

greatest challenges are in the domain of data management. There are conceptual

challenges in the organisation and semantic articulation of data originating from

diverse and varied sources and destined for use in undefined future contexts. There

are technical challenges for real-time contextual matching: e.g. anticipating what

data is valuable at any instant in dynamic use-contexts, and folding this in with

served applications. The semantic web literature highlights the challenges of

organisation and seamless integration of data and metadata, but most interestingly

from the perspective of knowledge management are the challenges associated with

the incorporation of social intelligence in corporate business models.

The traditional user-centred and customer relationship management approaches

are compatible with the exploitation ofWeb 2.0 capabilities to improve the customer

experience or to tailor services and products to meet specific customer requirements.

Similarly the use of social computing and the deployment of wikis, blogs and tweets

may be viewed as enhanced communication capabilities to reinforce the knowledge

management practices inherited from the KMheyday of the 1990s. Distinctive in the

Web 2.0 vision is the notion of leveraging the collective intelligence of customers

and civil society, and the diversity of scale and scope for interactions with potentialcustomers and external sources of intelligence.

2.4 Leveraging Collective Intelligence

The business models of the most cited examples of Web 2.0 success stories are all

predicated on leveraging collective intelligence as a resource by doing some or all

of the following:

• Hosting and orchestrating the emergence of collective intelligence,

• Harnessing collective intelligence to


– Improve product/service offering, and

– Develop product/service offering

• Using social media and networks to market product/service offering, and

• Using social media, networks and collective intelligence to mobilise collective

action.

Examples exist for both informational goods and services, and for those involv-

ing physical processes and material objects. Wikipedia, the online encyclopaedia is

probably the most frequently cited example of a site that hosts and orchestrates the

emergence of collective intellectual property – it provides the space where anybody

can add to, comment on, or correct existing entries to constantly develop the scope

and quality of the stored material.

E-bay, Amazon and Netflix are interesting stalwarts from the e-business era

that embodyWeb 2.0 characteristics. E-bay provides a trusted transaction space and

facilitates interactions between prospective buyers and sellers, and in doing so it

amasses social capital and has access to social networks enhancing its reputational

standing. It analyses the usage patterns and bidding behaviours of its users to refine

and manage its product categories and their flows. Amazon combines information

from suppliers and customer reviews with an analysis of patterns of customer

behaviours to provide an enhanced purchasing experience. Its considerable foot-

print in the book trade enabled it to have a head start in the e-book market along

with its Kindle e-book reader. Like Amazon, Netflix (a film loaning business) uses

collaborative filtering to make recommendations to users based on what other users

like, and, according to the Economist, nearly two-thirds of the film selections by

Netflix’s customer come from referrals made by computer (Economist 2010a).

Social networking sites like Facebook that host and orchestrate social interactions

are attractive to advertisers wanting to tap into the potential of social networks to

act as recommendation channels.

The key feature of these success stories is that in retrospect they all have a pattern

in which they build a large following using their service/product, coupled with

a generative business model in which the more a product or service is used, the

more valuable it becomes. This phenomenon of positive returns (Arthur 1990) where

the adoption of a product generates more demand for it, was associated with network

effects in the 1990s. This dynamic was used to explain why VHS overshadowed

Betamax in video formatting, or why there was a tipping point effect that established

the winners in the diffusion of competing communications technologies.

In the case of Web 2.0, in addition to raising the popularity of the offering

and generating the non-linear diffusion effects for market penetration of products,

use of the product or service generates data and intelligence which is harnessed to

enhance the quality of the product or service. Google’s PageRank3, and query

3 In Google’s terms, PageRank “. . .reflects our view of the importance of web pages by consider-

ing more than 500 million variables and 2 billion terms. Pages that we believe are important

pages receive a higher PageRank and are more likely to appear at the top of the search


recognition algorithms and Amazon’s recommendation systems are based on data

about user actions and behaviours. Google’s spell checker uses an analysis of user-generated corrections of mis-spelt queries to hone the appropriateness of its

responses and to generate the “Did you mean. . .?” service for its query recognition,whilst its page-ranking algorithm presents the most “valuable” items first, based on

its analysis of the way in which different kinds of users access and spend time on the

various web-pages.

The potential of the web to generate innovative community-based models of

production was recognised in the pre-Web 2.0 era. The Open Source community’s

model of software development and peer-to-peer content sharing sites like Napster

were seen as disruptive innovations, threatening to displace the dominant

incumbents in the software and music industry. In the business community there

was a great deal of interest in open innovation, with customers and users

contributing to the design of products (Chesbrough and Vanhaverbeke 2006).

These movements were harbingers of the Web 2.0 era which ramped up the

potential scale of involvement from the level of the community to the level of the

entire society, and which extended the scope of their involvement by enabling

a range of different modes of engagement.

Crowd-sourcing4 – getting the users to carry out work or provide content that

enhances the value of a product – is a typical model for social engagement.

Wikipedia, the online encyclopaedia is often cited as an example of crowd-sourc-

ing, as anybody can contribute to its collective, dynamic, universal repository of

knowledge, and entries are open to modification and refinement through an open

process. Flikr, in addition to hosting user-contributed images, uses the intellect of

its users to structure and organise its massive image store in way that enables image

retrieval through multiple and diverse associations – a “folksonomy” emerges as

users tag and classify the images using their own keywords. A more commercial

exploitation is Twitter’s crowd-sourcing of its language translation tool: users

provide translations, giving their time and text for free, whist Twitter owns the

rights to the translations and their reuse to provide apps in different languages for

different devices, thus increasing its global footprint. The idea of crowd-sourcing

has its variants in finance: in contrast to the traditional route of venture capitalists

with deep pockets backing start-ups, there is a small but growing number of crowd-

financed business ventures where a number of small investors collectively back

a venture in return for equity.

results. . .PageRank also considers the importance of each page that casts a vote, as votes from

some pages are considered to have greater value, thus giving the linked page greater value. We

have always taken a pragmatic approach to help improve search quality and create useful products,

and our technology uses the collective intelligence of the web to determine a page’s importance.”4 Howe (2006) coined the term “crowd-sourcing” for “the act of taking a job traditionally

performed by a designated agent (usually an employee) and outsourcing it to an undefined,

generally large group of people in the form of an open call.”


The examples above illustrate the way in which Web 2.0 enables

• The harnessing of intelligence and cognitive capabilities of non-corporate agents

(customers, users or enthusiasts) to do “knowledge work”, and

• The collection and leveraging of metadata to improve the business value

proposition.

However the current excitement about Web 2.0 is very strongly linked with the

developments in social media and the exploitation of social networks for a variety

of economic and social goods. The focus on intellectual capital of the pre-Web 2.0

era is now combined with a quest to monetise the social and relational capital

of networks.

2.5 Social Networks: Leveraging Social and Relational Capital

2.5.1 Socially Intelligent Targeting

Social networking sites like Facebook host and facilitate the development of

extensive and dense networks of “friends”. The sharing of personal information

and experiences between peers, and the degree to which users indulge in self-

disclosure on such sites makes “the network” a repository of social and relational

capital. The ease with which users can deploy a diversity of social media to extend

and organise their personal networks serves both to accelerate growth of the site’s

footprint due to network effects5, and to lock in users who will have ended up with

a large chunk of their relational capital invested in that site.

The attraction of such sites for advertising lies in the fact that they combine the

potential of large networks for amplifying the reach of communications with an

inbuilt, socially intelligent targeting mechanism: the network of friends serves to

select and promote particular brands or products and services. Compared to

Amazon’s recommendation system this has a powerful additional social driver:

recommendations from a “friend” with knowledge of a user’s taste, life style and

aspirations will be couched in terms that are meaningful within the context of the

recipient’s self-concept, and have the assurance of “personally known” trusted

sources. The use of viral marketing to exploit network effects was already well-

established, but the increased potency of social media combined with the capacity

of social networking sites to act as “hubs” for targeting marketing messages has led

to the emergence of a much more sophisticated and personalised form of social

marketing.

5 The power of this effect is illustrated by Facebook’s reported trajectory of growth from 50 M

active users in October 2007 to 500 M in July 2010 (http://www.facebook.com/press/info.php?

timeline accessed 21/02/11).


A recent review inWired magazine (Rowen and Cheshire 2011) gives examples

of ways in which companies are engineering the social use of blogs and tweets to

mobilise social selling, with models that incorporate a range of different socially-

based mechanisms. Many of these are based on combining conventional promo-

tional techniques (such as coupons and product placement) with detailed individual

disclosures of purchasing choices on social media. There are diverse sources from

which to garner intelligence about individual purchasing behaviours and trends,

ranging from dedicated websites like Blippy.com (which openly publishes all its

members’ credit card transactions along with their product reviews), to spontaneous

postings on personal blogs. However, whilst there is significant investment by

venture capitalists and media companies in social commerce start-ups, mainstream

businesses have yet to develop the competencies and strategies to exploit them

(Econsultancy 2010).

2.5.2 Tensions

Business models predicated on monetising collective intelligence or exploiting the

social, relational or intellectual capital embodied by social networks are subject

to an inherent tension unless the terms of engagement are clearly defined up-front.

The tension is connected with three inter-related issues:

Intellectual property rights: There is likely to be a resistance (social and/or legal)to the exploitation of common intellectual property and information for private

profit. A recent example of this is AOL’s $315M acquisition of the Huffington Post,

a news and content aggregation site: there was sensational media coverage of

Huffington Post’s contributors’ fears “that the site’s distinctiveness would be

blunted by its new corporate parent” and their sense of betrayal (Economist

2011). A landmark legal case was the class action lawsuit challenging Facebook’s

Beacon utility which exposed members’ purchasing history to their contacts: this

resulted in Facebook closing down Beacon and agreeing to a settlement fund of

$9.5 M. In the case of crowd sourcing, corporates are concerned about the escala-

tion of associated costs: the Economist reports that some firms are realising that

crowdsourcing can be more expensive than doing things themselves, with the most

significant costs being associated with checking the provenance of contributions

and whether or not they infringe copyright.

Trust: Pecuniary rewards for reviewing and recommending products and

services may undermine trust in the impartiality of the recommenders and diminish

the credibility and influence of recommendation. Similarly reported abuse or

unauthorised exposure of personal data (as in the case of Facebook and Beacon)

is likely to incur a loss of trust and significant reputational damage.

Ethics: The use to which user-contributed content is put may give rise to ethical

issues. For example in December 2009, Facebook was challenged for changing the

default settings on its privacy controls so that individuals’ personal information

would be shared with “everyone” rather than selected friends. Google also attracted


legal scrutiny when it was found to be capturing Wi-Fi data without permission: bits

of sensitive private data from 30 countries were collected and stored for years,

without the Street View leaders’ knowledge (Economist 2010b), and in April 2010,

ten privacy and data-protection commissioners from countries including Canada,

Germany and Britain demanded changes in Google Buzz, the social-networking

service, which was dipping into users’ Gmail accounts to find “followers” for them

without clearly explaining what it was doing.

The examples cited here illustrate the absence of precedent for many of the legal

and ethical issues that arise as firms start to exploit Web 2.0 capabilities in powerful

ways. Whilst Google and Facebook responded to criticisms by modifying their

practice, at the industry level we can interpret these episodes as examples of testing

the boundaries, and expect to see more infringements and challenges in the coming

years. For firms wanting to develop social media strategies, these types of issues are

important to consider as making repairs for transgressions may be costly.

Whilst the mainstream focus has been on the deployment of social media by

corporates to improve their value propositions and competitive positioning, it is

important to note that the division between commerce and civil society is not so

clear-cut. Blogs, wikis tweets and social networks can be used to mobilise and

co-ordinate civil society to act in ways that have an economic or political impact,

e.g. by boycotting goods and services or launching denial of service attacks on

corporate websites, or mobilising civil action to bring about constitutional change

as evidenced by the events of February 2011 in Egypt. In the third sector,

social networks and social media have been used to generate new and sustainable

models of social enterprise, as evidenced by the success of peer-to-peer lending

organisations like Kiva and Zopa (http://www.kiva.org, http://www.zopa.com, last

accessed 21/01/11), who mediate social lending for micro-financing.

2.6 Time and Space Matters

The richness and reach of communications in the internet-enabled world combined

with the proliferation of social networking sites, blogs and tweets exposes well-

connected individuals to multiple, possibly simultaneous “feeds” of information on

a variety of topics from a diversity of sources. Businesses wanting to use social

media for promoting value propositions are therefore confronted with the challenge

of getting their potential clients to attend preferentially to their promotional

messages.

2.6.1 Context Sensitive Content

One potential solution to the problem of capturing attention lies in exploiting social

networks to send targeted messages as discussed in the last section: the extent to


which the recipient values the opinion of the sender may serve to prioritise the

speed and seriousness with which the message is attended to. However, to do this

effectively the advertiser needs to have detailed knowledge not only of who is

connected to whom, but also of the nature of the relationships embodied in those

connections. Whilst there are companies specialising in mining social network data

to extract or infer this type of knowledge, it is far from being a perfected practice.

Another, more accessible, approach is to provide contextually-relevant informa-

tion – at the right time and the right place for the message to be particularly relevant

and the product or service to be particularly useful or desirable. The ubiquity of

mobile telephones6 makes them the device of choice for delivering context-specific

mobile solutions. Smart phones are increasingly equipped with accelerometers

(sensing the motion of the telephone), Global Positioning System (GPS) receivers,

Internet connectivity and multi-media capabilities, providing the means to collect

and disseminate context-specific information in real-time using a mix of sources

and modes of communication (e.g. voice, video, text, and graphics) to cater for

a diversity of contexts and devices. Gartner (Valdes et al. 2010) suggest that as the

mobile sector becomes more fragmented with the appearance of many competing

platforms (from Apple, Google, Microsoft, Nokia, Research in Motion and HP/

Palm), developers are turning to cross-platform Web applications that will use

HTML5 and mobile-enabled browser engines that support GPS, tilt, proximity

and other sensors.

Context-aware service delivery opens up the possibility of enhancing the user

experience by connecting the cyber-experience with the web of places and things.

Combining this access with access to the social web would deliver the benefits of

social endorsement complemented by the immediacy of the communication: both

in terms of real-time delivery and in terms of relevance to an individual’s current

needs at a specific time and place.

2.6.2 Dynamic Data Management

The focus on real-time data about behaviours distinguishes Web 2.0 strategies from

those of the earlier e-business era. With regard to enterprise intelligence, current

analysis suggests that effective exploitation of Web 2.0 capabilities entails both

access to very large volumes of data (about user behaviours, personal networks,

spatial orientation and geographic location) and the capacity to mine this data for

meaningful patterns.

Access to the requisite data is likely to be via a number of different channels,

ranging from individual personal blogs and their network of followers to

6 “The 23/04/09 issue of Nature (Kwok 2009) reported that the GSM Association, a mobile

communications industry trade group, announced in February that the number of mobile-phone

connections worldwide had hit 4 billion and was expected to reach 6 billion by 2013.”


commercial information brokers. Promotional efforts are also likely to entail syn-

dication of the focal enterprise’s own content with content from social networks

and that from providers of other complementary products and services. Thus

an important functionality of the enterprise web will be to syndicate and push

content and logic to other sites, and to aggregate in-coming data from other sites,

deploying cloud/web platforms, composite applications, and mashup approaches

and technologies.

Providing seamless delivery of context-aware applications requires agile and

scalable enterprise architectures and entails integrating diverse technologies and

defining standards for data formats, semantics and application interfaces. The key

challenges are associated with getting integrated platforms and architectures at the

back end to provide seamless delivery of products and services, eliciting an

enhanced user experience at the front end. This demands both competence in real

time data analysis and process management (process synchronisation; harmoni-

sation of interaction channels, real-time analysis of evolving interaction patterns,

etc.). Gartner (Valdes et al. 2010) identify context brokers, state monitors, sensors,

analytic engines and cloud-based transaction processing engines as the requisite

relevant technologies, along with standards for defining data formats, metadata

schemas, interaction and discovery protocols, programming interfaces, and other

formalities to enable context-aware applications to be widely adopted. They also

suggest that the pull for context-aware data architectures will come from packaged-

application and software vendors expanding to integrate communication and

collaboration capabilities, unified communications vendors and mobile device

manufacturers, Web megavendors (e.g. Google), social-networking vendors (e.g.

Facebook), and service providers that expand their roles to become providers

and processors of context.

2.7 Conclusions

The purpose of this chapter was to illustrate the step change in the scope and scale

of the “intelligence” that enterprises needing to exploit Web 2.0 capabilities will

have to engage with. In conclusion, it is clear that there are a number of dimensions

that strategies for Web 2.0 exploitation should address, and these are summarised

below.

Strategies for positioning within the competitive landscape need to address

both the structure and the dynamics of the network economy: the old paradigm

of competition within well-defined industry structures is being displaced by the

imperative to survive by competing and collaborating in an ecology populated by

a diversity of players. This carries with it opportunities for the exploitation of

network effects and the exploitation of long tail distributions to develop new niches,

but it also exposes firms to competition on a global playing field. Understanding

the structure and dynamics of the ecology is therefore an essential feature of the

intelligent enterprise.


The provision of enterprise intelligence in this complex networked context is

likely to entail integration of data and applications from diverse sources across

organisational boundaries, and this carries with it challenges for the development of

standards, interfaces and agile, scalable architectures. However even more chal-

lenging is the step change in the diversity, volume and granularity of data available

for collection, organisation, analysis and exploitation. This is due to both the

increased scope and scale of web enabled transactions and diversity of multimedia

devices that interact with the web. Whilst organisations like Google are pioneering

the development of powerful datamining algorithms, formidable issues of semantic

organisation, analysis and manipulation remain. Web 2.0 business models are likely

to entail collaborations and partnerships between providers and users of data and

services: Google, Facebook and Amazon all have extensive webs of collaboration

with technology and content providers as well as with businesses that want to use

their services.

One of the early defining characteristics of the Web 2.0 era was the proliferation

of user-provided content. The rapid co-evolution of social media and social net-

working sites like Facebook has focused corporate attention on creating business

models that leverage social intelligence by harnessing the intellectual, social and

relational capital that is embodied in social networks. A number of legal, ethical

and philosophical issues have emerged as firms experiment with exploiting user-

generated content and crowd-sourcing. It is strategically important to establish clear

governance frameworks for such engagement particularly with regard to intellec-

tual property rights and the management of personal information, as transgressions

can incur very significant financial and reputational damage.

Finally, the combination of multimedia capabilities, sense and location data,

highly granular data about user behaviours, context-aware applications, and large

scale data analysis combined with the social, relational and intellectual capital

of individuals and collectives presents the opportunity and challenge of defining

business models that can really exploit the synergies between cyberspace and place

that are inherent in the “web of places and things”.

References

Anderson C (2006) The long tail. Hyperion, New York

Arthur B (1990) Positive feedbacks in the economy. Sci Am 262(2):80

Axelrod R, Cohen M (1999) Harnessing complexity: organizational implications of a scientific

frontier. Free Press, New York

Castells M (1996) The rise of the network society. Blackwell, Oxford

Chesbrough HW, Vanhaverbeke W (2006) Open innovation. Oxford University Press, Oxford

Economist (2010a) Clicking for gold. Economist, 2/27/2010, 394(8671), Special section p9–11

Economist (2010b) Dicing with data. Economist, 5/22/2010, 395(8683), p16

Economist (2011) Content couple. Economist, 2/12/2011, 398(8720), p71

Econsutancy (2010) http://econsultancy.com/uk/reports/social-media-and-online-pr-report Accessed

on 21 Feb 2011


Evans P, Wurster T (2000) Blown to bits: how the new economics of information transforms

strategy. Harvard Business School Press, Cambridge, MA

Howe J (2006) The rise of crowdsourcing. Wired 14(6). http://www.wired.com/wired/archive/

14.06/crowds.html. Accessed on 21 Feb 2011, p1–4

Kwok R (2009) Phoning in data. Nature 458(23):959–961

Merali Y (2004) Complexity and information systems. In: Mingers J, Willcocks L (eds) Social

theory and philosophy of information systems. Wiley, Chichester, pp 407–446

Merali, Y (2011) Beyond problem solving: Realising organisational intelligence in dynamic

contexts, OR Insight advance online publication 13 July 2011

O’Reilly T (2007) What is web 2.0: design patterns and business models for the next generation of

software. MPRA paper no. 4578. http://mpra.ub.uni-muenchen.de/4578/ posted 07. November

2007 04:01. Accessed on 21 Feb 2011

Rowen D, Cheshire T (2011) Commerce gets social. Wired, 84–91

Shapiro C, Varian H (1999) Information rules: a strategic guide to the network economy. Harvard

Business School Press, Cambridge, MA

Valdes R, Phifer G, Murphy J, Knipp E, Smith DM, Cearley DW (2010) Hype cycle for web

and user interaction technologies, 2010. Gartner Research Report Number 00201568.

Gartner, Stanford


Part II

ACTIVE Technologies and Methodologies

3

Enterprise Knowledge Structures

Basil Ell, Elena Simperl, Stephan W€olger, Benedikt K€ampgen,Simon Hangl, Denny Vrandecic, and Katharina Siorpaes

3.1 Introduction

One of the major aims of knowledge management has always been to facilitate the

sharing and reuse of knowledge. Over the years a long list of technologies and

tools pursuing this aim have been proposed, using different types of conceptual

structures to capture the knowledge that individuals and groups communicate and

exchange. This chapter is concerned with these knowledge structures and their

development, maintenance and use within corporate environments. Enterprise

knowledge management as we know it today often follows a predominantly

community-driven approach to meet its organizational and technical challenges.

It builds upon the power of mass collaboration and social software combined with

intelligent machine-driven information management technology delivered though

formal semantics. The knowledge structures underlying contemporary enterprise

knowledge management platforms are diverse, from database tables deployed

company-wide to files in proprietary formats used by scripts, from loosely defined

folksonomies describing content through tags to highly formalized ontologies

through which new enterprise knowledge can be automatically derived. Lever-

aging such structures requires a knowledge management environment which not

only exposes them in an integrated fashion, but also allows knowledge workers

to adjust and customize them according to their specific needs. We discuss how

the Semantic MediaWiki provides such an environment - not only as an easy-to-

use, highly versatile communication and collaboration medium, but also as an

B. Ell (*) • E. Simperl • B. K€ampgen • D. VrandecicKarlsruhe Institute of Technology, KIT-Campus S€ud, D-76128 Karlsruhe, Germany

e-mail: [email protected]; [email protected]; [email protected];

[email protected]

S. W€olger • S. Hangl • K. SiorpaesSTI Innsbruck, University of Innsbruck, Technikerstraße 21a, 6020 Innsbruck, Austria

e-mail: [email protected]; [email protected]; [email protected]


29

integration and knowledge engineering tool targeting the full range of enterprise

knowledge structures currently used.

This chapter is split into two parts. In the first part we undertake a comparative

analysis of the different types of knowledge structures used by knowledge workers

and enterprise IT systems for knowledge sharing and reuse purposes. In the second

part we devise a comprehensive approach to develop, manage and use such

structures in a collaborative manner. We present an ontology editor bringing

together Web 2.0-inspired paradigms and functionality such as Flickr (http://

www.flickr.com) and wikis to support laymen in organizing their knowledge as

lightweight ontologies. Integration with related knowledge resources is exemplified

through a series of methods by which arbitrary folksonomies, but also highly

popular knowledge bases such as Wikipedia (http://www.wikipedia.org/) and

Freebase (http://www.freebase.com/) are made accessible in ontological form.

To further optimize the usability of knowledge structures – an issue which becomes

particularly important in a non-expert-driven knowledge engineering scenario

integrating various resources – we design techniques to check for common fallacies

and modeling errors, which offer a solid baseline for cleansing the underlying

knowledge base. The implementation is based on Semantic MediaWiki (http://

semantic-mediawiki.org/), and has been deployed in the three case studies of the

ACTIVE project which are introduced in Chaps. 9–11.

3.2 Enterprise Knowledge Structures and How Are They Used

The question of how to optimally capture and leverage enterprise knowledge has

engaged the knowledge management community since its inception. As already

discussed in the introductory section of this chapter, the prominence of this topic is

reflected in the different types of conceptual structures which we can find behind

the scenes of enterprise knowledge management platforms, a diversity which is

multiplied by the wide spectrum of methodologies, methods and techniques pro-

posed for their development, maintenance and use. In the present day, enterprise

knowledge management essentially follows a community-driven approach,

implementing solutions for crowdsourcing and social networking in order to opti-

mize communication and collaboration – within the company and its ecosystem of

business partners and end-customers – and knowledge sharing and reuse. In addi-

tion, formal semantics provide intelligent information management technology for

capturing, accessing, managing and integrating knowledge. The approach is based

on ontologies, knowledge structures whose (community-agreed) meaning is expec-

ted to be exploitable by machines, in particular via reasoning facilities by which

implicit knowledge is derived and inconsistencies are detected.

In the following we illustrate how enterprise knowledge structures can be used,

and the various trade-offs which are associated with the different types of struc-

tures, in terms of three motivating scenarios taken from the case studies.

30 B. Ell et al.

3.2.1 Knowledge Management at an International ConsultingCompany

The first scenario is set in a large, knowledge-intensive enterprise – a consulting

company – where employees collaborate around the globe on various topics to

provide services to clients with best efficiency.

Most enterprise knowledge management systems are set up for the ‘prototypical’

user with no specific task in mind. Especially in a large enterprise context,

employees need information for various different tasks. For instance, they may

want to find information on previous projects, get an overview of a specific tech-

nology, or they may be interested in learning about a particular group within the

company. These tasks are particularly relevant in the context of proposal develop-

ment, by which a company creates a description of the products and services it is

offering at an estimated cost to a potential customer.

Proposal writing follows standardized processes and procedures – giving

instructions about the tasks to be undertaken, the information to be gathered, the

documents to be created, etc. Equally important are less formalized practices –

calling contacts that may have information on similar projects, or searching for

similar proposals in the intranet. Often, information about previous projects cannot

simply be obtained from a central data repository. This is due to the fact that many

documents created within the context of a client project are client-proprietary, and

may not be shared within the entire company. There are also many technical

challenges related to the decentralized and heterogeneus nature of the enterprise

IT landscape, and to the limitations of keyword-based information management

technologies. Especially in an enterprise scenario, and in the context of a specific

task, it will often be useful to retrieve actual facts rather than the documents

that mention them. Such facts refer to entities, for example, to experts, locations,

clients, other companies, and relationships among them. Which facts should be

retrieved naturally depends on the task at hand: for instance, in the case of proposal

development, one might want to find clients for which the company has submitted

similar bids.

Enterprise knowledge structures are the backbone of such sophisticated infor-

mation retrieval facilities. They capture enterprise domain knowledge at various

levels of expressivity and formality. When choosing the most appropriate among

these levels, it is important to weigh the advantages and disadvantages of heavy-

weight ontology-based approaches, supporting reasoning and full-fledged semantic

search, vs. the additional costs associated with the maintenance and usage of the

knowledge structure, which should be integrated into the daily workflow and allow

user participation at large. Enterprise document repositories support bookmarking

and tagging as means to describe the content of documents. The resulting con-

ceptual structures contain knowledge which could prove extremely useful to create

rich, formal ontologies to implement more purposeful information retrieval

solutions.

3 Enterprise Knowledge Structures 31

3.2.2 Knowledge Sharing at a Large Telecom Operator

A similar scenario has been identified at a large telecom operator.

Operating in multiple projects is a reality of modern businesses. As part of their

daily work knowledge workers interact with various systems, information sources and

people. Their work is highly dependent on contextual dimensions as diverse as the

customer, the status of the sales opportunity, current project issues, and the suppliers

involved. To improve productivity, frequently used information such as contact and

customer data and product documentation should be easily available; the knowledge

worker should not have to search around for these things as they change from one

working context to another. Furthermore, as the user resumes an earlier task, her

working context should be restored without problem to the state it was before.

There is an abundance of information held within the company’s repositories,

much of which may not be easily accessible to technical consultants, solution

consultants, and sales specialists. In addition there is a wealth of tacit knowledge

which may not be being captured to best effect. The key problem here is that

knowledge workers may not be aware of earlier solutions; it is possible that

comparable solutions to similar problems are being worked on in isolation rather

than in co-operation, or even that a particular problem has already been solved.

A better awareness of the solutions to specific business problems and the business

domains in which those solutions were applied should enable common patterns of

solutions to be identified.

To support agile knowledge working, several knowledge management features

are required: information such as contacts, relevant (technical) documentation,

emails, and customer-specific information must be captured and easy retrieval

must be enabled. Moreover, the context of a knowledge worker has to be captured.

This involves modeling of general enterprise knowledge as well as appropriate

knowledge representation formalisms, suitable from an information-management

point of view, but also tangible for knowledge workers.

3.2.3 Process Optimization at a Digital Chip Design Company

The order of the design activities during chip design is hard to determine before

process start. Usually a designer or a team decides on the best possible continuation

of the activity flow in an ad-hoc manner during the process. Problems can occur in

the case of goal changes, requirement changes, environment changes, etc.

It is important to collect data about the actual execution and sequences of design

activities in several concurrent design project flows. Ideally, this should be sup-

ported by a knowledge management application that assists in eliciting the knowl-

edge about how a sequence of design and verification activities is related to

a particular type of a designed artifact, the configurations of used design tools,

and the capabilities of design teams.

32 B. Ell et al.

The company uses a modeling framework and an upper-level ontology for repre-

senting dynamic engineering design processes and design systems as process

environments. The modeling approach is based on the understanding that an

engineering design process can be conceived as a process of knowledge transfor-

mation which passes through several states. Each state is the state of affairs in

which a particular representation of a design artifact or several representations are

added after being elaborated by a design activity leading to this state. Evidently,

the overall goal of a design process is to reach the (target) state of affairs in which

all the representations are elaborated with enough quality for meeting the require-

ments. The continuation of the process is decided by choosing an activity from the

set of admissible alternatives for that state. Engineering design processes are

situated in and factually executed by the design system comprising designers,

resources, tools, and normative regulations.

The ontology used in the chip design company is a core component of

all processes. The ontology constantly evolves and its evolution needs to be

supported by collaborative ontology engineering tools. The objective is to ensure

that the enterprise knowledge structures and the proprietary ontology suite

are aligned.

3.2.4 Trade-off Analysis

Enterprise knowledge structures vary with respect to a number of aspects, ranging

from expressivity to size, granularity and modeling paradigm followed. These

aspects influence not only the utility of (a category of) knowledge structures in

a particular scenario, but has also direct consequences on the ways in which

a knowledge structure is developed, maintained and used. This section aims to

conduct a baseline analysis of the trade-offs implied by these aspects and to

introduce methods which can be used to perform such an analysis in a systematic

manner.

Particular attention is paid to the use cases discussed above. Considering the

scenario within the large consulting company, enterprise knowledge structures can

be used to allow for the implementation of intelligent knowledge organization and

retrieval techniques. Questions related to the most adequate type of knowledge

structure, its tangible benefits, and the associated development and maintenance

costs are crucial to demonstrate the added value of the technology in this scenario.

The maintainability of knowledge structures is, besides reuse, an essential aspect of

the second scenario we investigate in the project. Here, the additional problem to be

looked into is the extent to which reusability of existing knowledge structures is

economically feasible. The chip design scenario leverages ontologies as means to

capture domain knowledge and enable communication between designers. Cost-

benefit-motivated quantitative and qualitative means are expected to optimize the

ongoing ontology engineering process (see Chap. 4).


Trade-offs are specified along a number of dimensions used in the literature to

classify and describe knowledge structures:

1. Formality: (Uschold and Grueninger 1996) distinguish among four levels of

formality:

– Highly informal: the domain of interest is modeled in a loose form in natural

language.

– Semi-informal: the meaning of the modeled entities is less ambiguous by the

usage of a restricted language.

– Semi-formal: the knowledge structure is implemented in a formal language.

– Rigorously formal: the meaning of the representation language is defined in

detail, with theorems and proofs for soundness or completeness.

(McGuiness 2003) defines a ‘semantic spectrum’ specifying a total order

between common types of models. This basically divides ontologies or ontology-

like structures in informal and formal as follows (Fig. 3.1):

– Informal models are ordered in ascending order of their formality degree as

controlled vocabularies, glossaries, thesauri and informal taxonomies. In this

category we can also include folksonomies, sets of terms which are the result of

collaborative tagging processes.

– Formal models are ordered in the same manner: starting with formal taxono-mies, which precisely define the meaning of the specialization/generalization

relationship, more formal models are derived by incrementally adding formalinstances, properties/frames, value restrictions, disjointness, formal meronymy,general logical constraints etc.

In the first category we usually encounter thesauri such as WordNet (Fellbaum

1998), taxonomies such as the Open Directory (http://www.dmoz.org) and the ACM

classification (http://www.acm.org/class/1998/) or various eCommerce standards

(Fensel 2001). Most of the available Semantic Web ontologies can be localized at

the lower end of the formal continuum (i.e. as formal taxonomies), a category whichoverlaps with the semi-formal level in the previous categorizations. However, the

usage of SemanticWeb representation languages does not guarantee a certain degree

of formality: while an increasing number of applications are currently deciding to

Fig. 3.1 Semantic spectrum

(based on McGuiness [2003])

34 B. Ell et al.

formalize domain or application-specific knowledge using languages such as RDFS

or OWL, the resulting ontologies do not necessarily commit to the formal semantics

of these languages. By contrast, Cyc (Lenat 1995) or DOLCE (Gangemi et al. 2002)

are definitively representative for the so-called heavyweight ontologies category,

which corresponds to the upper end of the continuum.

In (Vrandecic 2009b) we offer a complete formalization of all the above types of

knowledge structures, and thus also how OWL2 (Grau et al. 2008) can be used to

represent each of the other types besides ontologies. This allows us to classify

knowledge structures automatically, and to check if they indeed meet the criteria of

a specific type of knowledge structure. What is important here is that we can use any

of the given structures without restrictions and nevertheless guarantee the integra-

tion of all these knowledge structures.

2. Shareability: due to the difficulties encountered in achieving a consensual

conceptualization of a domain of interest, most of the ontologies available today

reflect the view of a restricted group of people or of single organizations.

Standard classifications such as the Open Directory (http://www.dmoz.org),

classifications of job descriptors, products, services or industry sectors have been

developed by renowned organizations in the corresponding fields. Due to this fact,

these knowledge structures are being expected to be shared across a wide range of

applications. However, many of them have been developed in isolated settings

without an explicit focus on being shared across communities or software

platforms. Given this state of the art we distinguish among four levels of (expected)

shareability:

– Personal ontologies: the result of an individual development effort, reflecting

the view of the author(s) upon the modeled domain. Personal Semantic Web

ontologies are published online and might be accessed by interested parties, but

their impact is limited, as there is no explicit support for them being reused in

other application contexts. Depending on the complexity of the ontology, they

still might achieve a broad acceptance among a large user community.

– Application ontologies: developed in the context of a specific project for pre-

defined purposes and are assumed to reflect the view of the project team

(including the community of users) upon the modeled domain. Whilst under

circumstances made public on the Web, they are de facto intended to be used

within the original, project-related user community. Their acceptance beyond

these boundaries depends on the impact of the authoring authority in the specific

area, but also on the general reusability of the ontologies. Many of the domain

ontologies available so far can be included in this category.

– Openly developed ontologies: developed by a large, open community of users,

who are free to contribute to the content of the ontology. The ontology, as a

result of continuous refinements and extensions, emerges to a commonly agreed,

widely accepted representation of the domain of interest. The evolution of the

Open Directory classification is a good example for collaborative, Web-based

ontology development: the core structure of the topic classification, originally

proposed by Yahoo! (http://dir.yahoo.com/) and used in slightly modified form


by various search engines, was extended by users, who also played an crucial

role in the instantiation of the ontology with Web documents. Another promi-

nent example is the Gene Ontology (Gene Ontology Consortium 2000).

– Standardontologies: developed for standardization purposes by key organizationsin the field, usually being the result of an extended agreement process in order to

satisfy a broad range of requirements arisen from various user communities. The

majority of standard ontology-like structures currently available are situated in the

area of eCommerce: The United Nations Standard Products and Services Codes

UNSPSC (http://www.unspsc.org), the RosettaNet classification (http://www.

rosettanet.org) or the North American Industry Classification System NAICS

(http://www.census.gov/epcd/www/naics.html). Another example is the FOAF

ontology (http://xmlns.com/foaf/0.1/). The simple ontology describing common

inter-human relationships enjoys significant visibility, not only as a result of the

standardization efforts of the FOAF development team.

3. Domain and scope: according to (Guarino 1998) ontologies can be classified

into four categories:

– Upper-level/top-level ontologies: they describe general-purpose concepts and

their properties. Examples are the Top-Elements Classification by Sowa (Sowa

1995), the Suggested Upper Level Merged Ontology SUMO (Pease et al. 2002)

or the Descriptive Ontology for Linguistic and Cognitive Engineering DOLCE

(Gangemi et al. 2002).

– Domain ontologies: they are used to model specific domains in medicine or

academia. A typical example in this area is the Gene Ontology developed by the

Gene Ontology Consortium (2000).

– Task ontologies: they describe general or domain-specific activities.

– Application ontologies: they are extensions of domain ontologies having regard

to particular application-related task ontologies and application requirements.

A last category of ontologies, which was not covered by the classifications

mentioned so far, are the so-calledmeta-ontologies or (knowledge) representationontologies. They describe the primitives which are used to formalize knowledge in

conformity with a specific representation paradigm. Well-known in this category

are the Frame Ontology (Gruber 1993) or the representation ontologies of the W3C

Semantic Web languages RDFS and OWL (http://www.w3.org/2000/01/

rdf-schema, http://www.w3.org/2002/07/owl).

When describing the scope of an ontology, the types of knowledge that should be

available to the engineering team to build the domain conceptualization are highly

relevant. In principle, one can distinguish between ontologies capturing common

and expert knowledge, and based on this distinction determine the composition of

the team engineering a particular ontology.

4. Representation language: a wide range of knowledge structures emerged in

a pre-Semantic Web era. In order to overcome this syntactic and semantic barrier

a plethora of approaches investigate the compatibility between different forma-

lisms, while the aforementioned representation ontologies are intended to capture

36 B. Ell et al.

these differences explicitly. The most popular representation paradigms regarding

ontologies are Frames, Description Logics and UML-MOF.1

On the Semantic Web, the classic trade-offs regarding expressivity have been

decidability and complexity. For a language to be decidable, a reasoner can be

implemented such that all questions that can be asked against a knowledge base

that are expressed using that language have an answer. Decidability as a property of

languages is highly desirable: it guarantees that all questions that can be asked can be

answered, and that the associated reasoning algorithms are effectively imple-

mentable. Research in Description Logics explores the borders of decidability.

Besides decidability, which guarantees the effective implementation of reason-

ing algorithms, we further need to regard the complexity of the algorithms that

can answer questions against the knowledge structures. In general it can be said

that the more expressive a language is, the higher the complexity. Since neither

expressivity nor complexity are continuous spectra, it can happen that we can

increase expressivity but retain the same complexity.

In the context of the scenarios introduced earlier, OWL DL fulfills the require-

ment with regards to decidability, but both decidable OWL languages (OWL DL

and OWL Lite) have an exponential (or worse) complexity (Horrocks and Patel-

Schneider 2004), which makes it possibly unsuitable for our use cases – since we

have to expect to deal with a high number of instances. Languages that allow the

use of algorithms that can be implemented with a tractable complexity are consid-

ered more suitable in cases where we can expect such a high number of instances as

in enterprise settings. OWL2 introduces language profiles (Motik et al. 2008),

which are well-defined subsets of the OWL2 constructs. These profiles have

specific properties that are also guaranteed for all models adhering to these profiles.

Other aspects not mentioned in this classification, but relevant when describing

an ontology, or every other knowledge structure, are covered by so-called meta-

ontologies and metadata schemes thereof. Metadata schemes such as OMV

(Hartmann et al. 2005) cover general information about ontologies, such as the

size in terms of specific types of ontological primitives, the domain described, the

usage scenarios, the support software and techniques, and so on. Many of these

aspects are interrelated and can be traded against each other, as it will be elaborated

later in this section. Potential developers and users of ontologies should be made

aware of the trade-offs associated to engineering and using a particular type of

knowledge structure. More precisely, these tasks require specific expertise, soft-

ware and infrastructure, as well as the compliance with processes and methodo-

logies, all under circumstances related to considerable costs.

These trade-offs are summarized in Table 3.1.

The considerations presented in Table 3.1 can be used as general guidelines to be

taken into account and applied in the process of engineering an ontology. Their

operationalization has to rely on methods which allow a quantification of costs and

1 http://www.omg.org/technology/documents/modeling_spec_catalog.htm


benefits involved and their analysis (see Chapter ‘Using Cost-Benefit Information

in Ontology Engineering Projects’).

3.3 How are Enterprise Knowledge Structures Being Built

In this section, we first give a short overview of the wiki technology Semantic

MediaWiki (SMW) (Kr€otzsch et al. 2007) as a flexible tool for dealing with

enterprise knowledge structures. Then we describe in more detail selected aspects

of knowledge structure editing, leveraging, and repair.

Social software as a tool for knowledge sharing and collaboration is gaining

more and more relevance in the enterprise world (Drakos et al. 2009). This is

especially true for so-called enterprise wikis, that, just as wikis in the public Web,

Table 3.1 General trade-offs

Formality A formal ontology is useful in areas which require sophisticated processing

of background knowledge and automatic inferencing. This assumes the

availability of mature tooling for these tasks. In addition, the more formal

an ontology should be the higher the level of expertise and the costs of the

ontology development processes. Finally, heavyweight ontologies can

not be acquired automatically, as properties and axioms can not be

feasibly learned from unstructured knowledge structures using the

present software.

Shareability The main advantage of a shared ontology is its capability to enable

interoperability at the data and interoperability levels. Developing

a commonly agreed ontology implies, however, additional overhead

in terms of the development process to be followed, including

methodological support and software to support the discussions and

consensus reaching task. In addition, a shared ontology will not be able to

optimally match very specific needs of many usage scenarios in which it

is involved. Thus additional overhead to understand and adapt is required.

Domain and scope First there is the aforementioned trade-off between the scope and the

reusability of particular categories of ontologies. In addition, higher-level

ontologies tend to be more costly, as they require specific expertise. The

same applies for ontologies dealing with expert knowledge, such as those

in areas of chip design. The size of a knowledge artifact (expressed, let’s

say, in the number of concepts, properties, axioms and fixed instances) is

an important factor to be aware of, not only because of the direct

relationship to the development and maintenance costs, but also because

of the difficulties associated with the processing of large artifacts by

reasoners and alike. There is a trade-off between the domain coverage of

an ontology and the additional effort required to build, revise and use it.

Representation

language

Besides to the link to the formality dimension, the choice of a representation

language has consequences with respect to the ways an ontology can be

used in knowledge inferencing tasks and the extent to which particular

aspects of the knowledge domain can or can not be captured by the

ontology. In addition, formal, logics-based representation languages

require specific expertise within the ontology development and

maintenance team.

38 B. Ell et al.

provide their advantages of low usage-barriers and direct benefits within a company

intranet. However, the simple provision of a Wikipedia-like internal page does not

guarantee acceptance by employees; such wiki software needs to be customized

to the specificities of the corporate context.

SMW provides this customization by combining the complementary

technologies of Web 2.0 and the Semantic Web (Ankolekar et al. 2007). It enhances

the popular open-source wiki software MediaWiki (http://www.mediawiki.org/

wiki/MediaWiki) with semantic capabilities. In addition its functionality can be

enriched with general-purpose extensions developed by the community2 as well as

custom extensions tailored to the needs of specific enterprise scenarios. The usage

of the Semantic Web standards RDF, RDFS and OWL, and of ontologies enables

the realization of comprehensive knowledge-management solutions, which provide

integrated means to formally describe the meaning and organization of the content

and to retrieve, present and navigate information.

In the following, we describe how enterprise knowledge structures are collabo-

ratively built, enriched, and exploited using SMW.

Creating Structured Information Information stored in SMW can be converted

into machine-readable RDF. In other words, it is possible to have property-value

pairs explicitly assigned to wiki pages. Such a property-value pair can indicate

a named link (a so-called object property) to another page, e.g., ‘locatedInCountry’,or a typed attribute (a so-called datatype property), e.g., String ‘hasTag’, Date

‘hasFoundingDate’, and Number ‘hasHeight’. Such properties can be freely

inserted into a page via wiki syntax or forms. Enterprise knowledge structures

can be defined through categories (so-called classes) of pages with certain

properties and encoded as an ontology in RDFS and OWL. The resulting ontology

can be automatically or manually applied to the wiki (Vrandecic and Kr€otzsch2006), for instance, in the form of categories, pages, wiki templates, forms and

properties.

Retrieving Information The availability of machine-processable information

facilitates the realization of concept-based search, presentation and navigation

features going beyond traditional keyword-based approaches. The user can issue

structured queries, addressing certain properties of a page, e.g., the customer of a

proposal. All pages belonging to a category having certain properties can be listed

as an overview, including links to those pages, e.g., all products within a specific

price range. Various result formats can be used, starting from simple tables to more

advanced calendars, time lines, and maps. Through facetted search one can incre-

mentally filter lists of pages via keywords and property-ranges. More complex, but

still user-friendly, querying following similar patterns as the standard Semantic

Web querying language SPARQL (http://www.w3.org/TR/rdf-sparql-query/) is

supported as well. When the user enters a keyword, the system looks for

connections between pages described with the keywords and lists those pages

2Openly available at http://www.mediawiki.org/wiki/Extension_Matrix (MediaWiki) and http://

semantic-mediawiki.org/wiki/Help:Extensions (SMW)


(Haase et al. 2009) In SMW, these queries are possible through forms on special

pages, but can also be embedded as so-called inline queries in single pages.

Integrating External Information External sources can be integrated and their

content merged with existing enterprise knowledge structures. The results can be

organized as new pages or properties, referenced from other pages, and visualized

in new ways.

Enterprise knowledge is rarely represented in RDF, but there are many tools

available that deal with such transformations from established formats and

standards, most notably tabular ones. The same applies to online knowledge sources

such as Freebase (http://www.freebase.com), other SMW installations or the

Linked Open Data cloud, for which a growing number of Web services delivering

RDF are available (http://www.linkedopenservices.org). Orthogonal to the transla-

tion to RDF is the question of how to map specific elements of the source

knowledge structure into the wiki model. Simply creating a page for each element

within an external source and copying the data into the wiki may prove suboptimal

for subsequent data usage. In Sect. 3.3.2 we provide additional details on SMW’s

integration features.

Improving Information Quality SMW specifically targets scenarios where

knowledge is created in a decentralized manner – be that by exposing and

integrating external sources, or by supporting collaborative editing and interaction.

In such scenarios information quality can quickly become a problem. A prominent

dimension we discuss here is consistency, both with respect to the primary sources

and with respect to the domain at hand. For the former, SMW adheres to a regime in

which users may only refer to, and comment upon the primary sources from within

the wiki, while changes may only be undertaken at the level of these sources. For

the latter, one can use an inference service operating on the wiki knowledge base.

Deduction methods on the enterprise knowledge structures can provide insights

about the wrong usage of categories, pages, and properties (Vrandecic 2009a). Most

such errors cannot be automatically repaired, but at least, made visible to the users

or administrators. For example, if the imported data contains information about a

proposal with customer X and a wiki page exists about X, which is not a member of

the customer category, adding that page to the category can be automatically

suggested to the administrator. In addition, visualizing information in a structured

way may lead to the identification of missing and incorrect information, which

applies to both genuine wiki content and content from external sources. Users may

not directly correct the latter, but they can rate it, and comment on it for revision.

In Sect. 3.3.2 we will discuss a number of simple measurements whose results

can indicate specific quality issues.

Interplay with other Enterprise Tools To maximize its added value for knowl-

edge workers, SMW should not be used in isolation from existing enterprise

systems and workflows. This is enabled by the information integration functionality

presented earlier, and by a number of additional features targeting application

integration. The content of a semantic wiki can be exported as RDF, as well as

many other structured data formats, e.g., JSON, vCard, and BibTeX. Results of

queries can be monitored for new pages and modified properties, and published as

40 B. Ell et al.

RSS feeds and send per e-mail. Using HTTP requests to the wiki, external

applications such as office productivity tools can access, add, and modify pages

and properties.

In the following sections we go into more details of how enterprise knowledge

structures can be edited, leveraged, and repaired.

3.3.1 Building Knowledge Structures Manually

The SMW OntologyEditor is an extension of Semantic MediaWiki for developing

and maintaining knowledge structures (so-called vocabularies). As such it inherits

many of the features and the mode of operation of Semantic MediaWiki. It targets

Semantic MediaWiki users, but it also provides a comfortable interface for people

less experienced in using wikis, in particular the wiki syntax. In this section we will

briefly introduce the functionality of this extension – a more detailed account is

given in (Simperl et al. 2010).

Main Page The main page is the entry point of the SMW OntologyEditor (see

Fig. 3.2). It contains the primary navigation structure and links to important pieces

of functionality, including the creation of new vocabularies consisting of categories

and properties, the integration of other knowledge structures, such as folksonomies

and external vocabularies encoded in RDFS and OWL, and knowledge repair (see

Sect. 3.3.2). In addition, the user is provided with a short introduction to the

tool, important links, as well as an overview of the content of the current wiki

installation, in terms of namespaces of the individual vocabularies and a tag cloud.

Vocabulary Creation To create a new vocabulary one can use the corresponding

link in the primary navigation menu, which leads to a form (see Fig. 3.3). There the

user can enter a vocabulary name and a description and add categories and

properties. Once the vocabulary is created the user is presented with a vocabulary

overview, which contains automatically added metadata such as Flickr images in

addition to the information manually provided by the user and a link to the CreateCategory Form (see Fig. 3.4).

The Create Category Form includes a short explanation and a number of input

fields. They are autocompletion-enabled, by which the user is presented with a list

of entities with a similar name to the one she is about to type-in (see Fig. 3.4). The

user can enter a name for the category, refer to an existing vocabulary, define sub-

and super-categories, and add new and existing properties to the category. Subse-

quently the system displays the category overview page illustrated in Fig. 3.5,

including a tag cloud to easily access category instances (so-called entities) as

well as the most interesting images on Flickr related to the category. Categories and

entities are visualized in a tree-like hierarchy, which can be altered by clicking on

the Edit links which open pop-ups for inline editing (see Fig. 3.6).

Knowledge Repair The knowledge repair algorithms can be accessed as so-

called SpecialPages in the wiki.


After clicking on Category Statistics the system provides the user with a com-

prehensive overview of all categories available in the system, and potential model-

ing issues (see Fig. 3.11). At the top there is a table with explanations followed by

a table with minimum, maximum and average values serving as basis for error

detection. An additional table displays the corresponding values for all categories.

Colors and symbols are used to direct the user focus to potential problems. If the

user is not interested in all categories, but rather in a specific one she can click on

the tab repair on the category page which will lead to an overview page of the

specific category displaying similar information (see Fig. 3.10).

The SpecialPage Categories in cycles, shown in Fig. 3.9 lists cycles in a categoryhierarchy. The user then has to decide whether the specific cycle will be accepted or

not. Redundancies in the hierarchy are displayed in the SpecialPage Categorieswith redundant subcategory relations (see Fig. 3.9). The user can decide which

link is indeed redundant and delete it on-the-fly. The SpecialPage Entities withsimilar names provides the user with information about entities with similar

names (Fig. 3.9). For each entity the system calculates the Levenshtein distance

(Leveshtein 1966) to other categories and displays the results. In Sect. 3.3.2

Fig. 3.2 Main page

42 B. Ell et al.

Fig. 3.3 Create vocabulary form and overview page

Fig. 3.4 Create category form


we introduce additional knowledge repair features, such as the Category Histo-gram, the Property Histogram, Categories with similar property sets and Unsub-categorized categories.

Versioning The versioning SpecialPage gives an overview of the history of

changes of vocabularies and categories. When a vocabulary or category is selected,

a pop-up with detailed versioning information is displayed. On the left-hand side

the user can choose between Vocabulary Structure Changes and CategoryChanges. Different versions are displayed (via AJAX) on the right-hand side of

the pop-up. A selected version can be restored by clicking the Restore SelectedVersion button, as depicted in Fig. 3.7.

Import and export One of the advantages of Semantic MediaWiki as a knowl-

edge management platform is its ability to provide integrated access to a multitude

of knowledge structures, most prominently folksonomies and ontologies. The

folkosonomy import relies on the technique described in Sect. 3.3.2 thus the

Fig. 3.5 Category overview

Fig. 3.6 Changing parameters via inline editing

44 B. Ell et al.

associated information – a collection of tagged resources such as bookmarks or

conventional documents – has to be organized in a specific XML format. Given

this, the folksonomy is enriched with additional structuring information and is

transformed in a lightweight ontology which can be explored and further revised

in the editor just as any other vocabulary. When importing an existing OWL

ontology – for instance, one that was developed in a different ontology engineering

environment – the system uploads the OWL file specified by the users, extracts all

ontological entities and creates corresponding wiki content following the

instructions defined in a so-called meta-model. This meta-model describes the

types of ontological primitives supported by the editor – in this case, as we are

dealing with lightweight knowledge structures, a subset of OWL consisting of

classes, instances and properties, in particular specialization-generalization – and

how they are mapped to SMW artifacts. Once this step is concluded, the resulting

vocabulary can be further processed in a collaborative fashion in our tool. Every

vocabulary can be locally stored as an OWL file using the export tab on the

vocabulary overview page.

3.3.2 Leveraging External Knowledge Sources

Enterprise knowledge structures come in various forms, from database tables,

standardized taxonomies and loosely defined folksonomies to strictly organized

knowledge bases. To optimally support knowledge management tasks in a corpo-

rate environment Semantic MediaWiki needs to provide mechanisms to access,

Fig. 3.7 Versioning


integrate and use all these different formats. This is important for its acceptance as

a knowledge management solution – as it builds upon established resources and

platforms – and for its efficient use – as reusing existing resources can reduce

costs and improve the quality of the resulting enterprise knowledge structures.

In the previous section we have explained how such knowledge structures can be

manually created and maintained. The techniques introduced in the following are

complementary to this functionality. The first one adds a critical mass of formal

semantics to folksonomies in order to overcome some of their typical limitations,

such as the usage of abbreviations and alternative spelling, synonyms and different

natural languages to tag the same resource. The resulting lightweight ontology can

be explored, further developed and used in Semantic MediaWiki. In contrast, the

focus of the second technique is on leveraging existing knowledge bases, which

might contain significant amounts of (instance) data which could be useful within

SMW. The implementation is based on Freebase as one of most visible collections

of structured knowledge created in recent years; however, the mediator-based

approach underlying the implementation can be equally applied to other knowledge

bases.

3.3.2.1 Turning Folksonomies into Ontologies

This section gives an overview of our approach to extract lightweight ontologies

from folksonomies. The approach consists of 12 steps that have to be carried out in

the given order (see Table 3.2).

Step 1: Filter Irrelevant Tags In the first step, we eliminate tags, which do not

improve the information content, but have a downgrading effect on the quality of

the data basis. Unusual tags, which do not start with a letter are therefore filtered

out. Additionally, uncommon tags are dismissed. In this context a certain tag is

uncommon, if it is used less than a predefined threshold.

Step 2: Group Tags Using Levenshtein Metric The process of annotating

a certain resource with tags is an uncontrolled operation, which means that no

spell-checking or any other input verification can be assumed to take place. As

a consequence typing errors, mixing of plural and singular forms, annotations in

different languages and other possible minor discrepancies between tags are likely

to occur. The Levenshtein similarity metric (Leveshtein 1966) is used to discover

morphologically similar tags.

Step 3: Enrich Tags with WordnetWordnet is a rich resource of lexical informa-

tion (Fellbaum 1998). The database is organized in so-called synsets and can be

accessed locally or remotely over a simple user interface. If a certain tag can be

found in Wordnet, one can expect the tag to be a valid English term. All tags which

are covered by Wordnet are assigned a flag containing the exact number of

occurrences in Wordnet synsets.

Step 4: Enrich Tags with Wikipedia Wikipedia is a large, high-quality, and up-

to-date online encyclopedia. If a certain tag can be mapped to a Wikipedia article,

this tag can be considered a correct natural language term. In addition, we can

46 B. Ell et al.

benefit from the redirect pages functionality implemented in Wikipedia, so that

even when a tag is incorrectly spelled or abbreviated, there is a high chance to find

the correct corresponding Wikipedia article.

Step 5: Spell-check and Translate Spell checking and translating single words

(not sentences or pieces of text) can be done automatically at a high precision. We

apply this additional step because after the Levenshtein similarity check and the

exploitation of Wikipedia redirects, not all tags can be related to these resources.

This might occur when a tag is misspelled, or a tag is not in English.

Table 3.2 The 12 steps of our method

Step Title Description

1 Filter irrelevant tags Consider only tag data that is shared between

a sufficiently high number of users to increase

the community representativeness of the prospected

ontology.

2 Group tags using Levenshtein

metric

Compare relevant tags using the Levenshtein similarity

metric and group the highly similar ones. Tags

within the same group are considered to have

equivalent meaning and differences are assumed to

be the result of spelling mistakes.

3 Enrich tags with Wordnet

information

Check whether a tag is covered by the Wordnet

thesaurus, which we consider a feasible indicator for

a valid English term.

4 Enrich tags with Wikipedia

information

Use information available on Wikipedia to enrich the

tags.

5 Spell-check and translate Perform English spell-checking and translate those tags

that were neither found in Wordnet, nor inWikipedia

from foreign languages.

6 Update group assignments Update the tag groups created in step 2 based on the

additional information gathered in steps 3–5.

7 Find representative for each

group

Select representative for each tag group based on its

quality.

8 Create co-occurrence matrix Create symmetric square matrix containing information

on the frequency with which two tags (or tag groups,

respectively) were used to annotate the same

resource.

9 Calculate similarities Apply vector-based algorithm (Pearson correlation

coefficient) in order to detect similarities between

vectors in the co-occurrence matrix.

10 Enrich co-occurrence matrix with

co-actoring information

Augment co-occurrence matrix with the information

about the frequency with which two tags (or tag

groups, respectively) were used by the same author.

11 Create clusters Create clusters of tags (or tag groups, respectively) on

the basis of the calculated correlation coefficients

and co-actoring information.

12 Create ontologies Transform the tag clusters created in step 11 into SKOS

ontologies exploiting all information gathered in the

previous steps.


Step 6: Update Group Assignments In this step we update the tag groups defined instep 2 based on the information collected in steps 3–5 and eventually decide which

tags are relevant for the ontology to be created. The step can be further divided into

3 activities: (1) the re-grouping based on spell-checking and translation results;

(2) the re-grouping based on Wikipedia results; and (3) the selection of relevant tags.

Re-grouping based on spell-checking and translation results The first groupupdate is triggered by the mapping defined according to spell-checking and trans-

lation results. In order to ensure consistent groups after this update, four different

scenarios for mappings of the type tagA – > tagB have to be considered:

1. Neither tagA nor tagB are assigned to a group: in this case tagA and tabB,plus all other tags mapped to any of them, form a new group.

2. tagA is assigned to a group, but tagB is not: in this case tagB, and all other

tags mapped to it, are included into the group of tagA.3. tagB is assigned to a group, but tagA is not: just as in the previous case, tagA,

and all other tags mapped to it, are included into the group of tagB.4. Both tags are already assigned to the same or different groups: in addition to the

group updates, those tags, which are already assigned to one of the corres-

ponding groups, have to be considered as well. Existing group members of

tagA will be assigned to the group of tagB.

Re-grouping based on Wikipedia results The second group update is perfor-

med if two tags are assigned to the same Wikipedia article. Just as in the previous

update step based on spell-checking and translation results, we consider existing

groups and its members, which means that also other group members may be

affected by this update operation.

The selection of relevant tags We assume that all tags, or groups of tags,

containing either a Wikipedia or a Wordnet reference, are relevant for the genera-

tion of ontologies. The relevancy of the remaining tags and groups thereof is based

on their frequency of occurrence in the folksonomy, i.e., on their usage. If this

frequency is below a certain threshold, the tag or the tag group will not be

considered. All affected tags will, therefore, be marked with a corresponding flag,

indicating that the tag is not relevant for future steps towards the generation of

ontologies. If the frequency of usage is above the given threshold, the tag, or tag

group, will be considered to describe a new term created by the tagging community.

Step 7: Find Representative for Each Group This step is about finding the most

representative tag in a tag group. This decision is taken as follows: the tag groups

defined in the previous steps can contain many single tags. By definition all tags in

a group are equivalent to each other, regardless of whether they are misspelled,

occur with a certain lower or higher frequency, or are translations from other natural

languages. For the generation of ontologies, however, we need to identify which of

these tags is the most representative for the meaning of the corresponding tag group.

Preference is given to tags occurring in Wordnet and Wikipedia references, in this

order. If neither is the case, the decision is based on the highest frequency of usage.

Step 8: Create Co-occurrence Matrix Co-occurrence matrices provide the means

to derive some kind of semantic relation between two entities. Amongst many

48 B. Ell et al.

others, this approach was chosen by (Begelman et al. 2006; Cattuto et al. 2007a, b;

Simpson 2008; Specia andMotta 2007) to analyze connections between tag entities.

The symmetric n � n co-occurrence matrix M contains information about how

frequently two tag entities are used to annotate the same resource. The value mij,

representing the intersection of (entityi, entityj) for 1 � i, j � n, corresponds to the

frequency with which the two tag-entities entityi and entityj were used to annotate

the same resource. The diagonal elements mij, where i ¼ j, of the matrixM contain

information on how often the tag-entity entityi was used at all. This serves as

a starting point for steps 9–11.

Step 9: Calculate Similarities The co-occurrence matrix is a starting point to

derive relations between tag entities. From a simplistic point of view, the relation

between co-occurrence values and the total frequency of tag entries (as proposed

by (Begelman et al. 2006)) can be seen as a good indicator for the relation of two tag

entities. This approach, however, has one important disadvantage: it does not take

into account similarities of the two tags to other tags or tag groups. A vector-based

similarity measurement, as proposed in (Specia and Motta 2007), resolves this issue.

A vector represents a row (or column) of the co-occurrence matrix. The similarity

measure is based in our case on the Pearson correlation coefficient. Algorithm 1

below shows how the Pearson correlation coefficient is calculated for two variables

X and Y, the means �X and �Y and standard deviations Sx and Sy, respectively.Algorithm 1 Pearson Correlation Coefficient

r ¼Pni¼1

ðXi � �XÞ � ðYi � �YÞðn� 1Þ � Sx � Sy

A positive coefficient value is evidence for a general tendency that large values

of X are related to large values of Y and that small values of X are related to small

values of Y. A correlation above 0.5 is an indicator that the two vectors are strongly

correlated.

Step 10: Enrich the Co-occurrence Matrix with Co-actoring Information The

outcomes of step 9 do not allow us to derive relations between tags. This holds in

particular for tags that are used frequently, but only by a limited number of users.

Usually the insertion of tags by spam robots is causing this phenomenon. Even

though there are many related tags with correlation values below 0.5, the threshold

can not be lowered any further without taking the risk to derive faulty relations as

well. To cope with this issue we enrich with so-called “co-actoring information”.

This key-figure can be calculated in a manner similar to the co-occurence informa-

tion, the only difference being the fact that the focus lies rather on the users instead

of tags. As such, the co-actoring information for two tags is defined as the total

number of users who used both tags.

Step 11: Create Clusters In this step, we aim at creating sets of strongly related

tags that we refer to as “clusters”. To do so we calculate the relation of a tag entity

to the total number of usage and the co-occurrence/co-actoring information and

raise the correlation coefficient if the relation proportions are high enough.


Algorithm 2 shows the exact formula, where ccoff denotes the correlation coeffi-

cient of two tag entities, #(tag1) and #(tag2), denote the total usage of a certain tag,coac(tag1,tag2) stands for the co-actoring information of the tag tag1 and tag2and cooc(tag1,tag2) represents the co-occurrence value of the two tags.

Algorithm 2 Correlation Coefficient Strengthener

r ¼ ccoeff � coacðtag1;tag2Þ#ðtag1Þþ#ðtag2Þ�coacðtag1;tag2Þ

� �� coocðtag1;tag2Þ

#ðtag1Þþ#ðtag2Þ�coocðtag1;tag2Þ� �

� 100

The algorithm minimizes the problem of spam entries and related tags with

lower correlation coefficients dramatically. Tag pairs with either a basis correlation

above the defined threshold th1 or with a strengthened correlation coefficient to

reach the threshold are then automatically considered to be related and form the

basis for a cluster.

Tags are merged into one cluster only if the calculated correlations between tag

entities, which are indirectly connected by the transitive law, are above another

threshold th2. This means, only if cooc(tag1,tag2) � th1, cooc(tag2,tag3) � th1and cooc(tag1,tag3) � th2, the three tag entities belong to the same cluster. Addi-

tional tag entities are added to a cluster only if all correlation values, with respect to

the other tags in the cluster, exceed the defined threshold th2.While useful, applying this strategy results in a relatively high number of very

similar clustering differing only in one or two elements. To solve this issue we

apply two smoothing heuristics as follows.

1. If one cluster is completely contained in another one, the smaller cluster is

deleted.

2. If the differences between two clusters are within a small margin and, addition-

ally, the number of elements of both clusters exceeds a certain percentage with

respect to the total number of elements of both clusters, the smaller cluster is

deleted and the tags not included in the larger one are added to it.

The second smoothing heuristics is depicted in Algorithm 3, where #(cl1) and#(cl2) denote the number of elements within the clusters cl1 and cl2, respectively.The relevant threshold in this algorithm is thcl.

Algorithm 3 Second Smoothing Heuristics for Two Clusters

if#ðcl1\cl2Þ#ðcl1[cl2Þ � thcl

and thcl � #ðcl1Þ#ðcl1Þþ#ðcl2Þ

and thcl � #ðcl2Þ#ðcl1Þþ#ðcl2Þ

then remove(cl1), remove(cl2)and insert(cl1\cl2)

Step 12: Create OntologiesAll the terms occurring in a cluster are assumed to be

related to each other in some way. The concrete type of the inter-tag connections is,

nevertheless, hardly resolvable. We consider this limitation to be of less importance

for creating lightwight ontologies. We use the SKOS standard (http://www.w3.org/

50 B. Ell et al.

2004/02/skos/), which allows establishing associative links between concepts with-

out the need to further specify their semantics. More precisely, the SKOS property

skos:related can be used to designate all kinds of relationships amongst terms

within one cluster. The clusters themselves are considered to be the domain of the

ontology, for which meta-properties (e.g., by using Dublin Core) can be included.

In SKOS, skos:ConceptScheme is used to identify a certain ontology. As

a consequence, all entities within this scheme have to include a reference to this

scheme; this is achieved through the construct skos:inScheme. The terms

within a cluster represent the entities the ontology consists of. This direct mapping

is possible as within the SKOS language, there is no distinction between classes and

instances. The construct to designate these entities is skos:Concept (Fig. 3.8).The SKOS constructs previously mentioned allow us to define the basis structure

of the ontologies. The information that was collected with respect to translations,

spell-checks, and so on, is used to enrich the ontologies. The preferred label for

a concept is the respective representative of a tag group, which is denoted by

skos:prefLabel. If there are other terms within the same group of tags,

which do occur in Wordnet, the corresponding term can be considered as a valid

substitute for the preferred label, information which is captured through the skos:altLabel construct. As SKOS does allow language distinctions, this feature is

Fig. 3.8 Example ontology created through our method


also used for both preferred labels and alternative labels. If a translation was found

for a tag, this information is attached to the label, otherwise the label is considered

to be English. This is done by using standard XML annotation, e.g., skos:prefLabel xml:lang ¼ “EN”. All other tags of a certain group are consideredto be “hidden labels” for the corresponding concept. The set of labels marked by

skos:hiddenLabel comprises common spelling mistakes.

3.3.2.2 Integrating Freebase into Semantic MediaWiki

This section gives a brief overview of an extension to SMW that allows the use of

inline queries to query Freebase (http://www.freebase.com) content via a mediator.

The mediator creates an MQL query (Metaweb Query Language, the query lan-

guage used in Freebase), handles the communication with Freebase, and returns

query results in the same way as for conventional SMW inline queries. A full

documentation of the extension is available in (Ell 2009).

Imagine you want to create a list of all European countries and their populations

within your SMW-based knowledge management system. This information is

available in general-purpose knowledge bases such as Freebase, and can be impor-

ted into the local SMW installation. The query statement could look as follows,

where the source argument is an extension of the original AskQL syntax indicating

the external knowledge base to be used.

{{#ask: [[Category:Country]] [[Located in::Europe]]

| ?Population| source ¼ freebase

}}

The AskQL query has to be translated into an MQL query, which could look

as follows.

[{

"/type/object/name" : null,"/location/statistical_region/population" : [{

"number" : null}],"/type/object/type" : "/location/country","/location/location/containedby" : [{

"/type/object/name" : "Europe"}]

}]

In order to be able to perform this translation additional information is needed.

In this case it is necessary to know that

52 B. Ell et al.

1. the category Country maps to /location/country,2. the property Located in maps to /location/location/containedby, and3. the print request Population maps to /location/statistical_region/population

where the field storing the value has the name number.

The transformation, which essentially follows a local-as-view approach, is

presented in detail in (Ell 2009). Mapping information is stored in pages via

properties, thus being editable and reusable for various inline queries.

Category mapping information is stored on category pages using the property

freebase category mapping. For example the page Category:City (the pagedescribing this category in the category namespace) may contain the statement

[[freebase category mapping::/location/citytown]].Page mapping information is stored on pages in the main namespace using the

property freebase page mapping. For example the page Karlsruhe may contain the

statement [[freebase page mapping::#9202a8c04000641f800-00000000b283e]].

Property mapping information is stored on property pages using the properties

freebase property mapping and freebase property type. For example the page

Property:Population (the page describing this property in the property namespace)

may contain the statements [[freebase property mapping::/loca-tion/statistical_region/population ;number]] and [[free-base property type::number]]. Path elements are separated by ‘;’. Ifno type mapping is specified then the standard type string is assumed per default.

Print request mapping information is stored on property pages since print

requests relate to properties. For storing the mapping information the property

freebase pr mapping is used. For example the property page Property:Located inmay contain the statement [[freebase pr mapping::/location/loca-tion/containedby]].

In case the mapping information is missing or can not be properly interpreted,

the extension behaves as follows.

Ambiguities The page where mapping information is expected to be contained

may contain the mapping property multiple times. For example a category page

may contain several properties with the property name freebase category mapping.In this situation the mapping information is ambiguous and only the first result

returned by the SMW database is used.

Property type If the property type of a property is not given using freebaseproperty type then type string is assumed.

Page mapping information missing If no page mapping exists for page P then

an MQL query is created where an entity is requested with name P. If the query is

specified with parameter language ¼ L then an MQL query is created that requests

an entity that has the name P in language L.Category mapping information missing If no category mapping information

is found then the category statement and all subordinated statements in the descrip-

tion object tree returned by the query processor are ignored.


Property mapping information missing If no property mapping information

is found then the property statement and all subordinated statements in the descrip-

tion object tree returned by the query processor are ignored.

This behavior is robust since missing mapping information is ignored. In case

of ambiguities or missing mapping information, a warning is displayed to the user.

Thereby a step-by-step development and improvement of the query is supported.

3.3.3 Repairing Knowledge Structures

Quality issues are a natural consequence of the collaborative, integrated knowledge

engineering approach followed by Semantic MediaWiki and its extensions. There-

fore, our solution also includes techniques to support users in detecting and

correcting potential modeling errors or missing information. This section provides

an overview of the types of quality issues we deal with and the implementation of

the associated knowledge repair functionality.

Similar Names In an ontology we have different types of entities. A common

issue with adding entities to an ontology is that a user might overlook that the entity

she intends to add is already in the ontology with a name slightly different from the

name the user would have chosen. By adding the entity, the user introduces

redundancy to the ontology which makes the ontology unnecessarily larger, and

more error prone. To avoid such issues we measure similarities between entities via

the Levenshtein distance, and present the results to the user, who then has to decide

whether the entities under consideration represent the same and thus should be

merged, or whether they do not represent the same and therefore should be kept

separately in the ontology.

Similar Property Sets The idea here is to compare the property sets of ontology

classes in order to identify potential similarities. The ontology editor introduced in

Sect. 3.3 displays all the sibling categories which have at least 50% of their

properties in common (see Fig. 3.12) for the user to decide for appropriate action.

Cycles and Redundancies This measurement identifies cycles within a special-

ization-generalization hierarchy (see Fig. 3.9). Similarly the knowledge repair

functionality includes means to identify redundant is-a relationships, which are

presented as decision support to the user.

Missing Properties The underlying rationale for this metric is the inherent

difficulties experienced by knowledge modelers in distinguishing between the

data and the schema level of an ontology. Here we display those ontological

primitives that do not have any successors in the hierarchy, thus indicating missing

specialization-generalization properties or misclassifications of specific entities

as classes or instances.

Category knowledge repair The previously discussed attempts to solve

problems are used primarily by certain users who aim at keeping the knowledge

base consistent. The methods mentioned enable the user to get an idea which

categories are part of a problem of the knowledge base no matter which taxonomy

54 B. Ell et al.

they belong to. However, there are also users who create an ontology because

the domain under consideration is a domain of interest of such a user. Therefore

the user might be keen on creating an error-free ontology. Instead of using each

approach sequentially in order to resolve the issues about a certain category, the

user also has the possibility to get all information about one category at a time.

Besides the previously mentioned methods the user gets also information about

minimum, average and maximum values which can be compared to the values of

the category under consideration as well as information about the meaning of

certain figures, which is useful for the non experienced users. In order to guide

the attention of the user to severe problems these are marked with a symbol or red

color. Minor problems are marked and all the other information is not marked

(see Fig. 3.10).

Category statistics Some of the previously described methods provide the

user with information of all categories regarding one specific type of problem.

The method Category knowledge repair in contrast provides the user with informa-

tion of all the types of problems regarding one specific category. This approach

combines these two types of problem solving attempts. It displays all categories

together with the results of each problem solving attempt. Therefore the user gets

all information about issues regarding all categories. The use of this approach is to

have a global view on the situation of taxonomies and the categories. Without such

a comprehensive view it can be rather difficult to solve issues which spread over

many categories. Then a user would have to jump from one category to another

many times to resolve an issue. This gets more complicated if the branching of the

category under consideration is more complex than it is with a category only having

one supercategory and one subcategory. So far, the user gets quite the same

information for all categories, as he does when using the Category knowledge

repair approach for one specific category. In order to guide the attention of the

Fig. 3.9 Categories in cycles, categories with redundant relationships and entities with similar

names


Fig. 3.10 Category knowledge repair

Fig. 3.11 Category statistics

56 B. Ell et al.

user to severe problems these are eye-catchingly marked. Minor problems are

highlighted and all the other information is not marked as seen in the figure (see

Fig. 3.11).

Category and Property Histogram In an ontology we have many entities

starting with different letters. In order to get an overview of the distribution of

entities starting with a specific letter in the ontology in relation to the alphabet

a histogram can be very useful. It provides a comprehensive view on how many

entities start with a specific letter in comparison to other letters (see Fig. 3.12).

A normalized histogram can point out unusual things, however this requires that

there is a certain number of entities in the database. The more entities there are

the more likely they will follow a specific distribution regarding their first letters.

3.4 Conclusions

The chapter has covered the area of enterprise knowledge structures, starting from

the requirements and research questions derived from use cases all the way to

methodologies and implementations to bridge the different heterogenous structures

that are in use today.

We expect that a common language for representing knowledge structures will

foster further development and research in this area. The research results presented

in this chapter are examples of what can be achieved once some foundational

questions (such as the representation language or the necessary expressivity) have

been settled, and we can move forward towards unifying knowledge management

Fig. 3.12 Category histogram and categories with similar property sets


tools and methodologies, further integrating results from heterogeneous areas in

order to support the knowledge worker to the fullest possible extent.

Enterprise knowledge structures are heterogeneous in nature, and their inte-

grated use requires a framework that allows understanding the trade-offs between

different structures, and optimizes for given scenarios.

Many enterprises may already apply folksonomy-like systems. We have shown

how folksonomies can be used as the foundation for developing lightweight

ontologies which can then be in turn used to connect to further knowledge sources.

Besides tagging, we have explored further Web 2.0 inspired paradigms, and imple-

mented extensions to a wiki-based system that allows for the seamless integration

of external data sources like Flickr or a company database. This system allows

for explicit but lightweight management of an ontology within the wiki-interface,

and powerful gardening and knowledge quality assessment tools.

References

Ankolekar A, Kr€otzsch M, Tran T, Vrandecic D (2007) The two cultures: mashing up web 2.0 and

the semantic web. In: WWW ’07: proceedings of the 16th international conference on world

wide web, ACM Press, New York, pp 825–834, ISBN 9781595936547. doi: 10.1145/

1242572.1242684, URL http://dx.doi.org/10.1145/1242572.1242684. 2007

Begelman G, Keller P, Smadja F (2006) Automated tag clustering: improving search and explora-

tion in the tag space. In: Proceedings of the collaborative web tagging workshop co-located

with the 15th international world wide web conference (WWW2006), 2006

Cattuto C, Loreto V, Pletronero L (2007a) Semiotic dynamics and collaborative tagging. Proc Nat

Acad Sci U S A 104(5):1461

Cattuto C, Schmitz C, Baldassarri A, Servedio VDP, Loreto V, Hotho A, Grahl M, Stumme G

(2007b) Network properties of folksonomies. AI Commun 20(4):245–262

DrakosN,Rozwell C,BradleyA,Mann J (2009)Magic quadrant for social software in theworkplace.

Gartner RAS core research note G00171792, Gartner. http://www.gartner.com/technology/

media-products/reprints/microsoft/vol10/article4/article4.html. Accessed date Jan 2010

Ell B (2009) Integration of external data in semanticwikis.Master thesis, Hochschule,Mannheim, 2009

Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA

Fensel D (2001) Ontologies: silver bullet for knowledge management and electronic commerce.

Springer, Berlin

Gangemi A, Guarino N, Masolo C, Oltramari A, Schneider L (2002) Sweetening ontologies with

DOLCE. vol 2473 of Lecture notes in artificial intelligence (LNAI), Springer, Siguenza, Spain,

pp 166–181, ISBN 3-540-44268-5

Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nat Genet

25:25–30

Grau BC, Horrocks I, Motik B, Parsia B, Patel-Schneider P, Sattler U (2008) OWL 2: the next step

for OWL. Web Semant Sci Serv Agent World Wide Web 6(4):309–322. ISSN 1570–8268. doi:

http://dx.doi.org/10.1016/j.websem.2008.05.001

Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5

(2):199–220

Guarino N (1998) Formal ontology and information systems. In: Guarino N (ed) Proceedings

of the first international conference on formal ontologies in information systems (FOIS),

vol 46 of Frontiers in artificial intelligence and applications, IOS-Press, Trento, Italy, 1998

58 B. Ell et al.

Haase P, Herzig DM, Musen M, Tran DT (2009) Semantic wiki search. In: 6th annual european

semantic web conference, ESWC2009, vol 5554 of LNCS. Springer Verlag, Heraklion, Crete,

Greece, pp 445–460, Juni 2009

Hartmann J, Sure Y, Haase P, Palma R, Suarez-Figueroa MC (2005) OMV – Ontology metadata

vocabulary. In: Welty C (ed) Ontology patterns for the semantic web workshop, Galway,

Ireland, 2005

Horrocks I, Patel-Schneider PF (2004) Reducing OWL entailment to description logic

satisfiability. J Web Semant 1(4):7–26

Kr€otzsch M, Vrandecic D, V€olkel M, Haller H, Studer R (2007) Semantic wikipedia. J Web

Semant 5:251–261

Lenat DB (1995) CYC: a large-scale investment in knowledge infrastructure. Commun ACM 38

(11):33–38

Leveshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals.

Soviet Physics Doklady 10:707–710

McGuiness DL (2003) Ontologies come of age. In: Fensel D, Hendler J, Lieberman H, Wahlster W

(eds) Spinning the semantic web: bringing the world wide web to its full potential. MIT Press,

Cambridge, MA

Motik B, Grau BC, Horrocks I, Wu Z, Fokoue A, Lutz C (2008) OWL2 web ontology language:

profiles. W3C Working Draft 2 December 2008, Available at http://www.w3.org/TR/2008/

WD-owl2-profiles-20081202/.

Pease A, Niles I, Li J (2002) The suggested upper merged ontology: a large ontology for the

semantic web and its applications. In: Working notes of the AAAI-2002 workshop on

ontologies and the semantic web, 2002

Simperl E, W€olger S, B€urger T, Siorpaes K, Han S-K, Luger M (2010) An ontology authoring tool

for the enterprise 3.0. Taylor and Francis Publishing, 2010, London

Simpson E (2008) Clustering tags in enterprise and web folksonomies. Technical Report HPL-

2008-18, HP Labs, 2008

Sowa JF (1995) Top-level ontological categories. International Journal of Human-Computer

Studies 43(5/6):669–685. ISSN 1071–5819. doi: http://dx.doi.org/10.1006/ijhc.1995.1068

Specia L, Motta E (2007) Integrating folksonomies with the semantic web. In: Proceedings of the

4th European semantic web conference (ESWC2007), pp 624–639, 2007

Uschold M, Grueninger M (1996) Ontologies Principles, Methods and Applications. Knowledge

Engineering Review 11(2):93–155

Vrandecic D (2009) Towards automatic content quality checks in semantic wikis. In: Social

semantic web: where web 2.0 meets web 3.0, AAAI spring symposium, Springer, Stanford,

CA, March 2009a

Vrandecic D (2009) Ontology evaluation. PhD thesis, Karlsruhe Institute for Technology,

Germany, 2009b

Vrandecic D, Kr€otzsch M (2006) Reusing ontological background knowledge in semantic wikis.

In: V€olkel M, Schaffert S (eds) Proceedings of the first workshop on semantic wikis – from

wiki to semantics, Workshop on Semantic Wikis. AIFB, ESWC2006, June 2006. URL http://

www.aifb.uni-karlsruhe.de/Publikationen/showPublikation?publ_id¼1211


4

Using Cost-Benefit Information in OntologyEngineering Projects

Tobias B€urger, Elena Simperl, Stephan W€olger, and Simon Hangl

4.1 Introduction

Knowledge-based technologies are characterized by the use of machine-under-

standable representations of domain knowledge – in form of ontologies, taxonomies,

thesauri, folksonomies, and others – as a baseline for the realization of advanced

mechanisms to organize, manage, and explore information spaces. These techno-

logies have been promoted since the nineties, or even earlier, but experienced a real

revival only with the raise of the Semantic Web almost a decade ago. Primarily

introduced by Sir Tim Berners Lee, the originator of theWorldWideWeb, the idea of

providing the current Web with a computer-processable knowledge infrastructure in

addition to its current, semi-formal and human-understandable content foresees the

usage of knowledge structures which can be easily integrated into, and exchanged

among arbitrary software environments in an operationalized manner (Berners-Lee

et al. 2001). In this context, these structures are formalized usingWeb-suitable, and in

the same time semantically unambiguous representation languages; are pervasively

accessible; and can be – at least theoretically – shared and reused across the World

Wide Web. Complementarily, similar forms of knowledge organization, typically

referred to as ‘folksonomies’, emerged in the realm of Web 2.0, most prominently in

relation to technologies and platforms promoting community-driven knowledge

T. B€urger (*)

Capgemini Carl-Wery-Str. 42, Munich D – 81739, Germany


E. Simperl

Karlsruhe Institute of Technology, KIT-Campus S€ud, D-76128 Karlsruhe, Germany


S. W€olger • S. HanglSemantic Technology Institute, ICT – Technologie Park Innsbruck, Technikerstraße 21a,

A-6020 Innsbruck, Austria

e-mail: [email protected]; [email protected]


61

management and sharing. Just as many other (by-) products of Web 2.0 enterprises,

they are mainly user-generated, and can be enriched with formal meaning and

transformed into characteristically lightweight, machine-understandable, widely

accepted ontologies. Chapter ‘Enterprise Knowledge Structures’ provides an over-

view of the various forms of knowledge organization populating the enterprise

knowledge management landscape, of their characteristics, relationships, and of the

ways they can be created and maintained.

The benefits of semantic technologies across business sectors and application

scenarios have been investigated at length in the Semantic Web literature of the past

decade (Davis et al. 2003; Fensel 2001; Hepp et al. 2008; McGuinness 2003).

Nevertheless, most discussions have been carried out at the technical level, and an

economic analysis of ontology-based applications in terms of their development,

deployment and maintenance costs – or of any kind of tangible return value

assessment – have only been recently considered. In previous work of ours we

have introduced a series of models that analyze and predict the costs and benefits

associated with the development of knowledge structures such as ontologies,

taxonomies and folksonomies, and of the semantic applications using them (B€urgeret al. 2010; Paslaru Bontas Simperl et al. 2006; Popov et al. 2009; Simperl et al.

2009b; Simperl et al. 2010). We will not introduce these models here, as they have

been extensively described in previous publications of ours which the reader is

kindly referred to. This chapter can be seen as a continuation of our work – while in

past publications we have focused on the definition of the models and their evalua-

tion, this chapter is concerned with the application of the models in enterprise

projects and provides guidelines – both scenario and tool-oriented – that assist

project managers in utilizing the models throughout the ontology life cycle.

The remainder of this chapter is structured in three parts. The first part gives

a brief overview of the results of a survey performed between October 2008 and

March 2009 of 148 ontology engineering projects from industry and academia. We

have analyzed the findings of this survey in order to identify how cost-related

information could be of use to alleviate some of the issues ontology engineering

is currently confronted with. The second part investigates the life cycle of

ontologies and ontology-based applications and presents 15 scenarios in which

cost-benefit information could support decision-making. The third part proposes

a number of methods and tools by which models such as ONTOCOM, FOLCOM or

SEMACOM can be applied in these scenarios in an operationalized manner.

4.2 The Impact of Cost-Benefit Information on OntologyEngineering Projects

In this section we briefly present the results of a six months empirical survey

that collected data from 148 ontology engineering projects from industry and

academia in order to give an account of the current ontology engineering practice,

62 T. B€urger et al.

and the effort involved in these activities. Through its size and the range of the

subjects covered, the survey gives a comprehensive overview of the current state

of practice in ontology engineering as of 2009. A detailed report and analysis of

its findings has been published in (Simperl et al. 2009a). For the purpose of this

chapter, we will focus on those aspects which refer to activities in which decisions

could be influenced by the availability of reliable cost-benefit information.

The survey pointed out that the use of methodological support for developing

ontologies clearly varies from project to project. Concrete use of an ontology

engineering methodology was, for instance, observed only in one out of nine

projects. As improvement in ontology engineering is concerned, participants

suggested the following: Project settings in which domain-analysis and evaluation

needs run high, mandate domain-specific customizations of the generic

methodologies available. This can be confirmed by our data analysis, which

indicates low tool-support for these ontology-engineering activities. High tool-

support, therefore, was shown to reduce development time considerably. Such

customizations might be particularly beneficial for very complex domains, for the

development of ontologies with broad coverage, and for those that involve non-

common-sense knowledge such as life sciences. A last issue to be highlighted, in

particular as more and more high-quality ontologies are becoming available, is

ontology reuse. Our survey showed that reuse is still not systematically performed,

and often hampered by technical obstacles. While the adoption of ontology-based

technologies will continue, it is likely that such scenarios will gain in relevance and

significant efforts will need to be invested in revising existing ontology reuse

methods, techniques and tools towards providing the adequate level of support

for non-technical users.1

With respect to process-related aspects we could observe discrepancies

between (1) the complexity of particular activities as perceived by ontology engi-

neering practitioners; (2) the significance of these activities as measured in terms

of their impact on the total development costs; and (3) the level of maturity

achieved at present by the R&D community with respect to methods and tools

supporting these activities. To further investigate these discrepancies, we analy-

zed the correlation between various ontology engineering aspects which are

covered by the survey.

1 The survey did not cover reuse efforts related to the usage of ontologies in the context of the

Linked Open Data (LOD) initiative, where a (relatively small) number of vocabularies is reused

through so-called ’interlinking’. With LOD acting as a real game-changer in the semantic-

technologies landscape, a new survey is required in order to fully understand the state-of-practice

of ontology engineering in such data-driven scenarios.

4 Using Cost-Benefit Information in Ontology Engineering Projects 63

4.2.1 Correlation between Ontology Engineering Aspects andTotal Effort

Table 4.1 shows the correlation between ontology engineering aspects and the

actual project effort in person-months.2

DCPLX Out of the six positively correlated factors, domain analysis was shown

to have the highest impact on the total effort, achieving a significantly higher

correlation value over the other five activities. This is an assessment of the time-

consuming nature of the knowledge-acquisition process, which was also confirmed

by comments received from the participants in the interviews, and by previous

surveys in the field. As our results point out, tool support for this activity was very

poor. Many interviewees questioned the utility of available tools, which were per-

ceived as too generic especially when it came to ontologies developed for highly

specialized domains such as health care, or in projects relying on end-user contri-

butions. In addition, participants shared the view that process guidelines tailored

for such specialized cases are essential for the success of ontology engineering

projects. Existing methodologies are very generic when it comes to issues of

knowledge elicitation. They state the imperative need of a close interaction between

domain experts and ontology engineers, but extensive studies on using techniques

such as concept maps, card sorting and laddering (Cooke 1994) are largely missing.

These particular techniques, complemented with detailed insights on the practices

established in the respective domains, could be very useful to design specially

targeted methodologies and guidelines for ontology engineering.

Table 4.1 Correlation between ontology engineering aspects and effort

Ontology engineering

aspect Description Correlation

DCPLX Complexity of the domain analysis 0.496

CCPLX Complexity of the ontology conceptualization 0.237

ICPLX Complexity of the ontology implementation 0.289

REUSE Percentage of integrated reused ontologies 0.274

DOCU Complexity of the documentation task 0.346

OE Ontology evaluation 0.362

OCAP/DECAP Capability of the ontologists/domain experts �0.321

OEXP/DEEXP Expertise of the ontologists/domain experts �0.192

PCON Personnel continuity �0.134

LEXP/TEXP Level of experience with respect to languages and tools �0.172

SITE Communication facilities in decentralized

environments

�0.168

2We use the cost-driver abbreviations defined in the ONTOCOM model [Popov et al. 2009].

A positive correlation means, that, if the value of one factor rises, the other rises as well; the

opposite holds for negatively correlated factors. For instance, if OCAP decreases, the effort is

expected to rise.


OE The quality of the implemented ontology remains a major concern among

ontology engineers. Nevertheless, the projects we surveyed seldom used any of the

existing ontology-evaluation methods and techniques, but relied on expert judge-

ment. In projects in which systematic ontology-evaluation practices were observed,

they immediately had a significant impact on the effort. More than 50% of surveyed

projects reported minor effort in formally testing the ontologies they developed.

Other 48% reported fair use of simple testing methods which were mostly carried

out manually. Only three projects performed extensive testing using several

methods. The survey indicated a combination of manual testing and self-validation

by the engineering team as the preferred and common choice in most projects.

At this juncture, ontology evaluation plays a passive role in ontologies developed in

less formal project settings such as in academia. However, as ontology-evaluation

practices increase with the demand for quality assurance, the associated impact on

effort can be substantial.

DOCU Documentation proved to be a costly factor as well. The survey results

point out that most of the developers of highly-specialized ontologies perceived

documentation as a resource-intensive activity. This was not necessarily true for

less complex ontologies, and in cases where the development process was less

formal.

CCPLX, ICPLX The ontology conceptualization, which is responsible for the

modeling of the application domain in terms of ontological primitives (concepts,

relations, axioms), and the ontology implementation where the conceptual model

is formalized in a knowledge-representation language, are positively correlated

factors. However, their impact on the total effort is not as high as the one of the

domain analysis or the ontology evaluation. This outcome speaks for the relatively

well understanding and high-quality tool support for these activities of the ontology

engineering process.

OCAP/DECAP, OEXP/DEEXP, LEXP/TEXP The impact of the personnel-

related aspects suggests that more training programs in the area of ontology

engineering, better collaboration support, and an improved, more fine granular

documentation of the decisions taken during the ontology engineering process

may have positive effects.

SITE The data analysis produced counter-intuitive results for the SITE para-

meter which accounts for the degree of distribution of the team and their com

munication and collaboration facilities. Here the analysis suggested that email

communication lowered the effort needed to build ontologies while frequent face-

to-face meetings increased the effort significantly. This could be based on the

assumption that face-to-face meetings produced more different views on the onto

logy, and resulted in more discussions which, of course, raise the costs of ontology

development.

The slight dominance of factors such as DCPLX (domain analysis) and OE(ontology evaluation) indicates that any facilitation in these activities may result

in major efficiency gains. Even though tools such as wikis may be helpful especially

in collaborative settings, they are still rarely used for the purpose of domain

analysis. More generally, the results of the interviews indicated a low tool support


for this task. This situation could be improved by applying methods such

as automated document-analysis, or ontology learning approaches, to support the

analysis of the domain, the assessment of the information sources available and the

knowledge-elicitation process. Extending existing methodologies with specific

empirically determined practices in place in particular vertical domains could

also have a positive effect. A similar conclusion can be drawn for ontology

evaluation. Despite of the availability of automated approaches such as unit tests,

consistency checks or taxonomy-cleaning techniques, ontology evaluation still

lacks tools which are easy to use and comprehensible for most users.

Concluding the correlation analysis between ontology engineering aspects and

effort, we can state that process activities such as domain analysis, conceptuali-

zation, implementation and evaluation, as well as the level of reusability of

the ontology, and the documentation requirements have a well-distributed correla-

tion factor associated with the effort. This means that each of these activities

exhibits a relevant impact on the effort, while at the same time indicating that no

individual activity plays a overwhelmingly dominating role. As expected, the

quality of the ontology engineering team is crucial for the success of a project;

it would be interesting to investigate, however, the effect of such aspects in more

collaborative scenarios, which could become the norm in ontology engineering.

The data set on which this analysis is based on is not relevant to highly decen-

tralized scenarios of community-driven ontology engineering. More research is

needed to assess the state of the art in the area of ontology reuse and associated

activities such as ontology understanding, merging, and integration. In this respect

the survey is not representative and should be revisited once this engineering

approach gains more importance, for instance, as a consequence of the wide scale

development of a critical mass of ontologies in diverse vertical sectors. An addi-

tional, equally interesting scenario is related to the increased popularity of data-

driven, LOD-oriented approaches, which reference vocabularies already in use in

the LOD cloud. Reuse is defined not necessarily at the level of ontologies to be

assessed, selected, modified and integrated, but in terms of data sets and mappings

between them. Provided a clear understanding of these new concepts, the costs

associated with such knowledge engineering exercises, and models which poten-

tially apply to the new scenarios, require further investigation.

4.2.2 Correlation between Ontology Engineering Aspects

In addition to the impact on the total development effort we analyzed the correla-

tion between specific aspects of ontology engineering projects. The most important

findings are discussed in the following.

Personnel-related aspects (Table 4.2) were shown to be positively correlated.

This was obvious for those questions referring to the capability and experience of

the ontology-engineering team. In most cases the survey showed that the capability

of the participants was largely based on their project experience. Additionally, the


software support available to projects carried out by the same ontology-engineering

team tended to remain unchanged. When new tools were introduced, the learning

period for experienced practitioners was much higher than for novel developers.

Similar observations were made in software engineering, in which habits of

software use have a significant influence on acceptance and adoption of new

software.

High correlation values were also measured between activities within the ontol-

ogy engineering process (Table 4.2). One, in particular, was between ontology

evaluation and ontology documentation (Table 4.3). Data analysis showed that

these results mainly apply to large-scale ontology engineering projects. This is

possibly due to the fact that such ontology-development projects run more exten-

sive evaluation tests, which in turn might lead to additional documentation effort.

Domain analysis was most highly correlated with conceptualization and implemen-

tation. The majority of the interviewees did not perceive a clear cut between the

conceptualization and the implementation activities. Conceptualization in most

cases was a lightweight description and classification of the expected outcomes.

In most of the projects surveyed there was no language- or tool-independent

representation of the ontology. Instead, the ontology was implemented with the

help of an ontology editor. In over 40% of the projects the development was

performed mainly by domain experts, who generally agreed that current ontology

editors are relatively easy to learn and utilize. This finding is different from the

observations of previous surveys and comparative studies, and confirms one more

time the fact that ontology engineering has reached an industry-strength level of

maturity.

According to this survey, in ontology engineering the main technological build-

ing blocks and development platforms are meanwhile available from established

vendors. Despite this promising position, the information known about the process

underlying the development of ontologies in practice is still very limited. Cost-

benefit information might lead to improvements along the lines we have argued

above. We have used the findings of the correlation analysis to identify scenarios in

which ontology engineering processes can be improved by paying proper attention

to cost-benefit information delivered through models developed in the ACTIVE

project.

Table 4.2 Correlation

between personnel-related

aspects

OCAP/DECAP OEXP/DEXP

OCAP/DECAP 1 0.552

OEXP/DEXP 0.552 1

LEXP/TEXP 0.489 0.584

Table 4.3 Correlation

between process-related

aspects

DCPLX DOCU

OE 0.211 0.389


4.3 Guidelines for Using Cost-Benefit Information in OntologyEngineering

Starting with the findings of the survey discussed in the previous section, we have

performed an analysis of the ontology life cycle with respect to the utility of cost-

benefit information for decision-making purposes.

In order to provide measurements of specific key indicators throughout the

ontology life cycle, clear goals for improvement have to be defined. This is crucial

not only for the selection of the relevant indicators, but also for the definition of

associated methods and tools, and for directing the acquired information to the right

recipient. This can be achieved, for instance, through the GQM (‘goal-question-

metric’) approach (Ebert and Dumke 2007). GQM proposes three steps to identify

suitable metrics by (1) first defining the goals to reach (i.e., business objectives,

key performance indicators, improvement goals), (2) then reviewing the goals by

asking appropriate questions to reach the goals (i.e., improvement programs,

change management, project management techniques), and (3) finally, by defining

measurements to assess the current status (i.e., to review the work of employees,

and the development status of products or processes) (Ebert and Dumke 2007). To

identify how the availability of cost-benefit information could impact the operation

of ontology engineering projects, we will develop GQM templates for a series of

scenarios and activities, in which such information is expected to provide an

improved baseline for decision making.

4.3.1 Using Cost-Benefit Information Throughout the OntologyLife Cycle

From an enterprize perspective the ontology life cycle – just as any other product

life cycle – consists of four phases: (1) business case and strategy development,

(2) project and concept definition, (3) development and market entry, and

(4) maintenance and evolution. In each of these phases cost-benefit information is

typically used to support decision making (Ebert et al. 2005):

Business case and strategy development Cost-benefit information might be

used to support a trade-off analysis according to (development) time, costs, content,

benefit, or return of investment (ROI).

Project and concept definition Here this information is typically used to make

effort estimates, to develop budget plans, and to support buy vs reuse decisions.

Development and market entry Cost-benefit information might be used to

support trade-off analysis and to determine the costs to completion. Additionally

it might be used to influence the development process, staffing and tool acquisition,

and to perform progress controls.


Maintenance and evolution In this phase one could assess the costs and benefits

of extensions in order to decide for repair against replacement, and to estimate

maintenance costs.

We can distinguish between a number of recipients of this information. This

includes the senior management of the company, the project management team, and

the engineers. They use cost-benefit information as follows (Ebert and Dumke

2007):

Senior Management They use the information

– To obtain an reliable view on the business performance;

– For forecasts and indicators where action is needed;

– To drill down into underlying information and commitments; and

– For flexible resource refocus.

Project Management Project managers appreciate cost-benefit information

– For immediate project reviews;

– To check the status and forecast for quality, schedule, and budget;

– To identify follow-up action points; and

– To generate reports based on consistent raw data.

Engineers During the development phase engineers rely to cost-benefit

information

– To gain immediate access to team planning and progress insights;

– To get an overview into own performance and how it can be improved; and

– To identify weak spots in deliverables.

4.3.2 Scenarios Supported by Cost-Benefit Information

This section defines scenarios in which cost-benefit information can help to support

decisions in knowledge engineering projects. The scenarios will be further detailed

in Sect. 4.3.

Supporting the Business Case Development Phase In this phase, cost-benefit

information will mostly be used by the senior management planning a new business

or the introduction of a new technology. The information has to be based on data

from historical projects in order to be able to derive estimations based on experi-

ences from previous endeavors. In some cases, experience reports or analytical

insights may prove useful as well (B€urger and Simperl 2008), but the ultimate goal

is to have an objective baseline to decide for or against the introduction of semantic

technologies at a general level.

Scenario 1.1 Introduction of an ontology-based application Cost-benefit

information can be used to support the decision whether or not an ontology-based

application should be introduced and/or developed inside a company.


Scenario 1.2 Development of certain featuresHere information about the costs

and benefits of ontology engineering can be used to assess the economic impli-

cations of specific features to be introduced in an application, or to define the scope

of an ontology to be developed.

Scenario 1.3 Expressivity/Type of knowledge structure Based on a set

of initial requirements, insights about the efforts and added value associated

with the development and maintenance of specific knowledge structures can be

used to decide upon the most appropriate type of knowledge structure, in particular

about the required expressivity, to support a certain business plan. The aim is to

perform a trade-off analysis between costs and benefits of the use of ontologies,

taxonomies, folksonomies, and other knowledge structures, using as input the

results delivered by methods such as ONTOCOM (Simperl et al. 2009b) or

FOLCOM (Simperl et al. 2010).

Supporting the Project Definition Phase In this phase, the information will be

used by the project management but also by the senior management team. Relevant

decisions include the overall planning of a project, but also rather specific issues

such as buy vs reuse vs in-house development. Decisions should be backed up with

detailed requirements in order to be able to provide accurate estimates.

Scenario 2.1 Effort planning Cost information can foremost be used to estimate

the effort needed to realize a project.

Scenario 2.2 Project planning Cost information can support project planning

and help to estimate the length of certain project phases.

Scenario 2.3 Team building Both cost and benefit measurements can be used

to assess the performance of team members with certain skill levels, and, provided

the availability of such measurements from previous projects, the information

can be used to support the initial assignment of staff to project teams.

Scenario 2.4 Tool acquisition Historical cost-benefit information typically

allows to judge the efficiency of the team in different project phases based on the

tools used. This information can be used in the planning phase of new projects to

decide about tools to use or purchase.

Scenario 2.5 Develop vs buy vs reuse Based on methods such as ONTOCOM,

associated with a deep knowledge of the existing ontologies that are relevant for the

project at hand, the project management decides whether to buy, reuse, or develop

an ontology from scratch.

Supporting the Development Phase Cost-benefit information can be used

to influence decisions in iteratively organized processes, in which, for instance,

the development of ontologies undergoes certain revisions, which is common for

collaborative ontology engineering. During the planning of a new iteration, differ-

ent development options can be assessed against their economic feasibility in

terms of expected costs or expected benefits. In non-iterative projects, the same

information can be used to assess the performance of the development process and

to make necessary adjustments. This can either be based on cost information, but

also on benefit information, indicating that certain features do not receive the

necessary acceptance among a preliminary user base.


Scenario 3.1 Length of development cycles Cost-benefit information can be

used to adapt the length of revision cycles as a trade-off between the benefit of an

ontology which might not meet the requirements of all its stakeholders and the costs

of its development.

Scenario 3.2 Performance assessment Monitored cost-benefit information can

be used for performance assessment of the team, or of the status of certain features

during project iterations.

Scenario 3.3 Revision planning Information from past iterations can be used to

do just-in-time revision planning and to revise priorities, to re-assign team members

to tasks, or to discontinue the development of certain features.

Supporting the Maintenance Phase The scenarios introduced so far reflect

needs arising at the beginning of the maintenance phase, but also during the use

phase of a developed ontology, or ontology-based application. Monitored benefit

information can be used to support extension planning, i.e., to judge what type of

benefit might be expected from an extension. Benefit information can also be used

to judge on the discontinuation of certain application features. Cost information

might most notably be used to decide about repairing or replacing broken features

by newly acquired or developed parts of an ontology or ontology-based application.

Scenario 4.1 Extension planning Information acquired in the use and main-

tenance phase of a project can support the planning of extensions into directions in

which more benefits are expected.

Scenario 4.2 Replace vs repair Information on development performance can

be used to judge whether to replace or repair a broken or insufficiently developed

feature.

Scenario 4.3 Maintenance planning Information acquired in previous projects

can help to estimate maintenance costs and to establish staffing plans for this phase.

Scenario 4.4 (Dis-)continuation of features Especially benefit information

acquired in the usage phase can provide valuable hints on the acceptance of certain

features.

We now turn to the presentation of guidelines supporting these scenarios.

4.3.3 Guidelines for the Realization of the Scenarios

The guidelines will be presented in form of GQM templates for each of the 15

scenarios, together with an analysis of the cost models developed in the ACTIVE

project with respect to their suitability for the realization of the scenarios and

potential adjustments and extensions.

Scenario 1.1 Introduction of an Ontology-based Application As explained in

the previous section, the purpose of this scenario is to provide methods and tools to

support the introduction of ontology-based applications. This decision might best

be supported using benefit information on semantic technologies in the context of

a concrete business case or application.


In the following we provide a GQM template to grasp the problem at hand

(Table 4.4) and analyze the suitability of ACTIVE cost-benefit models to support

this scenario.

Two models are appropriate to support this scenario. First, the ONTOBEN

model which can be, as illustrated in (B€urger et al. 2010), used to calculate savings

based on the use of ontologies. Here the business case has to be analyzed, necessary

functionalities have to be identified and the benefit of ontology-based applications

to realize the use case at hand has to be highlighted. At the end, the model produces

a quantitative figure reflecting potential savings. Second, the ONTOLOGY-UIS

model can be used as a source of information on end user satisfaction which can be

interpreted as an indicator of the efficiency of ontology-based applications (B€urgerand Simperl 2008). To use this model, important features have to be identified

and their performance has to be assessed. In order to provide decision support for

this concrete scenario, benefit information from previous applications has to be

gathered and used to argue for or against features for similar applications in the

concrete business case based on analogy.

Scenario 1.2 Development of Certain Features The purpose of this scenario isto provide arguments for or against the development or introduction of certain

features in ontologies or ontology-based applications. The associated GQM tem-

plate is illustrated in Table 4.5.

Two models of the ACTIVE cost-benefit framework are appropriate to support

this scenario. First, SEMACOM, which assesses costs of semantic applications.

It can be used to produce estimates of the prospected costs necessary for the

Table 4.4 GQM template for scenario 1.1

Object of study A concrete business case for which technological possibilities

are being discussed

Purpose To judge whether or not the use of ontologies or ontology-based applications

makes sense in this case

Focus To identify and highlight potential business benefits of using ontologies or

ontology-based technologies in this case. This might be potential savings or

positive user acceptance values

Stakeholder A business case is typically discussed on a senior management level

Context factors Factors which influence this scenario are the requirements of the business case,

available development options, as well as restrictions in the company


Object of Study A concrete application feature that shall be part of a potentially

new or existing ontology-based product

Purpose To judge whether or not to implement a certain feature

Focus The focus is on the identification of a potential business or end user benefit

arising through this feature and to estimate costs needed to develop such

features

Stakeholder Senior management

Context factors Factors which influence this scenario are the requirements of the business case,

information on end user satisfaction of the current application and on the

performance of similar features


realization of a semantic application. Given that the granularity of the available

historical data for SEMACOM is sufficient, analogies could be drawn to judge on

(isolated) application features and to highlight their prospected costs and benefits.

This, however, demands for the availability of historical project data which is

ideally originated from the company in which the estimations should be made.

Furthermore the ONTOLOGY-UIS model can be used as an indicator for the effici-

ency of similar features implemented in other applications as already indicated in

the previous sections (B€urger and Simperl 2008). Again, this assumes that historical

project data is available.

Scenario 1.3 Expressivity and Type of Knowledge Structure The goal of thisscenario, which is further detailed in the GQM template available in Table 4.6, is

to provide tools that are easy to use and understand in order to compare existing

knowledge structures in terms of their prospected costs based on a number of

predefined parameters such as size, distribution of the project team, or complexity

of a domain.

A number of ACTIVE cost-benefit models can be used to estimate the costs

needed to develop an ontology (ONTOCOM) or a folksonomy (FOLCOM) and

thus to support this scenario. Support for this scenario, of course, demands for the

availability of calibrated cost models in order to be able to deliver the necessary

quantitative data to judge for or against a specific type of knowledge structure.

In addition, the ONTOLOGY-UIS model provides quantitative figures on the

efficiency of applications based on the different types of knowledge structures

used (B€urger and Simperl 2008), which can be used as arguments to influence the

decision at hand. To be able to apply ONTOLOGY-UIS reliably, historical data on

perceived benefits of ontology-based applications, which make use of different

types of knowledge structures, has to be available.

Scenario 2.1 Effort Planning In the effort planning scenario, the project

management team needs estimates on the needed effort to realize a project. This

scenario, which is detailed in the GQM template in Table 4.7, can be supported

using a number of ACTIVE cost models.

All methods targeting various types of knowledge structures developed through-

out ACTIVE are relevant for this scenario. In addition, SEMACOM can be applied

to estimate the costs needed for the development of a semantic application. In order

to use the models for that purpose, historical project data for the calibration of the

models is assumed to be available. If this is the case, initial requirements for the


Object of study A concrete business case in which the use of knowledge structures has already

been decided and for which the type of the structure still needs to be

discussed

Purpose To judge which type of knowledge structure is more appropriate for the current

business case

Focus The focus here lies on influencing this decision based on cost benefit information

Stakeholder Senior management

Context factors Past experiences in using specific types of knowledge structures


ontology or ontology-based application can be used as input for the corresponding

model to generate effort estimates.

Scenario 2.2 Project Planning In the project planning scenario in Table 4.8 theproject-management team would like to generate estimates for the overall develop-

ment effort expected to be necessary for the development of an ontology, or

ontology-based application. The estimates should be split according to the phases

of the development life cycle.

In order to realize this scenario, and to use the mentioned models for project

planning, a number of pre-requisites have to hold. First of all, ONTOCOM and

SEMACOM would have to be adapted to the concrete methodologies followed in

the enterprise (Paslaru Bontas and Tempich 2005). More importantly, historical

project data, which is typically needed for the calibration of models, has to be

collected with a finer granularity than in (Paslaru Bontas et al. 2006) (Simperl et al.

2009b), which were concerned with effort estimates at the project level rather than

at the level of individual life cycle phases. An alternative approach could be the

distribution of the overall estimates, calculated using the existing models, across the

phases, as it was done for the inclusion of ONTOCOM information in the gOntt tool

(see Sect. 4.2).

Scenario 2.3 Team Building Here the aim is to support staff assignment to

project development teams. In order to realize this scenario ONTOCOM and

SEMACOM can be used to generate estimates that can be adjusted based on

different input factors such as team or tool experience. Based on these factors in

concrete projects, optimal team member profiles can be determined that minimize/

maximize the effort needed to develop an ontology or ontology-based application.

As usual, calibrated models have to be available to realize this scenario. If team

members shall be assigned to distinct phases the calibration of the model should be


Object of study A concrete project which shall be implemented

Purpose Estimate the length (and effort required) of development phases in the project

plan

Focus The focus here lies on splitting and distributing of the overall effort along

distinct project phases

Stakeholder Project management

Context factors Concrete development methodology and information on distinct phases.

Furthermore the application requirements of course influence the cost

estimates and thus the splitting as well



Purpose Estimate the effort required to develop the knowledge structure inside the project

Focus The focus here lies on the costs of knowledge structures to be used


Context factors Application requirements, project team, domain, development environment (tool,

distributiveness), etc


adjusted accordingly, so values of the personnel-related cost drivers can be col-

lected at the level of the individual phases. So far, data collection has been at the

level of the overall project, and no lower-level information is taken into account

(Table 4.9).

Scenario 2.4 Tool Acquisition In the project planning phase a number of

decisions have to be taken which influence the project environment later on. This

includes the acquisition of tools that support one or more development phases. The

aim of Scenario 2.4 is to be able to argue for or against the acquisition of specific

tools based on empirical cost-benefit information (Table 4.10).

Cost estimates generated with ONTOCOM are influenced by the prospected tool

support, and thus allow to reason about the impact of certain type of tools on the

estimated development effort. In order to provide statements on the performance

and usefulness of concrete tools, the model has to be fed with more detailed data.

Otherwise it can only provide information on the influence of well or not well-

performing tools on the effort needed to conduct the whole development project.

Statements on specific tools are thus not possible. Besides gathering data on how

well the development process was supported by specific tools, the names of the

specific tools have to be collected as well. This information is not available in the

data set described in (Simperl et al. 2009b).

Scenario 2.5 Develop vs Buy vs Reuse Devising the most adequate engineering

strategy is based on both technical and economic matters. Scenario 2.5 (Table 4.11)

covers the latter.

A model which has been tailored to support this type of scenarios is

ONTOCOM-R which is an extension of the original ONTOCOM model towards

ontology reuse (Simperl and B€urger 2010).The ONTOCOM-LITE and the SEMACOMmodels, which account for the costs

of taxonomies, and semantic applications, respectively, would have to be extended

in a similar direction, paying proper account to the characteristics of the reuse



Purpose Plan and optimize the allocation of team members to the project or distinct

phases

Focus The focus here lies on identifying optimal team member profiles in order to

realize the project in a given time frame


Context factors Concrete development methodology and project requirements



Purpose To argue for or against the acquisition of a certain tool

Focus Requirements of the to be developed ontology with respect to tools


Context factors Project requirements, previous experiences with tools


process. In order to reliably use ONTOCOM-R in this scenario, there are a number

of issues which would need to be tackled. First of all, a considerably higher amount

of data on the potentially reusable ontologies has to be available. This would result

in a better calibration of the REUSE cost driver, which accounts for the additional

effort required to build ontologies which should be reusable in contexts other

than the one they have been created for, and of all other cost drivers capturing aspects

related to ontology engineering by reuse. Furthermore, an analysis of the reuse

candidates with respect to the need to translate or align parts of the reusable ontology

has to be performed. More information about the application of ONTOCOM-R is

provided in (Simperl and B€urger 2010).Scenario 3.1 Length of Development Cycles The purpose of Scenario 3.1

(Table 4.12) is to estimate the length of development cycles in an iterative project

based on historical data. Such data could provide insights on the trade-offs between

centralized, commonly agreed development, which represents a compromise view

over the domain shared by all stakeholders, and localized, modified versions of the

ontology, which have to be mediated through alignments.

If the project is run in an agile fashion, then the scenario could be realized using

the FOLCOM model, which is by design targeted at such environments. For the

remaining models, a number of adjustments have to be made, triggered by the

slightly different process model followed. In (Paslaru Bontas and Tempich 2005)

we provide a detailed example of how such an adjustment could look like. Existing

data and calibration results remain valid, as the individual iterations still match the

core work breakdown structure underlying ONTOCOM.

Scenario 3.2 Performance Assessment The purpose of Scenario 3.2 (Table 4.13)is to assess the performance of team members at specific points in time during the

development of an ontology or ontology-based application.

BothONTOCOMand FOLCOMcan be used for this purpose if data on the costs of

development phases is gathered while the project is running. Within the FOLCOM

model this requirement is fulfilled by design which means that the model is supposed

to be used to estimate the effort needed for future iterations, which can then be com-

pared to performance information that was monitored. For ONTOCOM one would

have to carry out the adjustments discussed in the previous scenarios.

Scenario 3.3 Revision Planning In order to plan a next development iteration,

cost-benefit information might be useful as performance indicator. Thus it can be


Object of study A concrete (part of a) project which shall be implemented, e.g., an ontology

module

Purpose To assess whether to develop an ontology or a module thereof from scratch

or to reuse existing components

Focus The focus lies on the make vs. reuse decision for a concrete part of the

ontology


Context factors Application requirements, definition of the ontology module, profile of

potentially reusable modules or ontologies


considered as a valuable source for planning forthcoming iterations in development

cycles of a project, which is the purpose of Scenario 3.3 (Table 4.14).

To be used in this scenario, ONTOCOM would have to be adapted to an

(iterative and collaborative) methodology (Paslaru Bontas and Tempich 2005).

Furthermore, information on the reusability and integration effort needed to con-

solidate revisions, is required. On the other hand FOLCOM is by design usable for

this purpose, as discussed earlier.

Scenario 4.1 Extension Planning The purpose of Scenario 4.1 (Table 4.15) is toassess the feasibility of the integration of new features into the product (ontology or

ontology-based application) using on cost-benefit information.

To realize this scenario all models of the ONTOCOM framework, meaning

ONTOCOM, ONTOCOM-LITE and SEMACOM can be used to estimate the

costs of a planned extension, provided the planned extension is perceived as a

new development including reuse of an existing ontology. ONTOLOGY-UIS can

be used to assess the current performance and, based on that, can support decisions

on whether new features are required, or would be beneficial for an application. In

addition, it might provide cues on missing parts of the ontology, if configured

accordingly. In order to use the model an appropriate survey infrastructure, includ-

ing questionnaires and statistical techniques to analyze the results, has to be set up.

Scenario 4.2 Replace vs Repair In this scenario project managers need to

decide whether to replace or repair (parts of) an ontology or ontology-based

application based on cost-benefit information (Table 4.16). This information

is delivered by the ONTOCOM-R model, which ideally would be calibrated with

enterprise-specific data on concrete ontology reuse practices in order to provide

optimal results (Simperl and B€urger 2010).


Object of study A currently running development project

Purpose To estimate the length of development cycles based on historical

data (from the same project)

Focus The focus lies on an estimation of future project cycles at

the beginning or in the middle of a running project

Stakeholder Project management/ developers

Context factors Application requirements, team performance indicators, completed

project iterations



Purpose To assess the performance of team members

Focus The focus lies on past and current development phases

Stakeholder Project management/ developers

Context factors Other projects currently running in the company might (negatively or

positively) influence the performance of team members. This aspect is

typically not assessable


Scenario 4.3 Maintenance Planning Every of our models could be used to

support this scenario if extended towards cost drivers specific to the maintenance

activity. At the conceptual level this extension can be implemented in a straightfor-

ward manner, by introducing an addition cost driver accounting for the additional

effort, but the challenge resides in acquiring the empirical data which would be

necessary to re-calibrate the model (B€urger et al. 2010). The GQM template for this

scenario is depicted in Table 4.17.

Scenario 4.4 (Dis-)continuation of Features During the use of a product, it

might turn up, that a certain feature does not receive the necessary user support,

or is not functioning as desired. The purpose of Scenario 4.4 is to assess the per-

formance of such features during or after their development based on cost-benefit

information (Table 4.18).

The ONTOLOGY-UIS model assesses the performance of ontology-based

applications in order to isolate the benefits that are generated by ontologies or

particular features of an application. Given that the performance of specific features

can be monitored in isolation, ONTOLOGY-UIS can be applied to decide about the

(dis-)continuation of features. In order to so do, one would have to design the

questionnaires necessary for the assessment of the performance of the application

using the ontology and its features.



Purpose To plan a next iteration in development cycles of a project

Focus The focus lies on estimating the (optimal) length of revision cycles

Stakeholder Project management, development team

Context factors Team structure and distribution, initial size of the ontology, etc


Object of study A completed product (ontology)

Purpose To assess the feasibility of the integration of new features into the product

(ontology or ontology-based application)

Focus New features to be integrated

Stakeholder Senior management, project management

Context factors Performance of the application, available budget


Object of study A previously developed ontology or ontology-based application

Purpose To decide whether to replace or to repair (parts of) an ontology or

ontology-based application)

Focus A broken (part of an) ontology or ontology-based application.


Context factors Existing budget, available ontologies, development team factors, etc


4.4 Methods and Tools for Cost/Benefit Driven CollaborativeKnowledge Creation

In this section we introduce a number of tools to support the realization of the

scenarios presented in Sect. 4.3. The tools target all stakeholders in the life cycle of

collaboratively developed ontologies and ontology-based applications.

4.4.1 Cost Estimation of Ontology Development

We have developed a tool which can be used to run the ONTOCOM estimation

process, and the calibration of the model using (company-internal) data. The tool

provides an easy-to-use user interface for running calibrations and predictions

(Fig. 4.1). The calibration can utilize the existing data set of 148 data points, self-

owned data, or a combination of both. Furthermore, users are able to customize the

model, selecting the cost drivers to be taken into account for the calibration or

adding new cost drivers. The latter is relevant, for instance, in scenarios explicitly

targeting iterative engineering.

Once the model is calibrated, project managers and engineers can use the model

to calculate effort estimates. To do so, they have to indicate the rating levels of the

cost drivers which best fit the project circumstances and provide an estimate of the

size of the ontology (Fig. 4.2). The rating levels are associated with numerical,

calibrated values, which serve as input to the ONTOCOM formula. The result is

expressed in person-months.

The tool can support scenarios 1.3, 2.1, 2.3, 2.4, and 4.1.

4.4.2 Planning of Ontology Development

gOntt is a project planning tool for ontology-development projects inspired by

software-engineering tools such as Microsoft Project (Gomez-Perez et al. 2009)


Object of study Currently developed ontology or planned ontology

Purpose To estimate the costs needed in the maintenance phase.

Focus Costs occurring while the ontology/ ontology-based application is in use.


Context factors Underlying development requirements, maintenance team


Object of study Ontology-based application

Purpose To assess the performance of certain features during or after

their development.

Focus Particular application features which are based on an ontology.


Context factors User perception, influencing factors of the application


(http://www.neon-toolkit.org/wiki/Gontt). We are extending it into cost information in

order to facilitate the realization of scenario 2.2, using the results of ONTOCOM for

planning and scheduling purposes. In order to integrate the two approaches in

a sound manner, one has to take into account several key aspects related to the

parametric approach adopted by ONTOCOM (and related models in other areas) to

estimate development efforts. First, such models require a critical mass of empirical

data for calibration purposes, which inherently reduces the number of ontology-

engineering activities covered by the NeOn methodology (Suarez-Figueroa et al.

2007) – which gOntt relies upon – for which accurate estimates can be calculated.

Second, refining the cost model to deliver estimates at a more fine granular level

would ask for the definition of various sub-models covering a specific, potentially

complex ontology engineering activity; examples of such models include a model

estimating the costs of ontology reuse, a model explicitly targeting the costs of

ontology evaluation, and many more. All these models, provided the availability of

historical project data, could then be applied to produce estimates for specific

phases of the life cycle model followed. However, they would be rather focused

and restricted in application, and, for certain activities (such as domain analysis)

would very likely require deeper insights into the practices that are actually in place

Fig. 4.1 The ONTOCOM main user interface of the ONTOCOM tool


at the enterprise using the corresponding model. Therefore, we opted for a slightly

different strategy, which overcomes these issues. In a nutshell, ONTOCOM (in its

altered version resulting from the alignment between the NeOn methodology and

its current cost drivers) can be used to predict the total cost of an ontology

engineering project; the project manager subsequently devises a distribution func-

tion by which this estimate is broken down to the individual phases of the project,

based on her expertise and on existing insights from case studies in ontology

engineering literature. The simplest form of this distribution function is based on

percentages, which are then taken into account to calculate cost estimates per phase

or even activity. A mock-up is depicted in Fig. 4.3 below.

When aligning ONTOCOM cost drivers to the NeOn methodology we identi-

fied a series of issues related to the definition of the former, which will result in

a new release of the cost model:

– The need for a more detailed account of project management costs, coveringNeOn

activities such as feasibility study, scheduling and configuration management.

– The need for ontology location support as part of our ontology reuse cost drivers

group. This will be implemented as a new cost driver as well.

Fig. 4.2 Cost prediction user interface


– Discrepancies in the level of granularity of certain cost drivers, in particular

ontology modification and ontology translation. These should be merged in a

next version of the model.

– Methodologically sub-optimal support for knowledge reuse, which includes all

aspects related to leveraging non-ontological resources in an ontology engineer-

ing project.

Fig. 4.3 gOntt extension for ONTOCOM


4.4.3 Decision Support for Choosing Appropriate KnowledgeStructures

Another envisioned tool combines ONTOCOM, ONTOCOM-LITE and FOLCOM

into one application so as to support senior and project management in the task of

choosing an appropriate knowledge structure based on the estimated size and

prospected costs (see Scenario 1.3).

A sketch of the knowledge structure comparison tool is presented in Fig. 4.4.

Essentially, the tool displays the cost development of different knowledge

structures based on their size. This graphical view should be generated from the

data sets of the corresponding cost benefit models, making knowledge structures

comparable in terms of the costs associated with their development, as a function of

their size. In addition, a filtering mechanism could be offered, allowing potential

users to select if the knowledge structures in the data set are based on reuse or

developed collaboratively from scratch, and how this influences the costs as

outlined in Sect. 4.2.

4.4.4 Budgeting Ontology-Development Projects

In order to assist the management in estimating the costs required for the develop-

ment of a knowledge structure, and in choosing the most suitable staff and support

tools, we envision a budgeting application which displays the results of, for

instance, ONTOCOM in different phases of the ontology-development project.

The budgeting tool could be adjusted along various parameters; the idea is that

potential users should provide parameters of their planned development projects,

and the tool should then generate estimates for the expected costs in different

development phases. Basically, users should indicate the type of knowledge struc-

ture and the estimated size which serves as a minimal input to the tool. Users should

Fig. 4.4 Knowledge structure comparison tool


be able to indicate if a planned knowledge structure shall be developed based on

reuse or developed collaboratively. Furthermore the tool is supposed to provide

support for balancing the prospected efforts by adjusting several parameters,

including team or tool parameters. A possible screenshot of such a tool is depicted

in Fig. 4.5. The tool should furthermore indicate how an optimal profile of a team

member in each phase should look like and which tools are suggested for the

implementation of the planned knowledge structure. This functionality, supporting

scenarios 2.1–2.5 could be integrated into gOntt.

4.4.5 Decision Support for Ontology Reuse

As previously motivated, the reuse of ontologies has always been a lively discussed

problem in ontology-engineering community, and bears technical, as well as organiza-

tional challenges. A core decision to be taken in the planning phase of many ontology-

related projects is whether to develop an ontology from scratch or to foster reuse of

existing ontologies. This decision is often taken on economic premises, considering the

cost savings achieved by reuse and/or the prospected benefits of the two strategies.

A tool to support such decisions could look like the one displayed in Fig. 4.6.

At the bottom of the interface the user is supposed to insert the estimated size of the

ontology to be developed (manually or via reuse). Furthermore, she has to indicate

a breakdown of the total size in translated, modified, or aligned ontology elements,

as these operations are associated to a a different amount of effort. ONTOCOM-R

assesses the benefit of a reuse-driven strategy based on these inputs, which is

relevant for scenarios 2.5 and 4.2.

Fig. 4.5 Budgeting tool


4.4.6 Tagging Efficiency Monitoring

FOLCOM is a model to estimate tagging costs based on the principles of story-

points (Simperl et al. 2010). Information on the performance of the development

team is interesting for project management in order to adjust the team if the

performance is not satisfying. FOLCOM produces time estimates in an iterative

way, in other words it calculates an estimate for the next project iteration given data

gathered from previous iterations.

A screenshot of the tool monitoring the performance of a team based on

FOLCOM is shown in Fig. 4.7. The tool targets scenario 3.2. It is Web-based and

measures the time to tag selected resources by users. Efficiency measurements are

also required to calibrate the model, as by design the estimation model calculates

the effort associated with the remainder of a project based on the actual effort spent

so far.

In a nutshell the tool shows an excerpt of an information resource (PDF

documents, but also images and videos) from a specific collection and asks users

to tag it.3 Depending on the selected tagging mode, it is possible to add free tags, as

well as tags from a controlled vocabulary, and tags recommended by the tool

according to some heuristics. Deletion of tags is also supported.

Fig. 4.6 ONTOCOM-R reuse decision tool

3 FOLCOM calculates the time required to tag the entire collection.


4.4.7 Performance and User Satisfaction Assessment

Assessing the performance of an application or a feature of it, can provide valuable

hints on the (dis-)continuation of the development or the introduction of certain

features. As illustrated in Sect. 4.3, the ONTOLOGY-UIS model can be used to

assess user satisfaction of features of an application and by that make a statement

about the performance of the application. The outcome of the method can be

visualized in a so-called snake diagram where one can see the gaps between the

expectation of the users vs the actual performance of the application. This is an

indicator of the efficiency of certain application features (B€urger and Simperl

2008). A mock-up interface of the tool is depicted in Fig. 4.8. It is relevant for

scenarios 1.1, 1.2, 4.1, and 4.4.

4.4.8 Organizational Impact Evaluation

The Organizational Impact Evaluation Tool (OIET) addresses the impact of seman-

tic applications at an organizational level. Being deployed in Pillar 5 of EVEKS

framework, it analyzes the corporate structure based on user ratings (Imtiaz et al.

Fig. 4.7 Tagging efficiency monitor


2009; Imtiaz et al. 2008). The output of the OIET can be used as an input for cost-

benefit models, in particular in scenarios 1.1 and 1.2.

The procedure to apply the OIET is supposed to be as follows (Fig. 4.9): First,

organization parameters of measurement are set by higher management. Afterwards

questionnaires will automatically be generated based on the first step, refining the

scope of the investigation. These questionnaires have to be answered by employees

of the company. The output of this activity is a cross-match of variables and

questions for every employee. Based on that, an overall matrix can be aggregated

showing the company-wide perceived structure and satisfaction of a particular

information system. This leads to an assessment of the three key parameters

collaboration, knowledge and technology with respect to their impact of variables

such as dumb nodes,4 active nodes,5 and external nodes (Imtiaz et al. 2009; Imtiaz

et al. 2008).6 Based on these findings, information systems can be evaluated in

terms of their company-wide benefit.

Fig. 4.8 Performance and UIS meter (based on [Remenyi et al. 2001])

4 Dumb nodes are non-specialists which are information/data pushers. These roles may be

incorporated into enterprise information systems.5 Active nodes are specialists that define the process dynamics and can only have an interface to

enterprise knowledge portals.6 External nodes are specialists on a sub-process level. They should therefore be considered while

designing enterprise information systems.


Thus, OIET provides an overall impact at organizational level by describing

such an impact in terms of different variables which can explain the ultimate

benefits in adopting such technologies for organizations. B€urger et al. (2010)

explains in detail the steps which have to be performed in order to apply the tool.

At a later stage, its output can help the company to analyze the proper balance for

the parameters in cost-benefit models.

In table 4.19 we summarize which scenarios (as presented in Sect. 4.3) are

supported by the different models and tools.

4.5 Conclusions and Outlook

Industry is starting to acknowledge the technical value of ontologies for enterprises.

In the last years early adopters have been increasingly using them in various

application settings ranging from content management to enterprise application

Fig. 4.9 Organizational

impact evaluation

Table 4.19 Models and tools supporting the different scenarios

Scenarios Models Tools

1.1, 1.2 – OIET

1.1, 1.2, 4.1, 4.4 ONTOLOGY-UIS Performance and UIS meter

1.3 ONTOCOM, ONTOCOM-

LITE, FOLCOM

Knowledge structure comparison tool

1.3, 2.1, 2.3, 2.4, 4.1 ONTOCOM ONTOCOM Tool/Cost prediction user

interface

2.1, 2.2,2.3, 2.4, 2.5 ONTOCOM Budgeting tool, gOntt Extension for

ONTOCOM

2.2 ONTOCOM gOntt Extension for ONTOCOM

2.5, 4.2 ONTOCOM-R ONTOCOM-R Reuse decision tool

3.1 FOLCOM –

3.2 FOLCOM Tagging efficiency monitor

3.3 FOLCOM, altered version

of ONTOCOM

–

4.3 Any –


integration. The main technological building blocks are meanwhile available from

established vendors. Despite this promising position, the economic side of their

development, maintenance and usage are still not an integral part of ontology-

engineering projects. In this chapter we analyzed how cost-benefit information

delivered by models such as ONTOCOM, FOLCOM and ONTOLOGY-UIS can

be used at various stage of the ontology life cycle by management and technical

staff. Some of the scenarios we identified demand for additional empirical data,

either with respect to volume – in order to cover cost drivers which have been

neglected by the data sets currently used for calibration purposes – or with respect

to granularity – to allow for predictions at the phase, rather than project level. Most

notably, the former demands for the availability of reliable data on ontology

maintenance projects, and on reuse of ontologies or other similar knowledge

structures. While such data could not be collected in the past – mainly as a con-

sequence of the evolution of the Semantic Web area – the raise of data-driven

approaches in the context of the Linked Open Data initiative may quickly change

this state-of-affairs, even if it also may require a re-definition of the existing notion

of reuse and the methodologies therefor. The need for adjustments to provide better

coverage of the reuse topic has also been acknowledged when aligning ONTOCOM

with the NeOn methodology, nevertheless only at the conceptual level – the

question of data availability seems to be realistically accessible only in close

relation with developments around the LOD cloud.

Acknowledgements The research leading to this paper was partially supported by the European

Commission under the contract FP7-215040 “ACTIVE”.

References

Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43

B€urger T, Simperl E (2008) Measuring the benefits of ontologies. In OTM ’08: proceedings of theOTM confederated international workshops and posters on on the move to meaningful internetsystems, pp 584–594. Springer

B€urger T, Popov I, Simperl E, Hofer C, Imtiaz A, Krenge J (2010) Calibrated predictive model for

costs and benefits. Deliverable D4.1.2, ACTIVE

Cooke N (1994) Varieties of knowledge elicitation techniques. Int J Hum Comput Stud 41:

801–849

Davis J, Fensel D, van Harmelen F (eds) (2003) Towards the semantic web: ontology-drivenknowledge management. Wiley, The Atrium, Southern Gate, Chichester, West Sussex

Ebert C, Dumke R (2007) Software measurement: establish – extract – evaluate – execute.Springer, Berlin

Ebert C, Dumke M, Schmietendorf A (2005) Best practices in software measurement. Springer,

New York

Fensel D (2001) Ontologies: a silver bullet for knowledge management and electronic commerce.

Springer Verlag, Berlin

Gomez-Perez A, Suarez-Figueroa M-C, Vigo M (2009) Gontt: a tool for scheduling ontology

development projects. In Proceedings of the fifth international conference on knowledgecapture, ACM, New York


Hepp M, De Leenheer P, de Moor A, Sure Y (eds) (2008) Ontology management: semantic web,semantic web services, and business applications (semantic web and beyond). Springer,New York.

Imtiaz A, Giernalczyk A, Davies J, Thurlow I (2008) Cost, benefit engineering for collaborative

knowledge creation within knowledge workpspaces. In Proceedings of EChallenges 2008.Imtiaz A, Giernalczyk A, B€urger T, Popov I (2009) A predictive framework for value engineering

within collaborative knowledge workspaces. In Proceedings of EChallenges 2009.McGuinness DL (2003) Ontologies come of age. In Fensel D, Hendler J, Lieberman H,Wahlster C,

(eds) Spinning the semantic web: bringing the world wide web to its full potential. MIT Press,

Cambridge

Paslaru Bontas Simperl E, Tempich C, Sure Y (2006) Ontocom: a cost estimation model for

ontology engineering. In Proceedings of the 5th International Semantic Web ConferenceISWC2006.

Paslaru Bontas E, Tempich C (2005) How much does it cost? Applying ONTOCOM to

DILIGENT. Technical Report TR-B-05-20, Free University of Berlin

Popov I, B€urger T, Simperl E, Imtiaz A (2009) Preliminary predictive model for costs and benefits.

Deliverable D4.1.1, ACTIVE.

Remenyi D, Money A, Sherwood-Smith M, Irani Z (2001) The effective measurement and

management of IT costs and benefits

Simperl E, B€urger T (2010) H. Jin, Z. Lv (editors): data management in semantic web, chapterOntology Reuse – Is it Feasible? Nova Science Publishers, Inc. (to be published)

Simperl E, Mochol M, B€urger T, Popov I (2009a) Achieving maturity: the state of practice in

ontology engineering in 2009. In Proceedings of Ontologies, DataBases, and Applications ofSemantics for Large Scale Information Systems (ODBASE’09).

Simperl E, Popov I, B€urger T (2009b) ONTOCOM Revisited: towards accurate cost predictions

for ontology development projects. In Proceedings of the 6th European Semantic WebConference (ESWC 2009), pp 248–262

Simperl EPB, B€urger T, Hofer C (2010) Folcom or the costs of tagging. In Proceedings of the 17thinternational conference on Knowledge Engineering and Management by the Masses(EKAW2010), pp 163–177

Suarez-Figueroa M, de Cea GA, Buil C, Caracciolo C, Dzbor M, Gomez-Perez A, Herrrero G,

Lewen H, Montiel-Ponsoda E, Presutti V (2007) Neon development process and ontology life

cycle. NeOn deliverable 5.3.1, NeOn


5

Managing and Understanding Context

Igor Dolinsek, Marko Grobelnik, and Dunja Mladenic

5.1 User’s Working Contexts in Knowledge Workspace

Some may say that the word context is rather broad and people may have difficulties

understanding the specific interpretation one is using when talking about context.

On the other hand, we were not able to find a more suitable replacement for this

word, so we stayed with context. To overcome the confusion we will initially spend

some time to describe our usage of the word context.

Our focus is on knowledge workers and on their daily work with computer

systems. More specifically, we are most interested in their knowledge processes.

From our perspective, knowledge workers are conducting knowledge processes by

using software tools like MS Office, Internet Explorer, Windows File Explorer,

CRM and ERP tools, Wikis, blogging tools, chatting tools etc. While using the tools

they are accessing information from all kind of sources like MS Word documents,

MS Outlook messages, Web pages etc. We will refer to them as informationresources. Furthermore, knowledge processes are performed to achieve some

personal or business goal. Often it happens that several people are involved in

jointly achieving a business goal or working on the same assignment. Furthermore,

the same knowledge process is often executed by the same person with different

information resources and in collaboration with different people to achieve differ-

ent goals. The proposed framework thus needs to be able to support such settings.

Our assumption is that grouping people, information resources and knowledge

processes needed to achieve a specific goal or perform a specific assignment, may

simplify the knowledge worker’s interaction with the computer system and their

I. Dolinsek (*)

ComTrade d.o.o., Litijska 51, Ljubljana 1000, Slovenia


M. Grobelnik • D. Mladenic

Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova 39, Ljubljana SI-1000, Slovenia



91

collaboration with others who are working on the same assignment. This grouping

is organized through the concept of context. A person or a group of collaborators

defines a context for a particular goal they would like to achieve or assignment they

would like to perform. Whenever they are performing some computing activities

which are related to that goal or assignment we say that ‘they are working in that

specific context.’ While working in that specific context all relevant information

resources (emails, documents, web pages etc.) and knowledge processes are linked

to that context. This makes the later information retrieval and repeated executions

easier because the users can navigate through the system by using the high level

concept of context rather than the detailed level of individual information

resources. The assumption is that such abstraction is closer to the usual thinking

process. As a consequence, the working context can be used also to filter the

information which is presented to the user by the software tools to reduce the

information overload.

The actual interpretation of the context is left to the users. For example, for a

patent lawyer a context could be a particular patent application he is working on

together with the patent inventors. For a sales person this could be a specific

proposal she is preparing in response to the request for proposal from the client.

Note that proposal preparation can be a lengthy and quite complex undertaking

where several coworkers are involved and a number of knowledge processes like

meeting organization and proposal review need to be performed. For a researcher a

context can be a project he is working on together with a team of co-workers. Large

projects can be further structured and therefore corresponding context can consist

of several sub-contexts. For example, work on the ACTIVE project could have

ACTIVE.planning, ACTIVE.development, ACTIVE.testing, ACTIVE.reporting,

ACTIVE.meeting and ACTIVE.dissemination sub-contexts to group people,

resources and knowledge processes in a more fine-grained way.

In short, in our sense a context is a collection of information resources, processes

and people which a knowledge worker finds it convenient to group together in order

to be able carry out his or her work more effectively.

5.1.1 Non-Observed Versus Observed User’s Activities

In order to manage and understand user context in computer environments we need

to conduct some form of user observation. For example, we need to keep track of

various information resources the user has been reading or modifying, of mail

messages he has been exchanging with others, etc. This is the price we have to

pay to gain the benefits of better search and reduced information overload. We are

aware of the fact that this may raise privacy concerns. Nevertheless, in an enterprise

environments user’s activities are being observed already by systems management

and data protection tools. We have to ensure that the user activity observation

process maintains the same level of privacy protection as is maintained by the

92 I. Dolinsek et al.

data protection and systems management processes which are already deployed in

the enterprise.

5.1.2 Observation Methods

For useful context management more data has to be collected about the user’s

activities than is normally available from off-the-shelf office tools. In addition,

relations between the people and the resources they are using have to be recorded in

a similar way as already done by e-mail and messaging tools. For our purposes a set

of primitive events has been defined to model the most fine-grained user activities

we are interested in. Examples of primitive events are

• User has accessed a particular web page

• User has sent an email message with particular subject, content and attachments

to the specified list of recipients

• User has started/stopped a particular program

• User updated a particular Word file

Each primitive event includes the relevant data that was involved in the event.

For example, the ‘user updated a particular Word file’ event includes the before and

after image of the Word document. Another example is email-related events that

carry the subject, mail content and attachments.

On the other hand, we are not interested in low-level user’s activities like mouse

clicks, windows selections, etc.

5.2 Knowledge Workspace

We believe that existing software tools in a typical MS Windows/MS Office setup

are not providing adequate support to deal with knowledge processes. Currently,

knowledge workers can deal with information on the individual document or file

folder level and interact with others on individual email message or chatting

message level. On the other hand the knowledge that links the relevant people,

activities, files and email messages in respective folders to the specific goal is

implicit and often exists only in the person’s mind.

For the most frequently used and well-defined business processes (like a typical

sales cycle), specialized software tools have been developed to link the necessary

process elements and resources together. However, for the support of informal and

dynamic knowledge processes there are very few suitable software tools available.

For example, with the Unified Activity Management project (Moran et al. 2005)

IBM has developed a set of specialized tools to cope with generic knowledge

processes and collaboration. On the other hand some European research projects

such as:

5 Managing and Understanding Context 93

• Nepomuk (http://nepomuk.semanticdesktop.org/) and

• Gnowsis (http://gnowsis.opendfki.de/)

have approached this field from the semantic desktop perspective.

The knowledge workspace we are proposing (called the ACTIVE knowledge

workspace – AKWS) is a software prototype which we have developed in the

ACTIVE project. Our approach to building a collaborative knowledge process

management platform differs from the existing approaches taken by the formerly

mentioned projects. The main guideline for developing this prototype was that it

should extend the existing Microsoft Windows platform and MS Office tools with

proposed concepts as much as possible. While context, knowledge process and

resource meta-data support are provided in the newly developed platform, they are

to a large extent delivered to the users through extensions to popular enterprise

office automation tools likeWindows File Explorer, Internet Explorer and other MS

Office tools.

The proposed knowledge workspace is architected as a set of cooperative

services, dealing with contexts, knowledge processes, resource’s meta-data and

other web platform-specific infrastructure and as a set of plug-ins for popular

programs in use in the enterprise environment. Workspace services reside in the

enterprise intranet and plug-ins extend the existing software on user’s desktops.

In the scope of the ACTIVE project, plug-ins were developed for the following

software products:

• Windows File Explorer

• Microsoft Internet Explorer

• MS Outlook 2007

• MS Excel 2007

• MS PowerPoint 2007

• MS Word 2007

• Semantic Media Wiki

• LiveNetLife

• LiveOfficeLife.

• miKrow

We will refer to them as ACTIVated applications. In addition, the ACTIVE

Taskbar tool provides access to the ACTIVE services from the desktop and the

ACTIVE portal provides web access to the services. A trial version of the ACTIVE

knowledge workspace software prototype is available for download for research

purposes at http://www.active-project.eu.1

1As at January 2011.


5.3 Context Support in the ACTIVE Knowledge Workspace

There are two perspectives for dealing with contexts in the AKWS. The top-down

perspective allows users to name and define contexts explicitly on an as-needed

basis. The bottom-up perspective relies instead on the observation of the user’s

computing activities where by using computer intelligence techniques AKWS

services infer the various contexts the user is working in.

5.3.1 Defining Contexts and Context Switching

In the proposed knowledge workspace, context is represented as a workspace

resource which can be defined and shared by the users. Context is subject to

Workspace access policies in the same way as any other Workspace resource.

When a user starts working on a new assignment a context with a suitable name

is created in the Workspace by the user. In the case that this work will require

collaboration with other people in the enterprise, they can be assigned to this

context, too. AKWS provides the means to see all contexts which are assigned to

the user and gives the user a possibility to switch between them explicitly, these

features being provided by the ACTIVE Taskbar. The screenshot in Fig. 5.1 shows

the user’s ACTIVE Taskbar at the moment when this user is working in the

ACTIVE.meetings context and is about to switch the context to ACTIVE.reporting.

The Taskbar context menu shows a list of assigned contexts for this user:

AMRoffer and ACTIVE which is further structured into sub-contexts ACTIVE.

planning, ACTIVE.development, ACTIVE.testing, ACTIVE.meetings and

ACTIVE.reporting. This way the user explicitly controls in which context he/she

is working. Other ACTIVated applications the user is running on this desktop are

automatically aware of the current context.

5.3.2 Associating Information Resources

All ACTIVated applications provide a way to associate the data they operate on

with the current context. This can be done either explicitly by the user via the

‘associate’ button or implicitly whenever the data is accessed by the ACTIVated

program. The screenshot in Fig. 5.2 shows a part of the ACTIVated MS PowerPoint

ribbon where explicit association of the current PowerPoint document ‘active-

knowledge-workspace-session-v1’ with the current context ‘ACTIVE.meetings’

can be performed simply by clicking on the ‘Associate’ button.

Similarly an e-mail can be associated with the context in MS Outlook, a

spreadsheet in MS Excel, an URL in Internet Explorer and text documents in MS

Word. An arbitrary file in Windows can be associated with the context by using the

ACTIVE Shell extension menu option as it is shown on the screenshot in Fig. 5.3.


Here the user will insert the selected Excel spreadsheet ‘server_mem’ into the

workspace and associate it with the current working context ACTIVE.meetings.

5.3.3 Associating People

People can be assigned to a context by using the team context management feature

of the ACTIVE portal as it is shown on the screenshot in Fig. 5.4:

Fig. 5.1 Context switch in ACTIVE taskbar

Fig. 5.2 Associating a PowerPoint presentation to the working context


Fig. 5.3 Associate a windows file with the context

Fig. 5.4 Assigning people to the context


Here user Carlos will be assigned to the ACTIVE.meetings context to which

Igor, Ian, Paul and John are already assigned. Once a person is assigned to a context,

this context will become available through the context selection list in that person’s

ACTIVE Taskbar.

5.3.4 Associating Knowledge Processes

Similarly the existing knowledge processes Trip arrangement and Venue arrange-ment can be associated with the context by using the TaskPane of the ACTIVE

Taskbar as shown by the screenshot in Fig. 5.5.

Here two knowledge processes (Trip arrangements and Venue arrangements)are already associated with the context ACTIVE.meeting and the user can associate

more of the predefined knowledge processes by importing them from the knowl-

edge process store or from a file.

5.3.5 Viewing and Searching Through the Context Perspective

Through various features of the ACTIVE Taskbar the user is given insight into the

content of the current context. Screenshot in Fig. 5.6 shows the ACTIVE.meeting

context together with the list of people currently working in that context and a list of

people who are also assigned to the context but are working in some other context

right now. On user request, a list of all information resources which are associated

Fig. 5.5 Associate knowledge processes to the context


with the current context is displayed in the ‘Resources in current context’ form and

the TaskPane shows the currently associated knowledge processes.

Alternatively the context can be visualized as is shown on the screenshot in

Fig. 5.7.

The associated context resource in question can be directly accessed or

manipulated from the respective list (a document can be opened by the appropriate

program, user can be contacted via mail or chat, etc.). Similarly the resource’s

meta-data as provided by the Workspace can be displayed.

The context name can be used to search for associated resources by using the

search option of the ACTIVE Taskbar. This can help to formulate searches in

situations like this: ‘I remember this data was presented in a PowerPoint slide which

we prepared in scope of the ILM offer for Nasa.’ A quick look at the context cloud

in the Workspace shows that there is a context named ‘Nasa ILM offer’ so a

context-restricted search can be made with further restriction to the PowerPoint

documents and we are presented with all potentially relevant PowerPoint slides.

One can argue that the same effect can be achieved when all team members are

using the same conventions for naming the directories where the documents are

stored or the same set of characteristic keywords embedded in the documents. In

that case standard desktop search facilities could be used for easier retrieval.

However, in that case we are relying on people always following the same pattern

of manual activities, not to mention the problem of consistently using the agreed

markup names. In addition, the team leader has to use conventional communication

tools to inform the others about the conventions. With the context concept in place

Fig. 5.6 Viewing workspace resources of a working context


and with ACTIVE Taskbar being used by all team members this is already well

defined and instantly visible to all team members. Furthermore, with automatic

associations in place the ‘markup’ is also resilient to sloppy users. The only thing

expected from them is to accurately set their working context. But as will be shown

later in this chapter, the Workspace will also assist the user in setting the appropri-

ate working context.

The currently selected context is used to filter the data presentation in various

applications in the AKWS. For example, all Office tools have a new ‘Start button’

menu option ‘Open from context’ where the user can open a new document only

among the documents which are associated with the current context as shown on the

screenshot in Fig. 5.8.

Similarly a context filtering button is available in Outlook so that only mail

messages associated with the current context are displayed in the selected Outlook

message folder list. On the screenshot in Fig. 5.9 only two emails, associated with

the context ACTIVE.meetings are displayed in the Inbox because the context filter

was applied by the user by pressing on the ACTIVE.meetings button in the

ACTIVE toolbar in Outlook.

Fig. 5.7 Viewing context resources in context visualizer


Fig. 5.8 Context filtering at document open

Fig. 5.9 Context filtering of mail messages


5.4 Context Discovery

So far the top-down perspective of context management was described where

contexts are named, defined and switched explicitly by the users. However, we

are dealing with contexts also from the bottom-up perspective. User observation

methods which are in place in all ACTIVated applications are recording a number

of primitive events. These streams of events are collected by the proposed ACTIVE

knowledge workspace context mining services. The machine learning technology

built into those services analyses the content of the documents, web pages and email

messages the users are working on. The goal is to identify the clusters of informa-

tion resources, collaborators and tasks which the users may recognize as their

context. The algorithms which are used for this purpose are described in detail in

Chap. 7: Machine learning techniques for understanding context and process.

The results of the context mining process are discovered contexts in the ACTIVE

knowledge workspace. Discovered contexts are not named, but they are chara-

cterized by a set of keywords which were suggested by the mining algorithms as

the most relevant representation of the context. In addition to the representative

keywords also the list of relevant information resources, collaborators and tasks

are determined by the mining software and associated with the discovered context.

The workspace user has the option of reviewing the list of currently discovered

contexts and then based on his judgment he can convert the discovered context into

a top-down context by giving it a name which is most meaningful to him.

Screenshot in Fig. 5.10 shows the current list of discovered contexts for one

Workspace user and detailed content of a discovered context with keywords

SKIING, HOLIDAY, AMADE, AUSTRIA, SNOW, etc.

Fig. 5.10 List of discovered contexts


Since this was a result of a personal investigation for various skiing conditions in

Austria, only one user was discovered for this context and all discovered resources

are links to popular skiing and travel web-sites. The user can decide and define a

name for this discovered context; for example ‘Austria-ski’ by selecting the ‘Name

Context’ button on the form and can then select the most relevant URLs he was

using during his search and automatically associate them with the newly named

context as shown on the screenshot in Fig. 5.11.

5.5 Context Detection

As described above, the context mining service discovers emergent contexts in the

incoming stream of primitive events. In addition it is trying to figure out the current

working context of a user. This process is called context detection and is used by the

user as an aid to automatically set the working context. Note that the user can set his

current working context explicitly through the ACTIVE Taskbar. However, this

setting might be inaccurate because the user forgets to switch the context when he

starts working on something else. Since the knowledge workspace software

maintains explicit associations between the information resources and the named

contexts it is possible to automatically verify if the current working context matches

the context which is deduced from the information resource, currently processed by

Fig. 5.11 Bulk association of discovered resources


the user. In case of a mismatch a notification is displayed on the ACTIVE Taskbar

suggesting to the user to switch to one of the suggested contexts. Then it is up to the

user to either switch context as suggested or continue with the current setting.

Screenshot in Fig. 5.12 shows a context detection notification which can be then

followed by the context switch.

In this example user was working in the ‘rails_discussion’ context but then

started to browse a Sport magazine website. However, some time ago he has

associated that Web site to two of his contexts, skiing and golf. Therefore the

Workspace sent him a suggestion that he may not be working in the most relevant

context (this could not be seen on the screenshot) and suggested him to switch to

either skiing or golf context which were directly associated with the Sport magazine

web site. But the final decision to switch the context is left to the user.

5.6 User-Guided Discovery and Detection

The final judgment of the quality of context discovery and detection is always left to

the user. The ACTIVE knowledge workspace includes communication dialogues

where the user can review the currently discovered contexts and provide feedback

to the mining service about the quality of the discovered contexts. In a case where

Fig. 5.12 Working context suggestions


the discovered context does not represent a meaningful context for the user, the

service can be instructed to discard that context. In some situations the discovered

context is actually similar to an existing named context but the mining algorithms

failed to recognize it. In such situation it is possible to manually merge the

discovered context with an existing context.

Similarly the context detection is used to provide context switch suggestions

instead of automatically switching the working context for the user. The actual

user’s decision to switch the context to the one suggested by the system, can be used

by the mining to refine the context detection process.

5.7 Discussion

There are several Business Process Management (BPM) and Enterprise 2.0 tools on

the market and many attempt to assist in the collaboration between knowledge

workers by using the project concept, like for example Basecamp (http://

basecamphq.com/2) and @task (http://www.attask.com/3). At first glance it seems

that the concept of working context which is used in the ACTIVE knowledge

workspace is actually the same as project in other tools. Indeed, the ACTIVE

context has several features that are common to the project concept in other tools.

For example, project is often used to group people, well-defined business process

(es) and information resources in similar ways as the context groups people,

information resources and informal knowledge processes. However, the inclusion

of various information resources in BPM tools is often limited to emails and

documents. Furthermore, they are often attached to the project as supplementary

information to the underlying BPM data structures and their relation to the project

can be used only within the BPM tool itself. In AKWS the context is maintained by

the Workspace but can also be incorporated into other tools. A set of Workspace

web services makes it possible to deal with contexts and their relations to other

Workspace entities in any software tool.

When working with project-centric tools it is often the case that the users can

think and work in terms of project concept only inside of the project management

tool. When they start using other tools they are again forced to think and work on

the level of individual documents and rely more on the implicit relations between

the documents and projects which they have to remember. With the open structure

of the AKWS we can align more software tools around the working context and

simplify daily operations for knowledge workers.

With closer observation of users’ activities during their interaction with the most

commonly used office automation software tools, the Workspace can help the users

2As at February 2011.3 As at February 2011.


to easily determine their current working context. Once the appropriate context is

identified it can be used for filtering and prioritization of the information, knowledge

processes and collaborators which are delivered to the users through the tools. This

makes it easier for the knowledge workers to stay focused on their work at hand.

References

Moran TP, Cozzi A, Farrell SP (2005) Unified activity management: supporting people in

e-business. Commun ACM 48(12):67–70, http://www.almaden.ibm.com/cs/projects/uam/


6

Managing, Sharing and Optimising InformalKnowledge Processes

Jose-Manuel Gomez-Perez, Carlos Ruiz, and Frank Dengler

6.1 Introduction

Knowledge workers are central to determine an organization’s success which more

and more depends on tacit individual knowledge, undocumented factual and proce-

dural expertise, essential for competent performance, and tediously learned by

experience and example. To significantly increase economic productivity it is

necessary, therefore, to increase the productivity of this knowledge-based and

knowledge-driven work. However, a high performance relies on how knowledge

workers and companies deal with some factors such as information overload, task

switching, context loss, and knowledge processes.

Knowledge workers access huge amounts of information available in personal

desktops, company knowledge repositories (ranging from document repositories to

knowledge management tools) and, eventually, on the Web. The result is that they

waste time searching and navigating to find the information needed to complete the

task-at-hand, or even attempting to figure out how they realized a similar task

related to the task-at-hand some time ago.

Furthermore, knowledge workers have to depend on personal communication

and ad hoc collaboration and communication tools, including email, to coordinate

their work. However, these tools do not unequivocally improve productivity:

Taglocity1 estimates that Intel employees spend 20 h per week to manage their

J.M. Gomez-Perez (*) • C. Ruiz

iSOCO, Intelligent Software Components, S.A., Avenida Del Partenon, 16-18, Madrid 1� 7a 28042,Spain


F. Dengler

Institut AIFB, Karlsruhe Institute of Technology, Englerstr. 11, Building 11.40, Karlsruhe 76131,

Germany


1 http://www.taglocity.com/.


107

email, a third of which are unnecessary; Davenport affirms that “while all knowl-

edge workers surveyed used e-mail, 26% felt it was overused in their organization,

21% felt overwhelmed by it and 15% felt it actually diminished their productivity”

(Davenport 2005). In addition, knowledge workers deal constantly with multitude

of task. Thus, a knowledge worker typically spends a vast amount of time switching

to working on a different task before completing the task-at-hand. Whenever there

is a task switch, knowledge workers lose the work context that they have manually

built up, leading to distraction, frustration, and loss of productivity.

Besides all these factors, knowledge workers are engaged in processes. Some of

these processes are formal business processes typically defined by the organisation

focusing on describing how business processes take place for many years.

Languages for such processes now exist, e.g. WS-BPEL2 for process execution

and BPMN3 for process description. Moreover, business processes are static,

i.e. once defined they are recurrently used in the same application domain, treated

as stable, well-defined workflows with little or no variations in their execution

based on some business rules that guide decisions. As a matter of fact, business

processes are ill-suited with respect to change management, with evolution and

adaptability being neglected in the past, preventing possible reactions in case of

variable changing.

However, as a consequence of the technical evolution, the increasing exchange

of information and collaborative decision (Kotlarsky and Oshri 2005) among

workers (e.g. analysts, researchers, etc.) carrying out their activities in knowl-

edge-intensive applications, knowledge workers work around their own informal

processes. These processes are often called knowledge processes and exhibit a

totally different nature: they are quite flexible, neither standard, nor structured,

and it is the experience, knowledge and intuition of the knowledge worker that drive

the process to success. There is a huge variety of examples: the processes which an

organisational employee uses to obtain information on a customer from a variety of

sources; or the processes to set up a meeting, including booking a room, arranging

refreshments and notifying reception of the names of visitors; or the processes of

arranging the flow of technological operations in engineering design based on the

specific character of the design artifact and the skill profiles of involved designers.

They are frequently not written down, or if they are then only very informally. This

hinders their reuse even by their creators, and certainly hinders sharing between

colleagues. We need, therefore to provide assistance to knowledge workers in

creating, reusing, sharing and also improving on these knowledge processes.

While great productivity gains can be achieved in business processes by

formalizing fixed and frequently executed processes into computer processable

workflows, standardizing most work processes in a top-down manner is neither

economical, nor possible in practice. This is probably one of reasons why informal

2WS-BPEL (Web Services Business Process Execution Language) is an OASIS standard, see

http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf.3 BPMN is an OMG standard, see http://www.bpmn.org/.

108 J.M. Gomez-Perez et al.

knowledge processes, though acknowledged as essential, are poorly supported at

the enterprise level.

The purpose of this chapter is twofold: Firstly, we give details regarding the

notion of knowledge process with a deep comparison with business processes.

Moreover, we give a formal description of what we understand as knowledge

processes describing the complete knowledge process lifecycle. Secondly, we

show different tools developed to deal with and exploit knowledge processes in

order to empower knowledge workers’ productivity by significantly improving the

mechanisms through which enterprise information is created, managed, and used.

6.2 Knowledge Processes Within the ACTIVE Project

6.2.1 Some Definitions: Business Processes vs. KnowledgeProcesses

The term business process has received a lot of attention in corporations where the

value of creating them for an enterprise is in the intellectual assets that those

processes represent. A business process is usually understood as a set of coordi-

nated tasks and activities that will lead to accomplishing a specific organizational

goal, applied by a workflow which brings the details of the process. Nevertheless,

Davenport (2005) pointed already out the definition of business processes in

contrast of some others: “a structured, measured set of activities designed to

produce a specific output for a particular customer or market. It implies a strong

emphasis on how work is done within an organization, in contrast to a product

focus’s emphasis on what. A process is thus a specific ordering of work activities

across time and space, with a beginning and an end, and clearly defined inputs and

outputs: a structure for action. . . . Taking a process approach implies adopting the

customer’s point of view. Processes are the structure by which an organization does

what is necessary to produce value for its customers.” Moreover, concepts,

methods, and techniques from Business Process Management (BPM) (Weske

2007) support the design, configuration, enactment and analysis of these business

processes.

As opposed to traditional business processes, knowledge processes are defined

by Strohmaier (2003) as representations of complex knowledge flows. They depict

the generation, storage, transfer and application of knowledge that is necessary to

create products or services. In contrast to that, we do not focus on knowledge flows

and consider knowledge processes as (Still 2007) “high added value processes in

which the achievement of goals is highly dependent on the skills, knowledge and

experience of the people carrying them out”: e.g. a new product development

processes. Those informal processes are used by people to perform difficult tasks

requiring complex, informal decisions among multiple possible strategies to fulfil

specific goals.

6 Managing, Sharing and Optimising Informal Knowledge Processes 109

In general, both kinds of processes can be understood as a set of tasks and

activities that will lead to accomplishing a specific goal, but while business

processes are focused on organizational and highly structured goals at different

levels of details, knowledge processes do not take that vision into account and is

more focused in flexible and informal activities dependent on tacit knowledge and

experience of users. For example, while a typical business process might be “An

insurance claim process,” a knowledge process can be “Schedule a meeting” or

“Book a travel.” While the former is based on a set of formalized steps (e.g. fill up

an application form, wait for a response, etc.), the latter is very flexible, with no

defined steps or even up to the worker (experience should make users to check some

web sites instead of others). A comparison can be found in Table 6.1.

In addition, knowledge processes are usually created on the fly by knowledge

workers in every situation of their daily work. As soon as complex tasks arise

knowledge workers create these processes based on their experiences and skills but

also taking into account information about their current task context, and this

supports them as they navigate a knowledge process.

Usually there are many possible ways to achieve the process objectives or to

reach a certain goal. Though, the knowledge worker has to make a lot of complex

decisions to optimize his or her process and to reach the goal. For example, the

knowledge worker could decide to reduce a given quality to be able to deliver in

time or to deliver earlier.

Furthermore, knowledge workers use their connection to other workers to carry

out the process, because, most of the time, knowledge processes are collaborative.

By performing a process collaboratively it is possible that each task is carried out by

the most specialized, experienced and knowledgeable worker in that specific area.

Having a net of relations within the organization is a very important asset for people

executing knowledge processes.

It is extremely important to continuously improve knowledge processes, by

creating an environment through which they can evolve. It is crucial to establish

an adequate process context (the combination of technologies, procedures, people,

etc. that supports the processes). The process context must incorporate feedback

mechanisms, change evaluation procedures, process improvement methods and

techniques and must be flexible in order to be able to incorporate enhancements

in an agile but controlled way.

If the process is instantiated frequently and the instances are homogeneous, it is

possible to create great process models that dramatically increase the efficiency of

Table 6.1 Comparison: business process vs. knowledge process

Business process Knowledge process

Goal Business-goal driven User-goal driven

Scope Enterprise Individual

Nature Static Dynamic

Description Formal Informal

Guided Externally coordinated Ad hoc/spontaneous

Analyzed Monitored, analyzed, optimized Not monitored, emerging


the process. The best way to ensure process improvement is to generate an environ-

ment in which people are motivated, enthusiastic and passionate to provide feed-

back to the underlying knowledge process management system.

There is also the evolving area of case management (Weske 2007) in the BPM

community. According to Strohmaier (2003), events triggered from outside drive a

collaborative, knowledge intensive and dynamic process. Those events determine at

runtime which activities need to be performed by knowledge workers handling the

case and whether additional steps are required. In contrast to informal processes, the

control flow cannot be expressed in an explicit process diagram defined in advance.

6.2.2 Scenario: Hiring Process

Figure 6.1 provides a scenario concerning a hiring process which is a common

process within the enterprise (it is an adapted version of the hiring process you can

find in Hill et al. (2006)). This sample shows how a person – in this case Alice –

would perform certain tasks for the hiring process in her small company. The

company is so small that, so far, there is no valid business process defined for

hiring people; because of this, the process is not fixed or well-defined and it depends

highly on the skills and experiences of the hiring expert.

The following actors participate in this process: Dave, the hiring manager

(manager of the group with a vacancy) – Makes the ultimate decision as to whether

to hire a candidate; Alice, hiring expert – Performs the initial filter of candidates and

Informationabout

vacancy

Vacancy

AliceHiring Expert

BobAlice’s assistant

DaveHiring Manager

Boss

ChrisThe candidate

Post and Search Review Screen

Mail aboutjob

description

Add openposition toweb site

Ask tosearch forcandidates

Searchonline

services

Sendapplicantsper mail

Review andforward

Set upinterview

withcandidate

Phoneinterview

Takingnotes

Informabout

candidateSend offer

Fig. 6.1 Knowledge process example


only forwards to Dave those that she deems appropriate; Bob, Alice’s assistant –

Performs administrative functions for Alice; Boss, Alice’s boss; and Chris is the

candidate applying for the job.

The following list explains the process steps and with whom and what Alice is

communicating:

1. Boss informs Alice that Dave has a vacancy.

2. Alice and Dave exchange e-mails on the text of the job description and then

meet to finalize.

3. Alice posts the vacancy in the online service with the text from the e-mail.

4. Alice asks Bob to search the online service for viable candidates (i.e., those

who have not explicitly applied for the job but have the necessary skills).

5. Bob searches the online service and sends Alice e-mails with viable candidates

by detaching the resumes from the online service and sending them as e-mail

attachments.

6. Alice reviews applicants from the online service (both those who have applied

for the position and those whom Debra found).

7. Alice forwards the candidates whom she deems viable in an e-mail to Dave.

8. Dave responds with an e-mail to inform Alice of the candidates whom he

wishes to pursue and those whom he wishes to reject. He also includes his

initial thoughts on the candidates.

9. Alice sets up interviews by phone for herself and the candidates.

10. Alice takes notes during the interviews by phone.

11. Alice informs Dave of the candidates whom she thinks Dave should send an

offer.

12. Dave sends an offer to the candidate.

6.2.3 Knowledge Process: A formal Extended Definition

As seen, previous definitions for knowledge processes are incomplete and do not

cover all relevant features knowledge workers need to face nowadays. Then, we

have extended that definition in the following way: A knowledge process is

• A loosely defined and structural ramified collection of tasks,

• not fully defined in terms of structure and the order of activities,

• in which activities require a decision by a knowledge worker about the follow-up

task,

• in which the actor knowledge worker uses her experience and expertise as well

as her working context to decide for the successor task,

• in which decisions have to been taken during execution time over the process

development path and lead to emerging structural ramification constituted by

admissible alternatives, and then, with dynamic ramification as one of the key

features.


6.2.4 Knowledge Process Life Cycle

In general, competitiveness of organizations depends on how they efficiently and

effectively create, locate, capture, and share organization’s knowledge and exper-

tise (Zack 1999). A similar statement can be applied to knowledge processes and,

then, four different stages can be defined as the knowledge process lifecycle:

• Definition – Capture and Acquisition. Knowledge processes are either created byknowledge workers or can be acquired from interaction with others.

• Share – Storage and Retrieval. This stage bridges upstream repository creation

to downstream knowledge distribution.

• Search. This stage represents the mechanisms used to make knowledge process

repository content searchable and how useful that information is.

• Compare – Presentation. The value of knowledge is pervasively influenced by

how it is visualized and the features such visualization tools provide.

6.3 Tools for Knowledge Processes within the ACTIVE Project

This section covers the main tools and approaches we have followed in the

ACTIVE project to deal with knowledge processes. These tools:

• Let knowledge workers create processes and tasks, and manage and structure

them at different levels of details. E.g. A knowledge worker can define a process

for scheduling a meeting as a set of tasks – complete the list of participants,

check their availability, fill out the request to arrange rooms and refreshments,

and wait for confirmation from administration

• Let knowledge workers create a process by recording their actions at runtime

• Let knowledge workers follow a step-by-step application to automate and

simplify some of the most common activities.

• Let knowledge workers visualize processes and how the different elements

relate to each other.

6.3.1 Framework for Knowledge Process Management

The goal of such a framework is to support knowledge workers in order to (re)

structure knowledge processes at runtime in a top-down and user-driven approach

(knowledge workers define their own processes, tasks, and related resources) in a

lightweight manner (processes still depend on the knowledge worker who is driving

the process). Thus, the framework provides the conceptual process model and data

structures as well as a set of services and applications to enable knowledge process

management by knowledge workers.


The main components of the framework along with the connections among them

are all displayed in Fig. 6.2. In addition, the framework and its tools are integrated

as part of the ACTIVE Knowledge Workspace. These are:

The Task Pane – It is the front-end offering manual facilities to create and

manage processes at different levels of details helping users to manage their daily

work into processes and more fine-grained tasks. Also, it can be used to associate

resources, like documents, web address, or colleagues, with processes and tasks.

The knowledge worker can also import and export processes and templates or set

the active task in the Task Pane. This approach facilitates to reduce the time for

process switching – the Task Pane offers an immediate way to change between

processes- and alleviate context loss – the Task Pane manages the context in terms

of tasks and information resources associated to a process.

ACTIVE Knowledge Workspace Services

ACTIVETaskbar

Recording

TaskPane

TaskWizard

TemplateManager

TemplateRepository

Task Repository

Con

text

Min

ing

Ser

vice

patterns

Tas

k S

ervi

ce

Sem

antic

Med

ia W

iki Task

Manager

templates instances instances

Fig. 6.2 Framework for knowledge process management


The Task Recording (Fig. 6.3) – It enables knowledge workers to automatically

record the sequence of actions on his system (e.g. opening a document, browse a

web page). Conceptually, it is comparable to macro recording which was used in

many applications (e.g. Office Word).

The knowledge worker can start, pause, and stops the recording session by

pressing the corresponding buttons. The recording collects all the events and adding

some metadata (e.g. associated resource, title, keywords, etc.). After stopping the

recording, the user can rearrange the recorded sessions, structuring the sequence of

actions by introducing new processes and tasks or by grouping actions into existing

ones.

Task Wizard – The manipulation of daily activities, in form of templates or

instances, requires a considerable effort. However, it is likely to be easier to

perform this task using a wizard, especially for complex or infrequently performed

tasks where the user is unfamiliar with the steps involved. Then, the Task Wizard

guides knowledge workers through a set of steps with a particular purpose: select a

template or instance based on some keywords, tag cloud, or according a recom-

mendation; create a template or instance from a previous one or from scratch, where

the wizard guide knowledge workers with suggestions on task balance or related

tasks; finally, share templates and instances into local and community repositories

(e.g. storing templates in a Semantic MediaWiki).

Task Service – It is the core of the framework for knowledge processes.

It defines the knowledge process model and the corresponding service interface to

manage processes, tasks, and resources, besides other operations (e.g. tagging).

Furthermore it tracks the invocation of tasks defined by an actor, the time, and the

state the process is in. Such invocation information is exploited by the bottom-up

approach (mining actions and events to come up with process prediction models,

see Chap. 7).

Fig. 6.3 Task recording


Semantic MediaWiki – This tool is used as collaboration and sharing platform

for templates. Thus, while knowledge processes are created on the client side,

domain expert can create templates (manually from scratch or from knowledge

processes, or using mining techniques) to be stored inside a Semantic MediaWiki to

be visualized, discussed and exchanged with other knowledge workers. A

screenshot of the Semantic MediaWiki as corporate knowledge process template

repository can be seen in Fig. 6.4.

Context Visualizer – In addition to these tools, a key factor for increasing the

productivity of knowledge workers is to help knowledge workers manage and

understand their daily collaborative processes. These are lightweight and highly

flexible processes carried out by knowledge workers and highly depending on the

context which defines the boundaries and environment of processes, and how

decisions are taken. Therefore, bearing in mind that each complex collaborative

process occurs in some context, the articulation of such a context would help

understand the underlying relationships within collaborative processes. However,

as any other complex system, contexts can be large in size and complexity, difficult

to understand and control. Thus, and although visualization of complex collabora-

tive processes can be addressed from different aspects, there is a need for new ways

of visualizing complex contexts which can assist knowledge workers in a better

understanding of their dynamics.

The Context Visualizer (Fig. 6.5) help users to understand complex collaborative

processes through the visualization of their corresponding contexts including

elements related to a working context (knowledge processes, people, and

resources), contextual information about context and elements, direct relationships

Fig. 6.4 Semantic MediaWiki as template repository for knowledge processes


between related elements in the context (in red), different icon size showing its

relevance in context, and filtering options (e.g. based on resource type).

6.3.2 Collaborative Process Development

As stated above knowledge workers should be able to define their own processes

and to share them as templates among each other. The components of the frame-

work for knowledge process management support the sharing of process templates

created by individual knowledge workers. The aggregate knowledge of a large

group is superior to the knowledge of one or a few experts. Thus, it is equally

important to provide knowledge workers with means to develop processes

collaboratively.

Knowledge workers have different experience with modelling processes. Thus,

there are usually knowledge workers involved, who are novice in process

modelling. Recker et al. (2010) have investigated how novice model business

processes in the absence of tool support. Their findings are that design representa-

tion forms chosen to conceptualize business processes range from predominantly

textual, to hybrid, to predominantly graphical types. They have also discovered that

the combined graphical and textual types achieve higher quality scores. Another

survey, analyzing the used modelling constructs of BPMN, shows that in most

BPMN diagrams, less than 20% of the BPMN vocabulary are regularly used and the

most occurring subset is the combination of tasks and sequence flows (zur Muehlen

Fig. 6.5 The Context Visualizer


and Recker 2008). Based on these findings, requirements for collaborative process

development can be derived:

• Manual modelling support for novice users. Knowledge workers, who are

novice in process modelling, need manual modelling support, so that they can

create, extend and follow the processes without the assistance of an expert. The

tool requires a rich user interface providing the user with means for interacting

with processes in a highly intuitive manner. As a result this leads to trade-off

between expressivity offered to develop the formal process model and the

usability of the tool.

• Collaboration support. Knowledge worker must be able to discuss process

models asynchronously. Changes of the process model have to be tracked and

knowledge workers should be enabled to access the version history and to revert

to previous versions. In addition design rationales should be documented.

• Structured process documentation support. The process models must be stored

in a machine-processable documentation format, including semantic

representations. Knowledge workers must be able to interlink between process

descriptions and external resources

As a solution to support such a collaborative, distributed, and iterative process

development, we combined Semantic MediaWiki (SMW) (Kr€otzsch et al. 2007)

with the graphical open-source process editor Oryx (Decker et al. 2008), allowing

the use of formal semantics in combination with natural language to describe

processes. SMW was extended to be compatible with the Oryx to act as a process

knowledge repository. In addition, the graphical editor was extended to display and

edit wiki pages at the bottom of the screen as shown in Fig. 6.6; as a consequence

user can directly access the corresponding wiki page within the process editor.

Fig. 6.6 Wiki-based process editor


We support the Basic Control-Flow Patterns introduced in Russell et al. (2006).Every single process step (task) is represented as a wiki page belonging to category

Process Element and linked via the properties has Type to the corresponding type

(Task) and Belongs to Process to the corresponding process, represented as a wiki

page (process summary page) in SMW. An activity is the basic element of our

process. Depending on the granularity level of the process this can vary from atomic

activity, such as open a web page, to activities describing a whole subprocess.

To express the control flow of the process, we use edges in the diagram and special

predefined process elements (gateways). If an element has a successor we draw an

edge from the activity to the successor activity in the diagram and store this with the

additional property has Successor on the corresponding wiki page in SMW. For

more successors executed in parallel (parallel-split pattern), a Parallel Gateway isused in between the activities. An activity can have several successors, but only one

has to be selected and executed (multi-choice pattern). Therefore we use the Data-based Exclusive Gateway without conditions. The Data-based Exclusive Gatewaywith conditions is used to split based on a condition (exclusive-choice pattern).

A condition is stored as an n-ary property. The distinction between the synchroni-

zation pattern and the simple-merge pattern is realized by using the ParallelGateway and the Data-based Exclusive Gateway the other way round to merge

different branches of a process.

The advantages of such an approach are:

• The combination of natural language with formal semantics allows collaborative

modelling for both novice and experts. Textual and graphical elements can be

used interchangeably and complementarily. If the user does not know the

graphical representation of a process element, natural language can be used to

describe it.

• This approach uses an extendible underlying schema. Users can introduce their

own properties in the wiki by using the SMW property syntax on the process

element wiki page. Thus, processes can be linked to existing knowledge

structures (e.g. what input documents are used, can be made available and

processable for computers).

• Standard wiki features can be used for process modelling, like versioning, watch

lists, reverting, etc.

• SMW acts as a process repository where processes and their process semantics

are stored. Process knowledge can be linked, queried and displayed on process

pages and on other wiki pages.

6.3.3 Refactoring and Optimization

The variety and ways of carrying out knowledge processes and the underlying

complexity leads to the need of tools for refactoring and optimizing them in order to

help knowledge workers either restructure existing knowledge processes or select a

follow-up action, and eventually, to increase knowledge processes performance,

dynamicity and flexibility.


For this reason, knowledge processes need to be quantified to make them

comparable in terms of the different factors which might influence such processes.

On the whole, three factors influence knowledge processes: Firstly, business pro-

cesses which might trigger knowledge processes; secondly, knowledge processes

which might trigger another knowledge processes of another person because their

social interaction (e.g. Bob has to prepare a presentation and could ask Alice to

provide some input to him because he knows she prepared some other related

presentation); thirdly, existing knowledge from user experience or even by using

document repositories or Wikis within the organization.

Based on this challenging motivation to make different knowledge processes

comparable in order to refactor and optimize them as part of the ACTIVE project,

we have developed a three-stage framework:

1. Metrics and measures to quantify knowledge processes

In order to make knowledge processes comparable, a set of measures and metrics

are defined to quantify knowledge process instances (also called knowledge process

traces). Those metrics are important to support semi-automatic refactoring of

knowledge processes and are based on the aforementioned factors. In particular,

metrics of business processes and workflows about complexity and costs for knowl-

edge processes were adapted. In addition to this, the metrics do also cover some

knowledge worker specific values based on decisions about follow up actions within

a knowledge process using his skills and context. This is an important distinction

between business processes and knowledge processes so that we could add addi-

tional metrics which reflects skills and roles of knowledge workers directly.

We categorized the metrics into: measurable metrics (like size, performance or

external costs), user-dependable metrics (like skill or feasibility), and qualifiable

measures (like quality of the result or satisfaction).

Furthermore, as aggregation of these measures, an indicator called knowledge

process trace indicator provides an overall metric to compare knowledge processes.

Such aggregation is performed by a logic score of preferences (LSP)-based approach

(Dujmovic 2005) which enables to express the knowledge process trace indicator as

a function of metrics of all these categories. This provides a highly flexible method-

ology to aggregate different metrics. Basically, the usage of weights for each of the

values makes this approach adaptable and tuneable for different perspectives.

2. Foundation for knowledge process optimisation

From the formal definition in the previous stage, the basic libraries for quantifi-

cation and calculation of metrics and measures were implemented in order to be

reused in visualisation, refactoring, and optimization.

3. Tools for refactoring and optimization of knowledge processes

As final step, we provide a tool to facilitate refactoring for knowledge processes.

A refactoring tool typically lets knowledge workers improve knowledge processes

by applying a series of small behaviour-preserving transformations (e. g. join two

tasks) even promoting reuse (e.g. copy a task from one knowledge process to


another) while the cumulative effect of each of these transformations is quite

significant. By doing such modifications in small steps you reduce the risk of

introducing errors while improving design, efficiency, and flexibility.

Figure 6.7 shows a screenshot of the tool divided into three main sections: on the

left, the knowledge process instance selected by the user to be compared and

refactored; on the centre, a set of related knowledge process instances. These know-

ledge process instances can come from two different sources: on the one hand,

because they may have been created from the same template; on the other hand, they

may have been detected by the mining algorithm; on the right, the knowledge

process template related to the knowledge process instance selected by the user.

The combination of these tools might be exploited in several types of scenarios.

For example, when a user starts such a knowledge process, typically only a small

amount of information about the process and the context is available. Thus, the log

data is insufficient for process mining. In this case, user interaction is exploited and

the process refactoring becomes semi-automatic which helps to optimize the process

more quickly and reliably. In addition to this we even might offer refactoring

recommendations highlighting main difference between knowledge processes.

6.3.4 Security and Privacy Issues

The scenario in this knowledge economy is more and more collaborative and

heterogeneous, where knowledge is defined, used and shared across different

groups and domains. Sharing this knowledge allows users to create social

relationships, forming working and knowledge-based groups beyond organizations,

Fig. 6.7 The refactoring tool


but it also poses new challenges and risks in terms of the security and understanding

that new shared information.

This scenario raises the following two needs: (1) Knowledge workers need to be

provided with tools to create and manage virtual boundaries where knowledge is

shared with other workers. This means having flexible and powerful security

mechanisms and policies which can be defined dynamically and offer inference

mechanisms; (2) Knowledge workers need to be provided with visualization tools

to understand the complex relationships within knowledge processes.

These statements hold for any company where the typical scenario is collabora-

tive, decentralized and heterogeneous, and knowledge processes are shared across

the same and different domains and organizations. In this context, these virtual

boundaries, which connect different users and groups through the sharing of

knowledge processes, and which are tied by different security mechanisms, are

here called Knowledge Spheres.

In addition to the term, we propose a Knowledge Sphere Framework, a flexible

ontology-based framework to handle security and privacy and automate sharing

knowledge processes by knowledge workers across organizations. In a nutshell,

Knowledge Sphere Framework offers the underlying infrastructure to

manage Knowledge Spheres and their security policies and a graphical tool to

represent them facilitating an understanding of their relations and related contents.

In more detail, the main components are:

Knowledge Sphere Ontology – This ontology brings concepts and properties todefine how knowledge-based communities are created and formed, and how secu-

rity and privacy policies are defined and satisfied. The Knowledge Sphere Ontology

combines and extends two other ontologies: SIOC (Semantically-Interlinked

Online Communities)4 for describing information from online communities, and

used in the Knowledge Sphere Ontology to create Knowledge Spheres; and, the

KAoS ontologies5 for describing independent-platform policy originally oriented to

dynamic and complex software agents applications, and later adapted to grid

computing and Web Services, and used in the Knowledge Sphere Ontology to

define the security policies to be applied to Knowledge Spheres. While SIOC is

very limited in terms of security or access control mechanisms, KAoS does not

cover the actions associated to security policies required to disclose knowledge

processes (e.g. allow access and give modifications rights). An example of how the

Knowledge Sphere Ontology looks like is shown in Fig. 6.8 for the case of defining

a Knowledge Sphere – individualKS1 – with a positive authorization policy –

policyKS1_A – with two access actions: a read access action – actioniSO-COLabRead – for people from iSOCOLabGroupMadrid and a write access action

– actioniSOCOLabWrite – for people from iSOCOLabGroupMadrid having admin-

istration role.

4 http://sioc-project.org/ontology.5 http://ontology.ihmc.us/.


Knowledge Sphere Service – It manages all the related actions for managing

Knowledge Spheres, including a security module which infers if users have

permissions to carry out the action in the given Knowledge Sphere. This service

runs ontology inference on the Knowledge Sphere Ontology, transform security

policies into semantic rules, and eventually resolve the permission request.

An example of the Knowledge Sphere Service is as follows: A given the user –

John Sowa – wants to carry out an action – Delete a knowledge process – on the

Knowledge Sphere – Proposal making. The Knowledge Sphere – Proposal Making– has several policies assigned. For John Sowa being able to delete a knowledge

process there must exist a policy that grants him permissions. In this example, there

exists a security policy describing that members of the group – ProposalAdministrators – can execute the Action – Delete Knowledge Process – related to

the Knowledge Sphere. Since John Sowa is not member of this group he has no

rights to perform the action and delete the knowledge process.

Knowledge Sphere Visualization Tool (Fig. 6.9) – This is a graphical tool to

represent and visualize Knowledge Spheres, facilitating the understanding of their

relations and related contents. It comprises three main parts: the upper menu bar,

the right hand side Diagram Pane that shows the graphical representation of the

Knowledge Spheres, the left hand side Information Pane that shows information

about the elements explored in the Diagram Pane.

6.4 Conclusions

As a consequence of the increasing exchange of information and collaborative

decision among knowledge workers, the interest on knowledge processes

has risen. These knowledge processes are different to business processes with an

own nature: they are user-driven, informal, dynamic and adaptive, based on

Fig. 6.8 Knowledge sphere ontology example


communication, not-well structured, dependent on skills, knowledge and experi-

ence of users. Moreover, the performance on knowledge processes marks the

productivity of knowledge workers and critically determines the eventual success

or failure of a corporation in the knowledge society

In this chapter, we provide an insight into knowledge processes as a key

challenge for modern organizations, the main differences with business processes,

and approaches to address their management. The success of an effective exploita-

tion and support of knowledge processes are based on providing useful tools and

applications for knowledge workers to deal with knowledge processes facilitating

their comprehension and according a concrete contextual factors which are relevant

to the process.

With this purpose, we present several building combined in a holistic view for

knowledge process management to enable the knowledge worker to improve his

daily work in terms of formal and informal processes (Fig. 6.10). Basically, the

steps can be combined as part of knowledge process lifecycle as follows: Define:

the knowledge worker defines a sequence of tasks and actions as a knowledge

process. In addition, a manager or a domain expert can define some process

Fig. 6.9 Knowledge sphere visualization tool


templates to be reused; Share: using the Semantic MediaWiki for sharing the

process templates with others and enable it for collaboration and improvements.

A similar statement can be applied for knowledge processes; Search: there are

various ways to search for processes, such the knowledge sphere visualisation;

Compare, the refactoring tool supports the comparison of process templates with

instances. This provides feedback about real executions compared to the envisioned

templates; Redefine: the features supported by the Task Pane and the knowledge

from the comparison can be used to redefine and improve knowledge processes and

templates.

References

Davenport TH (2005) Thinking for a living: how to get better performances and results from

knowledge workers. Harvard Business School Press, Cambridge

Decker G, Overdick H, Weske M (2008) Oryx – an open modeling platform for the BPM

community. In: Dumas M, Reichert M, Shan MC (eds) Business process management, vol

5240, Lecture notes in computer science. Springer, Heidelberg, pp 382–385. doi:10.1007/978-

3-540-85758-7_29

Dujmovic J (2005) Continuous preference logic for system evaluation. Department of Computer

Science, San Francisco State University, San Francisco. doi:10.1109/TFUZZ.2007.902041

Hill C, Yates R, Jones C, Kogan SL (2006) Beyond predictable workflows: enhancing productivity

in artful business processes. http://www.research.ibm.com/journal/sj/454/hillref.html. Accessed

24 Nov 2010

Fig. 6.10 Knowledge process lifecycle covered by the ACTIVE tools


Kotlarsky J, Oshri I (2005) Social ties, knowledge sharing and successful collaboration in globally

distributed system development projects. Eur J Inform Syst 14:37–48. doi:10.1057/palgrave.

ejis.3000520


Semantics 5:251–261. doi:10.1145/1135777.1135863

Recker J, Safrudin N, Rosemann M (2010) How novices model business processes. In: Hull R,

Mendling J, Tai S (eds) Business process management, vol 6336, Lecture notes in computer

science. Springer, Hoboken, pp 29–44. doi:10.1007/978-3-642-15618-2_5

Russell N, Ter Hofstede AHM, van der Aalst WMP, Mulyar N (2006) Workflow control-flow

patterns: a revised view. Technical report BPM center report BPM-06-22, BPMcenter.org,

doi: 10.1.1.93.6974

Still K (2007) Exploring knowledge processes in user-centred design. Electron J Knowl Manag

5:105–114. doi:10.1.1.93.1675

Strohmaier M (2003) A business process oriented approach for the identification and support of

organizational knowledge processes. In: 4. Oldenburger Fachtagung Wissensmanagement,

Potenziale – Konzepte – Werkzeuge

Weske M (2007) Business process management: concepts, languages, architectures. Springer-

Verlag, Berlin, Heidelberg. doi:10.1007/978-3-540-73522-9

Zack MH (1999) Managing codified knowledge. Sloan Manage Rev 40(4):45–58

zur Muehlen M, Recker J (2008) How much language is enough? Theoretical and practical use of

the business process modeling notation. In: Bellahsne Z, Lonard M (eds) Advanced informa-

tion systems engineering, vol 5074. Springer, Montpellier, pp 465–479. doi:10.1007/978-3-

540-69534-9


7

Machine Learning Techniques forUnderstanding Context and Process

Marko Grobelnik, Dunja Mladenic, Gregor Leban, and Tadej Stajner

7.1 Introduction

Machine Learning techniques have been developing since the middle of the twenti-

eth century. Their original focus was on game-playing programs and then moved on

to finding regularities in database records (of patients, supermarket transactions,

credit card activity, stock exchange activity, etc.), modeling scientific measure-

ments (equation discovery, ecological modelling, etc.), speech recognition, user

modeling and spam filtering. Current applications include real-time data modeling

(real-time sensing behavior, tracking moving objects), semantic data annotation,

analysis of semantic sensor networks and monitoring real-time cyber data to track

interest and opinions. Capabilities of the available technology offer practical

benefits but also raise privacy issues (Mitchell 2009). In this chapter we focus on

the machine learning techniques themselves and do not deal with privacy or other

related issues.

7.2 Machine Learning

In general, we can say that Machine Learning seeks to answer the question “How

can we build computer systems that automatically improve with experience, and

what are the fundamental laws that govern all learning processes?” (Mitchell 1997).

More formally, a machine learns with respect to a particular task T, performancemetric P, and type of experience E, if the system reliably improves its performanceP at task T, following experience E. Depending on how we specify T, P, and E, the

M. Grobelnik (*) • D. Mladenic • G. Leban • T. Stajner

Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova 39, Ljubljana SI-1000, Slovenia

e-mail: [email protected]; [email protected]; [email protected];

[email protected]


127

learning task might also be called by names such as data mining, autonomous

discovery, database updating, programming by example, etc. (Mitchell 2006).

For instance, we can have a task T: Categorize email messages as spam orlegitimate, a performance metric P: Percentage of email messages correctly classi-fied and experience E: Database of emails, some with human-given labels.Supervised machine learning techniques can be applied on the database of

e-mails to build a model for classifying e-mails as spam or legitimate. Usually

these would require representation of the data as feature-vectors, so we need to

define a set of features that can be used for describing e-mails and whose values we

can obtain from the database of e-mails (e.g., date of sending the e-mail, words in

the e-mail subject, words in the e-mail body) and sometimes including additional

knowledge sources (e.g., professional skills of a person receiving e-mail).

7.2.1 Data Representation and Modeling

Data points that are used in Machine Learning are usually represented as feature-

vectors in an n-dimensional space capturing important characteristics of the data.

Depending on the data that is represented, features can have numerical values

(e.g., size of an e-mail) or categorical values (position of an e-mail receiver in a

company organizational structure). When representing text, it is common to assign

a feature to each word in a document collection and represent each document as a

vector reflecting word frequencies (Mladenic 2007). When representing images,

features can correspond to pixels on an image.

The number of feature also varies depending on the data and task, in some cases

the original representation of the data is in a very high dimensional space and some

dimensionality reduction needs to be applied in the pre-processing steps of the data

analysis (Mladenic 2006).

In addition data points can have labels (e.g., e-mail can be categorized as

legitimate or spam) or not. Addressing tasks that needs labels means applying

supervised machine learning techniques (e.g., classification) for modeling the

data, while the tasks that do not care about labels suggests that unsupervised

machine learning techniques (e.g., clustering) should be applied for modeling the

data.

Following the previous example on e-mail filtering, we can have a collection of

e-mails that we would like to organize into folders assuming no folders exist.

Representation of e-mails using feature vectors can be similar to that for the

classification task, just that the output is a model grouping similar e-mails together.

These can be further used for classifying new e-mails as belonging to one of the

identified groups.

The latter is similar to the scenario that we have followed in modeling context.

First unsupervised machine learning techniques are used for context discovery,

based on the history of users’ activity. Then the current user activity is classified

128 M. Grobelnik et al.

into contexts and related resources are recommended for probable use in the near

future.

The rest of this section describes machine learning techniques that are directly

relevant for understanding context and process.

7.2.2 Multi-relational Clustering

The problem of identifying parts of data collections relevant for each context can be

approached as a multi-relation clustering problem. The motivation behind using

relational clustering for context modeling is the fact that a domain data model of

knowledge work can be modeled as a graph with multiple types of nodes: the

knowledge workers themselves are one type, the information resources they are

using are another, and the events that represent the uses of the resources are another.

Since the different nodes are also of distinct types, we cannot make the assumption

of identical and independent distribution of features, which is implicitly assumed

when using classic clustering techniques which are agnostic of node type (Banerjee

et al. 2007). For instance, assume that we have three tables. (1) A table of people

working on the contexts, giving some basic information on each person. (2) A table

of documents written in the contexts, having text of each document and a list of its

authors. (3) A table of events recording that a person has accessed a document at

some time (related to a table of people and a table of documents). In general, one

can assume that knowledge workers while working on one project, switch between

different tasks related to the project and that each task consists of a set of actions

performed in more or less fixed order. By multi-relational clustering we would like

to cluster people, documents and events so that events, related for the same project

are clustered together and events, related to the same action are clustered together.

Besides choosing an appropriate algorithm to fit the data model, another impor-

tant aspect of context discovery is selecting appropriate features (Stajner et al.

2010). This varies from domain to domain, but unless we have supervised examples

to do features selection from, we can resort to the following guidelines: at the data

level, we can distinguish between different contexts by different content keywords,

resources and people, involved in the events. This means that context discovery

uses literal names and affiliations of people as features, as well as the literal

contents of the document, since these are the features we have determined to be

important for describing a context.

7.2.3 Semi-supervised Clustering

While clustering has proven to be effective in exploratory and large-scale

context discovery, context modeling for the enterprise environment in some cases

imposes stricter standards on accuracy on context quality than is available from

7 Machine Learning Techniques for Understanding Context and Process 129

unsupervised methods. From the usability point of view, the semi-supervised aspect

also lets the knowledge worker feel more in control over the whole context

definition process, delivering overall better quality and more meaningful clusters.

In recent years, there has been a lot of interest in the development of semi-

supervised methods for clustering data which accommodate supervision from the

user. Most often, such supervision is in the form of constraints (Basu et al. 2008),

for instance using must-link and cannot-link constraints over pairs of data points

(Wagstaff and Cardie 2000) or cluster-level balancing and size control (Davidson

and Ravi 2005). However, instance-level models for supervision require that the

human supervisor understands and visualizes the whole collections of data points in

its entirety, which is rarely practical. On the other hand, cluster-level models may

be too coarse a tool for guiding the clustering in a desired direction.

As a complementary supervision technique, we can also apply feedback from the

user as conditional constraints (Dubey et al. 2010). For instance, instead of asserting

the exact pair-wise linkage constraints between data points, the user can give

assignment feedback, where he reassigns a data point to a more relevant cluster,

or alternatively, cluster description feedback, where the user modifies a cluster’s

feature vector to a more meaningful description.

These constraints are commonly implemented as iterative feedback loops

gathering supervision data in iterative steps from the user. They can also be inte-

grated into the k-means framework as an additional criterion for minimizing con-

straint violation while still maintaining low error. Experiments in (Dubey et al. 2010)

conclude that cluster-level assignment and description feedback enables faster

convergence and better results than pair-wise cannot-link and must-link constraints.

The application of semi-supervised clustering for context mining can be applied

both as assignment and cluster description feedback. When the user explicitly

assigns an information resource to a context, we can interpret that as assignment

feedback. On the other hand, cluster description feedback has also been imple-

mented in a different fashion: instead of letting the user change the context descrip-

tion, we let the user decide with a yes-no question whether the context is relevant.

7.2.4 Sequence Mining

Various flavors of Markov models, such as probabilistic deterministic finite

automata (Jacquemont et al. 2009) have been used for modeling processes, assum-

ing that a process can be seen as a collection of sequences. Modeling process then

can be addressed as identifying sequences in the data and connecting them to a

meaningful process model. The general idea behind this is that by learning a

conditional probability model for the sequences, we are able to implement predic-

tive functionality into the knowledge worker’s workspace. In practice, there are

several applications where predictive functionality based on sequence mining can

be used (Dong and Pei 2007).The first example is suggesting the most likely

information resource for a given moment. For this application, we look at the


history of the knowledge worker’s resource usage – internet browsing history,

opening various documents, communication via e-mail. From that, we represent

these individual events as actions, giving them a level of abstraction. For instance,

some deliverable document edits can be seen as edit-project-document actions. Theprocess model then captures the sequential dependencies between these events and

gives us a conditional probability of any resource being required at the current time,

given that we are aware of the immediate history of information resource usage. For

instance, imagine that we are editing a project-related document (such as a deliver-

able), The model can then suggest that of all possible different future actions, we

will likely send a message to a project partner soon or open a similar project-related

document. The advantage that this prediction provides is that we can maintain a list

of the top most likely resource in a given time, shortening the overhead of having to

look for them.

Figure 7.1 shows an example of such a sequence-based resource suggestion,

showing two distinct sets of recommendations. Furthermore, it demonstrates inte-

gration of both context and sequence mining – besides sequence information, we

also favor resources which are relevant for the current context.

Fig. 7.1 Different sets of resource predictions, the first predicting most likely resources after

opening a “validation and testing” document, and the second after opening a document with “sdk”

technical documentation


When designing a sequence mining application, there are several different

aspects of the problem that require consideration in choosing a good model for

implementing them.

The first question that needs to be answered is the relationship between events

and actions: should we define them atomically as one-to-one, or can a single event

represent multiple actions? For instance, an ordinary email message may in some

domains only be an e-mail-to-project-partner, but in other domains, it may at the

same time be the action of e-mail-proposal-to-key-client. This requires us to model

a single event with multiple features, which is a different class of sequence models.

A second consideration is on the order of the model – a first-order model only

captures direct sequential dependencies, while a higher-order model can predict

more interesting and longer sequences, similar to what we can produce with n-gram

language modeling.

A third consideration is on the definition of “sequence” itself. Do we consider

the sequence to be strict, as in A-strictly-follows-B, or do we impose a less strict

A-before-B.All of these variables affect the complexity of the model and impose constraints

on the scalability of the approach while giving us richer models.

7.3 Context and Process

Context as addressed in this chapter is used as a term for grouping/packaging

information for a particular need. A criterion for selecting or prioritizing informa-

tion from a broader pool of information can be called a contextual model. We can

say that a good contextual model is the one which selects the right information fora particular need in some cases that may be related to personalized information

delivery.

Process as addressed in this chapter is seen as a sequence of tasks or activities

performed in order to achieve some goal. More about knowledge processes is

described in Chap. 6. Here we focus on using machine learning techniques for

automatically grouping the actions of the user into tasks and detecting common

sequences of tasks that can occur independently of the context.

In the research literature (Guo and Sun 2003; ECAI 2006; VUT 2007) we

can find different views on context depending on the aspect that is observed

and the purpose of using context. For instance, in Wikipedia context analysis

(http://en.wikipedia.org/wiki/Context_analysis) is described as a method to analyze

the environment in which a business operates. Depending on the aims of observing

context, context can be defined from a formal point of view, in terms of its

dynamics, scenarios of its usage, type of data it is modeled from, its usage/user

base, importance of efficiency when talking about context, acquisition of context or,

evaluation of context. Examples of context characteristics along several dimensions

are the following:


• Formalism: logic based versus probabilistic descriptions

• Dynamics: static versus temporal/dynamic contexts

• Scenarios: global model versus multiple local models

• Cross modal: modeling contexts across different data modalities (text,

multilinguality, social networks, audio, images, video, sensors)

• User aspects: context for single user versus user communities versus contexts

for machine processing

• Efficiency: expressivity versus scalability

• Acquisition: manual versus semi-automatic versus automatic approaches

• Best practices: economy of different application scenarios

• Evaluation: information compression (not really addressed yet)

7.3.1 Representation

Context representation as proposed in this chapter is built on the top of the Time-

Network-Text (TNT) store (Grobelnik et al. 2009) that enables capturing and

storing the data for further processing using context mining and task mining

approaches. The context mining capabilities of the TNT store aim at identifying

the main context of the knowledge worker from a streamed log of events. Contexts

are represented as abstractions of the TNT events, where each event has three

components: Text, Network, and Time as defined by TNT model. Notice that

several TNT events can overlap in the content of their components. For instance,

a person sending five e-mail messages appears in five TNT events as a sender and

that can be helpful in identifying context and tasks.

The representation of TNT events captures information about data objects

involved in the TNT event and it is used as a basis for context representation.

Basing context representation on data objects, we examine the characteristics of the

data objects that are available within an enterprise environment. In the proposed

context definition, each data object (e.g. documents, email, etc.) is represented as a

vector of a potentially large number of features. These features encode several

properties of the data; the following are the most important:

• Meta data about the application taking part in the event, email metadata,

document metadata

• Social graph of people, organizations, roles, social relations, entities in the

documents

• Terms from documents such as bag of words, named entities, concepts,

annotations

• Temporal information such as absolute and relative time, day of the week, part

of the day, seasons

• Relational information from ontologies or extracted relationships

The underlying data structure for the input data is defined in a general way to

capture the diversity of input data-sources, to capture temporal dynamics of data, to


enable automatic building of models, to be guided by users’ suggestions if available

and, the last but not the least, to be scalable. The data structure we use to support the

above objectives is a dynamic network with extra information on nodes and links

as detailed in the description of the TNT store (Grobelnik et al. 2009).

7.3.2 Modelling

When considering methods for mitigating information overload of knowledge

workers, a common approach that is considered is partitioning the space of infor-

mation objects into several contexts. For our purposes, we define each context as a

grouping of information or data objects for a particular need. We refer to this

grouping as a context model. A knowledge worker can resort to manual context

model definition or modeling methods, based on knowledge management tools.

In an actual enterprise setting, the effort required for manual context model defini-

tion may often outweigh the benefits provided by partitioning information resources

in contexts. To lower the barrier to entry to exploiting the benefits of contextual

assistance applications, we can also consider using machine learning methods.

More formally, we define a context model as a function being in the form for

classification Contexti ¼ ContextModel(DataObjectj), or in a form for membership

queries Belonging ¼ ContextModel(Contexti, DataObjectj), or in a form of a scor-

ing function BelongingScore ¼ ContextModel(Contexti, DataObjectj), where

DataObjectj is in our case data on TNT event. As already described, we assume

that ContextModel can be

• Trained automatically from pre-labeled data (supervised learning, e.g. classifi-

cation algorithms like SVM)

• Trained semi-automatically from partially labeled data in user interactive

processes (semi-supervised learning, e.g. active learning algorithms like uncer-

tainty sampling)

• Discovered from unlabeled data (unsupervised learning, e.g. clustering

algorithms like K-means)

• Defined manually, for instance as a set of rules

Different approaches can be used that explore the trade-offs on varying amounts

of control that we allow the user to have over the context modeling process versus

the amount of modeling effort required, as well as describing possible design

decisions for developing solutions based on context models.

7.3.3 Deployment

Once we obtain an appropriate context model, we can deploy it for supporting the

user in her activity. When the context model is available, machine learning

techniques can be used to detect the user’s current contexts. Once the knowledge


worker starts doing something unrelated to his current context, for instance opening

a document related to another project, a shift is detected and the system may suggest

a switch to a different context – one which better reflects the user’s recent activity.

If the user then accepts this switch, her workspace is then put in another context.

There are several key points that should be addressed in the whole process of

context discovery and detection to make this functionality more usable.

First, experience suggests that users expected tightly integrated top-down and

bottom-up context definition, meaning that all information that they supplied to the

system should be also integrated in the discovery process. Therefore, newly discov-

ered contexts take into account all previous definitions and associations of contexts,

to avoid suggestion of a new context very similar to some existing one. This aspect

led us to improve on scalable online semi-supervised clustering algorithms which

we applied to context mining. Second, we found that correctness of context detection

was critical to knowledge workers, as false positive suggestions disturb their

workflow – an issue we wanted to avoid in the first place. As it turns out, when a

user associates (or does not associate) a resource with a context, it means muchmore

than a user accessing the information resource when in that particular context.

Therefore, fully supervised models are now used for context switch detection. In

terms of user involvement, this means that upon a successful context discovery, the

user is presented with a list of possible resources, out of which he selects the ones

which are relevant for learning a model for that context.

To summarize – for context discovery, which has a more exploratory nature,

semi-supervised methods are very practical in terms of suggesting interesting

relevant contexts with very little manual work. However, when actually supporting

knowledge workers in real-time by suggesting context switches (context detection),

highly accurate methods with stricter supervision are crucial.

7.4 Example Application: Context Mining on Emails

As a demonstration of context mining let us consider the domain of emails. In the

current technological era, emails definitely play a big part in our everyday life.

A May 2009 report by a research company The Radicati Group (Radicati 2009)

estimates that there were 1.4 billion email users in 2009 and the number is expected

to rise to 1.9 billion by 2013. The same source also suggests that there are 247

billion emails sent each day (notice that the majority of these emails are spam which

is automatically blocked by spam blocker) and predicts that the traffic will double to

507 billion emails per day by 2013. These predictions are definitely concerning,

since people are already overloaded with emails that they have to process, organize

and make sense of.

Despite email’s popularity, email clients unfortunately have not changed signif-

icantly since the beginnings of email in 1970s and still provide only the basic

functionality. The user interface of a state-of-the-art email client, such as Microsoft

Outlook 2010, displays a list of email folders on the user’s account, a list of emails


and a reading pane. Email clients typically allow the user to sort the list of emails by

different criteria, such as the date or sender of the email, but there are no ways to

display emails related to a specific context. If the user would like, for example, to

see a list of emails that he recently exchanged with other partners on a certain

project then he basically has two options. He can either search the whole list of

emails and filter out irrelevant emails or he can create and maintain a special email

folder where he puts all emails related to that group of people. Both options are very

inefficient since they require the user to do all the work manually.

To help users be more efficient when handling emails we developed an

add-in for Microsoft Outlook called Contextify (Leban and Grobelnik 2010). It is

intended for heavy email users who receive tens of emails per day and possibly

work on several projects simultaneously. The main features that are currently

supported and will be described are: display of information related to the currently

selected email, social network visualization, visualization of discussion threads,

contact identity management, recipient suggestion, Facebook and LinkedIn inte-

gration, people and email tagging, and automatic folder suggestion. Contextify

supports Microsoft Outlook 2007 and 2010 and is freely available at http://

contextify.net.

7.4.1 Displaying Contextual Information in the ContextifySidebar

The Contextify add-in displays information in different windows. One of the

windows is the Contextify sidebar which is placed at the right side of Microsoft

Outlook, next to the email list and the reading pane. Examples of the sidebar are

displayed in Fig. 7.2. Information in the sidebar is updated each time the user

selects a new email and displays content related to the sender of the currently

selected email. The top of the sidebar shows sender’s personal information (Marko

Grobelnik in Fig. 7.2) such as phone numbers, job position, address, photo, etc. The

main part of the sidebar contains a tab control that displays recent emails sent to or

from this person and information extracted from these emails.

The first tab in the tab control contains the list of recent emails (see Fig. 7.2a).

For each email we display the sender’s name, the email’s subject and a short snippet

of the email body. For emails that are a part of a thread we also display a number

indicating the number of emails in the thread. If the user would like to read the

whole thread, he can select “Show Thread”option from the popup menu and

the content of the whole thread will slide into view. Below the list of emails is

also a tag cloud displaying the keywords extracted from the content of the displayed

emails. These keywords can help us to quickly recognize what are the topics

discussed in the emails.

The other three tabs in the tab control display information extracted from the

emails displayed in the first tab. The second tab in the control (see Fig. 7.2b)


(a) related e-mails (b) attachments

(c) social networks (d) web links (e) …when searching

Fig. 7.2 Examples of the Contextify sidebar showing the list of related emails (a), attachments

(b), social network (c) and web links (d). Contextify helping the user when performing a query

searching on the fly while the user is typing (e)


displays the attachments exchanged in the emails. When different people collabo-

rate on a document they often exchange it several times while working on it. To be

able to easily see the whole history of a document, Contextify groups files with the

same filename and displays them all in the same sub-tree. Each attachment can be

opened or saved by clicking it. The third tab (see Fig. 7.2c) displays people who are

participating in the displayed emails. People are grouped based on their company

which is deduced from, e.g., e-mail address or LinkedIn account (see Sect. 7.4.4).

Clicking a person updates the sidebar with contextual information regarding this

person. The last tab in the control displays (see Fig. 7.2d) the web links that were

exchanged in the emails.

The Contextify sidebar also provides search capabilities as shown in Fig. 7.2e.

When the user enters the query in the sidebar’s search box, emails that match the

query are found and information extracted from these emails is displayed in the tab

control. Contextify currently supports three types of queries. The first type is the

person query where we search for information related to a particular contact (this

type of query is also performed each time we select a new email in Outlook). The

second type is the keyword query where we display information about emails that

contain all keywords that the user specified. The last type of query is a tag query.

The user is able to define custom tags and assign them to specific contacts. An

example where this can be especially useful is when the user would like to be able

to see emails from all people who work on a particular project. In such cases a tag

with the project name could be created and assigned to the members of the project.

When the user then performs a tag search, emails from all the members would be

displayed in the sidebar.

In the examples described we determined the to-be-displayed contextual infor-

mation based on the contact names, tags or search keywords. Alternatively, we

could also use similarity measures or machine learning techniques to find related

emails. Imagine, for example, that we are a person providing customer support.

Each day we receive tens of questions regarding the use of different features of our

product and we have to provide answers to these questions. Since it is likely that

different people have similar questions we can use answers to old questions at least

as a template for answering new questions. In such a scenario we could use a

similarity measure (such as the cosine similarity) and compare a new question with

the previous ones. The computed similarity could be used to rank the questions and

the most similar questions could be listed in the sidebar together with the provided

answer. Such contextualization would be extremely useful for the user as it would

save him countless hours of manually searching through the emails.

7.4.2 Advanced Searching and Filtering of Conversations

More advanced search and visualization functionality is provided in the Contextify

Dialog. The user can use the search box to specify a complex query consisting of a

combination of person names, keywords and tag names. Emails that match the


query are then displayed and grouped into threads based on the email’s subject.

Along with the individual emails in the thread we also extract and display keywords

for each thread which makes it easy to identify the content of the thread. An

example of the Contextify Dialog is displayed in Fig. 7.3.

The bottom left part of the dialog displays email activity (for the given query)

over time. Each bar represents the number of emails received in a particular time

span. The visualization is interactive – the user can select a subset of bars to display

only those emails that were sent in the selected time period.

The right side of the dialog contains a tab control. The first tab displays the social

network of people who participate in the displayed emails. Nodes of the graph

represent people while the edges indicate that there were emails exchanged between

the two connected persons. Font size and color is also used to indicate the intensity

of communication – people who sent more emails have nodes displayed using a

larger font and the darkness of the edge between two persons corresponds to the

number of exchanged emails between them. This helps to quickly identify who are

the main participants in the discussions and who they frequently communicate with.

The social network is also interactive. By selecting a contact all emails from that

contact are highlighted in the list of emails. By double-clicking a contact you can

even hide all email threads where the contact is not participating.

Fig. 7.3 Contextify Dialog where a search was performed using a contact’s name. Since Marko is

selected in the social graph all emails from him are highlighted in the list of emails


7.4.3 Visualization of Discussion Threads

When we collaborate with a group of people it is common to have long email

discussions on a particular subject, often consisting of tens of emails. Such

discussions are typically very difficult to follow since it is hard to see who replied

to which email. What is even worse is that such discussions often have several

backchannels in which different subgroups of people are participating.

To help users more easily understand and follow the discussions we provide a

thread visualization feature. It is displayed in the Thread View tab in the Contextify

dialog. As mentioned, Contextify treats emails that have the same subject (after

removing prefixes such as “Re:”, “Fwd:”) as belonging to the same discussion

thread. When a message thread is selected, emails in the thread are displayed as

boxes in a flow diagram. The diagram starts at the top left corner with the first email

in the thread. The reply to an email is typically drawn under the previous email.

Exceptions to this rule are emails that start a new backchannel – either by adding

new participants or by removing some existing ones. In such cases, a new vertical

branch is created and all following emails with this specific group of participants

are then displayed in this branch. Boxes for emails contain the sender’s name,

attachments if they exist and a snippet of the email’s content. For emails that started

a new backchannel we also list the contacts that were added (in green color with a

“+” sign), contacts that were removed (in red color with a “�” sign) and the existing

contacts that are kept.

In Fig. 7.4a we see an example of a visualization of a relatively simple discussion

with 13 emails. We can see that the discussion was started by Tadej Stajner who sent

an email to Gregor Leban. Gregor replied and on the next day also sent an email to

Marko Grobelnik in which he removed Tadej from the discussion (hence the new

branch). All emails between Gregor and Marko are then displayed in the second

branch. For threads that are very complex with many backchannels, such as the one in

Fig. 7.4b, you can also use the zooming feature to get an overview of the discussion.

7.4.4 Identity Management

Most people own several email accounts. When searching for emails from a specific

person it is therefore important that we are able to show emails from all accounts. In

order to be able to do that, Contextify provides a Contact Management Dialog

where different email accounts that belong to the same person can be merged

together. An example of the dialog is displayed in Fig. 7.5.

Merging of contacts can be done manually or automatically. For automatic

merging Contextify uses the contact’s email address and name as provided in the

emails. This information is first preprocessed where we remove all non-letter

characters and try to obtain words that represent the contact’s name and surname.

A matching algorithm is then used that identifies contacts that are similar enough


Thread visualization with 13 e-mails in the thread

Thread visualization with 31 e-mails in the thread

a

b

Fig. 7.4 Two examples of the thread visualization; (a) a simple thread with 13 emails and (b) alonger thread with 31 emails and many backchannels. The visualization in part (b) is zoomed-out

in order to provide a general overview of the thread


and should be merged. A more advanced approach for merging contacts would be to

also apply the analysis of similarities of social networks. If two email addresses

belong to the same person then it is likely that both email addresses will be used to

communicate with a specific group of people.

To obtain additional information about the contacts, Contextify can also connect

to user’s Facebook and LinkedIn accounts. This information is then displayed in the

top of the sidebar. Along with contact information, Contextify also manages

information about the companies associated with the contacts. When several

contacts specify on LinkedIn an association with the same company, Contextify

tries to induce which domain is associated with the company by inspecting the

email accounts of these contacts. If, for example, five of my contacts specify on

LinkedIn an association with Jozef Stefan Institute, I can discover from their email

accounts that the associated domain is most likely ijs.si. Using this information we

can now associate with Jozef Stefan Institute (JSI) also other contacts with email

accounts from this domain, even if they don’t manually specify this association.

Having contacts associated with companies is useful because company names

represent special tags that can be used when performing a search. Searching for

the JSI tag in the Contextify dialog, for example, displays message threads that

contain people who work at Jozef Stefan Institute (JSI).

Fig. 7.5 The Contact Management Dialog that can be used to manage information about contacts

and companies


The user can use the Contact Management Dialog also to create custom tags and

associate them with specific contacts. These tags can be then used when performing

a search in the sidebar or the Contextify Dialog.

7.4.5 Recipient and Folder Suggestion

Another nice example of contextualization provided in Contextify is the recipient

suggestion feature. When composing a new email, Contextify displays an addi-

tional sidebar in the email composition window. After the user adds a recipient we

list in the sidebar other contacts that have in past emails frequently appeared

together with the added recipient. By clicking on any contact in the sidebar, the

contact is added as an additional recipient. The list of suggested recipients is then

updated and shows also contacts who appeared in emails with the newly added

recipient. The list is updated each time a new recipient is added and is sorted so that

the contacts that will be most likely added next are placed at the top of the list. An

example of the sidebar is shown in Fig. 7.6.

Fig. 7.6 A demonstration of the contact suggestion feature. After we added Marko as the

recipient, Contextify automatically lists other contacts that we are also likely to add as recipients


A new feature that we are currently implementing and which relies heavily on

machine learning methods is folder suggestion. People often organize emails into

folders. Moving each email individually into the corresponding folder can be very

time consuming and we would like to automate this process. In order to be able to

predict to which folder an email should be moved we need to build a classification

model. There are many possible learning features for building the model. People

often place emails based on the participants in the email; therefore one of the most

relevant features will definitely be the sender and all the recipients of the email.

A crucial feature will also be the email’s subject – not only because of the content of

the subject but also because emails that belong to the same thread should most

likely be in the same folder. Folder prediction could also be based on the content of

the email. In this case we would use the current folder structure and treat the content

of all emails in one folder as one large document. We could then apply the cosine

similarity measure to find to which document the current email is most similar and

move the email to the corresponding folder.

After representing e-mails with features we could use any of the classification

methods to build a classification model. The choice of the model depends on our

goals. If we are primarily interested in the accuracy of the predictions, we would

most likely achieve best results by applying the SVM algorithm. If we also need

to provide the user with the explanation for our prediction, then methods such as

if-then rules or decision trees should be preferred.

References

Banerjee A, Basu S, Merugu S (2007) Multi-way clustering on relation graphs. SDM SIAM,

Minneapolis, MN April 26–28, pp 145–156

Basu S, Davidson I, Wagstaff KL (2008) Constrained clustering: advances in algorithms, theory,

and applications. Data mining and knowledge discovery series. Chapman and Hall/CRC, CRC

Press, Florida, New York

Davidson I, Ravi S (2005) Clustering with constraints: feasibility issues and the k-means algo-

rithm. In: Proceedings of SDM, Newport Beach, CA

Dong G, Pei J (2007) Sequence data mining. Springer-Verlag New York Inc., New York

Dubey A, Bhattacharya I, Godbole S (2010) A cluster-level semi-supervision model for interactive

clustering. In: Proceedings of ECML PKDD 2010, Lecture Notes in AI, Springer-Verleg,

Berlin/Heidelberg

ECAI (2006) Workshop on context representation and reasoning. http://sra.itc.it/events/crr06,

Accessed date 5 Aug 2011

Grobelnik M, Mladenic D, Ferlez J (2009) Probabilistic temporal process model for knowledge

processes: handling a stream of linked text. In: Proceedings of the 12th international confer-

ence information society – IS 2009, vol A. Institut Jozef Stefan, Ljubljana, pp 222–227

Guo J, Sun C (2003) Context representation, transformation and comparison for ad hoc product

data exchange. ACM DocEng 2003: proceedings of the 2003 ACM symposium on document

engineering, ACM, Grenoble, France, New York, U.S., pp 121–130

Jacquemont S, Jacquenet F, Sebban M (2009) Mining probabilistic automata: a statistical view of

sequential pattern mining. Machine Learning 75(1):91–127


Leban G, Grobelnik M (2010) Displaying email-related contextual information using Contextify.

In: Proceeding of ISWC 2010, LNCS, Springer, Heidelberg, pp 181–184

Mitchell TM (1997) Machine learning. The McGraw-Hill Companies, Inc., New York

Mitchell TM (2006) The discipline of machine learning. CMU-ML-06-108 July 2006, School of

Computer Science, Carnegie Mellon University, Pittsburgh

Mitchell TM (2009) Mining our reality. Science 326(5960):1644–1645, http://www.sciencemag.

org/content/326/5960/1644.short

Mladenic D (2006) Feature selection for dimensionality reduction. In: Subspace, latent structure

and feature selection: statistical and optimization perspectives workshop, vol 3940, Lecture

notes in computer science. Springer, Berlin, Heidelberg, Hershey, USA, pp 84–102

Mladenic D (2007) Text mining: machine learning on document. In: Encyclopedia of data

warehousing and mining. Hershey [etc.], Idea Group Reference, cop., pp 1109–1112

Stajner T, Mladenic D, Grobelnik M (2010) Exploring contexts and actions in knowledge

processes. Workshop on context, information and ontologies, Lisbon

The Radicati Group Releases, Email statistics report, 2009–2013. http://www.radicati.com/

VUT – Vienna University of Technology, Austria, 2007, D2.2 design and proof-of-concept

implementation of the inContext context model version 1

Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of ICML

2000, Morgan Kaufmann, Massachusetts


Part III

Applying and Validating the ACTIVETechnologies

8

Increasing Productivity in the Customer-FacingEnvironment

Ian Thurlow, John Davies, Jia-Yan Gu, Tom B€osser, Elke-Maria Melchior,and Paul Warren

8.1 Introduction

The objective of this case study was to evaluate novel information technology from

the ACTIVE project (Simperl et al. 2010) with people working in a demanding

customer-facing environment. The case study took place in BT, a major global

provider of telecommunications services. Specifically, our trialists were from the

sales community (salespeople and associated technical and product specialists) of a

division focussed on providing ICT solutions to the UK enterprise market. Trialists

were distributed across a wide geographical area. Indeed, one of their problems was

that they were not often able to meet face to face to share knowledge.

8.2 Users and User Requirements

We started by talking to the senior managers responsible for our trialists. In general,

as managers, they wanted better knowledge worker productivity. Specifically, they

wanted to shorten the time to create sales proposals and to improve their quality.

I. Thurlow (*) • J. Davies • J.-Y. Gu


e-mail: [email protected]; [email protected]; [email protected]

T. B€osser • E.-M. Melchior

kea-pro, Tal, Spiringen CH-6464, Switzerland


P. Warren




149

They saw information reuse through knowledge sharing as an important way to

achieve this.

Talking to the senior managers also helped us to further develop our initial

intuition of what information-related problems our trialists were confronted with.

Our task then was to confirm and refine that intuition with the trialists themselves.

We were working with two groups; a large group interacting directly with

customers and a small team responsible for coordinating creation of the more

complex customer proposals. Although their requirements were related, we needed

to deal with each group separately.

Both groups did share one characteristic, along with all BT people. They both

used the standard Microsoft applications, in particular Outlook, Word, Excel and

PowerPoint. It was essential therefore that, if ACTIVE was to be used, it worked

with and enhanced these applications. This characteristic was shared with the

Accenture case study described in Chap. 9 and was a major reason for the inclusion

of these applications into the ACTIVE Knowledge Workspace (AKWS) as

described in Chap. 5.

8.2.1 Working with Customers

The larger group consisted of sales consultants and their managers, and technical

specialists. The sales consultants were responsible for groups of customers. Some

had responsibility for hundreds of customers. These were generally desk-based. The

majority were responsible for tens of customers. Members of this group often spent

as much as 3 days a week out of their offices with customers. The technical

specialists had expertise in particular product ranges, and were used to provide

the technical input to business proposals. Our hypothesis was that, whilst working

at their desks, these people would all frequently switch their focus from one task to

another, as they responded to different customer requirements.

Two techniques were used initially to test this hypothesis and to generally gain

an understanding of how these people interacted with information, the problems

they faced in using information systems, and what they needed from such systems.

Firstly, a number of semi-structured interviews were held; some one to one, some in

small groups; some face to face; some by audio conference. Secondly, we observed

a few of our potential trialists over a period of around two hours.We did this remotely.

Specifically, we observed our subjects’ screens and how they interacted with

information via a Microsoft LiveMeeting session. We used Microsoft Communicator

to provide a separate audio channel for listening to our subjects’ telephone

conversations. We also used this audio channel for occasionally asking questions to

seek clarification, e.g. as to why certain on-screen actions were being taken.

Initially this procedure was adopted to save time, bearing in mind the wide

geographical spread of our trialists. However we believed that the technique had

the further advantage of being less intrusive than had we been physically present.

The subject knew that he or she was being listened to and observed, of course, but

150 I. Thurlow et al.

was potentially not so constantly conscious of this as would have been the case if

there had been an observer physically present. We were not able to test this

hypothesis, and this was not our purpose, but in any case we believed we had a

very effective mode of observing behavior.

These techniques gave rise to a number of conclusions. The subjects’ work was

often driven by email. That is to say, there was heavy use of email; and incoming

emails, along with telephone calls, caused people to switch tasks. In effect, people

were continuously switching their task context.

Despite this, and despite a significant use of the telephone, some of our subjects

reported problems arising from being isolated geographically from their colleagues.

As already noted, opportunities to meet and share knowledge were very rare.

A significant amount of time was spent using BT systems, e.g. CRM systems

to record interaction with customers. In addition, there was a great deal of use

of spreadsheets. The effect of multiple systems was to necessitate user

processes involving cutting and pasting from one system or application to another.

Effectively, users developed their own processes to accommodate this way of

working.

8.2.2 Building Customer Proposals

Our understanding of the proposal creation process and the requirements of the bid

unit who coordinate this process was built up through a number of face to face

meetings.

Whilst straightforward proposals are put together by individual sales

consultants, working with their personally-developed network of colleagues,

the production of more complex proposals is coordinated by a bid unit of five

people. Typically the bid unit receives an invitation to tender which contains

often many tens of questions. The majority of these questions can be answered

quickly by adapting text from previous proposals. There then remains a small

number of questions which are much harder to answer. The bid team then seeks

assistance from relevant technical specialists, who provide text for inclusion in

the proposal. This approach gives rise in particular to two problems. Firstly, in

the information-gathering phase, members of the bid unit need to be sure they

are accessing up-to-date information. Secondly, the fact that text has been written

by a variety of different authors creates inconsistencies in the proposals which

significantly affect their quality. Such inconsistency can be at the level both of

style, and also of fact. Even where a factual inconsistency does not materially affect

the solution being offered to the customer, it undermines that customer’s confi-

dence. An example might be a reference to how many similar solutions have been

offered in the past. A difference of one or two in the number quoted in different

parts of the proposal might not be material, but serves to reduce the quality of the

document.

8 Increasing Productivity in the Customer-Facing Environment 151

8.3 Gaining Feedback

8.3.1 Feedback from the “Mock-Ups”

The techniques described in Sect. 8.2.1 were used during the first year of the project

to give us an understanding of our trialists’ requirements. At the same time,

working with our partners in the ACTIVE project, we were able to gain a good

understanding of what functionality could be made available in our case study,

based on the technology being developed in ACTIVE. At this stage most of this

functionality was not yet available, but we wanted to get feedback from the trialists.

In particular, we wanted to be sure that the key requirements of the user community

were being met and to refine these requirements in the light of feedback. To do this,

we constructed a number of PowerPoint “mock-ups” to illustrate the basic func-

tionality. When presenting these to our trialists we stressed that we were not

concerned with particular questions of style, e.g. the use of icons, menus, fonts,

etc. At this stage we were interested in their feedback to the proposed functionality.

The mock-ups chiefly reflected the more general requirements of our larger

group of trialists, as described in Sect. 8.2.1. However, we did also show them to

the bid unit for their feedback. Apart from their own, very specific, requirements,

the bid unit shared with the other trialists the generic requirement to mitigate the

effects of information overload.

These mock-ups featured the following areas of functionality:

• Tagging, and the ability to search against a combination of tags and contexts, i.e.

for information objects tagged in a certain way and associated with a certain

context.

• Contextualised information delivery, along with the ability to create contexts,

select a particular context, and associate information objects with particular

contexts, as described in Chap 5. We also described the “bottom-up” features,

i.e. the ability to discover new contexts, to automatically detect a context switch,

and to automatically associate information objects with a context.

• Process recording, as described in Chap 6.

• A context-sensitive factbook. This was to be an application which could store

information, organised by particular contexts. It was motivated by the observa-

tion that triallists were frequently copying information from one application to

another. Sometimes this was done by cutting and pasting. At other times, the

information was manually re-keyed. Some people reported that they would make

a written note of information to be re-keyed, keeping a piece of paper or notepad

on their desk for just that purpose. The factbook was intended to be an electronic

version of these written notes.

We obtained very positive feedback to the first two of these. Moreover, there was

also a positive response to the use of context for knowledge-sharing. Specifically, if

colleagues share a context, e.g. relating to a particular customer, then information

placed on a shared server by one, and associated with that context, would be


available for them all to use. Another advantage of our implementation of context

became apparent during the project. A number of triallists commented positively on

the ability to associate an information object with more than one context. They

contrasted this favourably with the single folder limitation of their current computer

filing system.

The response to our ideas on process recording were, in general, less enthusias-

tic. This may have been a problem in presentation. One individual commented that

the system “feels like I need to change the way I work to fit the system”. This, of

course, was not the objective; the system is designed to record how people work and

help them work as they want to work. We still felt that the technology was useful,

and it was developed as described in Chap. 6. However, as explained later, it was

not evaluated with our trialist group because of lack of time.

The response to the factbook was not at all favourable. This seems chiefly to be

because it was seen as another application, and individuals were already working

with a large number of applications; they felt they didn’t need another. As a result, it

was decided not to proceed with the factbook.

We also discussed and obtained feedback about a number of general issues. One

question was whether ACTIVE functionality should be on the desktop or integrated

into applications. There were mixed opinions on this. One comment was that

“integration into working tools will help uptake”. In fact, both approaches have

been adopted; a taskbar is present on the desktop and a toolbar integrated into the

common Microsoft applications.

The issue of trust arose with regard to the sharing of information. One person

asked “how do I know the information others have shared is correct?” Our tools

make clear the provenance of information; over time people build up confidence in

information from particular colleagues.

The view was also expressed that accurate search to find timely and relevant

information is a key determinant of whether a knowledge management system is

used or not. Clearly our system needed to provide high quality search.

8.3.2 Offline Working

A final requirement emerged at a later stage of the project. As we have already

noted, many of our triallists spent a great deal of time, often 3 days out of 5, out of

the office visiting customers. The original concept of the AKWS was that it would

employ a client–server architecture and, at least as originally developed, the client

functionality would depend on connection to a corporate server running, e.g., the

machine learning applications. It became apparent that, if many of these highly

mobile people were to use the ACTIVE functionality, they needed certain basic

aspects of that functionality to be available at all times. As an example, one person

wanted to use the tagging feature instead of the normal hierarchical file structure.

It was essential, therefore, that this feature was still available when disconnected

from the corporate network, as was often the case when our triallists were on

customer premises or travelling to customers.


As a result, our colleagues in the project developed an offline version of the

client–server software. When the client was connected to the server, the user had all

the AKWS functionality available to him. From the end user’s perspective the

Workspace operations in offline mode are performed in the same way as in online

mode. The ACTIVE Taskbar and all ACTIVated Office applications can be used,

but the user is notified of working in the offline mode andWorkspace operations are

performed only on the resources from the Local Workspace. Resource replication

from the Enterprise Workspace to the Local Workspace works as follows: when-

ever a Workspace resource is created, read or updated, it is automatically replicated

to both the Enterprise and the Local Workspace. This means that when a user goes

offline he will have all Workspace resources he has been using so far in his Local

Workspace. In addition, the user can specify all Workspace resources he would like

to replicate explicitly and perform the replication to the Local Workspace before he

is going offline. While working offline, only the Local Workspace will be updated

with the changes the user is making and when returning online, a synchronization of

all new and modified resources he made locally will be made to the Enterprise

Workspace automatically. This offline version was able to be deployed with those

people who were frequently out of the office.

8.4 Deploying the AKWS

Within the ACTIVE project there was inevitably only limited time for deployment

and evaluation of the project functionality. We decided, therefore, to concentrate on

those aspects of the AKWS which previous feedback told us were most relevant to

our triallists. In particularly, we did not want to overwhelm our users with a large

range of new features. Our aim was to introduce to the users those features we

believed would be valuable to them, and to evaluate those features through user

feedback.

Specifically, for our evaluation activity, described in Sect. 8.6.1, we included:

• Information filtering by context, in particular in Microsoft Outlook,Word, Excel, PowerPoint, Internet Explorer and File Explorer. Combined

with this was the ability for users to create contexts, set their current contexts

and associate information objects with a particular context, as described in

Chap. 5. With this we included the ability, supported by machine-learning, to

automatically discover contexts, detect potential context switches, and

associate information objects with contexts. We also provided users with the

ContextVisualizer, again as described in Chap. 5, because we received very

positive feedback when we initially demonstrated this tool.

• Tagging, and the ability to search on a combination of tags and context.In particular, the system is able to recommend tags to the user, as illustrated in

Fig. 8.1. These recommendations are made partly on the basis of the contents of

a file and partly on the basis of what tags have been used by the same or other


users to tag similar files. Here, similarity is based on the information retrieval

concept of cosine similarity (Hiemstra 2009). It was believed that automatic tag

recommendations bring two advantages. Firstly, this feature may overcome

some people’s reluctance to tag; caused in part by simply not being able to

think of a tag to use. Secondly, tag recommendation may help the user, or more

particularly groups of users, to converge on a relatively small set of tags, rather

than creating several tags for the same concept. Of course, a counter-argument is

that this feature may limit the user’s creativity in devising new descriptive tags.

• Access to the AKWS management portal. This provides the facility for users,

or an administrator on their behalf, to set their profiles. In particular, settings

exist to determine whether a user should approve the discovery of new contexts,

or whether new contexts can be created automatically; similarly whether users

need to approve a context switch, or whether it should just go ahead automati-

cally; and whether the tag suggestion facility should be initiated every time a file

is closed. The profile options are shown in Fig. 8.2.

8.5 Creating Better Business Proposals

Sect. 8.2.2 described the work of the bid unit. Their requirement was to create

customer proposals more quickly and with greater quality. The bid unit do not use a

content management system; rather the library of proposals are on a shared server

Fig. 8.1 Suggested tags


and are searched and browsed with File Explorer. An Access database is used to

store key information about each bid. Moreover, once a team was brought together

to create a proposal, sharing of information was by email and exchange of Word

files; no specifically collaborative tools were used.

The Semantic MediaWiki was seen as an excellent solution. The semantic

features were seen as important to enable the bid team to locate the best previous

proposals. The basic wiki features were seen as valuable to encourage collaboration

between members of the bid unit and the technical specialists.

We also wanted to enhance the way in which the bid team obtained solutions to

the questions they were unable to answer themselves. To do this we used a

MediaWiki extension which enables the export of particular wiki fields to RSS.

Technical specialists throughout BT were encouraged to subscribe to this RSS feed,

which was used to disseminate these questions.

8.5.1 Storing and Accessing Information on the SMW

The SMW was not used to create whole proposals. Its editing features were not

regarded as adequate for this task. Instead, it was used to store key information

about proposals, customers and products; and also to capture the main requirements

set by customers and create the response solutions to these in the executive

summary which is a decisive part of the proposal. Figure 8.3 shows an example

of a bid proposal page in the SMW.

The semantic links between pages describing proposals, customers, products,

and also the solutions to questions posed to technical specialists creates a knowl-

edge base which can be searched semantically. Inline queries allow users to ask

questions such as which proposals use a particular product set; or which customer

requirements have been evaluated before in a particular industry sector. Figure 8.4

shows proposals that offer mobile products, displaying some details that can be

used to sort the items and Fig. 8.5 shows questions posed to technical specialists

that relate to a particular industry sector.

Fig. 8.2 ACTIVE profile settings


When the knowledgebase is augmented with information about product

hierarchies and relationships, then this becomes more powerful. For example, if

a particular product name in the query is not known in the knowledgebase, then a

Fig. 8.3 The SMW, showing the key details of a bid proposal

Fig. 8.4 The SMW, showing proposals offering mobile products

Fig. 8.5 The SMW, showing questions related to the education sector


semantically-related name might be known, e.g., a generalisation of the product or a

related product.

To support our users, we made use of a number of extensions to the MediaWiki

and SMW. For example, one extension enabled us to import comma separated

variable information into the SMW and associate its content with semantic

properties across proposal, product and people pages. We used this to import bid

proposal data from an access database that is constantly updated by the bid unit

team during their normal day-to-day work functions; so collaboration in the wiki is

based around this key information. For example, questions created in the wiki to

include in RSS feeds seen by technical specialists are associated with the relevant

proposals and in turn, with its semantic properties. These semantic property values

across the wiki are leveraged by using another extension to provide an improved

facility for searching, including facetted search. Where a user was also using the

AKWS, connection between the SMW and AKWS meant that context could be

used as one of the facets of the search. Figure 8.6 shows the facetted search

interface that allows users to find proposals based on keywords, bid type, bid status,

product area, contract value, free form tags created by users in the wiki, and AKWS

context. Where a user makes a facet selection, the facetted search options automat-

ically present the relevant subset of search criteria powered by semantic properties

in the wiki to align with the current selection and allow the user to further refine

their search.

Figure 8.7 shows the extension used for finding customers that are similar based

on their industry sector and size, both in terms of its workforce and its number of

sites where the user can then make a customer selection and view the bid proposals

associated with that customer. Another extension visualises bid proposals and

associated community questions in timeline formats so that users can view the

preparation time for bid proposals and the bid proposals that are cancelled or still

require submission. This offers another way for users to find proposals based on a

time dimension where the user can make a selection and view the proposal page and

also gauge the overall volume and completion rate of bid proposals. Figure 8.8

shows an example of this usage. Calendar extensions were used in the wiki for

quick access to bid proposals and questions month by month.

Other extensions that were used in the wiki include implementing header tabs on

pages that contained large amounts of information, whose content could also be

clearly categorised e.g. in Fig. 8.3 bid proposal pages have separated key details

from post-bid review data but the tabs still allow fast access and browsing between

this information compared to scrolling on a single page. Another extension allowed

users to log in to the wiki using their BT identification details that are used for other

systems and tools so reducing any barriers to using the wiki and maintaining

consistency with other enterprise systems that users may be familiar with.

Another extension offered enhanced capability for the wiki to evaluate

conditions and display information dependent on the result. This was used through-

out the wiki for various purposes that range from testing if an editable field

contained information and prompting the user to add information if tested false,

to testing whether a person belonged to a team and showing the relevant team page


if tested true. This capability ensured that correct page links were displayed and a

user was able to view a link to further related information if this exists, maximising

the value of information in the wiki.

Fig. 8.6 The SMW, showing the facetted search for bid proposals

Fig. 8.7 The SMW, showing the facetted search for customers


The Semantic Forms extension was used on every page in the wiki to present a

consistent way of editing information on a wiki page, which is often compiled from

multiple sources such as transcluded information from other pages using templates

and also displays values that change dependent on the evaluation of a condition in

the page source. It was important, therefore, to use forms to lower the barriers for

users to add information to a page, protect the source and enable new data to be

associated with semantic properties that can be used throughout the wiki.

8.5.2 Communicating with the Technical Specialists

In fact, we used a number of RSS feeds to pose the more difficult questions to the

technical specialists. The specialists were invited to subscribe to particular feeds

that contain questions associated with the latest bid proposals that were being

prepared by the bid unit team with varying levels of urgency. Each question

contains a link to the relevant wiki page where a specialist can contribute their

responses through a form to minimise the number of steps that a user needs to

participate and make a valuable response. This method of interaction with the wiki

used the RSS reader facility in MS Outlook that is regularly used by the sales team

and reduces the barrier of uptake by integrating the RSS feed with software that is

familiar to the users. Each RSS item can also be treated as an email item as well as

the entire feed being shared via email through Outlook. Figure 8.9 shows a users

inbox and the links that are available on each RSS feed item to make a response and

view further detail such as related uploaded documents.

An important issue here is that of motivation. Specifically, how do we encourage

the specialists to subscribe to the RSS feeds in the first place, and then how do we

Fig. 8.8 The SMW, showing timeline visualisation of bid proposals


encourage them, when appropriate, to follow the link to the wiki and input their

knowledge. In principle there is even the danger that the technology will encourage

inaccurate responses from those who do not have the appropriate expertise; whereas

in the previous approach bid unit members targeted those they believed to have the

necessary knowledge.

8.6 User Tests and Field Trials

The development of AKWS and the Bid Unit Wiki was based on extensive

consultation with users and testing of partial applications and prototypes. The

results of these tests were numerous modifications of the application design and

bug fixes which assured that the finally completed prototypes were accepted in the

formal acceptance procedure of the BT bid unit.

During the last months of the ACTIVE project we undertook an extensive

evaluation exercise on all aspects of the case study. We wanted to understand

how the users reacted to the technology; to what extent it suits their needs; how

easy it is to use; and where there is scope for improvement. We also wanted to know

whether the technology was helping to achieve the organisational goals, e.g.,

whether it was making its users more productive; and whether it was improving

the quality of customer proposals, and the speed with which they could be written.

At the more abstract, scientific level, we wanted to help answer some of the

questions posed by the project. Specifically:

• Did the use of applications which integrate lightweight ontologies and tagging

offer measurable benefits to the management of corporate knowledge without

offering significant user barriers?

Fig. 8.9 RSS feed of bid proposal questions to technical specialists


• Did the use of context help users cope with information overload, and make it

easier to switch frequently between separate tasks, which many of them do

frequently? Also, did the use of machine intelligence further support the users

by discovering contexts, detecting context switching, and creating associations

between information objects and contexts?

We organised two separate field trials; one for the AKWS where we were

working with the customer-facing people and their managers; and one for the

SMW where we were working chiefly with the members of the bid unit, but also

talking to those technical specialists who were receiving the RSS feeds.

Our focus here was on three aspects of the ACTIVE technology:

• The explicit (“top-down”) use of context by users. We wanted to know how

useful they found it; how easily they appreciated the concept; and how they

reacted to the user interface.

• The use of machine intelligence to discover contexts, detect context switches

and associate information objects with contexts, i.e. the “bottom-up” approach to

context. Here the principal question was the effectiveness of the algorithms, i.e.

how well did the results compare to the users’ own natural understanding of

context. We were also interested, of course, in the user interface, e.g., how

discovered contexts were presented to the user.

• The use of tagging to describe information objects. Here again, we wanted to

know how valuable users found the overall concept; how effective was the

algorithm which suggested tags; and how people reacted to the user interface.

8.6.1 Developing a Mature Prototype of the AKWS

The first concern was to assure that the functionality of the AKWS is well adapted

to the needs of the professional users for which it is intended. Although the needs

and requirements were analyzed carefully, these constitute necessary, but not

sufficient conditions for assuring the acceptance of the application by the prospec-

tive users. A series of iterations of design and testing was carried out, where

increasingly complete and refined prototypes were developed and tested with

appropriate user representatives and users. The results were used for identifying

shortcomings of the current prototype specification and bugs, for refining the

requirements for the application and for implementing the subsequent prototype

version.

Tests were firstly conducted with senior technical user representatives, then with

experienced users, with representative groups of users, and finally in field tests to

which the entire prospective user group (all members of the BT bid unit) and a

number of further professionals at BT were invited. In this manner it was assured

that the final application prototype corresponded firstly to the organizational

requirements, and both to the functional requirements and the needs of the users.


Major results of the tests of advanced prototypes with users and user

representatives were significant modifications of the technical components to

improve performance (response time), significant improvements of the way in

which tags were recommended, and context detection and discovery was presented.

Additional requirements were formulated based on early user experience, such as a

re-design of the Task Pane and Task Wizard, and the ability to use the AKWS in an

offline mode.

The application passed the acceptance test by senior representatives of the BT

bid unit, which was a defined condition for proceeding with the field test. As a

summary, the application was regarded as sufficiently mature to ask the profes-

sional users to the BT bid unit to use the applications in their daily work, which

requires a significant amount of engagement from each individual user.

8.6.1.1 User Effort Imposed by New Functionality

One concern of user representatives, and an important barrier for professional users

when adopting new software applications, is the learning effort and the added

continuous effort imposed on users. The functionality of the AKWS demands a

certain amount of user intervention and response. Users have to tag, accept

suggested context switches (aka context detection), accept discovered contexts,

and associate documents with the new discovered contexts. Because it was

recognized that for users this is a non-trivial added burden, an independent analysis

was carried out.

Under controlled conditions an instance of AKWS was installed and systemati-

cally populated with a set of data (documents and URLs). A number of test persons

carried out defined tasks and the execution time was observed. The results indicate

the time required to execute the various functions of the AKWS (Table 8.1).

A function such as “Recommended tagging” requires a certain amount of time

(significantly longer than the time to execute the operation for entering data), due to

the need to read, make decisions, and select from the list of recommended tags. In

contrast, context tagging (context association) is carried out much faster, compared

to recommended tagging, provided that the user is in the context with which a

document is associated. Otherwise a context switch is needed before context

Table 8.1 Observed execution times for user tasks carried out with AKWS functionality

Task type Mean time to perform task (s)

Recommended tagging 00:55+

Context association 00:04

Context switch and context association 00:41

Context creation 00:34

Context switching 00:34

Search 00:36+

Discovered context 00:57+

Detected context 00:27


association with the desired context can be performed and in this case the total time

to perform context association is much higher. It is notable that using discovered

contexts and responding appropriately is a tedious task and a lengthy procedure.

In addition, analytic models of the user procedures were constructed (GOMS

models (Card et al. 1983)). The results show in detail that AKWS functions add

cognitive complexity to the tasks of users. In combination with the measures of

execution times and the subjective assessments of the users this shows that the use

of AKWS does not provide additional information to the user for free, but that a

certain amount of investment is required by each user to learn to use new tools and

functionality.

8.6.1.2 Field Tests of the AKWS and the Bid Unit Wiki

The objective of field tests is to provide conclusive evidence which shows that the

new applications developed in ACTIVE can be used as intended, are accepted by

the target users, and provide value for the user organization. The specific context of

work of the target users of AKWS provides challenging conditions for conducting

valid field tests: the bid unit members work at variable locations, mostly out of the

office. Professional users do not follow prescribed work procedures, but adapt their

work to the current tasks, context of work, but also their personal capabilities and

preferences. This requires that flexible means to collect valid data remotely must be

constructed.

Thirty test persons were prepared to install and use AKWS. Of these 18 users

supplied data which were sufficiently complete to be included in the analysis of

data. The experimental design is a repeated measures design, where data were

collected at different points in time from each user.

Measures collected were

– Subjective user quality of the application

– Objective performance measures and performance self-assessment by users

– Added value assessment of the specific functionality included in the AKWS

– Monitoring of the usage of the functionality, indicating acceptance

The test procedure started by giving demonstrations of the AKWS via Microsoft

Live Meeting, and by inviting all members of the trialist community. The AKWS

Client application was installed on users’ workstations. Each user received

instructions for about one hour and was asked to familiarize with AKWS for about

3–5 days. During the test period of 3–5 weeks the users carried out their daily tasks

using AKWS together with their standard tools, mostly MS-Office applications.

User data were collected by means of a number of online questionnaires and user

interviews after the users had experienced the technology for a number of weeks.

We used proven and standardised instruments such as Software Usability Measure-

ment Inventory (SUMI) (Kirakowski 1996, 1998) and focussed questionnaires

and rating scales to assess the added value of the functionality in the AKWS

and Bid Unit Wiki. In the AKWS we also used a user monitor to allow us to obtain


additional information about the usage of the functionality of the AKWS by users.

The events registered allow us to infer the acceptance and usage of the AKWS

functionality in the user population. In addition, user data determined which further

information would be collected from users. The objective was to assure that users

were only requested to provide information about functions which they had actually

used. As an example, if the user were presented with tag suggestions, the user

monitor might be invoked to obtain the user’s feedback on the usefulness of the

suggestions. The user monitor was designed so that the user was not queried too

frequently, and only about those functions which he had used.

8.6.1.3 Results of AKWS Field Tests

The ratings of user tasks characteristics indicate that context switching, search, and

information sharing is done frequently per day. Neither tagging nor editing and (re-)

using macros seem to be frequent activities of these users.

Quality of use assessment with SUMI. It is essential that the quality of use of

AKWS, even though it is not a mature product in the same sense as the other tools

used by the test users, is satisfactory. SUMI provides a standardized profile of the

quality of use of AKWS. The results show that AKWS fulfils the quality of use

requirements for the intended users, no single quality dimension stands out as

negative.

Ranking of features of the AKWS according to their usefulness. The test

persons were asked to rank eight major new features of the AKWS according to

their usefulness. The procedure forced the subjects to rank all features (no ties

permitted), as shown in Table 8.2.

The Kendall W statistic (W ¼ 0.22) indicates an acceptable degree of consis-

tency between subjects. A test of significance of the rankings using the Friedman

statistic yields an error probability of p < 0.01, i.e. the ranking sequence differs

significantly from chance.

The positive assessment of Manual Context Association, Context Filtering and

Manual Context Switching are reflected in additional remarks recorded by users

during the test, and by the frequency of manual context association and manual

context switching event logs. Our hypothesis is that manual context association

Table 8.2 Ranking of AKWS features according to usefulness for the users (18 subjects)

Rank AKWS feature Mean rank

1 Context association 2.9

2 Context filtering 3.1

3 Manual context switching 3.6

4 Automatic context detection 4.6

5 Workspace search 4.8

6 Recommended tags 5.5

7 Context visualizer 5.6

8 Automatic context discovery 5.8


(aka context tagging) can be carried out efficiently by users, and is therefore

accepted readily, e.g. compared with recommended tagging.

Adoption rate and frequency of using the AKWS functions. Monitoring of

event counters gave the test administrators the opportunity to observe how users

were using the AKWS. The events monitored were

– Recommended tagging requests

– Context association

– Context switching

– Workspace search

– Acceptance and denial of Context detection and discovery

Over the usage period we have observed which functions are accepted by users

(Fig. 8.10). Context association and context switching events increase progres-

sively while recommended tagging request events, search events and context

discovery events are accepted much more slowly.

8.6.1.4 Discussion of AKWS Field Tests

In summary, the AKWS corresponds to user requirements and can be used to

perform the intended tasks of users. Some functions, especially context services,

were immediately accepted by users.

Users noted the slow response of the application, a technical implementation

issue. It seems that the users did not recognize immediately the benefit which

tagging in general and especially recommended tagging generate. Before the test

Fig. 8.10 Cumulated frequency of events over time. Context switching and context association

event counts increase throughout the test period. TGRQ, tag recommendation request counter;

CXAS, context association counter; SEAR, search counter (this is the workspace search web serviceinvocation counter); CXSW, context switch counter (this is the explicit context switch counter);

CDYN, denied context discovery counter


users did not carry out tagging activities frequently. A solution for this problem

must be found on the organizational and management level, where incentives for

tagging and knowledge sharing must be provided. On the other hand, context

tagging (context association), which is a pre-requisite for context switching, is

accepted and used with an increasing frequency by users.

8.6.1.5 User Effort Versus User Benefits for AKWS Functions

Using the AKWS creates benefits for users, but also adds to the cost for users in

terms of time, cognitive complexity and workload, and learning cost. The results of

the lab tests carried out with the AKWS allow us to estimate the specific user costs

for using the functionality of the AKWS. Both the execution times under idealized

conditions, and the analytic results of the user procedures show that the user cost are

substantial, and that the user procedures create additional workload for the users.

Using the AKWS functions requires user procedures of different complexity. In

the case of “suggested context switch” the user just chooses between two responses,

“passive ignore” or “accept”, while “context discovery” must be answered by a

procedure comprising several actions.

It is more likely that users will accept to carry out tasks which have low cost in

terms of time and cognitive workload. Higher cost must be justified by greater

benefits. This relationship varies for different parts of the functionality of the

AKWS.We list the functions adding to user cost next to the user benefits in Table 8.3.

The benefit is convenience and improved personal productivity (such as in the case

of context switching), or information quality (such as in the case of search).

Table 8.3 Costs and benefits of using AKWS functionality

AKWS functions adding

to user cost User cost

Benefits for users (personal

productivity)

Context association

(with “manual” tagging)

+ create context

Low Ability to work in context:

context switching,

filtering, and visualization

Suggested context switch

+ accept or passive ignore

Low As above

Tag document by type-to-tag High Search (with added cost)

Tag by recommendation

+ select tag

High As above

Context discovery

+ accept context

+ name context

+ associate documents

+ set permission rights

High Ability to work in context:

context switching,

filtering, and visualization

Search

+ memorize metadata

+ select by visual search in tag cloud


The results suggest that users accept context related functionality immediately

due to low user cost, and immediate benefit in terms of personal productivity

(context switching and context search). Some of the benefit created by tagging

(enabling efficient search) is also obtained from context tagging, which reduces the

incentives for users to use document tagging functions.

8.6.2 Field Test of the Bid Unit Wiki/SMW

The Bid Unit Wiki (BUW) was introduced to the Bid Unit Team (5 core users and a

community of 150 technical specialists) with a focus on the following three aspects

of the functionality:

• The use of the SMW to store and retrieve information about the customer

proposals. How effective are the semantic features in assisting with searching

and browsing the SMW?

• The use of the SMW generally as a collaborative tool. How effective is the BUW

in encouraging collaboration between the bid unit team and the technical

specialists.

• The use of the RSS feeds to pose questions to the technical specialists.

The initial uptake and usage of the functionality was observed, and

questionnaires and interviews were used to collect responses from users.

The uptake of the Bid Unit Wiki progressed steadily after introduction. The first

question to the community of technical specialists was posed immediately after the

release and was answered within 3 days. Users initially explored the Bid Unit Wiki

to test how it corresponds to their requirements, and then proceeded to use it as part

of their working environment. The responses of users give a differentiated view of

the acceptance and benefit perceived by the users.

The team recognizes the value of the BUW primarily in the function to install a

fast means of communication between the bid unit team and the large community of

technical experts. The corresponding functionality is rated as valuable and benefi-

cial for personal productivity. Spontaneous statements made by the users highlight

the added value of the Bid Unit Wiki in comparison with the existing enterprise

tools. The most attractive features were judged by users of the Bid Unit Wiki:

• “. . . utilising expertise from the wider BT community in assisting us with

responses . . . saving us time in searching which can be very time consuming.”

• “. . . to get responses to awkward questions promotes greater team working”• “Asking bespoke questions.”• “. . . ability to locate previous information”• “Speed and ease of use to locate information and gather new information.”• “. . . gives quick and easy access to a wide range of specialists and knowledge

that we would otherwise have to mail/phone people individually.”

• “Reaching out to specialist not directly involved in bids.”

• “. . . enabling us to obtain responses from the wider BT community, which is

better for us and our customers of course.”


Based on the very positive reception by users, the decision was made to install

the BUW as a tool for collaboration at short notice.

The full power of the BUW for knowledge exchange in a large community of

technical experts unfolds itself when the community as whole adopts the BUW as a

tool for sustained use, which in the short time available was only just starting.

8.7 Conclusions

In our case study we set out to evaluate, both at the user and the organisational level,

a number of the technologies developed in the ACTIVE project. Our choice of

technologies was determined in part by the requirements of our users, but also by

the limitations of time, especially the limited ability of our professional users to

divert part of their time during their regular work to the adoption of new technol-

ogy. We would have liked, for example, to have been able to evaluate the process

tools described in Chap. 6.

We consider two of the project’s scientific hypotheses to be confirmed. It was

clear that the use of applications incorporating lightweight ontologies, as

implemented in the SMW and tagging, as implemented in the AKWS, can make

a material difference to productivity in our two environments. Moreover, context-

based information delivery, supported by our machine learning algorithms,

provides functionality which is valuable in managing information overload and

can reduce the disruptive effects of switching task focus.

The results of the detailed study of the user responses to the use of the AKWS

and the SMW in their daily work have shown that the context services are accepted

by users readily as a means to improve personal productivity. The use of AKWS

varies strongly with individual user habits. For example, users who have not been

tagging a lot before they used AKWS are likely to limit themselves to context

accepting, and might adopt tagging with recommended tags in the course of

intensified usage of AKWS.

A subject of further research, and an important question to answer for future

application, will be the cost/benefit ratio for users. It is clear that in order to enable

context detection or facetted search the user has to provide valid and reliable input to

the system beforehand. This has shown to involve user cost which may be weighted

disproportionally high because it demands additional work from the user in a work

situation where he or she is under high pressure imposed by other tasks.

The user monitor in combination with online questionnaires has been an effec-

tive and reliable instrument to collect empirical data in a distributed work

environment.

Acknowledgement ACTIVE project partner ComTrade implemented the user monitor described

in Sect. 8.6.1.

The research leading to these results has received funding from the European Union’s Seventh

Framework Programme (FP7/2007-2013) under grant agreement IST-2007-215040.


References

Card S, Moran T, Newell A (1983) The psychology of human–computer interaction. Lawrence

Erlbaum Associates, Hillsdale, NJ

Hiemstra D (2009) Information retrieval models. In: G€oker A, Davies J (eds) Information retrieval:

searching in the 21st century. Wiley, Chichester, UK

Kirakowski J (1996) The software usability measurement inventory: background and usage. In:

Jordan PW et al (eds) Usability evaluation in industry. Taylor & Francis, Brighton, pp 169–177

Kirakowski J (1998) SUMI User handbook. Human Factors Research Group, University College

Cork, Ireland

Simperl E, Thurlow I, Warren P, Dengler F, Davies J, Grobelnik M, Mladenic D, Gomez-Perez J,

Ruiz C (2010) Overcoming information overload in the enterprise: the active approach. IEEE

Internet Comput 14(6):39–46


9

Machine Learning and Lightweight Semanticsto Improve Enterprise Search and KnowledgeManagement

Rayid Ghani, Divna Djordjevic, and Chad Cumby

9.1 Introduction

Enterprise search and knowledge management tools are amongst the most

commonly used tools within enterprises today. This is mostly due to the fact that

most organisations today have enormous quantities of information in their

enterprises data warehouse and enterprise document repositories have been getting

larger. The challenge facing knowledge workers in these companies is how to

harness these enterprise-wide resources and how to use them when and where

they are needed most. The majority of the tools deployed in enterprises for search

and knowledge management are fairly generic and provide the same content to

every knowledge worker regardless of their context and task. A typical enterprise

search tool today doesn’t look much different than a typical web search engine,

even though the goals and functions of the users of these two types of systems and

the content they are operating on are vastly different. There has been a lot of

research focused on web search in the past decade (Wang and Zhai 2007; Jansen

and Spink 2006; Joachims et al. 2007) while enterprise search hasn’t received much

attention from the research community (Mukherjee and Mao 2004). The motivation

for the work described in this chapter came from several discussions and interviews

we conducted in Accenture to understand the needs of enterprise knowledge

workers. There were four key areas we identified that could be improved:

• Context and task-sensitive access to information

• Access to fine-grained reusable modular chunks of information

• Intelligent workflow/process support

• Document analysis and checking support for collaborative document

development

R. Ghani • D. Djordjevic • C. Cumby (*)

Accenture Technology Labs, Rue des Cretes, Sophia Antipolis, France

e-mail: [email protected]; [email protected];

[email protected]


171

We describe these areas in more detail later while the rest of the chapter

describes our approaches to solve these challenges in the context of two critical

enterprise knowledge management tasks: enterprise search and collaborative docu-

ment development.

We describe our work on augmenting an enterprise search tool with context

mining, process mining, and visualization technologies that use a combination of

bag of words and lightweight semantic representations, making users more produc-

tive and efficient. We also describe two support tools that were developed to help

improve the quality of documents as well as improve the efficiency of the collabo-

rative document development process. All three of these were tested extensively at

Accenture with enterprise users. As a result of those tests and user feedback,

the enterprise search engine (SABLE) as well as the document development

toolkit were deployed and made available to over 150,000 Accenture employees.

We present evaluation results using usage logs as well as questionnaires that show

that these prototypes are effective at making consultants at Accenture more efficient

as well as helping them find better information while at the same time being easier

to use than existing enterprise tools for these tasks.

9.2 Improving Enterprise Search with Machine Learningand Lightweight Semantics

As mentioned earlier, even though enterprise search tools get used by a large

number of users for a broad set of tasks, the majority of these tools are still fairly

generic and provide the same content and results to every knowledge worker

regardless of their context and task. We identified a need for providing context-

sensitive and process-aware access to information that is automatically tailored to

the particular needs of the task each user is engaged in. Current enterprise search

tools also typically return a list of documents where each document can be hundreds

of pages long. This is common among enterprise document repositories where

Microsoft Word or PowerPoint files can have hundreds of pages or slides making

it difficult for users to find the relevant content within them. Often, enterprise users

are trying to find a specific object that they can reuse for their current need. This

is common when users are looking for certain types of document sections or

PowerPoint slides that can be quickly customized and re-used. Based on our

discussions with users, an important need that recurred was the need to retrieve

reusable “chunks” of content from enterprise search engines, instead of large

monolithic documents. These “chunks” include objects such as sections of

documents or graphics (such as an architecture diagram) that can be reused in

future documents.

In order to tackle these shortcomings of existing enterprise search tools, we

developed SABLE, a research prototype for enterprise search with the following

capabilities (illustrated in Fig. 9.1):

172 R. Ghani et al.

1. Document search: Allows users to search for documents

2. Expert search: Allows users to search for experts within the enterprise. More

details about this are in Cumby et al. (2009)

3. Graphics search: Allows users to search for graphics of specific type

4. Fast previews: Fast, visual previews of documents allows users to quickly

preview documents before downloading them

5. Personalized contexts: Allows personalized re-ranking of search results based

on automatically inferred contexts (illustrated in Fig. 9.2)

6. Visual “soft” filters: Allows users re-rank documents based on visual soft filters

(illustrated in Fig. 9.3)

7. Search collaboration: Allows users to collaborate with and contact other

Accenture users who are looking for similar information

This chapter will focus on capabilities 3, 5, 6, and 7.

9.2.1 Graphics Search

Corporate digital libraries consist of business documents comprising not only of

text data but also of graphics that get reused by knowledge workers working on new

documents. These graphics include process flow diagrams, organizational charts,

architecture diagrams, logos, and graphs which are embedded in documents but not

Fig. 9.1 SABLE search engine

9 Machine Learning and Lightweight Semantics to Improve Enterprise Search 173

Fig. 9.2 Based on the search behavior of similar users, SABLE automatically comes up with

personalized clusters for every user. Clicking on a cluster or moving the “Red Ball” to a cluster (or

in between clusters) re-ranks search results biasing themmore towards the clusters near the red ball

Fig. 9.3 “List-based” search filters only allow users to select a single value. SABLE contains

Visual Filters to let users select multiple filter values by placing the red ball in between several

values

174 R. Ghani et al.

individually available for users to search for and retrieve. Even if a user knows that

he’s looking for a specific type of graphic, say an architecture diagram for the

Microsoft Enterprise search solution, they first have to search for documents,

manually browse hundreds of pages of content and then hope to find the relevant

architecture diagram. Getting access to specific, reusable, pieces of graphics was

a key requirement that came out of our discussions with users of enterprise search

systems at numerous large companies.

We developed a machine learning approach for graphics classification that

automatically extracts graphics from corporate documents and classifies them

into enterprise graphics taxonomy and enables graphics search functionality to

augment traditional enterprise search. We augmented our classification algorithms

with active learning (Settles 2009) to make the system adaptive. As a result, we

developed (1) a method to automate the creation (extraction and consolidation) of

reusable graphics from legacy MS Office documents, (2) analysis of feature extrac-

tion techniques that combine text, OCR, visual features and structural features and

experimental evaluation of the contributions of these different feature sets for

graphics classification, and (3) empirical evaluation of existing classification

approaches and different sample selection strategies for active learning for graphics

classification. More details of the algorithms and experimental results concerning

the graphics classification are described in Djordjevic and Ghani (2010).

We apply this graphics classification capability to the corporate repository at

Accenture resulting in the extraction and classification of over one million graphics.

These graphics are then indexed by a search engine and integrated into SABLE

where a user can type in a text query and select a graphic type (org chart, process

flow, architecture diagram, etc.) and retrieve relevant graphics. We developed this

graphics search engine by indexing the graphics using the text that appears near (or

within the image), and the categories that were assigned to it by our graphics

classification system. We use different term weighting techniques such as weighing

the titles of slides containing graphics more than the body of the slide. The text from

the shape objects is incorporated in the indexing and in case the slide has no text, the

rest of the document (global information) is used for capturing textual similarity.

The classification scores are further used to re-rank the retrieved list of results.

9.2.2 Personalized Contexts

Typical enterprise search systems today return the same search results regardless of

the user, context, or task. We developed the notion of personalized contexts that

integrate context mining and process mining to deliver personalized results that

are relevant to the user and his current context and task. The personalized contexts

feature was enabled by the following technologies:

1. Context Modeling

2. Context Mining


3. Process Mining

4. Context Detection

5. Context Visualization and Switching

6. Resource Re-Ranking

The rest of this subsection describes these components in further detail.

9.2.2.1 Context Modeling

For context modeling, we represent an event as User U accessing a resource

(typically a document) R at time T. U is represented as a set of lightweight semantic

features of users at Accenture such as level, office location, and a set of skills. We

experimented with two kinds of representations for R: a model using lightweight

semantics in the form of metadata that KM curators had tagged documents with

(document type, relevant product offering, relevant industry, etc.) and a bag of

words representation. The experiments conducted to test the effectiveness of

context mining showed that the lightweight semantics representation performs as

well (and sometimes better) as the bag of words representation while being more

efficient in terms of storage and processing time. Our deployed implementation

therefore uses the lightweight semantic representation for R.The data used for context modeling and mining consists of three kinds of

information:

1. Usage logs: Large database of search and access logs with timestamps from our

corporate enterprise search engine spanning 147,000 users, 134,000 documents

and 7.2 million actions over 2 years

2. Document data: Information about the document repository including light-

weight semantics (in the form of metadata about the content, filled in when the

content was uploaded based on a predefined company-wide taxonomy)

3. People data: Structured database of skills information with organizational infor-

mation (company groups, office location, and career level) for each employee

and a list of self-selected skills, along with proficiency rating and the number of

years of experience etc. This database contains the above-mentioned details

for over 170,000 employees

9.2.2.2 Context Mining

After we represent an event E as <U,R,T> for each user U accessing resource R at

time T, we then perform context mining using hierarchical clustering algorithms

(Manning et al. 2008) on all events present in our data set. The details of some of the

context mining algorithms are provided in Chap. 7. As a result of context mining,

we obtain a list of approximately 1,000 generic contexts for the entire enterprise.

These contexts are internally stored as a set of two similarity matrices – one that

contains a membership score for each < person, context > pair and the other

176 R. Ghani et al.

contains a membership score for each < document, context > pair. These matrices

are indexed and stored in memory to speed up contextual information delivery.

The context mining component runs periodically and updates the list of generic

contexts across all users and resources.

9.2.2.3 Process Mining

The process mining component also runs periodically to update the list of informal

sequences of actions across all users and resources. We use the same data as in

context mining. The Process Mining component is based on data mining techniques

to obtain a probabilistic process model. The component constructs probabilistic

temporal models that detect patterns of sequential user actions. For action

modeling, a multi-relational clustering approach is used, where events are logs of

users accessing documents described with a lightweight taxonomy and unstructured

text. The Process Mining component uses Markov Models for discovering frequent

sequences of actions. The processes are internally stored as probabilistic sequences

of meta-data fields. For example:

itemtype ¼ proposal material -> document_type ¼ powerpointis a sequence that might be discovered as a result of PowerPoint documents

being accessed frequently after documents tagged as proposal material. The goal

is to integrate this functionality into SABLE to help infer resources that will be

accessed next and make the search process faster.

9.2.2.4 Context Detection

When a user logs on to the SABLE system and does a search query, we use the

person-to-context similarity of the logged-in user and the document-to-context

similarity of the top 100 documents returned by the user’s search query to generate

a ranked list of global contexts that are most applicable for the current user and

his current needs. We then take the top n contexts as relevant contexts where n is

currently set to 8 for SABLE.

9.2.2.5 Context Visualization/Switching

The context visualization component is given a ranked list of contexts that have

been identified as most relevant to the user and his current information need. These

contexts are displayed in a 2D representation, allowing the user to manually select

nearby contexts and get new search results. They are automatically labeled with the

top scoring metadata values of the context centroid. The user has the ability to look

at the top contexts and move the focus to signal his current context. If the focus

is in-between several contexts, the documents retrieved are re-ranked based on

a weighted sum of the nearby contexts (as shown in Fig. 9.2).


Overall, this component displays the contexts that are inferred to be closest to

the user’s current context and allows the user to manually select nearby contexts

which results in the re-ranking of search results.

9.2.2.6 Resource Reranking

When a user uses the context visualization/switching component, the search results

get reranked using the context and process inferred by the respective components.

For each context, the context mining component has a document-context member-

ship score. Based on the previous k documents viewed, the process mining compo-

nent also returns a probability distribution over the metadata values for the next

most likely document of interest. We combine both of these components and

compute a ranked list of documents relevant to the current context (for the current

user) as a function of previous documents viewed. This re-ranked list is then shown

in SABLE.

9.2.3 Visual “Soft” Filters

The context visualization component is extended to display multiple search filters

in a 2D representation labeled with the most frequent metadata values and allows

the user to manually select co-occurring meta-data values and filter search results.

9.2.4 Search Collaboration

LiveNetLife provides semantically enabled social browsing. We use LiveNetLife to

create awareness among knowledge workers who are working in similar contexts or

on similar tasks and using context information from the context identification

component.

9.3 Improving Collaborative Document Developmentwith Machine Learning and Lightweight Semantics

The ability to collaboratively create documents is a common activity for most

product and service organizations. Most large organizations have a distributed

team of experts working together to create documents for a variety of purposes.

These documents can be project proposals written in response to Request for

Proposals (RFPs), training materials, marketing and sales materials or product

178 R. Ghani et al.

manuals. Typically, the development of these kinds of documents requires the use

of a combination of collaborative formal and informal processes. Depending on the

requirements of each document type, skilled teams must be identified and assem-

bled by the project manager tasked with developing the final document. Where

expertise cannot be identified in time, materials are simply drawn from the central

document repository and adapted as necessary. We describe two support tools

developed to help improve the quality of collaborative documents as well as

improve the efficiency of the document development process. The first prototype,

Document Development Toolkit, is designed to help knowledge workers find

relevant information (and expertise) for the task at hand. The second prototype,

Document Development Workspace, is designed to help large project teams work

collaboratively, supporting the development of documents.

9.3.1 Document Development Toolkit

We developed the Document development toolkit as add-ins to commonly used

document creation tools, specifically Microsoft Word and PowerPoint. The main

motivation for this decision was to support the users in the context of the document

they’re developing and to embed the support tools in the document creation process

as opposed to in a separate application that the users would have to use. The

document development toolkit consists of the following features:

1. Search and visual previews from within Word and PowerPoint (Document,

Graphics, and Expert Search)

2. Search for experts, communicate with them using MS Office Communicator

and adding them to the project team

3. Auto-Suggestion of search terms: Based on the local position of the mouse

cursor in the Word/PowerPoint document a set of keywords is extracted from

the current working position and search-terms are auto-suggested

4. Search refining based on personalized clusters: Same backend functionality as

the context mining and detection in SABLE except the front-end shows the

current context and most similar contexts in a simple word cloud. We had to

modify the front end of context visualization and switching into something more

compact due to screen space limitations in MS Office add-ins

5. Personalized clusters (based on the same context mining described earlier)

6. Interactive document Scrubbing/Redaction functionality to remove client confi-

dential information before the document can be shared

7. Team building, section assignment, and document development workspace

creation

The Document Development Toolkit was designed to help knowledge workers

find the right information (and expertise) for the task at hand. The toolkit is imple-

mented as an add-in for MS Office applications (MS Word and PowerPoint

specifically) and is illustrated in Figs. 9.4–9.6.


The Interactive Redaction Tool shown in Fig. 9.7 addresses the problem of

sharing documents with client sensitive information and encouraging knowledge

workers to contribute content to the enterprise repository at a much larger scale.

An algorithm for document redaction described in Cumby and Ghani (2011)

enables document anonymization in a bottom-up fashion. The approach attempts

Fig. 9.4 MS Word and MS PowerPoint with document search (top) and context visualisation in

a word cloud with eight closest neighboring contexts displayed (bottom)

Fig. 9.5 Document development toolkit tabs. From left to right: document search, expert search,

graphics search

180 R. Ghani et al.

to optimally perturb a document to maximize the classification error for the

sensitive client identity within the set of potential clients for a document (the

confusion set) which is known in advance. When a document is loaded, the add-

in builds a word frequency vector of the existing text and returns the most likely

Fig. 9.7 Screenshot of

the interactive redaction

component integrated into

the document development

toolkit. The user analyzed

a document about Ford

and several sensitive terms

have been highlighted for

inspection

Fig. 9.6 Word cloud based on context detection implemented in the document development

toolkit


client classification based on a Naive Bayes model, along with the words suggested

to redact in order for the document to resemble membership to other client classes.

The user can adjust a slider to change the highlighted list of terms to redact. It also

includes simple named-entity and template-based recognizers for social security

numbers and other personal identifiers which are often necessary to remove when

scrubbing a document for submission.

9.3.2 Collaborative Document Development Workspace

The Collaborative Document Development Workspace is designed to help large

project teams work collaboratively during the rapid development of documents. It is

built on the Semantic Media Wiki (SMW) and allows each team member to

continuously contribute their expertise to the document development process. The

prototype automatically analyses a document to identify a possible response docu-

ment outline and automatically populates it with relevant content retrieved from

the enterprise-wide knowledge base. Using the SMW collaborative environment,

the project manager can quickly organize tasks and meetings with the appropriate

experts, and keep track of deadlines. We have developed the following capabilities

in the Document Development Workspace:

• Dynamically import data that is relevant to the current document (information

about people, credentials, products, similar documents) in to the new workspace

• Allow task allocation, tracking, and management for the project team

• Enable a project manager to set-up a process or import a process that team

members need to follow using a visual process editor

• Enable connection between the knowledge stored in the workspace and the tools

consultants use to develop the document (typically MS Word and PowerPoint)

allowing them to easily access the knowledge and facts stored in the workspace

• Automatic configuration of a new project development workspace, templates for

adding project specific information from enterprise data sources

• Customized for document development with templates and forms enabling

functions such as adding project workspace, adding sections, adding and orga-

nizing meetings, adding and organizing tasks, calendar and timeline view, task

list, etc.

• Dynamic data population from live enterprise sources. These data and properties

are queried via an inline query language provided by SMW and the results

rendered into the different visualizations (e.g. searching for team members

with certain skill proficiency for including them in the team)

• Import capabilities from SMW to MS Word for facts, sections, offerings, etc.

through Office Smart Tag Wiki extension

• A visual process visualization and editing tool for the SMW allowing project

managers to represent and visualize formal processes their teammembers should

follow

182 R. Ghani et al.

9.4 Evaluation

We conducted a series of experiments and user tests for the prototypes described in

this chapter. In this section, we focus on describing the results of our evaluation of

the enterprise search tool SABLE. The final release version was deployed in

Accenture and made available to all of the enterprise users. We had over 2,000

users in the first 6 weeks of this release. The results we describe below come from

(1) analyzing usage logs of the SABLE application, (2) a survey that was answered

by 104 users, and (3) a task-based validation session where a group of 20 con-

sultants were asked to perform a series of tasks and respond to questions about

SABLE.

9.4.1 Web-Based SABLE Survey

As the first step of the validation activity we conducted a web-based survey that was

taken by 104 users.

A summary of the results is given below and shown in Fig. 9.8. See Sect. 9.4.2

for an explanation of the questions:

• 89% of respondents agree that they would like to use SABLE frequently and

82% agree that it is easy to use

• Compared to standard enterprise search tools, 86% find SABLE easier to use,

67% believe it gives better results and 71% believe it gives them relevant search

results faster

• Regarding the Graphic Search Feature, 84% of the users believe that they find

the feature useful for finding relevant information, 80% believe it enables them

to get relevant information faster and 76% agree it gives them better results

Fig. 9.8 SABLE survey taken by 104 users. See Sect. 9.4.2 for an explanation of the questions


• Regarding the personalized clusters, 62% of the users believe the feature is

useful for finding relevant information 54% reported they believe the feature

enables faster (more efficient) search and 55% believed that the feature give

them better search results

• Regarding the Visual Filters feature, 66% of the users believe the feature is

useful for finding relevant information, 58% that it enables faster (more effi-

cient) search and 57% say that it enables better search results

• Regarding LiveNetLife as a Collaboration feature, 71% of the users believe it’s

useful for connecting them with other users looking for similar information,

while 55% agree that the feature might help to get relevant information through

other users

9.4.2 Task-Based Validation for SABLE

In addition to the large-scale survey, we also conducted a more focused task-based

test with 15 “power” users. Based on the requirements gathering process and

interviews with potential users, we designed five information needs that typical

consultants in Accenture would have. We asked 15 users to attempt to fulfill these

information needs using SABLE. The survey conducted for each task asked the

users about the utility of SABLE overall for this task, compared to the standard

enterprise search tools.

The tasks we asked the users to perform were:

Task 1: Find an architecture diagram or solution overview for a content manage-

ment system

Task 2: Find a credential for Biometrics that shows innovative ideas

Task 3: Find training material on the use of social media for internal employees

Task 4: Find vendors or alliance partners that work on BI

For each task, we asked the users to tell us if SABLE helps them find information

faster (Speed) and if it helps them find better information (Quality). After finishing

the tasks, we asked them 17 questions. The first two (Q1 and Q2) asked them if they

would like to use the SABLE frequently and if they found it easy to use. The next

three (Q3, 4, 5) asked them to compare SABLE to the standard enterprise search

engine in Accenture in terms of ease of use, information quality, and speed of

getting information. The next 12 questions focus on specific new features of

SABLE (Graphics Search, Personalized Clusters, Search Collaboration, and Visual

Filters) and asked users if they found the features useful for finding relevant

information, if it gets them relevant information faster, and if it enables better

search.

Some results are listed here and shown in Fig. 9.9:

Q4:100% of users agree that SABLE gives them better search results than the

standard Accenture enterprise search engine

Q5: 83% agree that it makes the search process faster and saves them valuable time

184 R. Ghani et al.

Graphic search (Q6, Q7, Q8): 100% believe it enables them to get relevant

information faster, 82% agree that it gives them relevant information faster,

and 72% agree it gives them better search results

Personalized clusters (Q9, Q10, Q11): 67% believe the feature is useful for

finding relevant information, enables faster (more efficient) search, and enables

better search results

LiveNetLife as a tool for search collaboration (Q12, Q13, Q14): 67% believe it’s

useful for finding relevant information

Visual filters (Q15, Q16, Q17): 80% users believe the feature is useful for finding

relevant information, 80% that it enables faster (more efficient) search

9.4.3 Usage Log Analysis for SABLE

We collected extensive usage data on SABLE. We describe our analysis of that

usage data in this section. The data we report on was collected between January 11,

2011 (the full release date) and February 23, 2011. In that period, 2,064 unique

users used SABLE. Some high level statistics are shown in Figs. 9.10–9.12.

We use the usage logs to measure the impact of using context mining and

detection to re-rank search results. We track each use of the “personalized context”

feature and calculate the change in rank of every document that was viewed

(clicked) by the user. For example, if a user moves the “red ball” to a different

context and then views the document at rank 4 that was previously at rank 14 (with

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Task 1 SpeedTask 1 QualityTask 2 SpeedTask 2 QualityTask 3 SpeedTask 3 QualityTask 4 SpeedTask 4 QualityTask 5 SpeedTask 5 Quality

Q1Q2Q3Q4Q5Q6Q7Q8Q9

Q10Q11Q12Q13Q14Q15Q16Q17

Strongly Disagree

Disagree

About the same

Agree

Strongly Agree

Fig. 9.9 Results of task-based evaluation of ACTIVated SABLE


standard search), the change in rank is 10. We notice that about 66% of the

document views did not result in a change in rank. Figure 9.13 shows the “Change

in Rank” distribution for the non-zero values. The average change in rank (exclud-

ing those that did not result in any change) is 14.5 (out of 100 top results shown in

the search) which shows that when results are re-ranked using context mining and

detection, the documents viewed are ranked on average 14.5 places (14.5%) higher

than they were in the initial search results. This clearly shows that the re-ranking

approach improves search experience and results in giving users not only a faster

way to get relevant results but also reduces the likelihood of missing relevant

documents if they are lower down in the search results.

Fig. 9.10 Visit distribution for users

Fig. 9.11 Query distribution for users

186 R. Ghani et al.

9.5 Summary

We have described several prototypes that were developed to tackle key challenges

in enterprise search and knowledge management. These prototypes used a combi-

nation of machine learning and lightweight semantics to make generic knowledge

management tools context and task sensitive in order to increase the productivity of

Fig. 9.13 Distribution of change in rank after using personalized context feature for the

documents viewed by users. This only includes the distribution for cases where the change in

rank was non-zero. About 66% of documents viewed did not have a change in rank

1

10

100

1000

10000

0 10 20 30 40 50 60 70

Nu

mb

er o

f U

Ser

s

Average Number of Documents Previewed

Fig. 9.12 Distribution of number of documents previewed per session


knowledge workers. We focused on two enterprise problems: (1) Enterprise search

and (2) Collaborative Document Development. We describe an enterprise search

tool employing context mining, process mining, and visualization technologies that

use a combination of bag of words and lightweight semantic representations,

making users more productive and efficient when performing enterprise search.

We also describe two support tools to help improve the effectiveness and efficiency

of knowledge workers collaboratively creating documents. All three of these were

tested extensively at Accenture and two have been deployed and made available to

over 150,000 Accenture employees. Our evaluation results show that these

prototypes are effective at making consultants at Accenture more efficient as well

as helping them find better information while being easier to use than existing

knowledge management tools.

Acknowledgements The research leading to these results has received funding from the European

Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement IST-2007-

215040.

References

Cumby C, Ghani R (2011) A machine learning based system for semi-automatically redacting

documents. Proceedings of the 23rd innovative applications of artificial intelligence confer-

ence, IAAI 2011, San Francisco, CA

Cumby C, Probst K, Ghani R (2009) Retrieval and ranking of entities for enterprise knowledge

management tasks. Semantic search workshop at WWW2009, Madrid, Spain

Djordjevic D, Ghani R (2010) Graphics classification for enterprise knowledge management.

ICDM workshops 2010, Sydney, Australia, pp 562–569

Jansen BJ, Spink A (2006) How are we searching the world wide web? A comparison of nine

search engine transaction logs. Inf Process Manage, 42(1). Formal methods for information

retrieval, Jan 2006, pp 248–263

Joachims T, Granka L, Pan B, Hembrooke H, Radlinski F, Gay G (2007) Evaluating the accuracy

of implicit feedback from clicks and query reformulations in web search. ACM Trans Inf Syst

25(2), Article 7 (Apr 2007)

Manning CD, Raghavan P, Sch€utze H (2008) Introduction to information retrieval. Cambridge

University Press

Mukherjee R, Mao J (2004) Enterprise search: tough stuff. Queue2(2) (Apr 2004)

Settles B (2009) Active learning literature survey. Computer sciences technical report 1648,

University of Wisconsin-Madison

Wang X, Zhai CX (2007) Learn from web search logs to organize search results. Proceedings of

the 30th annual international ACM SIGIR conference on research and development in infor-

mation retrieval, Amsterdam, Netherlands

188 R. Ghani et al.

10

Increasing Predictability and SharingTacit Knowledge in Electronic Design

Vadim Ermolayev, Frank Dengler, Carolina Fortuna, Tadej Stajner,Tom B€osser, and Elke-Maria Melchior

10.1 Introduction

In knowledge-intensive sectors of industry knowledge workers (Drucker 1969) are

central to an organisation’s success – yet the tools they must use often stand in the

way of optimising their productivity. A remedy to the defects of current knowledge

worker tools has recently become substantially in demand across industries. For

example in the ACTIVE project, the three case studies in consulting, telecommu-

nications and engineering design have been driven by this requirement. Knowledge

workers acting alone but more importantly in teams that can be distributed geo-

graphically and organizationally are of a particular concern and focus in our

research themes. One of the themes is the support for informal process knowledge

acquisition, articulation and sharing.

To this theme the notion of an informal (or knowledge) process is central. The

definition by Warren et al. (2009) lays out the ground for the specific features:

“Informal processes are carried out by knowledge workers with their skills, experi-

ence and knowledge, often to perform difficult tasks which require complex,

informal decisions among multiple possible strategies to fulfil specific goals.

V. Ermolayev (*)

Zaporozhye National University, 66 Zhukovskogo st., 69600 Zaporozhye, Ukraine


F. Dengler

Karlsruhe Institute of Technology, B. 11.40 KIT-Campus S€ud, D-76128 Karlsruhe, Germany


C. Fortuna • T. Stajner

Jozef Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia


T. B€osser • E.-M. Melchior

kea-pro, Tal, Spiringen CH-6464 Switzerland



189

In contrast to business processes which are formal, standardized, and repeatable,

knowledge processes are often not even written down, let alone defined formally,

vary from person to person to achieve the same objective, and are often not

repeatable. Knowledge workers create informal processes on the fly in many

situations of their daily work”.

ACTIVE has adopted a service-oriented and component-based approach to its

architecture. Services and components are defined at a number of levels (Warren

et al. 2009). At the bottom level are infrastructure services. At the level above this,

machine intelligence technology is used. For example the process mining service

learns repeated sequences of action executions which constitute running processes

and populates the knowledgebase with these. Finally at the top level are the

applications. One of the case study applications is the management of design

project (DP) knowledge in microelectronic and integrated circuit (MIC) engineer-

ing design. This case study is lead by Cadence Design Systems GmbH (www.

cadence-europe.com), an engineering design services provider in this domain.

It goes beyond the existing performance management solutions by providing the

functionalities of the following two kinds: (1) at the back-end, the learning of

design process execution knowledge from distributed datasets of acquired knowl-

edge; and (2) at the front-end, design project knowledge articulation and sharing –

by providing a lightweight collaboration platform.

The remainder of the chapter is structured as follows. Section 10.2 surveys

the related work in informal process representation, mining and extraction, articu-

lation and sharing. It also outlines the unsolved problems that are further addressed

in our work. Section 10.3 presents briefly the ACTIVE approach to informal

process acquisition, articulation and sharing that helps the knowledge workers

in engineering design navigate their projects. Section 10.4 elaborates on the

architecture and implementation of our fully functional software prototype. Section

10.5 presents the plan for and the results of the validation of the implemented

software in an industrial setting. Section 10.6 discusses the results and draws some

conclusions.

10.2 Related Work

The research and development work presented in this chapter provided

contributions in several interrelated aspects relevant to managing and productively

using informal process knowledge. Our contributions implemented and integrated

in the software prototype (Sect. 10.4) comprise: the representation of informal

process knowledge in the form of ontologies; the methods for informal process

mining and extraction from process logs; the methods for informal process knowl-

edge articulation and sharing using a visualization and superimposition approach.

This section analyses how our results are positioned relative to other work in these

directions.

190 V. Ermolayev et al.

10.2.1 Process Knowledge Representation

The mainstream in process modeling is represented by enterprise and business

process representations – in the form of ontologies or languages. Among the

ontologies the following results have to be mentioned: the Enterprise Ontology

(Uschold et al. 1998), Toronto Virtual Enterprise Ontology (TOVE) (Gr€uningeret al. 2000), and more recently the theoretical work by Dietz (2006) and the

reference ontology for business models developed in Interop, an EU Sixth Frame-

work Programme (FP6) Network of Excellence (Andersson et al. 2006), see also

www.interop-vlab.eu. The business process modeling community has developed a

variety of languages with the major objective of representing business processes as

the executable orchestrations of activities. The most prominent examples of such

languages are: PSL (Bock and Gr€uninger 2005), BPEL and more recently

WS-BPEL (docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.pdf), BPML and

more recently BPDM (www.omg.org/spec/BPDM/1.0/). A more comprehensive

approach to semantic business process modeling and management has been devel-

oped in the FP6 SUPER project (Hepp and Roman 2007). A major shortcoming of

the listed results is that they are not supposed to provide a means to model informal

processes as denoted in the definition by Warren et al. (2009).

One of the relevant approaches to modelling and representing informal processes

has been developed in the FP6 Nepomuk project (Grebner et al. 2006). A short-

coming of the process representation in Nepomuk ontologies is the limitation of the

scope only to the tasks performed on the computer desktop.

Our approach to informal process representation builds on the work in dynamic

engineering design process modeling of the PSI and PRODUKTIV þ projects

(Ermolayev et al. 2008). Our contribution in ACTIVE lies in the development of

the lightweight knowledge process representation for engineering design that is

essentially a micro-ontology providing a simplified yet sufficiently expressive view

of a design process to be visualized for articulation and sharing. This micro-

ontology is aligned with the ACTIVE Knowledge Process Model (Tilly 2010)

through the PSI Upper-Level Ontology where the latter is used as a semantic bridge

(Ermolayev et al. 2008).

10.2.2 Informal Process Knowledge Mining and Extraction

Process mining is a set of data mining techniques, focused on constructing process

models out of a large number of events. The purpose of these techniques is to

discover process, control, data, organizational and social structures from event logs.

Practical usefulness of process mining in our setting is twofold. Firstly, it allows

inferring a process model when such a model did not exist in an explicit form.

Secondly, it allows devising alternative models to the primary one to enable compar-

ison of different possible interpretations with regard to complexity and observe the

10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 191

extent to which the primary model is being followed. Knowing the differences

between the actual process model and the mined process model is crucial when

optimizing the process. In the case of microelectronic design the most interesting

part of the structure is the process itself. Although process mining in general also

considers more organizational and social structures (van der Aalst and Song 2004),

the design process for a particular design artifact is focused mainly around a single

knowledge worker and his design decisions. The knowledge process which we are

trying to uncover is a product of a designer’s experience and intuition and is rarely

explicitly documented. This makes it valuable to ensure productivity and at the

same time difficult to capture manually.

There exist several different approaches to process mining each producing

models of varying expressivity. The selection of an appropriate model is influenced

by the properties of the event log and the expressivity requirement of the process

model. However all process models are based on the notion of states – at any point

in execution the process resides in some state. Multiple-actor models permit several

simultaneous states although in the electronics design domain a single process

instance is usually executed by a single designer. Another differentiating point

between various classes of process models is the semantics of the transition

between states which affects the expressivity of the model.

In informal knowledge processes, the states are often not well-defined which

requires solving this issue before tackling the process mining problem. One

approach that we incorporated in knowledge process mining software from other

application domains was to perform clustering on event logs and use the clusters as

proxies for states (Stajner et al. 2010) controlling the complexity of the process

model via the desired number of clusters.

In terms of transition modeling the most straightforward approaches consider

the Markovian assumption: each transition to a new state is dependent only

on the previous state. Usually the transition probabilities are statistical estimates

of the conditional probability of one state directly following another (Hingston

2002). We have explored this approach in related knowledge worker scenarios and

discovered that simple Markovian models work well for very fine-grained low-level

events (Stajner and Mladenic 2010). A side effect of using such models is that they

tend to have many states and transitions which make them difficult to interpret.

Because of this we often resort to de-noising the model by pruning the transitions

which we consider to have little information. For this purpose Probabilistic Deter-

ministic Finite Automata are often used, for which statistically well-founded

techniques for determining significant transitions are available (Jacquemont et al.

2009).

In environments where minute variations in activity order are not critical these

can be further relaxed to the conditional probability of one state following another

within a time window. Such a relaxation results in a slight decrease in expressive-

ness since a transition only means that a particular event has occurred within a

given time window before some other event. However this payoff avoids too much

sparsity especially when we are constrained by having many distinct activities in a

relatively short event log.


When higher expressiveness in control structures is required we can consider

using the family of process mining techniques based on Petri nets implemented in

the ProM framework (van Dongen et al. 2005). This approach provides for

modeling patterns beyond the Markovian assumption, allowing logical structures

such as conjunctions, disjunctions, splits and joins. Although the compromise is

that they do not operate probabilistically an important benefit of Petri net-based

approaches is that the models can also be transformed into extended Event Process

Chain (eEPC) diagrams which are more familiar to process analysts and more

amenable to comparison with formal process models.

All of the aforementioned models can be expressed with particular subsets of PSI

ontology terms. In that sense the PSI Suite of Ontologies provides the common

knowledge representation formalism for knowledge integration, fusion and

visualization.

10.2.3 Informal Process Knowledge Articulation and Sharing

The spiral of knowledge (Nonaka and Takeuchi 1995) introduces different knowl-

edge conversions which is a fundamental part of sharing knowledge. People can

share tacit knowledge with each other (socialization), but this is a rather limited

form of sharing knowledge. Knowledge articulation within companies is the pro-

cess of making tacit knowledge explicit (externalization). This explicit knowledge

can be combined with other explicit knowledge (combination) and shared through-

out an organization. Other employees extend and reframe their tacit knowledge

with explicit knowledge by internalizing it (internalization). There are different

ways to articulate and share informal process knowledge, but in all cases the

informal process knowledge has to be made explicit. For instance process knowl-

edge can be visualized manually or with tool support. In contrast to the visualiza-

tion approach, the process knowledge can also be stored and shared within the

system by using it directly for recommendations (Dorn et al. 2010).

10.2.4 Articulation and Sharing using Visualization Approach

It is natural for a human to use visualized representations of artifacts in general and

of processes in particular. Research in psychology, human memory models, image

recognition and perception reveals that graphical representations are comprehended

much easier and with lower effort than equivalent textual ones (Crapo et al. 2000).

Therefore process visualization is one of the mature instruments to articulate

processes thus enabling users to easily understand the logic of a process.

Most process visualization techniques are included in process modeling

activities, which can be centralized or decentralized. An abundance of modeling

methods and tools like ARIS (Scheer and Jost 2002) and IDEF3 (Mayer et al. 1995)


have been developed to ease the standardization, storage, and sharing of process

visualization. Unfortunately these tools are not sufficient for modeling collabora-

tive, decentralized processes. Therefore other approaches like CPM (Ryu and

Y€ucesan 2007) have been introduced.

In the area of knowledge processes additional methods and tools like KMDL

(Gronau et al. 2005), PROMOTE (Woitsch andKaragiannis 2005) andCommonKADS

(Schreiber et al. 1999) have been developed extending themethods and tools mentioned

above. In addition, semantic wikis combine the collaborative aspects of wikis (Leuf

and Cunningham 2001) with Semantic Web technology to enable large-scale and inter-

departmental collaboration on knowledge structures. Such features of semantic wikis

have been extended to support process development (Dengler et al. 2009), enterprise

modelling (Ghidini et al. 2009) and workflows (Dello et al. 2008).

Our contribution in process visualization is the enhancement of the existing

Semantic MediaWiki (SMW) process development approach (Dengler et al. 2009)

to visualize and discuss informal processes.

Our project navigation approach is based on offering a collaboration platform to

knowledge workers that facilitates socializing, externalizing and internalizing

design project knowledge using visualization. Visualizaton of project knowledge

helps to combine and internalize explicit project knowledge in a way that suggests

productive continuations of project execution.

10.3 The Approach to Project Knowledge Navigation

The goal of the presented case study is providing a software tool for design project

managers in MIC that will articulate and facilitate sharing knowledge about good

development practices in this domain.

An objective of a project manager as a knowledge worker is finding a reasonablebalance between the available and the achievable in order to meet the requirementsof a customer and accomplish development in his project with the highest possibleproductivity. The complexity of this task in modern design environments is beyond

the analytical capabilities of even an experienced individual. A manager has to find

an optimum in a solution space that has many facets: product structure comprising

possibilities for block reuse; the compositions of the development team involving

required roles and capabilities of the available individuals; the choices of the tools

for performing design and corresponding design methodologies; the resources

available for the project; project constraints and business policies, etc. One more

complication may appear in the course of the execution of the project – the

circumstances may change because of external events. Hence a previously good

plan may turn out to be not acceptable for the follow-up. Re-planning may therefore

be required at any moment.

Project managers use their working experience and intuition for taking planning

decisions under these complex conditions. In fact they rely on following good

practices and using the suggested development methodologies that they used in


the past and which constitute their tacitworking knowledge of project management.

Our working hypothesis in this research was that offering a software tool for:

• Eliciting good development practices as stable working patterns from the design

project logs

• Visualizing those practices of past projects, the plan and the state of the execu-

tion of a current project

• Facilitating moderated discussions among the members of the development team

on different aspects of a project

will decrease the complexity of making decisions for a project manager and

increase the robustness of his knowledge work. Such a tool would essentially

make the tacit knowledge of project managers within a company explicit – i.e.

articulate and facilitate sharing good project management and engineering design

practices.

For checking this hypothesis the software tool prototype of a Design Project

Visualizer has been developed in the case study. The tool implements a project

navigation metaphor – helping a knowledge worker find a productive execution

path through the state space of an engineering design project.

It is known for informal processes in general and the processes of engineering

design in particular that the paths to a desired outcome cannot be specified in

advance – before the process starts. Instead, a knowledge worker has to make his

decision about a follow-up action by choosing among the possible continuation

alternatives in an arbitrary process state – very similarly to the decisions made by a

driver on the road. Drivers use navigation systems that suggest the ways to go for

bypassing traffic jams, choosing a faster or a cheaper way. A similar approach is

employed in our work for helping a project manager to make decisions about the

continuations of his design project.

The Design Project Visualizer, like a car navigation system, provides the

visualized views of the basic “terrain” map. These views are product structures,

methodology flows that are either generic or bound to a particular product structure,

Work Breakdown Structures (WBS). These representations are essentially provided

by a project manager in a top-down fashion when he plans and kicks-off the project.

The Design Project Visualizer also assists in finding out where the project is on

the “terrain” at a specific point in time. The knowledge about the execution of the

project is mined from the available project log datasets, transformed to the terms of

the used ontology, stored to the knowledgebase, and superimposed onto the project

execution plans.

Unlike a car navigation system the Design Project Visualizer is a tool for team

work. It provides the infrastructure and the functionality for moderated discussions

attached to a visualized representation of any kind of a project constituent. By that it

facilitates making more informed decisions that are also more transparent to the

team members and are elaborated and approved with their active participation.

For constructing the necessary building blocks of the project maps and execution

tracks we first looked at the tasks of the project managers in their everyday work


and extracted the typical tasks of the project planning and execution management

that may be effectively facilitated. Those typical user tasks (Fig. 10.1) are

• Analyze the requirements and develop the structure of the product

• Choose development methodology and compose the team

• Develop the work breakdown structure

• Monitor the execution of the project

After extracting the typical tasks we decided about the requirements for the

functionality of the software tool. The requirements were elaborated by looking at

the working practices of a project manager in MIC design and extracting the use

case scenarios.

10.4 Prototype Architecture and Implementation

The architecture of the fully functional prototype of the ACTIVE Design Project

Visualizer is pictured in Fig. 10.1. It comprises both the back-end and the front-end

components and involves several ACTIVE technologies. Design process knowl-

edge acquisition is done mainly at the back-end while the functions of knowledge

articulation and sharing are offered by the front-end. As shown in Fig. 10.1, the

prototype helps users to perform their typical tasks of design project management.

Therefore it could be classified as a project management tool. The tool monitors the

environments of the managed engineering design project, that are design systems,

DP Execution Instances

Design Process

Temporal Probabilistic

Process Model

Cadence Knowledgebase PSI

Process Mining and Extraction Component

AKWS Server

DP Execution Logs

ACTIVEFront-end

Design Environment

Design Process

Design Environment

DP Monitor and Data Collector

Design Projects

Collaborative Platform (SMW)

DP Visualizer

Project MapMO

MO

Typical Tasks

Develop Product Structure

Select Methodology

Compose Development Team

Generate WBS

ACTIVE Back-end

CadenceFront-end

CadenceBack-end

WBS

MO

Track DP Execution

DP Monitor and Data Collector

…

SMW Connector

static top-down knowledge DP representation instances(top-down and bottom-up)

dynamic bottom-up knowledge

Cadence Project Management Tools

(CFI Framework, ProjectNavigator)

Legend: – ACTIVE component; – ACTIVated component; – Cadence component

– back-end component; – front-end component

Project Execution

Trace

Fig. 10.1 The configuration of ACTIVE and Cadence technology components for design project

knowledge acquisition, articulation and sharing


and allows for run-time extraction of the process-related knowledge in a bottom-up

fashion. The normative, methodological and static parts of project knowledge are

provided via the external tools in a top-down manner. The external tools are the

ProjectNavigator and the Cadence Flow Infrastructure (CFI) Framework – c.f.

Fig. 10.1. Hence, the prototype exploits the superimposition of the top-down and

bottom-up project knowledge for making its articulation and sharing more efficient

and effective.

Acquisition is done by incremental collection of the new knowledge about the

executions of design processes through monitoring design processes in their

environments and mining the dataset containing design process execution logs –

using the ACTIVE Process Mining component based on the probabilistic temporal

process model (TNT) (Grobelnik et al. 2009). The approach to process mining is

based on the generation of the Hidden Markov Models (Rabiner 1990). As outlined

in Fig. 10.1 the ACTIVated Design System Framework tools monitor the design

environments and the design processes and collect the data about the low level

events in the respective datasets. The datasets are further fed to the Process Mining

Service of the ACTIVE Knowledge WorkSpace (AKWS) Server that produces the

instances of the segments of the executed design processes in terms of the PSI Suite

of Ontologies. These instances are further stored in the Cadence Knowledgebase.

Articulation and sharing are done by visualizing different facets of DP knowl-

edge in the collaborative front-end using the SMW (Kr€otzsch et al. 2007) as a

platform – the ACTIVE DP Visualizer. Visualization functionality is structured

around the typical tasks that DP managers perform in their everyday business

(upper part of Fig. 10.1). The kinds of visualization pages are those for: product

structures; generic methodologies; product-bound methodologies1; tools; and actor

roles. These primary functionalities are supported by decision making instruments

for conducting moderated discussions – the discussion component as an extension

of SMW and LiveNetLife (www.livenetlife.com), an application for contextualized

interactive real-time communication between the users of a web site.

10.4.1 Knowledge Representation

A challenge in the development of knowledge representation for the case study and

the software prototype of the Design Project Navigator was finding a proper balance

between:

1A product bound methodology is a superimposition of the segments of the generic methodologies

appropriate for the particular types of functional blocks in the structure of the product to be

designed.


• The background result, the PSI Suite of Ontologies for MIC Engineering Design

domain used at Cadence, and the model of a knowledge process (KPM)

developed in ACTIVE (Tilly 2010)

• The expressive power of the knowledge representation of the Cadence knowl-

edge base (PSI Ontologies) and the lightweight character of the enterprise

knowledge structures developed in ACTIVE caused by the lightweight character

of the SMW used for the prototype development

The first aspect was essentially a harmonization problem. For harmonizing the

KPM with the PSI Suite of Ontologies the PSI Upper Level ontology (PSI-ULO)

has been used as a semantic bridge (Ermolayev et al. 2010b). Please refer to isrg.kit.

znu.edu.ua/ontodocwiki/ for the online documentation of this suite of ontologies.

The harmonization process was bidirectional. On one hand the Suite of PSI

Ontologies has been refined by cleaning the representations of process patterns

and processes. This work led to fixing the v.2.3 release of the PSI Suite. On the

other hand the KPM has been revised by aligning it to the PSI-ULO. This work led

to the final release of the KPM (Tilly 2010).

The second problem was the selection of the minimal required part of the PSI

Core Ontologies v.2.3 as the lightweight background knowledge representation for

the Design Project Visualizer. For that the requirements based on the analysis of the

typical user tasks and use cases have been applied and resulted in the development

of the micro ontology for the case study.

10.4.2 The Back-End: Process Mining and Extraction

The actual implemented workflow is as follows: first, the designer’s workstation is

instrumented with logging tools that capture his activities, their outcome and

measure the time that was required to complete those activities. Once the data is

exported the process mining software loads the sequence logs and constructs a

process model based on probabilistic deterministic finite automata.

The fitness for inclusion of a transition between two generic tasks in the process

is evaluated using the following procedure: given an error rate and sample size,

we use a statistical sequence mining technique to determine constraints for inclu-

sion of individual transitions in the process model as presented in Jacquemont et al.

(2009). We apply a criterion called proportion constraint. Given a desired risk

factor and an actual process execution log, we can compute the empirical probabil-

ity of every possible transition from one state to another. This can then be used as a

basis for determining whether every transition in the process is statistically signifi-

cant given the observed grounding in the process execution log. A benefit of using a

statistical approach is that the only parameter that the process analyst needs to

specify is the risk factor which corresponds to the expected false positive rate. This

parameter is easier to understand and specify than some arbitrary probability

threshold. We have found that pruning the model following this approach does


not affect the predictive power too adversely, while significantly reduces complexity.

Following that, the software outputs the two sets of results:

• The process pattern model expressing process state transition patterns as Generic

Tasks and Generic Activities – action patterns. The model also specifies statistical

dependencies between individual action patterns as possible output and input

configurations.

• The actual design process instances in terms of Tasks, States and Activities to

which concrete Actor and Design Artifact representation instances are related.

An instance of the whole design process is considered a top-level Task managed

by the Actor. The top-level Task comprises the lower-level Tasks for each pass

of the design process. The passes are further decomposed into atomic steps that

are represented as leaf-level Tasks and Activities. After the execution of each of

these steps the resulting Design Artifact Representation becomes different

reflecting the fact that each step of the process brings the process closer to the

target final Representation – the tape-out of the Chip.

This output is the native input for the visualization and sharing infrastructure,

provided by the back-end functionality (SMW Connector, Fig. 10.1). At the same

time, this provides a common interface for the consumption of process models and

instances from other PSI-based design project and process management tools. From

the perspective of process instances the solution resembles to some extent the ProM

framework which defines the MXML format for expressing event logs. However

ProM does not prescribe any terms for expressing process models. We observe that

the use of the ontology, PSI Suite in our case, for expressing both process models

and process execution logs is preferable in terms of integration for the purpose of

informal process knowledge management.

Furthermore, the Miner tool allows more complex queries of the mined design

process steps, among which also queries related to times spent on particular steps,

iterations, etc. For instance, it can be seen which designers executed a high number

of activities (dcart in the example in Fig. 10.2) and wich artifact required these

activities (AAMP in Fig. 10.2). The tabular interface allows browsing for more

details in the results of a query – please refer to Fig. 10.2. Other information such as

CPU time, duration of time spent by the designers per artifact or per step is also

possible to query and visualize using the Miner interface. By checking the numbers

for errors logged in the design process we observe that the median value was zero

and the average value was around 2,600. However the maximum number of logged

errors was over 1.5 million per activity while the sum was almost 13 million errors.

The tool allows tracking down the project that generated these errors and the

designer that was executing it. At a closer inspection, it can be seen that these

errors occured during one iteration of the design of NV_RAIL. Details about the

most frequent activities as well as their sequence executed within that design

process can be visualized. These visualizations may further be used for taking

remedial actions in a design process.

In the example dataset the most frequent activity turns out to be Extraction_PRHO.

This activity has normal accomplishments (there was no forced manual or automatic


abortion), on average it takes about 30min to complete (1,700,869ms), lowCPU time

and low number of errors/warnings (Fig. 10.3). However, the second most frequent

activity – SI_Analysis_Signoff – generally does not have a normal termination, on

average demonstrates more errors and warnings, and on average takes over 3 h to be

completed (Fig. 10.4). Due to the abnormal exists the duration of this activity is often

not known (i.e. end time is not logged). The abnormal exits are correlated with

unknown durations of activities.

Fig. 10.2 Designers, design artifacts and development activities mined from the U.S. dataset

Fig. 10.3 Details for the Extraction_PRHO activity


10.4.3 The Front-End: Process Visualization and Discussion

The front-end – ACTIVE DP Visualizer – for process visualisation and discussion

of design projects is based on SMW and has been implemented by extending the

existing process visualization approach (Dengler et al. 2009) and by developing

additional result printers for SMW to visualize and export the WBS – namely the

Gantt chart and the XML export result printers that are the part of the Mediawiki

Semantic Project Management extension: (www.mediawiki.org/wiki/Extension:

Semantic_Project_Management). The screenshots of the characteristic features of

the DP Visualizer are shown in Fig. 10.5.

This extension builds on the capability to query for semantic properties which is

provided by SMW and displays query results as process graphs, Gantt charts or

XML-files containing the WBS in XML schema to be further imported into MS-

Project. For the back-end a software connector has been developed that imports the

knowledge stored in the Cadence Knowledgebase into the SMW pages (Fig. 10.1).

Each element of the micro-ontology is represented as a wiki page containing

annotated links to other wiki pages (properties). These properties are queried via an

inline query language provided by SMW and the result is rendered into the

destination format required by the different visualization libraries.

For supporting collaboration the functionality for working with talk pages has

been developed as another SMW extension (Fig. 10.6). Talk pages corresponding to

visualized project elements can be created to discuss pros and cons of product

structures, methodologies, WBS and project execution progress. This collaborative

discussion functionality has been enhanced with semantics to add metadata to each

comment and allow querying. Therefore special wiki templates (www.mediawiki.

org/wiki/Help:Templates) and semantic forms (www.mediawiki.org/wiki/Exten-

sion:Semantic_Forms) have been developed. A summary icon is used to display

corresponding discussion activities including the sum of pros and cons (Fig. 10.5)

within the product structure, methodology, and Gantt chart visualizations.

Fig. 10.4 Details for the SI_Analysis_Signoff activity


Product Structure visualization

Generic Development Methodology visualization

Work Breakdown Structure with superimposed execution status

a

b

c

Fig. 10.5 Characteristic features of the Design Project Visualizer (a) product structure

vizualization; (b) generic development methodology vizualization; (c) work break down structurewith superimposed execution status


10.5 Validation Setup and Results

The development of the fully functional prototype was conducted in an iterative

design process focussed strongly on user needs and on the organizational

requirements for the application. The chip design process is very demanding in

terms of adherence to detailed technical requirements and standards, consequently

we have to focus strongly on testing detailed user and design processes. Different

types of tests were carried out throughout the development process. Each software

version was tested repeatedly in a sequence of tests, where as a general rule the

subsequent level of test was carried out after the lower level test returned a

satisfactory result, usually after a number of iterations:

• T1. Dry Runs and Technical Appropriateness: Experts and user representatives

who are familiar with the application context assess if the software is bug-free

and consistent with the requirements and suggest design improvements.

Fig. 10.6 Collaborative discussion functionality for managing invitations and working with

arguments


• T2. Usability in representative user tasks: User representatives and experts assess

the software in representative tasks (Sect. 10.3) according to quality-of-use

criteria and conformance to requirements.

• T3a. Information quality for users: Experienced users assess the quality of

information generated by the application.

• T3b. User satisfaction, acceptance: Representative samples of users use the

application in a realistic working context. User satisfaction and indicators for

the likely acceptance of the application are measured.

The validation results described here are part of T2 of the final application

prototype, with a main objective to assure that the software is satisfactory and

acceptable as a tool for the typical tasks of the prospective user population. The

likely effect on productivity will be considered, but this is the main question to be

answered in the subsequent T3 test.

10.5.1 Validation Plan

The validation trials were planned in two phases and are defined by combinations

of: (1) a software component; (2) a representative user task used as a frame for the

validation; (3) an external tool used as a benchmark for the assessment; (4) a source

dataset. The summary of the validation plan is given in Table 10.1.

Generic validation workflows have been developed for all the kinds of validation

trials (example in Fig. 10.7). These workflows, though identical in their nature and

goals, differ in the use of validation metrics, instruments (kinds of the

questionnaires to be filled in) and collaboration patterns – some are for individual

execution while the other are for a group of collaborating trialists.

Within the validation phases the set of validation tasks have been specified based

on the typical user tasks. The generic workflows have been instantiated for each

validation task. For example the task of validating usability (T2, Fig. 10.7) has been

decomposed into four lower level tasks as pictured in Fig. 10.8. In turn each of these

lower-level tasks have been performed using different instantiations of the generic

validation workflow for T2 developed according to the validation scenarios of the

particular lower level tasks. One example is given in Fig. 10.9.

The summary of the front-end component validation plan at phase 1 is given in

Table 10.2.

The expert test persons either decided if specific requirements are met or not met

(yes, undecided, no, unable to answer question), or made quality assessment on a

7-valued scale. Three separate on-line questionnaires with 75 questions in total

were answered by each test person.

10.5.2 Validation Procedure

The objectives of the validation was to verify that the Design Project Visualizer

corresponds to the specified technical and functional requirements of users, and to


Test Componentin a Typical Task

Assess Usabilityin Typical Tasks

Fill-in the on-lineQuestionnaire

On-line UsabilityQuestionnaire

Trialist (Moderator)

Several Trialists(Participants)

Fig. 10.7 The collaborative generic workflow for usability (T2) validation trial

Table 10.1 Validation phases and types for different validated components

Validated

components

Dry

runs

(T1)

Usability

for

typical

tasks

(T2)

Information

quality for

users (T3a)

User satisfaction,

acceptance of solutions

(T3b)

Comment

Completeness Correctness

Phase 1: Validation based on the simulated data. Verification tool – ProjectNavigator

Back end A trialist assesses the

components by

performing a

typical task on a

simulated project.

DPE � � � � �Front end

SS þ þ * * �PV þ * * * �DC þ þ * * �LNL þ þ þ þ �Phase 2: Validation based on the U.S. dataset. Verification tool – CFI framework

Back end A trialist assesses the

components by

performing a

typical task on a

real project that

has been

accomplished and

logged in the past.

DPE þ � þ þ þFront end

SS � þ þ þ þPV � þ þ þ þDC � � þ þ þLNL � � þ þ þDPE design process miner and instance extractor, SS semantic search component, PV design

project visualizer, DC discussion component, LNL LiveNetLife component, “–” not validated, “*”

partially validated because the data is simplified and artificial (simulated project), “þ” validated


validate the quality of use of the prototype with focus on the appropriateness of the

functionality for the tasks of professional users.

The test was performed by six professional experts with different roles in

Cadence: Engineering director with profound expertise in design, verification and

implementation; project manager; design project manager; knowledge engineer.

The evaluation was based on a task scenario composed of four typical tasks with

several sub-tasks each (described above). The test users performed the tasks on their

own; the discussion task was carried out in cooperation with the other test users.

After the completion of each task an on-line checklist for testing conformance

Validation Phase 1 (Simulated Data)

(T2.1)Usability for

“Develop Product Structure”

(T2.2)Usability for

“Choose Development Methodology and form

Development Team”

(T2.3)Usability for

“Generate Work Breakdown Structure”

(T2.4)Usability for

“Monitor Project Execution”

(T1) Dry Run

(T2) Usability

for Typical Tasks

(T3a) Info Quality for Users

(T3b) User Satisfaction and

Acceptance

(T4) Field Test

Fig. 10.8 The hierarchy of validation tasks for usability validation (T2) at Phase 1. The lowest

level corresponds to the typical user task based validation

Test Project Visualizer

Assess Usability

Fill-in the on-line Questionnaire

On-line Usability Questionnaire

Trialist (Moderator)

Several Trialists (Participants)

Test Discussion Component

Test Semantic Search

Test LiveNetLife

Examine Product Structure in CDNS ProjectNavigator

Fig. 10.9 The instantiation of the generic validation workflow (T2) for the validation task T2.1

highlighted in Fig. 10.8

Table 10.2 Validated front-end components per a typical user task

Typical user task Semantic

search (SS)

Project

visualizer

(PV)

Discussion

component

(DC)

LiveNetLife

(LNL)

Front-end

Develop product structure * * þ þChoose development methodology

and form development team

* * þ þ

Develop work breakdown structure

for the project

� * þ �

Monitor the execution of the project � * þ �


related to the task was completed. After executing the entire task scenario each

expert completed an on-line questionnaire with questions about the quality of use of

the prototype and the usefulness of the functionality for the task scenario. The three

questionnaires were composed of standard and proven scales, and questions related

to the specific functionality and context of the chip design process.

The critical question to be answered is whether the prototype is sufficiently

mature for further, full scale tests. The experts testing the prototype are representa-

tive of the decision makers who will decide the acceptance, and thus their decision

determines the organizational acceptance of the solution.

10.5.3 Validation Results

10.5.3.1 Conformance with Technical Requirements

In total, 6 senior experts participated in the test. The resulting sample size was

therefore too small for a statistical analysis. Hence, we discussed the results on a

case-by-case basis putting focus on the justifications and explanations given by

these experts in their assessments of the prototype. Overall, five of six professional

experts have accepted and approved the Quality of the Design Project Visualizer,

and four experts approved the Completeness of the Design Project Visualizer. The

following reservations were formulated:

One expert did not approve the Quality of the prototype because the software

was not able “to fully visualize the complex structures, interfaces and correlations

of Design Projects.” Two experts did not approve the Completeness of the proto-

type because of a “lack of flexibility, agility and completeness” of design project

visualizations.

Several experts were undecided about the quality and correctness of the Design

Project Visualizer because some specific elements (for example interfaces between

the functional blocks within a design artifact) while considered essential were not

included in the simulated input data. Therefore quality and correctness could not be

proven in this respect. For instance four experts were undecided whether the Gantt

chart representation of Work Breakdown Structures indicates the progress of the

project appropriately.

10.5.3.2 Conformance with Functional Requirements for the DiscussionComponent

The majority of experts (4 out of 5) agree that the functionality of the Discussion

Component meets their requirements. One expert was unable to subscribe to

discussions and thus did not exercise all of the functionality. Some experts are

undecided about the functionality for summary boxes because the summary boxes


contained summaries for simulated design project data only. This may be an issue

for further investigation, or at least a further test with real data.

10.5.3.3 Quality of Use of the Design Project Visualizer

The opinions about the visual presentations of product, generic methodology and

Work Breakdown Structure descriptions are divided, ranging from somewhat

positive to somewhat negative. All experts but one disagree or are undecided that

the visual presentations are appropriate for performing the typical tasks of the task

scenario.

The information quality of the visualizations was doubted: “. . . a visual repre-

sentation can only represent a part or a high level view of the overall process or

working patterns”. The visualization of working patterns, dependencies, roles,

tools, etc. are “too fragmented and difficult to connect for a non-savvy project

manager”. “. . . the WBS or Gantt representation cannot capture the full content and

properties which are needed to perform a design activity”.

To conclude, the experts have raised doubts about the quality of use of the

Design Project Visualizer.

10.5.3.4 Quality of Use of the LiveNetLife Component

The quality of use of the LiveNetLife component is judged positively. However,

LiveNetLife is competing with tools which are currently in use at Cadence.

LiveNetLife would have to demonstrate a differentiating benefit.

It was observed that LiveNetLife does not compute the similarity with other

users very reliably, a feature which might be an added value over competing tools.

10.5.3.5 Conclusions

The results of the validation of the Design Project Visualizer indicate that the

solution can provide expert assistance to design project managers performing

the typical tasks of project planning and execution management. According to the

professional experts the conformance with requirements tested on the basis of

simulated data in the Design Project Visualizer meet the technical and functional

requirements. The critical statements of experts relate to added value. This must be

proven by the information quality, which is the innovative feature enabled by

semantic backend processes.

The reservations may also be due to inconsistencies in the Knowledge Base. The

next test will be conducted with a real data set. Apart from checking the information

quality the objective of this test will be evaluating if the prototype can handle large

and complex projects (scalability) – another important issue which must be

investigated further using real data.


Overall, after repeated iterations the components of the prototype have achieved

a mature level. The functionality for discussions meets the requirements and is

judged to be satisfactory for users.

The quality of use of the prototype overall fulfills minimum usability requirements,

“though some features could have been implemented in a more functional and user

friendly way. The reason is the lightweight nature of the basic platform (SMW)”.

There is potential to improve semantic search, although users can cope with

the shortcomings of semantic search because the navigation and browsing in the

SMW works well.

The added value of an innovative application is important for its acceptance and

uptake. Diverse expert opinions about the added value of the Design Project

Visualizer can be explained by the different roles of the experts. Engineers prefer

to keep administrative work at a minimum level and therefore do not recognize the

added value of the tool directly. Project managers currently collect administrative

information manually (for example in project meetings). Automated data collection

and representation in Gantt charts would be an added value for this group of users.

The experts see the basic user functionality as acceptable but remain to be per-

suaded of the benefit of the new technology. Some experts asked how the Design

Project Visualizer will improve the productivity.

The prototype was compared with competing tools used at Cadence (e.g. the

visualization of Functional Blocks, re-used IPs, IPs libraries and interfaces are

already captured in existing Cadence tools). The experts fear that the integration

of a Design Project Visualizer into their work processes may cause additional

overhead (e.g. by having to ensure the consistency of several databases) instead

of increasing productivity. High upfront cost for individual users incurs a substantial

lag before benefits and added value are visible. Therefore users have to be con-

vinced that improved information quality will offset the upfront cost. The added

semantic backend-functionality should add significant value for users by providing

them with information with higher (pragmatic) information quality. This should

improve the cost/benefit ratio sufficiently to assure acceptance by the organization

and individual users at the workplace. Further tests will be conducted to collect data

on this issue.

10.6 Summary

The chapter presented the results of the case study of the ACTIVE integrated

project on the use of knowledge process learning, articulation and sharing

technologies for increasing the performance and decreasing the ramp-up efforts

of knowledge workers managing designs of Microelectronics and Integrated

Circuits. One of the most important characteristics of a design project in general

and in this domain in particular is a very low proportion of the use of predefined

workflows. Due to that the processes in engineering design are to a substantial

extent informal. Instead of following rigid working patterns, the knowledge


workers exploit their tacit knowledge and experience for finding the most produc-

tive way through the “terrain” of possible process continuations. Design product

structure and methodology knowledge is collected from the project manager and

the members of the development team in a top-down manner. Design process

execution knowledge is mined from process log datasets in a bottom-up fashion,

fused, superimposed on the top-down knowledge, and further used for visualizing

the design project plan and execution information in a way that suggests optimized

performance, points to the bottlenecks in executions, and fosters collaboration in

development teams. A project navigation paradigm has been developed that helps

knowledge workers more easily find their way to a reliable outcome. This approach

has been implemented in a software prototype – the Design Project Visualizer. The

first results of the validation of the software prototype indicate that the solution is

helpful in providing expert assistance to design project managers performing their

typical tasks of project planning and execution management. The total cost/benefit

improvement remains to be vindicated taking into account both organizational

objectives and the fact that for some users, notably design engineers, additional

overhead may be created by the tools.

Acknowledgement The research leading to these results has been funded in part by the European

Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement IST-2007-

215040.

References

Andersson B, Bergholtz M, Edirisuriya A, Ilayperuma T, Johannesson P, Gordijn J, Gregoire B,

Schmitt M, Dubois E, Abels S, Hahn A, Wangler B, Weigand H (2006) Towards a reference

ontology for business models. In: ER’06: Proceedings of the 25th international conference on

conceptual modeling, Springer-Verlag, Berlin, Heidelberg

Bock C, Gr€uninger M (2005) PSL: a semantic domain for flow models. Softw Syst Model 4

(2):209–231

Crapo A W, Waisel L B, Wallace W A, Willemain T R (2000) Visualization and the process of

modeling: a cognitive-theoretic view. In: KDD’00: Proc. 6th ACM SIGKDD international

conference on knowledge discovery and data mining, ACM, New York

Dello K, Nixon L, Tolksdorf R (2008) Extending the Makna Semantic Wiki to support workflows.

In: Lange C, Schaffert S, Skaf-Molli H, V€olkel M (eds) Proc. 3rd Semantic Wiki Workshop,

CEUR-WS.org/Vol-360, ISSN 1613–0073, pp. 119–123, online

Dengler F, Lamparter S, Hefke M, Abecker A (2009) Collaborative process development using

Semantic MediaWiki. In: Proceedings of the 5th conference of professional knowledge

management, Solothurn, Switzerland

Dietz JLG (2006) Enterprise ontology: Theory and methodology. Springer Verlag, Berlin

Dorn C, Burkhart T, Werth D, Dustdar S (2010) Self-adjusting recommendations for people-

driven ad-hoc processes. In: Hull R, Mendling J, Tai S (eds) Business process management,

Lecture Notes in Computer Science. Springer-Verlag, Berlin, Heidelberg

Drucker PF (1969) The age of discontinuity: guidelines to our changing society. Heinemann,

London


Ermolayev V, Keberle N, Matzke W-E (2008) An upper level ontological model for engineering

design performance domain. In: ER’08: Proceedings of the 27th international conference on

conceptual modeling, Springer-Verlag, Berlin, Heidelberg

Ermolayev V, Ruiz C, Tilly M, Jentzsch E, Gomez-Perez JM (2010b) A context model for

knowledge workers. In: Proceedings of the 2nd international workshop on context, information

and ontologies (CIAO 2010), CEUR-WS.org/Vol-626, ISSN 1613-0073

Ghidini C, Kump B, Lindstaedt S, Mahbub N, Pammer V, Rospocher M, Serafini L (2009) Moki:

the enterprise modelling wiki. In: The semantic web: research and applications, Lecture Notes

in Computer Science. Springer, Berlin, Heidelberg

Grebner O, Ong E, Riss U, Brunzel M, Bernardi A, Roth-Berghofer T (2006) Task management

model. NEPOMUK project, deliverable D3.1, http://nepomuk.semanticdesktop.org/xwiki/bin/

view/Main1/D3-1, Accessed date 5 Aug 2011

Grobelnik M, Mladenic D, Ferlez J (2009) Probabilistic temporal process model for knowledge

processes: handling a stream of linked text. Conference on data mining and data warehouses

(SiKDD 2009), Ljubljana, Slovenia

Gronau N, M€uller C, Korf R (2005) KMDL – capturing, analysing and improving knowledge-

intensive business processes. J Univ Comput Sci 11(1):452–472. doi:10.3217/jucs-011-04-

0452

Gr€uninger M, Atefy K, Fox MS (2000) Ontologies to support process integration in enterprise

engineering. Comp Math Org Theor 6(4):381–394. doi:10.1023/A:1009610430261

Hepp M, Roman D (2007) An ontology framework for semantic business process management. In:

Oberweis A, Weinhardt C, Gimpel H, Koschmider A, Pankratius V, Schmizler B (eds):

eOrganisation: Service-, Prozess, Market-Engineering, 1, Universitaetsverlag Karlsruhe

Hingston P (2002) Using finite state automata for sequence mining. Aust Comput Sci Commun 24

(1):105–110. doi:10.1145/563857.563814

Jacquemont S, Jacquenet F, Sebban M (2009) Mining probabilistic automata: a statistical view of

sequential pattern mining. Mach Learn 75(1):91–127. doi:10.1007/s10994-008-5098-y


Semantics 5(4):251–261. doi:10.1016/j.websem.2007.09.001

Leuf B, Cunningham W (2001) The wiki way: collaboration and sharing on the internet. Addison-

Wesley, Upper Saddle River, NJ

Mayer R, Menzel C, Painter M, Witte PD, Blinn T, Perakath B (1995) Information integration for

concurrent engineering (IICE) IDEF3 process description capture method report, Knowledge

Based Systems Inc., College Station, Texas

Nonaka I, Takeuchi H (1995) The knowledge-creating company. Oxford University Press,

New York

Rabiner LR (1990) A tutorial on hidden Markov models and selected applications in speech

recognition. In: Readings in speech recognition. Morgan Kaufmann Publishers Inc., San

Francisco

Ryu K, Y€ucesan E (2007) CPM: a collaborative process modeling for cooperative manufacturers.

Adv Eng Inf 21(2):231–239. doi:10.1016/j.aei.2006.05.003

Scheer A-W, Jost W (2002) ARIS in der Praxis. Springer Verlag, Berlin

Schreiber G, Akkermans H, Anjewierden A, de Hoog R, Shadbolt N, van de Velde W, Wielinga B

(1999) Knowledge engineering and management: the CommonKADS methodology. MIT

Press, Cambridge, MA

Stajner T, Mladenic D (2010) Modeling knowledge worker activity. In: Proceedings of the

workshop on applications of pattern analysis, Cumberland Lodge

Stajner T, Mladenic D, Grobelnik M (2010) Exploring contexts and actions in knowledge

processes. In: Proceedings of the 2nd international workshop on context, information and

ontologies (CIAO 2010), CEUR-WS.org/Vol-626, ISSN 1613-0073

Tilly M (2010) Dynamic models for knowledge processes. Final models. ACTIVE project

deliverable D1.2.2, http://www.active-project.eu/publications/deliverables.html, Accessed

date 5 Aug 2011


Uschold M, King M, Moralee S, Zorgios Y (1998) The enterprise ontology. Knowl Eng Rev 13

(1):31–89. doi:10.1017/S0269888998001088

van der Aalst WMP, Song M (2004) Mining social networks: uncovering interaction patterns in

business processes. In: Proceedings of the international conference on business process man-

agement (BPM 2004), Lecture notes in computer science. Springer-Verlag, Berlin

van Dongen BF, de Medeiros AKA, Verbeek HMW, Weijters A, van der Aalst WMP (2005) The

ProM framework: a new era in process mining tool support. In: Application and theory of Petri

nets. Springer, Berlin, Heidelberg

Warren P, Kings N, Thurlow I, Davies J, B€urger T, Simperl E, Ruiz Moreno C, Gomez-Perez JM,

Ermolayev V, Ghani R, Tilly M, B€osser T, Imtiaz A (2009) Improving knowledge worker

productivity – the ACTIVE integrated approach. BT Technol J 26(2):165–176

Woitsch R, Karagiannis D (2005) Process oriented knowledge management: a service based

approach. J Univ Comput Sci 11(4):565–588. doi:10.3217/jucs-011-04-0565


Part IV

Complementary Activities

11

Some Market Trends for KnowledgeManagement Solutions

Jesus Contreras

11.1 Introduction

The networked economy is based on the ability of companies to transform know-

ledge into value and take profit from it. Most of the companies are focused on

creating added value products and services and optimizing all business processes to

be able to compete in a globalized market. There are three critical factors

companies are interested in:

• The ability create value and differentiate from competitors

• The ability to improve business processes

• The ability to speed up the time-to-market

The knowledge as a collection of experiences, strategies and practices within an

organization may positively influence all of these factors. The more valuable and

available is the knowledge needed to perform, improve or even create business

processes, the more value is created for the organization. Traditionally Knowledge

Management is the discipline in charge of ensuring the availability and the quality

of internal knowledge in organization. Today, with the recent penetration of web

and web 2.0 technology and paradigms into organizations, traditional knowledge

management is re-focused as “Collaboration and Communication” or sometimes

“Enterprise 2.0”. These labels are not equivalent, but many times they may refer

very similar solutions and often they compete for the same budget.

Modern knowledge management, collaboration and communication or Enter-

prise 2.0 often includes some of the following competences:

J. Contreras (*)

iSOCO, Intelligent Software Components, S.A., Avenida Del Partenon, 16-18, Madrid, 1� 7a

28042, Spain



215

– Group and Community Management

– Discussion and Blogs

– Resource Sharing (Documents, Experiences, News, Ideas, etc.)

– Search and Information Retrieval

– Collaborative Editing

– Profile and Role Management for Employees

– Incentives and Rewarding by Contribution

– Collaborative Project Management

– Business Process Enhancement with Knowledge and Collaboration Concepts

From the technology point of view these competences include the following

functionalities to work well:

• Search: allowing users to search for other users or content

• Links: grouping similar users or content together

• Authoring: including blogs and wikis

• Tags: allowing users to tag content

• Extensions: recommendations of users; or content based on profile

• Signals: allowing people to subscribe to users or content with RSS feeds

11.2 Economic, Marketing and Research Challenges

In this section we introduce the economic, marketing and scientific challenges that

need to be overcome to permit further evolution of knowledge management solu-

tion into the mainstream market.

11.2.1 Economic Challenges

Economic conditions affect availability of capital, cost and demand. Economic and

market globalization drives the need for new, more dynamic modes of production in

the face of strong international competition. These factors are likely to be favorable

for social and collaboration solutions adoption as knowledge management in many

industries becomes more diverse and geographically distributed and competitive-

ness is determined by productivity. This encourages the adoption of the more open

and interoperable IT systems that run global business. Competitiveness based on

innovation and marketing/branding also provides opportunity for using collaborative

solutions, particularly for intelligent information management. The recession of

2008–2009 and the unprecedented speed of the collapse in the financial markets –

and then the organisations that relied upon financial institutions’ credit – have

created a challenging moment for realising the potential of knowledge manage-

ment. This, however, will provide vendors with an even larger opportunity and a big

216 J. Contreras

challenge for demonstrating a clear ROI and business case for deploying this new

generation of technologies.

• Cost Reduction: Cost reduction is a universal requirement in current market

conditions, including cost optimisation: not only lowering expenditures but also

providing the quality in services and products expected by end customers or

consumers while managing the financial and business risk. Cost reduction is

driven by the need:

∘ To achieve efficient performance in specific operational business processes,

∘ To achieve or improve a competitive advantage – strategic, i.e. cost

leadership,

∘ To manage financial distress and corporate recovery,

• Sustainable Innovation: Sustainable innovation for new market creation, mar-

ket share improvement and faster time-to-market are the result of a continuous,

more internetworked economy which dictates the need for interlinked and co-

evolving products and services in the market. This demands a constant flow of

innovation in a sustainable commercialisation process. Enterprises are under

pressure to boost innovation by adapting and responding to market conditions

and customer needs, which also are evolving at an ever increasing pace. This

drives the enterprise’s need to have access to the right information at the right

time so as to constantly fine-tune the product/service offering.

11.2.2 Marketing Challenges

According to some experts there are several challenges that need to be addressed by

any knowledge management solutions according to the phase of the project:

• Decision phase: When a knowledge management solution is presented to the

organization’s decision-makers, there are two challenges:

∘ Cultural change: The resistance to change is one of the strongest barriers in

the decision process. Knowledge management may change the usual way of

working and may require an additional effort from the employees to maintain

the system with no immediate return.

∘ Unclear ROI calculation: Since there is no clear return on investment method

(similar to knowledge management market or any other intangible value) it is

difficult to argue with financial benefit figures when facing potential buyer. At

least the total cost of ownerships needs to be calculated in order to ensure full

support from the managerial structure. Some recent works1 on process

1 Patrick Lambe: “How to Use Key Performance Indicators in KM Initiatives” 2007 (www.

stratsknowledge.com)

11 Some Market Trends for Knowledge Management Solutions 217

management have identified key performance indicators for KM allowing for

clear measurements and trend detections. Introducing measurement method-

ology possibly linked with business indicator (e.g., balanced scorecard) may

help overcoming the ROI calculation weakness.

• Usage and maintenance phase: When a knowledge management solution is

already in place, there are a number of remaining challenges:

∘ Critical mass achievement: knowledge management may fail before achiev-

ing critical mass of users. Despite good starting interest, knowledge manage-

ment may never jump from single department or small initial group to whole

companies. Consultancy for monitoring and corrective actions may help

avoiding the premature death of the solution.

∘ Focus on tools rather than on the problem: Knowledge management

tools are introduced with no clear causal relation to any concrete problem

or business process. The purpose is vague and employees have disparate

expectations.

∘ Lack of support from managers (lack of resources): Since knowledge

management adoption needs time and dedication from the employees, it is

crucial to have direction and managerial support. Activities such as training,

maintenance, moderation or incentives are critical during the whole life cycle

of the enterprise 2.0 solution. It is also necessary to include different

departments into the whole decision and adoption process: IT, HR, Marketing

and Finance.

∘ Infrastructure change: IT integration used to be an important inhibitor

due to the existence of legacy systems.

11.2.3 Research Challenges

There are several technology and research challenges, the solution to which may

accelerate the adoption of knowledge management solutions into corporations.

Traditional technology bottlenecks are usually related to the knowledge acquisition

phase, where traditional solutions fail due to low quality of information gathered

and limited user contribution.

The challenge may arise where traditional technology is not able to automate

the knowledge acquisition as could be the case for multilingual and multimedia

sources:

• Multilingualism is an upcoming challenge for many suppliers. Typically cur-

rent solutions are either applied to English or only a limited number of home

market languages. Solutions for multi-lingual technologies are expected from

language-specific modular add-ons to existing products, i.e. parsers and seman-

tic analysis for specific languages.

218 J. Contreras

• Spoken-language understanding (or generation) as well as machine translation

are not in the focus of current or near-term planning.

• Video: visual support has become key to knowledge management application.

Techniques of story-telling, lessons-learnt or best practices are more often stored

in audio-visual support improving the end-user consumption experience. Videos

need to be analyzed in an automatic way and included in the overall knowledge

life cycle.

In addition to the traditional knowledge management bottleneck, knowledge

representation, storage, manipulation and delivery could also represent a technol-

ogy challenge, especially for large industrial adoption of these kinds of solutions.

The definition of a common and standard data interchange language for knowledge

representation across industries is still not satisfactory to enable interoperation

across a broad range of sources. A considerable effort has been made in the last

5 years to achieve syntactical standards for the representation of semantics; i.e. for

the definition of languages useful for the description of data content, contextual

information, etc. But there are still missing standards for expressing business

vocabulary for sectors (e.g.: how to express market intelligent data, personal skills

or product description in a specific sector).

With semantic technology knowledge is represented using formal representation

on a high conceptual level and the acquisition is done on a seamless way with

minimal user involvement. For instance: a semantically enabled idea management

portal can automatically identify similar ideas (even when they do not share com-

mon words or expressions) and suggest authors to collaborate and elaborate a joint

initiative. In this case semantic technology allows for workload reduction providing

fewer ideas to check, with higher degree of collaboration and peer-review.

Knowledge management efforts may seem useless if the there is no efficient

consumption or exploitation phase in the knowledge life cycle. The possible return

on the investment is highly dependent on the way the knowledge is delivered,

to whom and under what circumstances. That is why the notion of context gains

importance when talking about knowledge delivery. For instance, the employee

context: where is he/she, the type of device, type of ongoing task or process, type of

knowledge, social context, time dependency, etc. are variables that need to be taken

into account for precise delivery.

11.3 Potential market

Nowadays productivity and efficiency needs are driving potential technology

evolution within companies. Among others concepts, collaboration, innovation or

communication are key functionalities demanded by potential customers.

The traditional knowledge management market has been extended into a new

market for communication, collaboration and social networking solutions for

the enterprise. This new market definition for knowledge management is still


a heterogeneous concept, and includes solutions for productivity and collaboration

support, sometimes called Enterprise 2.0 or Social Software in the Workplace

(Drakos et al. 2009). According to Forrester Research forecasts (Young 2008)

done before the last financial downturn, the potential market may achieve US$4.6

billion by the 2013, a similar size to the current Content Management Systems

(CMS) market. In the last 2 years, and despite the downturn, the worldwide market

for social software was estimated to grow about 15% from 2009 to 2010, and

another 16% to 2011 up to $769 million in 2011 worldwide (Gartner 2010).

Knowledge management solutions as discussed in this book provide technology

for better knowledge sharing and personnel performance by enhancing traditional

applications to become more easy to use, predictive and intelligent. Despite

this possible technology differentiation these solutions will be competing for

similar budget as any traditional knowledge management or enterprise 2.0 provider,

since they aim to solve similar problems. Looking at detailed tools that usually

form knowledge management or collaboration and communication solutions, and

according to Forrester Research report, there is a clear growing potential for

corporate social networking as seen in Fig. 11.1.

In addition to strategic forecast in the same report, some potential customers

and users have expressed their buying intentions. When asked about the buying

decision, more than 50% of companies over 1,000 employees answered they are

currently buying or considering enterprise 2.0 solutions (see Fig. 11.2.)

11.3.1 Customer Needs and Perceptions

Potential customers understand that the usage of collaboration tools will help the

corporations with knowledge sharing and cost reduction (effort and time) for some

critical processes. Figure 11.3 shows the result of a survey on buying decisions.

Even if team-work and knowledge-sharing are the most popular drivers for

collaboration solution adoption, the overall lack of understanding is one of the most

mentioned inhibitors, as shown in Fig. 11.4. The value perception (understanding,

Fig. 11.1 Sales forecast, as before the worldwide crisis, of social software features, data taken

from Forrester report (Young 2008)

220 J. Contreras

Fig. 11.2 Buying decision according to company size (Source Young 2008)

Fig. 11.3 Drivers for collaboration tools (Source [Miles 2009])

Fig. 11.4 Inhibitors for collaboration tool (Source Miles 2009)


priority, cost and return of investment) together with corporate cultural barriers

seem to be the main inhibitors for the adoption of knowledge management solutions.

Very similar to traditional knowledge management solutions, enterprise 2.0

encounters difficulties presenting compelling selling arguments. All functionalities

are ‘nice to have’ instead of ‘must to have’ and there is no clear way to perceive

value. On the one hand drivers that would reinforce a possible decision to buy are

related with the potential efficiency improvement on knowledge intensive tasks.

On the other hand there have also arisen new paradigms of corporate value which

include knowledge management and collaboration as part of business indicators.

Despite the fact that these features are very difficult to measure, some indicators are

visible in companies and sectors where knowledge intensive tasks are part of the

critical mission and where the knowledge retention is crucial to the cost control.

The current opportunity relies in the fact, apparently contradictory, that on the

one hand companies, due to global competence and outgoing crisis, urgently need

to boost productivity of knowledge workers and the knowledge management intui-

tively may help, and on the other hand any buying decision will need a clear

business case with explicit ROI calculation, very difficult to perform for such

intangible assets as the corporate knowledge. Tools and services able to link

collaboration, knowledge management and innovation with financial and market

indicators will find fewer inhibitors during the buying process.

11.3.2 Supply Side

Gartner’s overview of the supply side (well known magic quadrant) shows the clear

leadership of three companies: Microsoft, Jive and IBM that are offering complete

solutions for internal social and collaboration management. After some years the

market is getting more mature: fewer players in niche players and challengers

sections, as well as stable position in the leaders’ area.

In the ACTIVE R&D project researcher have performed a closer analysis on 40

of the providers that operate on the knowledge management, collaboration and

enterprise 2.0 market. (The complete list is shown in Table 11.1.)

Table 11.1 List of providers analyzed

Alcatel-Lucent Ektron Leverage Software Realcom

Atlassian Emantix Liferay Saba

Blogtronix EMC Liquid Planner Siteforum

blueKiwi EpiServer Microsoft SmartPoint

BlueNog eTouchSystems MindTouch SocialText

Box FatWire Mzinga Telligent

CentralDesktop Google Neighborhood America ThoughtFarmer

ConnectBean Huddle Novell Tomoye

CubeTree IBM Open Text Traction Software

Customer Vision iGLOO Oracle Twiki

222 J. Contreras

The purpose of this overview was to identify some possible ongoing strategies

that these companies use to reach the market. A quick look at the names shows that

there are a set of companies offering complete solutions for knowledge management

and collaboration covering most of the functionalities announced in the intro-

duction of this chapter (IBM, Microsoft, Oracle, Fatwire, etc.) and another set of

companies that offers a single piece or specialized functionality (CubeTree, Liferay,

Atlassian, etc.) covering just one or a few functionalities.

• Complete solutions versus niche player:

∘ Big players turn their traditional products into social software for enterprises.

That is the case for IBM, Novell, Oracle, Microsoft and EMC.

∘ Small players are mainly offering SaaS solutions with typical functionalities:

blogs, wikis, file sharing, RSS, communities and groups management, instant

messaging. These have become commodity and the competition is on price

(from US$5 per user/month).

∘ Some companies focus on vertical, niche markets, such as health, software

development and project management or e-government.

∘ Integration is key feature in small players.

∘ Some companies offer complete solutions on their own or in cooperation

(e.g.: Lucent in partnership with BlueKiwi).

From the feature point of view the findings are the following:

• Features:

∘ Only a few solutions can interoperate with business processes.

∘ No solutions mention ‘informal knowledge processes’ or ‘predictive

behavior’.

∘ Only a few solutions (Huddle, Smartpoint) work within the desktop. A

standard is to offer a platform or web solution.

∘ Mobile access is becoming a commodity.

∘ Open source platforms (Liferay, Drupal) are offering some basic functiona-

lities for free.

11.3.3 Marketing and Pitch

Like any technology based solution, knowledge based solutions need to elaborate

the message to different decision takers:

• Final user: People who will use solutions based on knowledge management and

related technologies. At this current stage of internal social and collaboration

management software, final users are usually considered a positive driver for

adopting these kinds of solutions (see Fig. 11.5). Productivity, less effort and

easy-to-use argument will help in any commercial approach.


• IT managers: These are responsible for integration and maintenance of the

solution and need to ensure a smooth integration into their existing architecture

with controlled change management and maintenance. IT departments usually

are ahead in new technology adoption for their internal purposes (see Fig. 11.6)

and that may serve as a beachhead for further extension within the company.

• Business or financial buyer Such managers, e.g., CEO, CFO, Head of HR, etc.,

are responsible for the strategic aspect of their business evaluating the cultural

and cost/benefit impacts of the solution. Arguments about productivity measures

and growth, ROI, innovation culture and employees’ satisfaction are a good

option for a commercial approach.

11.3.4 Unique Selling Proposition

The differentiation of collaboration based and semantically and context-aware

enhanced solutions from traditional knowledge management and collaboration

tools relies on the usage of emerging technologies that are able to handle behavior

in a more intelligent and user-friendly way.

• Informal processes: The integration of knowledge management tools into

corporate business processes has become a strong requirement to ensure the

success of any initiative. The pre-existence of well defined and identified

business processes is critical to fulfill this requirement. Only a few organizations

are mature enough, from the management point of view, to provide flexible and

useful process descriptions to permit the integration of knowledge management

tools and/or measures. Semantic and context technology allows both the auto-

matic identification and discovery of personalized processes and the provision of

added value on top of them.

0% 5% 10% 15% 20% 25% 30% 35%

Users

IT Managers

CIO/CTO

Senior/Executive Business Managers

Mid-Level Business Managers

CEO

Fig. 11.5 Drivers by user group (Data taken from Miles 2009)

224 J. Contreras

• Structured and unstructured data integration: A lot of knowledge is stored in

structured and non-structured sources within the organization information

systems. The usage of semantic technology allows for conceptual discovery

and integration of the knowledge across the whole organization fostering colla-

boration and value creation.

• Privacy: The typical KM scenario is collaborative, decentralized and heteroge-

neous, where knowledge is defined, used and shared across different groups and

domains. Articulating enterprise knowledge in the form of knowledge processes

allows users to create social relationships and form working and interest-based

groups beyond their own domain and organization (Ruiz-Moreno et al. 2009).

• Incentives: Cooperative behavior is generally not easily rewarded with material

rewards. An obstacle is that the definition of fair and effective incentives

requires the availability of transparent measures of performance. The organiza-

tional context (enterprise strategy, organizational structure, and processes)

determine many aspects of human resource management principles (and thus

available incentives), cooperation, and knowledge management principles

(which may be based on codified or personalized knowledge). In order to define

effective incentive systems, these organizational issues must be analyzed and

taken as constraints. A number of factors, such as career situation, experience,

personality, and others are likely to affect individual cooperative behavior,

and may not be easily affected by incentives (B€osser 2009).• Cost-benefit information: This can influence decisions in many stages of the

life cycle of physical or digital products. Besides that, large efforts are spent in

collaborative knowledge engineering projects and there is no objective judgment

possible, whether these processes are cost-effective or not and if certain actions

are beneficial to the whole process or beyond.

0% 20% 40% 60% 80%

IT

Marketing

Other Operations

Training

Customer Support

R&D

Sales

Human Resources

Admin

Finance

Legal

All Departments

No Department

Fig. 11.6 Departments demanding potential KM and collaboration solution (Data taken from

Miles 2009)


11.3.5 Pricing

Since the value of knowledge management, social or collaboration tools is not

always fully perceived, but the topic is still on the radar of potential customers,

there are some niche or small providers that are achieving good market share figures

adopting a low-cost strategy. There are some (mainly small and medium companies)

that offer their solutions from about US$5 per user – month with special plans for

large companies.

The use of SaaS (Software as a Service) has permitted pay-per-use business

models with almost no risk for the buyer. The basic functionalities of enterprise-2.0

like wikis and blogs are adopted by corporations, with no actual interaction with

corporate information systems, as commodities with no perceived differentiation of

value. This is not a mainstream trend but could be an excellent opportunity for niche

players that are not willing or able to integrate their solution with big players.

11.4 Conclusions

The potential market for knowledge management and collaboration solutions is

growing, especially thanks to the expansion of the Web 2.0 culture. Collaboration

and social software is penetrating large corporations even if there are still

difficulties at the corporate level in perceiving the added value of knowledge

management. In a short time horizon there is a need to relate knowledge manage-

ment, collaboration and social solutions with business indicators such as productiv-

ity, innovation or customer satisfaction in order to counteract the effects of the

current recession on the IT market. There is also need for compelling success stories

to convince decision takers.

A few big players are leading the offerings since they include knowledge

management features over well established platforms with an extensive customer

base.

There are several strategies to reach the mainstream market:

• Build on top of existing platforms offering value added features.

• Become a niche player with specific applications or in a specific sector or area.

• Replace existing platforms with alternatives (other desktop platform, web

applications, Saas model, mobile devices, etc.)

To become a successful player there are some technological or methodological

challenges any provider will need to address:

• Privacy is becoming critical issue in any social or collaboration application.

Underestimation of this topic may lead to the complete failure of any initiative.

• Multimedia and multilingualism is already a basic requirement for potential

customers.

226 J. Contreras

• Methodology and professional service around deployment, business process

management, incentive policies and measurements will allow for transforma-

tional project in large organizations.

In the mid or long-term timeframe, the forthcoming change of organizations and

companies to more open, geographically distributed, collaborative and productivity

based structures is a great opportunity for IT suppliers to position themselves

around this change and help companies to adapt their workplace and culture

to the new reality.

References

B€osser T (2009) ACTIVE D4.3.1 “Analysis of incentives in knowledge management and Web2.0

applications” http://www.active-project.eu. Accessed 16 Jan 2011

Drakos N, Rozwell C, Bradley A, Mann J (2009) Magic quadrant for social software in the

workplace, 22 Oct 2009, Gartner RAS Core Research Note G00171792

Gartner (2010) Gartner says worldwide enterprise social software revenue to surpass $769 Million

in 2011, Gartner press release. http://www.gartner.com/it/page.jsp?id¼1497215. Accessed

16 Jan 2011

Miles D (2009) Collaboration and Enterprise 2009, AIIM Industry Watch. http://www.emc.com/

collateral/analyst-reports/aiim-emc-collaboration. Accessed 26 June 2009

Ruiz-Moreno C, Gomez-Perez J, Contreras J (2009) ACTIVE deliverable 3.4.1 security in

knowledge processes: http://www.active-project.eu. Accessed 16 Jan 2011

Young G (2008) Global enterprise web 2.0 market forecast: 2007 to 2013 forrester research


12

Applications of Semantic Wikis

12.1 Business Applications with SMWþ, a SemanticEnterprise Wiki

Michael Erdmann, Daniel Hansch

12.1.1 Introduction

MediaWiki (http://www.mediawiki.org/wiki/MediaWiki) is the technical basis of

many wikis, including the online encyclopedia “Wikipedia”. The free software is

a web-based content management system that supports easy linking of articles,

which every user can read and edit. Wikis, which are built upon the MediaWiki

software, establish an asynchronous web-based communication and collaboration

platform. Their users become communities, make information available quickly and

jointly develop and share information. Operators of such wikis benefit from easy

installation procedures, low cost of operation, the robustness of the system and from

low training requirements for its users. Wikis are thus a flexible tool for collabora-

tion on theWeb. However, wikis are not designed for entering, managing or making

structured data available. For example, compare the population of London specified

in the German Wikipedia article “List of largest cities in the EU” with the figure

given in the “List of megacities”. You will notice different values. It is also

impossible to get a list of all cities whose population ranges between two and

three million using standard MediaWiki. Such lists must be researched, created

and maintained in a manual, i.e. time-consuming way. Existing data values cannot

be explicitly marked-up as such and the unstructured textual information is not

machine-readable.

The above mentioned shortcomings are addressed by Semantic MediaWiki,1 thesemantic extension to MediaWiki described in Chap. 3. Users can assign categories

1 http://semantic-mediawiki.org, cf. (Kr€otzsch et al. 2007)


229

as well as properties to wiki articles. To stay with the example above, “London” is

a “City” and has a specific population number. In addition, explicit relationships

between wiki articles can be created. For instance, “London” can be related to its

country “England” via a “located in” relationship. The now explicit data can be

accessed in all wiki pages, e.g. to generate dynamic lists. Semantic MediaWiki

provides a special language to formulate so-called inline queries, which can auto-

matically create lists, e.g. the “List of megacities”. Users can make use of these

semantic features by utilizing special wiki markup, which requires them to learn the

query- and annotation syntax. This makes the adoption of Semantic MediaWiki in

commercial settings, where usability is a success factor, hard to achieve.

12.1.1.1 SMWþ Addresses Enterprise-Level Requirements

SMWþ, a professionally developed software product from ontoprise2, includes

Semantic MediaWiki and addresses the requirements posed in corporate environ-

ments: usability, reliability in daily operation, scalability, expressivity, interopera-

bility and professional services. Usability enhancements (Pfisterer et al. 2008) to

Semantic MediaWiki, which were developed by ontoprise within project Halo3,

support casual users efficiently in applying the semantic features in their daily

work. Scenarios, which require access to external data sources or the evaluation

of business rules for query answering are supported by the ontoprise product

TripleStoreConnector (TSC) which is realized on top of OntoBroker, ontoprise’s

industry-grade reasoning engine. SMW+ distinguishes itself from other enterprise

wikis regarding the extent to which knowledge can be formalized and evaluated

in the wiki. SMW+ integrates well with the existing tool-landscape of an enterprise.

For instance, users can include rich text from Microsoft Office documents in the

wiki by copying it into the WYSIWYG editor. Connectors to Microsoft Word,

Microsoft Excel and Microsoft Project make annotated data in the wiki available in

these tools. Data from heterogeneous data silos in the enterprise is made available

to wiki users by the semantic data integration features. This data can also be used

to generate reports and visualize data. By entering commercial support contracts

with ontoprise, businesses receive immediate and professional support, for example

to adapt the wiki to their specific needs.

In this chapter we will report on the experiences we and our customers gained

by using SMWþ. We will first describe tasks, which can be supported by SMWþ

and then present actual use cases that demonstrate the versatility of SMWþ.

2 Product web site for SMW+ and TSC: http://wiki.ontoprise.com/3 Project Halo is funded by Vulcan Inc., http://www.projecthalo.com

230 M. Erdmann and D. Hansch

12.1.2 Achieving Knowledge Tasks in SMWþ

12.1.2.1 User Roles in Semantic Wikis

When applying SMWþ to a concrete use case we typically identify four primary

user roles: knowledge consumers, knowledge providers, knowledge architectsand collaboration managers. Other, secondary roles are knowledge engineeringexperts, wiki application experts and administrators.

Knowledge consumers and knowledge providers have complementary roles. The

group of consumers typically is the largest group and uses the wiki for reading or

exploring its contents. The knowledge providers contribute this content by creating

and writing articles or annotating data. Accordingly, consumers value efficient

methods to explore the wiki and to retrieve data and text. Providers, however,

are specifically interested in efficient authoring and annotation tools, which lead to

high-quality data and appealing wiki articles.

Users taking the role of a knowledge architect create the wiki ontology (i.e.

categories and properties) and templates and forms, or integrate web service and

inline queries into articles. If many users are working in the wiki within a particular

workflow (e.g. within the same project) then their interactions must be adjusted

and organized to ensure workflow compliance and data quality and consistency.

The collaboration manager takes this role and uses the framework given by the

knowledge architect to guide the community.

In contrast to the knowledge architects who know their domain well, highly

specialized knowledge engineering experts are skilled in creating advanced rules

and queries as well as in the handling of complex ontologies. Wiki application

experts can create advanced page layouts, templates and forms (so called wikitextprogramming). Finally, administrators take care of all technical aspects like

installing software upgrades or creating database back-ups.

In the following we explain which tools are provided by SMWþ to serve the

particular requirements of the above user roles.

12.1.2.2 Authoring Articles and Annotating Data

Contributors typically interact with the wiki by creating articles and entering rich

text into the WYSIWYG editor that supports formatted tables and page designs in

a Word-like manner. It also allows embedding media files such as documents,

images, audio and videos. Finally, users can include queries into the WYSIWYG

editor which are rendered as dynamic lists, tables or sophisticated data diagrams

(Fig. 12.1).

Besides the unstructured content that is provided by contributors using the WYSI-

WYG editor, they are given various tools to author data in articles (so called

annotations). The Semantic Toolbar is displayed alongside the WYSIWYG editor

and gives a detailed overview of all annotations which are present in the article.

It provides options to inspect, create and alter the annotations (Fig. 12.2).

12 Applications of Semantic Wikis 231

The Advanced Annotation Mode, available also via the WYSIWYG editor,

provides a method to annotate data in an article. The user marks a relevant text

area and assigns it to a property interactively. To support users in the annotation

process, the software can consider schema information about the type of properties,

e.g., providing auto-completion when choosing appropriate properties.

An alternative way to annotate data in articles is provided by semantic forms(Fig. 12.3). Articles, which are associated with a form, can be edited in a form mode

allowing users to fill in pre-defined form fields with data values. Each form field

comes with an auto-completion feature proposing suitable values.

Knowledge can not only be added directly to wiki articles but can also be

expressed implicitly via logical rules. The user formulates rules by means of rule

editors that support the authoring of calculation, definition and property chaining

rules. The TripleStoreConnector which can be installed along with SMWþ,executes these rules and generates automatically (i.e. infers) new data which

become accessible in the wiki.

12.1.2.3 Exploring the Wiki and Querying for Data

Information consumers require efficient tools to explore a wiki and to retrieve data

precisely. The full-text search engine in SMWþ returns relevant search hits coming

Fig. 12.1 The WYSIWYG editor comes with rich text editing capabilities


from wiki articles or from uploaded Microsoft Office- or PDF documents. The

search interface supports full text search, Boolean operators, auto-completion and

spell checking (“did you mean?”). The search results are augmented with semantic

data to help assessing the relevance of search hits immediately.

The semantic tree view supports contributors and consumers in exploring the

contents of a wiki. Data (categories, articles or annotated data) are displayed

in the hierarchical (tree-) view which is always automatically updated whenever

the underlying data change.

On the semantic level, annotated data is accessible via inline queries that can be

used within articles for dynamic content generation. The graphical query interfaceis available for composing queries and for immediately previewing and for-

matting the results (Fig. 12.4). The query interface is directly accessible from the

WYSIWYG editor for embedding the queries into articles.

Fig. 12.2 Enter and modify annotations with the semantic toolbar


Users can choose from a rich set of visualization methods for rendering query

results in an article. They range from simple lists or tables to calendars, timeline

charts, maps, bar or pie charts, or process diagrams (Fig. 12.5).

12.1.2.4 Organizing and Curating the Ontology

The knowledge architect is in charge of organizing and curating the ontology to

ensure that it addresses the requirements of the knowledge contributors and that

it is consistent with the data which is stored in the wiki articles.

The ontology browser provides a complete overview of the wiki’s ontology

to the knowledge architects. The ontology can be searched, filtered and changed in

this tool, e.g. by adding and renaming categories or deleting instances (Fig. 12.6).

The gardening bots of SMW+ are an extensible set of agents (bots) that detect

anomalies in the ontology such as undefined entities, a mismatch between property

type and property value, or empty categories. The knowledge architect regularly

starts bots to verify the consistency of data and ontology and to apply corrections

where necessary.

Fig. 12.3 Data editing in articles using semantic forms


12.1.2.5 Reusing Knowledge

A basic wiki idea is to facilitate knowledge interchange and re-use between people,

which also includes supporting as many relevant resource formats and origins as

possible. This concerns the import of external data into the wiki on the one hand,

as well as the usage of wiki-data in external applications on the other hand.

Therefore, SMWþ allows calls to external SOAP or RESTful web services and

integration of their results into wiki articles. This is useful to extend articles with

external information, e.g. from a bug tracking system to compile dynamic project

state and progress reports. An additional term import feature enables importing

data and vocabularies into the wiki (existing terminologies, glossaries, emails,

CSV files etc.) by creating wiki pages and automatically populating them.

In the other direction, data in the wiki can be re-used in other applications via

corresponding connectors. They allow querying data from the wiki within a spread-

sheet in Microsoft Excel. It is also possible to exchange project and task informa-

tion, including attributes, hierarchies and interdependencies, with Microsoft Project

(Fig. 12.7). With the ontoprise product WikiTags, users of Microsoft Outlook,

Word, and Excel can access, add and edit information and dynamically query

results embedded in wiki articles without leaving their application.

Fig. 12.4 The query interface supports building queries


The most powerful feature in the knowledge re-use context is semantic data

integration (Fig. 12.8). In conjunction with the TripleStoreConnector (containingOntoBroker), SMWþ is employed as the collaboration front-end in complex data-

base environments. Existing databases (using RDBMSs like Oracle, IBM,

Microsoft SQL Server or MySQL) are “lifted” to a semantic level, thus, made

available in SMWþ. OntoBroker’s semantic data integration features (Angele and

Gesmann 2007) provide a consolidated view on different data sources, which are

available in the ontology browser and the query interface.

12.1.2.6 Administrating an SMW+ Installation

Several tools facilitate the maintenance of an SMWþ installation, e.g., tools for

definition of the wiki’s security policy and for the installation and upgrade process.

A typical basic administration step is the definition of access restrictions to

different elements of a wiki for single users or user groups. Objects that can be

Fig. 12.5 Query results are visualized in articles


protected in this manner are namespaces, categories, single wiki pages, but also

semantic content (e.g. properties) and even specific actions within the wiki (e.g.

editing or reading pages). The Access Control List extension implements these

security features, allowing users to define private user or department workspaces,

and to restrictively expose selected areas to partners and customers.

Installation and upgrade tasks are facilitated with the Deployment Frameworkthat is included in SMWþ. It supports the administrator in (de-)installing, updating

and upgrading the wiki software without altering configuration files or manually

applying patches. Extensions are downloaded from the central ontoprise repository

and added while the tool considers all possible dependencies.

12.1.3 Business Applications

SMWþ is a universal and domain independent tool which is applicable for a

multitude of scenarios. In contrast to common enterprise applications, which are

designed to address general business problems, SMWþ focuses on the specific and

evolving needs of a work group and, thus, forms a situational application. Thanksto the ontology modeling and querying features, users can adapt SMWþ to serve

their situational and unique requirements.

Fig. 12.6 Inspect and edit the ontology in the ontology browser


Therefore, possible application areas range from enterprise content management

(ECM), to integrated collaborative environments (ICE) and business intelligence

(BI).

Fig. 12.7 A connector synchronizes data between SMWþ and Microsoft Project

Fig. 12.8 Technical

architecture of SMWþ,TripleStoreConnector and

OntoBroker


ECM comprises technologies to manage an organization’s unstructured informa-

tion, e.g. via web content- and document management. SMWþ is a content manage-

ment system and it covers typical customer needs like collaborative and distributed

authoring, version control, markup language to create layout templates and manage-

ment of technical metadata and media files. The semantic features of SMWþ are used

to describe wiki articles by means of annotations that amongst other things allow the

generation of new content. The benefits of the semantically augmented content are

(1) increased efficiency of content production by aggregating content using queries

and embedding dynamic lists into articles and (2) improved user experience by

providing better search results based on data in articles.

An Integrated Collaboration Environment (ICE) is defined as an environment

in which virtual teams do their work. In the case of project management, typical

requirements address the need of team spaces to collect and collaboratively author

project documents, support project managers in monitoring and scheduling tasks

and work packages and keeping the team informed about performance, costs and

schedule. Customers of SMWþ are already using SMWþ as a light-weight, seman-

tic project management system. In contrast to traditional project management

systems, which require users to follow rigid workflows and metadata blueprints,

SMWþ offers more flexibility to project teams. They flexibly adopt and extend the

SMWþ project schema to better satisfy their situational needs.

Recently the term Business Intelligence 2.0 has emerged. It involves the idea of

having easy-to-use business intelligence systems that provide insights on enterprise

data and allow users to collaborate, share and create reports via web-interfaces. The

overall objective is to make critical BI information available and accessible not

only to special CIOs or BI experts, but for everyone who relies on company relevant

information. SMWþ is a BI 2.0 tool, since it integrates heterogeneous enterprise

data sources into a unified view. In contrast to other BI 2.0 tools, SMWþ addition-

ally gives users a flexible descriptive layer (i.e. arbitrary wiki pages) to organize

reports, queries and external data.

12.1.3.1 Content Management: UNESCO OceanTeacher

UNESCO/IOC (http://ioc-unesco.org) employs SMWþ as technical basis of the

“OceanTeacher Encyclopedia” (http://library.oceanteacher.org), a production

system and public library of knowledge related to oceanographic data management.

It includes text, images, objects (PDF and Word documents, software packages,

audio and video files) and links to other web sites. Currently, 10 editors and 25þauthors are collaboratively generating articles for 300þ content consumers.

UNESCO decided to re-launch OceanTeacher because the former content man-

agement system did not provide a sufficient quality of search hits. It was nearly

impossible for content consumers to find materials on a particular topic for a

particular qualification level. UNESCO has contracted ontoprise to define an

ontology that covers the various topics and that allows users to match articles in

these topics with their qualification level.


The new OceanTeacher portal splits up into a public portal for retrieving,

exploring and reading articles and a production portal that is accessible for

UNESCO personnel only.

The production portal has been configured to orchestrate the production

workflow of the editors, contributors and consumers. Editors review and approve

content created by the members of the contributor-group. They distribute tasks to

these contributors, for example creating new articles or improving existing articles.

Contributors are responsible for creating content. The separation of the user groups

provides the opportunity to assign different user rights. Before publication in

OceanTeacher, created content has to be approved by an editor. Every contributor

has his own private section in the wiki, where not-yet approved articles are listed.

After successful approval, the article will be published in OceanTeacher and is

visible for other users.

The target audience of the new OceanTeacher can be separated into three levels

of expertise: beginners, intermediate and experts. While creating an article,

contributors assign the corresponding target group to the article. The user can

display the content, which relates to his personal state of knowledge and interests.

The selection of target group, category and further metadata is done very con-

veniently with forms or in the advanced annotation mode (Fig. 12.9).

Consumers are empowered to retrieve content precisely according to their needs.

This was realized with the SMWþ query interface which allows composing and

executing custom queries. These queries can be easily added with a fewmouse clicks.

After the re-launch of OceanTeacher consumers retrieve better search hits in less

time and they can explore the library of articles much more easily with the tree

view. The production workflow is better integrated and orchestrated. Finally,

editors generate more appealing articles thanks to the WYSIWYG editor.

Fig. 12.9 Editors use the advanced annotation mode of SMW+ to enrich wiki contents with

semantic annotations


12.1.3.2 Project Management: Business Process Re-engineeringat an Italian Bank

Business Process Reengineering (BPR) is defined as the analysis and (re-)design of

workflows within an organization in order to improve their efficiency. The follow-

ing example deals with a BPR project that was conducted with SMWþ by an Italian

association of banks: the “Association of Cooperative Banks of Emilia Romagna”.

The project aimed at consolidating the various work processes of the 8000

members of the association. This required conducting a survey of the current

processes employed by each bank. The data was collected by employees of the

bank and had to be evaluated afterwards in a business impact analysis. By these

means an assessment was achieved of how tasks were described in theory, how

they were actually handled in practice and what the most crucial processes were.

Based on that, standard procedures were identified and implemented.

The implementation of the project was split up into several phases. After setting up

SMWþ, the employees started collecting facts and descriptive texts in SMWþ about

the tasks they execute in their daily work. It was crucial that they entered particular

data in forms to ensure that it is normalized (e.g. risk level, responsible persons,

duration of execution etc.) to allow the subsequent statistical analysis. The collected

data was imported into Microsoft Excel where the statistical clustering was done to

get the groups of tasks, which share similar characteristics (e.g. the risk of failure).

These groups allowed spotting the most critical operational processes that needed to

be re-engineered. In a final step, the selected processes were standardized and then

collaboratively consolidated by the bank employees (Fig. 12.10).

SMWþ provided the reliable and agile knowledge base for every single phase of

the depicted case. The project requires an easy to use and easy to procure tool

enabling employees to safely enter data and text about their tasks. Furthermore, the

dynamic generation of aggregation lists and overview tables of the collected data,

as well as the data’s provision for statistical processing, was stipulated. SMWþ

meets these requirements with some key features, e.g.

• semantic forms enable easy data entry,

• a flexible datamodel which can be created and extendedwith theOntologyBrowser

• fine-grained access control (especially crucial for the sensitive banking industries)

• advanced ad-hoc querying of the stored data

• SMWþ machine-interpretable data which can seamlessly be extracted and

processed with external tools (e.g. MS Excel or MS Access) for advanced

statistical analysis.

12.1.3.3 Semantic Data Integration

One of the main problems companies have to deal with regarding knowledge

management is the integration of heterogeneous data sources. Considerable efforts

are necessary to provide a unified view on distributed data. The following case

depicts how a Fortune 100 pharma-company employed SMWþ as a self-service


portal for its R&D department, to integrate various data bases containing infor-

mation about production lots, ingredients of individual products, results from

tolerance tests, etc.

The R&D department of the company was faced with the task of tracking all

significant events and data that are relevant for Food and Drug Administration

(FDA) submission reports, which are required for approving novel drug products.

Since it was hard to assemble the scattered data from several sources, the compi-

lation of reports was error-prone and took an inordinate amount of time. Con-

sequently, the requirement was to achieve a consolidated view on the relevant data

stores by linking them to a common domain ontology. Easy access methods would

then allow the generation of suitable reports on the fly (Fig. 12.11).

The case was resolved with an internal self-service portal where the R&D staff

could not only generate reports via querying the integrated databases but also by

using rules for inferring knowledge. The original Oracle databases were connected

to an OntoBroker Server as integrating middleware. At the front end, researchers

interacted with SMWþ and OntoStudio. The latter served as a collaborative envi-

ronment for ontology engineering and curation while the former was a central

integration and presentation facility for all of the activities.

The application of SMWþ combined with the semantic middleware OntoBroker

offers an integrated, common view on separated data silos. OntologyBrowser and

semantic tree view facilitate browsing and navigating the product genealogy in the

wiki. The compilation of reports in this way is remarkably faster through federated

semantic queries. Moreover, researchers can collaborate on integrated data and

Fig. 12.10 Screenshot of a task which has been entered by an employee


reports that were gathered in the wiki. Also, the quality and coverage of relevant

information improves since the tracking and validation of products involves their

entire lifecycle now and important events always automatically trigger alerts

(e.g. when new test results are available or the ingredients of drugs have changed).

All told, the solution reduces the turn-around time for FDA submissions by several

orders of magnitude, from something that took a few weeks, to just a few minutes.

12.1.4 Conclusion and Outlook

In Section 12.1 we have presented SMWþ, the semantic enterprise wiki of

ontoprise, and demonstrated how flexibly it addresses enterprise-level requirements

compared to traditional enterprise-level applications.

The versatility of SMWþ opens a wide area of situational applications whereSMWþ can be efficiently employed. In all cases the applications are built by the

wiki users for the wiki users. Starting from a standard SMWþ installation they

gradually extend and refine the wiki as they work with it. The applications are

highly collaborative (e.g. project management); they are driven by the combination

of text and data (e.g. content management) and they all include data modeling tasks.

Finally, SMWþ demonstrates its unique strength when combining socially curated

unstructured data (i.e. text) with semantically integrated data from external, legacy

data sources, like RDBMSs. SMWþ is a novel tool for knowledge workers that is

easily procured and deployed to “start small” and which over time serves more and

more users and suits more and more requirements.

Fig. 12.11 OntoStudio mapping view for mapping RDBMS schema onto the ontology


12.2 Bringing Complementary Models and PeopleTogether: A Semantic Wiki for Enterprise Processand Application Modelling

Viktoria Pammer, Marco Rospocher, Chiara Ghidini,

Stefanie Lindstaedt, and Luciano Serafini

12.2.1 Introduction

Enterprise modelling is the process of formally describing aspects of an enterprise,

typically as process models, data models, resource models, competence profiles,

etc. The availability of such enterprise models, expressed in computer-interpretable

formal languages, is becoming an important factor for enterprises and organisations

to make better and more flexible usage of the organisation’s knowledge capital and

foster innovation. We describe here the design and applications of MoKi (the

Modeling wiKi, http://moki.fbk.eu), and illustrate that complementary models

can be created in the same modelling tool by people with complementary skills.

MoKi is a semantic wiki for modelling enterprise processes and application

domains, and intended to be used by knowledge engineers and domain experts,

and everyone with skills in between.

State of the art modelling tools provide good support towards the creation of

formal, computer interpretable models, but they suffer from two critical limitations:

Different tools and different people are needed to model various enterprise aspects,

and throughout all modelling stages.

• Different tools and modelling environments are provided for the specification of

the different aspects of an enterprise. Most notably, the tools that are used to

model business processes are usually separate and disconnected from the tools

used to model the enterprise application model, which is more and more often

encoded by means of ontology languages. This tool discontinuity produces two

types of problems: Firstly, the modelling team needs to learn and interact with

completely different tools. Especially in small environments like SMEs, where

there may be scarce resources to allocate to modelling activities, this double-

learning may lead to human resource problems. Secondly, the technical integra-

tion of the different parts (i.e., business processes and enterprise application

domain) is not directly supported by the different tools and can therefore become

unnecessarily complex and time-consuming.

• State of the art tools for ontology and process construction are often tailored to

knowledge engineers, that is, to people who know how to create formal models.

On the other hand, domain experts, that is, the people who know the domain to

be modelled, often find such knowledge engineering tools scarcely usable.

As a result the interaction between these different roles in the modelling team

is regulated by a fairly rigid iterative waterfall paradigm in which domain

experts produce or revise informal descriptions contained in textual documents

244 V. Pammer et al.

and these informal descriptions need to be manually interpreted and transposed

in a formal specification by a knowledge engineer with the obvious problems

of duplication of efforts, wrong or misleading interpretations, and so on.

To overcome the limitations illustrated above, we have experimented with

Semantic Web collaborative tools, and especially Semantic MediaWiki. The

current version of MoKi, an extension to Semantic MediaWiki, is the result of

these experiments. We have used MoKi in different development stages and

customisations, in various application cases (see Sect. 12.2.3). MoKi addresses

the above two limitations through (1) representing both tasks belonging to

a business process and topics belonging to an enterprise application domain in

a wiki based system, and (2) representing both formal and informal content, and

allowing different visual representations of the same content. A collage of views

(tree view of concepts, process editor, a single concept) on MoKi content is

depicted in Fig. 12.12.

12.2.2 A Conceptual System Description of MoKi

MoKi is an extension of Semantic MediaWiki, which itself is an extension of

MediaWiki. Semantic MediaWiki adds labelled links and RDF interpretation of

wiki content to MediaWiki, and MoKi adds OWL4 and BPMN5 interpretation

Fig. 12.12 Collage of views on MoKi content

4Web Ontology Language, http://www.w3.org/TR/owl2-overview5Business Process Modelling Notation, http://www.bpmn.org


of wiki content, as well as some knowledge engineering support, to Semantic

MediaWiki.

The decision to implement MoKi on top of an existing wiki was taken for several

reasons. Firstly, a wiki environment was chosen since the wiki principles of (1)

giving access to content (both with read and write permissions) to all, and (2)

making modifications of content as easy as possible for everybody, are well-aligned

with the goal of MoKi to be a modelling environment not only for knowledge

engineers. Secondly, most wiki environments support versioning and standard

collaboration features like discussion threads or comments. Thirdly, most wikis

are web-based, which enables geographically distributed modelling, and allows

textual as well as multimedia content, i.e. everything which can be published on the

web can be published in a wiki. Finally, most potential users of a system such as

MoKi can be expected to know how a wiki looks and feels, and a large portion of

these people also know how to actively contribute to wiki content. This is partly due

to the large success of the online encyclopaedia Wikipedia, but as well to the arrival

of wiki (and other semantic web) technology in the corporate world, as observed

e.g. in Gissing and Tochtermann (2007) and Schachner and Tochtermann (2008).

Extending traditional wikis, semantic wikis (Schaffert et al. 2008) already

provide the basic infrastructure to deal with structured data in addition to traditional

human-readable content-types like text or multimedia. Thus, they are technically

well suited to accommodate the results from informal as well as formal modelling

activities. In choosing a particular semantic wiki upon which to build MoKi, we

decided for Semantic MediaWiki, because MediaWiki has a large community of

developers and users, it is easily extensible through plugins and because it is generic

insofar as it is not adapted to any specific application scenario.

12.2.2.1 The “One Page – One Element” Design Principle

The basic design principle of MoKi is that every model element corresponds to

a wiki page. For application domain modelling, the relevant model elements are

concepts, individuals and properties/relations, while for business processes the

relevant model elements are processes.6 Each model element is internally given

a formal meaning. For instance, a concept is given the meaning of a description

logic concept, i.e. it is a unary predicate and can be interpreted as a set of entities for

which the unary predicate holds. It is essential to be clear about this internal

interpretation of model elements, since this forms the basis of technically dealing

with imports from various knowledge representation formalisms and exports to

various knowledge representation formalisms. A MediaWiki category is used to

distinguish different kinds of model element. Table 12.1 shows a synopsis of the

6 For the sake of simplicity, in MoKi we decided to not explicitly distinguish between atomic and

not atomic processes (i.e. composed of two or more sub-processes).


model elements available in MoKi, their formal interpretation as well as in which

kind of model the corresponding model element is expected to be used.

For every kind of model element supported by MoKi, a specific template is

provided. Figure 12.13 shows an excerpt of an already filled-out concept descrip-

tion. The implementation of templates is based on the Semantic Forms extension to

Semantic MediaWiki that allows users to define forms for editing pages and

templates to store semantic data. From a user perspective, a template is displayed

as a list of fields which have to be filled out in order to describe the model element.

Fields can differ between model elements. Conceptually, a template asks the user

for information which is typically needed for a specific kind of model element.

Example. When describing a domain concept, it is typical to ask “What is a

superconcept?”, i.e. what are more general notions than the currently described

concept, in which categories does it fall?

Obviously, the users are not necessarily required to fill all the fields of a page

when describing a specific model element. Additionally, some of the templates

support the possibility for users to add custom defined fields.

Example. A user may want to describe the concept “Project”, and then express

that a project is typically managed by a person. In this case, the user can add a new

field to the concept “Project” which is called “managed-by” and fill it with the

concept “Person”.

The use of templates allows for an easy customization of MoKi to hold addi-

tional kinds of models other than domain ontologies and business processes.

Although such a customization can not yet be done solely at the user interface –

some programming in PHP is required in order to define the formal meaning of the

fields in a new template and to add import and export support for the new model

elements – it requires some minimal software development effort.

12.2.2.2 Functionalities

In addition to the features offered by MediaWiki and Semantic MediaWiki, MoKi

provides functionalities for importing/exporting models, navigating, editing, and

validating models. In this section we briefly illustrate these functionalities. A more

extensive description can be found at the MoKi web site and in Ghidini et al. (2009).

The first group of functionalities concern the import/export of models. The

application domain model can be exported into OWL 2, RDF/XML Rendering, and

Table 12.1 Category names in MoKi for designating different kinds of model elements in MoKi.

“Type of Model” refers to the type of model in which such a model element is expected to occur

Category Model element Interpretation Type of model

“Domain model” Concept DL concept Domain ontology

“MokiProperty” Property/Relation DL role Domain ontology

“Individuals” Individual DL nominal Domain ontology

“Process model” Process BPMN process Process/Task model


Fig. 12.13 Excerpt of a filled-out concept template in MoKi, shown in the figure for a concept

called “Workshop”. The fields in the Annotation, Hierarchical structure and Notes boxes are

available for all domain concepts. The fields in the Properties box are added by the ontology

engineer specifically for each domain concept, as e.g. “hasParticipant” and “isOrganizedBy” for

the concept “Workshop”


the business process model can be exported into BPMN, according to the Oryx eRDF

(embedded RDF) serialization. Import of application domain models is possible from

OWL2files. In additionMoKi also supports importing knowledge from less structured

sources. Hierarchies of concepts can be imported by writing down the hierarchy as

a simple ASCII list of terms, where indentation indicates the hierarchy. Knowledge

can be imported from text documents by means of a term extraction functionality.

Extracted terms can be addedwith one click as (candidate) concepts intoMoKi domain

ontology. This functionality uses at the backend the KnowMiner framework, a Java-

based framework for knowledge discovery (Granitzer 2006; Klieber et al. 2009).

The second group of functionalities concern the navigation of models. In any

information or knowledge management system, navigation through content is vital

to its success: the best content is useless if it cannot be easily accessed. MoKi

content can be accessed through standard MediaWiki functionalities like search, or

through typing in the URL of a single page in the address bar of the browser. Apart

from this, there are the possibilities to get lists of model elements in a tabular style

(where each element is shown alongside some relevant information characterizing it),

and in graphical visualizations. For the ontology part, the graphical visualization

concerns a tree-view rendering of the specialisation (Is-A) and mereological (Part-

Of) hierarchies, and a tree-view rendering of the individuals concept membership

role. Hierarchies are also editable by drag-and-drop features. For the process part,

a graphical visualisation of the process workflow and of the different subprocesses

it comprises is available in the process element page.

The third group of functionalities concern the editing of models. Editing activities

relate to single model elements and concern the creation of model elements, their

repeated editing (among this renaming the model element, or even changing its type),

and the deletion of model elements. Depending on the type of element, the

corresponding template is loaded when a model element is created or edited.

A fourth important group of functionalities concern the evaluation of models

(described in detail by Pammer (2010), pp. 93–121). Whatever purpose models are

created for, they need to be evaluated in order to ensure that they will serve their

intended use. The current version of MoKi supports ontology evaluation through:

(1) a models checklist, (2) a quality indicator, (3) the ontology questionnaire

and (4) through displaying assertional effects. The models checklist is a list of

characteristics that typically point to oversights and modelling guidelines, and

automatically retrieves elements that fit the characteristics.

Example. One point on the checklist is “Orphaned concepts”, i.e. concepts that

have no super- or subconcepts, have no parts and are not part of anything. These are

often concepts left-over from brainstorming or another earlier modelling iteration.

The quality indicator is displayed on the page of all elements and visualises the

completeness and sharedness of the corresponding element as a bar that grows from

“short and red” to “long and green”. Both completeness and sharedness are heuris-

tic measures, where the first captures how much information (verbal, structural)

about the element is available while the second captures how many people have

contributed to the description of the element. The ontology questionnaire displays


inferred knowledge, i.e. statements that can be derived from the models contained

within MoKi, and provides explanations for them, as well as the possibility to

remove them. In case explicitly made statements are deleted in order to remove an

undesired inference, the ontology questionnaire also displays side-effects, i.e. all

inferences that will be lost alongside. Assertional effects (Pammer et al. 2009) are

displayed on concept and property pages directly after an ontology edit that causes

one or more assertional effects. It is called questionnaire in order to point out

that domain experts should go through inferred statements like going through

a questionnaire and asking the question “Is this statement correct?”.

12.2.3 Application and Experiences with MoKi

MoKi has been applied in several scenarios, in varying stages of development

and sometimes with small customisations.

12.2.3.1 Model Tasks and Topics for Work-Integrated Learning

Two early versions of MoKi have been successfully applied (APOSDLE 2009, pp.

21–25, 32–47) to develop enterprise models in six different domains: Information

and Consulting on Industrial Property Rights (94 domain concepts and 2 domain

roles; 13 processes), Electromagnetism Simulation (115 domain concepts and 21

domain roles; 13 processes), Innovation and Knowledge Management (146 domain

concepts and 5 domain roles; 31 processes), Requirements Engineering (the RES-

CUE methodology) (78 domain concepts and 2 domain roles; 77 processes),

Statistical Data Analysis (69 domain concepts and 2 domain roles; 10 processes)

and Information Technology Infrastructure Library (100 domain concepts and

2 domain roles; no processes). The enterprise models were created for the purpose

of initialising and serving as the knowledge backend of a system for work-

integrated learning ((Lindstaedt et al. 2007), www.aposdle.org). The modelling

activities involved people with different modelling skills and levels of expertise

of the application domains, and located in different places all over Europe.

The experiences of using the first version of MoKi, essentially Semantic

MediaWiki without much additional convenience functionality for modelling, are

described in Christl et al. (2008). These experiences provided the motivation for

adding convenience, i.e. modelling support, functionality to Semantic MediaWiki

in order to support enterprise modelling for processes and application domains.

A qualitative evaluation on the entire modelling process including the usage of

early versions of MoKi applied in APOSDLE is documented in APOSDLE (2009).

The evaluation of the second version of MoKi, much the same as the current

open-source version in functionality if not in design and robustness, took the form

of structured interviews. The interviews were composed both of open and closed

questions. Interview partners were all involved domain experts, on-site knowledge


engineers working at the application partners’ and external knowledge engineers

providing additional modelling skills where necessary. Regarding MoKi, questions

about its usability, the support for collaboration, the usefulness of its functionalities,

the homogeneity of the modelling environment for modelling the different aspects

(domain model and processes), etc., were asked. Note that before using MoKi, the

users already tried modelling with Semantic MediaWiki for informal, integrated

models (domain and business process) and Protege and a YAWL editor for the

formal models.

According to the results of the questionnaire, the users highly appreciated the

form-based interface of MoKi, and the fact that they were able to participate in the

creation of the models without having to know any particular syntax or deep

knowledge engineering skills. Thus, MoKi was perceived as an adequate tool to

actively involve domain experts in the modelling process. From the questionnaire it

further emerged that also people with some knowledge engineering skills found

MoKi as comfortable to use as other state-of-the-art modelling tools. The answers

show that MoKi helped the users in structuring and formalizing their knowledge in

a simple, intuitive and efficient manner. Particularly appreciated have been the

functionalities, in particular the graphical ones, which allow to navigate through the

content defined in the models. Finally, the users have found MoKi, in its charac-

teristic as web-based modelling environment, quite useful to produce a single

model with a geographically distributed modelling team.

12.2.3.2 Collaboratively Build an Ontology for Annotating Content

The current version of MoKi has been used by a team of knowledge engineers and

domain experts to collaboratively build an Organic Agriculture and Agroecology

Ontology (61 domain concepts, 30 domain roles, and 222 individuals) within the

FP7 EU-project Organic.Edunet (http://www.organic-edunet.eu/). This experience

was perceived as positive enough to use MoKi as the central modelling tool in the

follow-up EU project, Organic.Lingua.

12.2.3.3 Maintain a Glossary in a Project

The current version of MoKi is being used within the FP7 MIRROR project

(www.mirror-project.eu/) to develop and maintain a common glossary.

12.2.3.4 Model Clinical Protocols in Asbru

Although MoKi as presented here is tailored to the development of ontologies and

business processes, the applicability of MoKi goes beyond typical enterprise

modelling. A preliminary and customized version of MoKi that supports the

modelling of clinical protocols in the ASBRU modelling language is described


in Eccher et al. (2008). This version ofMoKi, calledCliP-MoKi, provides support for

modelling the key elements of an Asbru model (e.g. plans, parameters) as wiki pages,

and for exploring the models created according to the mechanisms for structuring

knowledge provided by the language (e.g. the plan/plan children decomposition).

12.2.4 Discussion

MoKi supports a variety of knowledge engineering activities, such as knowledge

acquisition, informal modelling, formal modelling, and evaluation of models at

various stages. Knowledge acquisition is supported through the term extraction

functionality. The term extraction functionality is state-of-the-art, which unfor-

tunately means that it supports only a limited number of languages (English and

German currently) and that the quality of results depends a lot on the corpus it is

given. Informal modelling is supported via the possibility to import simple

hierarchies, via the prominently placed possibility to verbally describe and docu-

ment (“Description”, “Synonym(s)”) all kinds of model elements, as well as via the

possibility to richly document (“Free notes”) all kinds of model elements in all

formats which can be held in a webpage. Evaluation of informal aspects of the

models contained in MoKi is supported, in that for instance elements with no verbal

description are explicitly pointed out to MoKi users (models checklist) and by

giving direct feedback on completeness and sharedness on element pages (quality

indicator). Formal modelling is supported from a user perspective by providing

form fields with auto-fill functionality to ontology engineers. Fields are given

a formal meaning, which is the basis of technically supporting formal modelling.

Evaluation of the formal models is supported through listing and explaining

inferences (ontology questionnaire) and through displaying the effects of editing

formal axioms on data (assertional effects). However, on the formal modelling side,

MoKi does not (yet) support the full expressivity of OWL 2. Most importantly,

it does not yet support complex concepts. Additionally, MoKi versions that support

both domain and process modelling, and a MoKi version that supports modelling

clinical protocols, exist. Thus, MoKi does improve on the first limitation of many

existing modelling tools and decreases the gap in tool support between tools for

different enterprise aspects, and for different knowledge engineering activities.

Most requirements for collaborative knowledge construction tools, discussed for

instance in Noy et al. (2008) and in Tudorache et al. (2008), are easily met byMoKi,

merely by its being implemented on top of MediaWiki. The requirements on

collaboration features satisfied by MoKi are distributed access to a shared ontology,

version control, user identity management and tracking the provenance of infor-

mation and discussion on model elements. Fine-grained access control is not

possible in MoKi, and collaborative protocols that involve rating or voting are

not supported in MoKi either. Coherence is ensured mostly through keeping the

informal and formal model element descriptions in one place, i.e. in one wiki page.

Inconsistencies between the natural language text or rich content on the one hand


and the formal descriptions on the other hand are not detected. Indeed, this would

exceed the state-of-the-art in natural language understanding, and even more so in

multimedia understanding. However, coherence is supported in another slightly

roundabout way, namely through the “watch” functionality of MediaWiki. Through

this functionality, users can be notified if changes occur at a wiki page, which in

MoKi means changes concerning a model element. Like this, both domain experts

and ontology engineers can easily detect changes to parts of the ontology in which

they hold an interest. Concerning the second limitation of most existing modelling

tools, namely that users with different knowledge engineering skills (knowledge

engineers, domain experts) are often not able to work within the same modelling

environment, MoKi already now is able to hold rich, informal content as any

MediaWiki can, as well as formal content that can be exported into formal domain

and business process modelling languages (OWL 2 and BPMN respectively).

References

Angele J, Gesmann M (2007) The information integrator: using semantic technology to provide

a single view to distributed data. In: Kemper A et al. (eds) Datenbanksysteme in business,

technologie und web (BTW). 12. Fachtagung des GI-Fachbereichs “Datenbanken und

Informationssysteme” (DBIS). GI-Edition-Lecture notes in informatics (LNI), p 103 http://

www.btw2007.de/paper/p486.pdf

Christl C, Ghidini C, Guss J, Pammer V, Rospocher M, Lindstaedt S, Scheir P, Serafini L (2008)

Deploying semantic web technologies for work integrated learning in industry. A comparison:

SME vs large sized company. In: Proceedings of the 7th international semantic web conference

(ISWC 2008), In use track, vol 5318, Springer, pp 709–722, 2008

Eccher C, Ferro A, Seyfang A, Rospocher M, Miksch S (2008) Modeling clinical protocols using

semantic MediaWiki: the case of the oncocure project. In: ECAI workshop on knowledge

management for healthcare processes (K4HelP), 2008

Ghidini C, Kump B, Lindstaedt S, Mabhub N, Pammer V, Rospocher M, Serafini L (2009) Moki:

the enterprise modelling wiki. In: The 6th annual european semantic web conference

(ESWC2009), Springer, pp 831–835, 2009, Demo

Gissing B, Tochtermann K (2007) Corporate Web 2.0 Bankd I: web 2.0 und unternehmen – wie

passt das zusammen? Shaker Verlag, Aachen, Germany

Granitzer M (2006) Konzeption und entwicklung eines generischen wissenserschliessungs-

frameworks. PhD thesis, Graz University of Technology, 2006

Klieber W, Sabol V, Muhr M, Kern R, Ottl G, Granitzer M (2009) Knolwedge discovery using

the knowminer framework. In: IADIS international conference information systems,

pp 307–314, 2009

Kr€otzsch M, Vrandecic D, V€olkel M, Haller H, Studer R (2007) Semantic wikipedia. In: WWW ’06:

proceedings of the 15th international conference on World Wide Web. ACM, New York (2006),

pp 585–594. http://korrekt.org/papers/KroetzschVrandecicVoelkelHaller_SemanticMediaWiki_

2007.pdf

Lindstaedt S, Ley T, Mayer H (2007) Aposdle – new ways to work, learn and collaborate.

In: Proccedings of the 4th conference on professional knowledge management WM2007,

ITO-Verlag, Berlin, Potsdam, Germany, pp 227–234, March 28–30 2007

APOSDLE Deliverable 1.6. Integrated modelling methodology version 2, 2009

Noy NF, Chugh Abhita, Harith Alani (2008) The CKC challenge: exploring tools for collaborative

knowledge construction. IEEE Intell Syst 230(1):64–68, January-February 2008


Pammer V (2010) Automatic support for ontology evaluation – review of entailed statements

and assertional effects for OWL ontologies. PhD thesis, Graz University of Technology,

March 2010

Pammer V, Serafini L, Lindstaedt S (2009) Highlighting assertional effects of ontology editing

activities in OWL. In: d’Acquin M, Antoniou G (eds) Proceedings of the 3rd international

workshop on ontology dynamics, (IWOD 2009), collocated with the 8th international semantic

web conference (ISWC2009), CEURWorkshop Proceedings, vol 519, Washington D.C, USA,

October 26 2009

Pfisterer F, Jameson, A, Barbu, C (2008) User-centered design and evaluation of interface

enhancements to the semantic media wiki. In: The proceedings of the workshop on semantic

web user interaction at CHI 2008, Florence, Italy. http://swui.webscience.org/SWUI2008CHI/

Pfisterer.pdf

Schachner W, Tochtermann K (2008) Corporate web 2.0 band II: web 2.0 und unternehmen – das

passt zusammen! Shaker Verlag, Aachen, Germany

Schaffert S, Bry F, Baumeister J, Kiesel M (2008) Semantic wikis. IEEE Softw 250(4):8–11

Tudorache T, Noy N, Tu S, Musen MA (2008) Supporting collaborative ontology development in

Protege. In: 7th international semantic web conference, Springer, Karlsruhe, Germany, 2008


13

The NEPOMUK Semantic Desktop

Ansgar Bernardi, Gunnar Aastrand Grimnes, Tudor Groza,and Simon Scerri

13.1 A Tool for Personal Knowledge Work

The crucial role of knowledge work in modern societies is widely recognised.

Characterised by the collecting, structuring, and interconnecting of information

and by the articulation of new insights, ideas, and results, knowledge work is

understood as a comprehensive information processing activity which ultimately

leads to decision making according to the goals and processes of the particular work

context. Usually, such knowledge work ultimately boils down to a personal acti-

vity: the individual processes, navigates and enhances a rich information space,

communicates and shares with others, makes sense out of available sources, and

accounts for decisions taken and actions performed.

Supporting individual, personal knowledge work is thus a promising approach

for effective work support. Such support must take into account three core dimen-

sions of personal knowledge work:

• Information in the personal realm is typically available in various formats across

different applications on a personal computer – think e.g. about Web browsers,

databases, address managers, mailing tools, file systems, and documents of all

kinds. Retrieving information in this complex collection can be cumbersome.

A. Bernardi (*) • G.A. Grimnes

DFKI GmbH, Knowledge Management Department, Trippstadter Strasse 122, 67663

Kaiserslautern, Germany


T. Groza

School of ITEE, The University of Queensland, St. Lucia 4072, QLD, Australia


S. Scerri

DERI, National University of Ireland, Galway, IDA Business Park, Lower Dangan, Galway,

Ireland



255

• Human insight and knowledge create relations and dependencies between infor-

mation items. Facilitating the documentation of such relations is a good way to

make the knowledge worker’s thoughts explicit.

• Typically, individual knowledge work is not performed in isolation. On the

contrary, creativity and decision-making very often arise out of a vivid exchange

with peers and contributors.

To cope with such challenges, the NEPOMUK project1 has developed and

implemented a vision of a new support tool for knowledge workers: the SocialSemantic Desktop. This work environment (the desktop) allows the representationof knowledge and relations in computer-processable ways by explicit semanticannotation and the sharing of such information with others in a kind of socialexchange. The central element of the Semantic Desktop is the generation

and maintenance of a Personal Information Model (PIMO). This PIMO makes

a knowledge worker’s concepts and relations between information items explicit

and allows for annotating arbitrary information items in a formal and consistent

way, thus representing the personal interpretations and assessments of the infor-

mation at hand. Such explicit and formal annotation is then the basis for automated

services to support the user.

For example, in Fig. 13.1, the knowledge worker – Claudia – is faced with

a multitude of information trapped within different data silos, such as, files, e-mails,

tools etc. PIMO leverages a semantic representation of this data and models

explicitly Claudia’s knowledge about the CID project, including relations to rele-

vant data, as well as relations among the different data silos.

The realization of the Social Semantic Desktop enables a better retrieval of

relevant information across different applications and information sources within

the personal (computer) desktop. It makes individual interpretation, perspectives,

and interconnections explicit and sharable, and allows for suitable automatic

activities, like classification of documents (via text/content analysis), grouping

and browsing support, or reminder services (for tasks, deadlines etc.).

To materialise the Social Semantic Desktop in a way which promises easy and

wide-spread uptake, NEPOMUK has identified core aspects which must rely on

established standards as far as possible:

• Explicit concepts and relations shall represent the user’s thoughts in the PIMO,

and available information is interpreted and structured accordingly. Thus the

semantics of the information is available in formal structures, and computer

services can rely on that.

• This should be realized based on standard data formats which are independent of

proprietary applications and implementation details. We find such data formats

in the domain of the Semantic Web

1 NEPOMUK was performed 2006–2008 by 16 partners with funding by the FP6 program of the

European Union. Further details and results are available at http://nepomuk.semanticdesktop.org

256 A. Bernardi et al.

• All models and annotations must rely on a widely-accepted conceptual basis in

order to make shared understanding and collaboration possible. To this end

NEPOMUK defines a number of ontologies to be used.

• The implementation realizes a Framework architecture with open standard

APIs, so developers can easily add their own additional services.

• The ultimate goal is to enable the maintenance and sharing of personal informa-

tion models – because knowledge workers are not alone!

13.1.1 An Example Scenario

We outline the usage of a NEPOMUK Semantic Desktop by following a typical

knowledge worker in some of her daily activities. Within the NEPOMUK project,

the description of personas and associated scenarios (Gudjonsdottir 2010) proved

very useful to clarify and illustrate the use of the tools and their benefits. The

personas and scenarios were developed after interviews with and observation of real

people from NEPOMUK use-case partners (see Sect. 13.6.2). There scenarios and

personas served to motivate the developers to think about end-users other than

the CIDproject

TopicProject Document

TaskManagement

CID slidesCID

Claudia

Who

Karlsruhe

Where

Dirk

files emails contacts

Documents

Papers

Projects

Inbox

Todo

SAP

Claudia Stern

Dirk Hagemann

Klaus Nord

CID

Research

CID proj

Karlsruhe

Fig. 13.1 A personal information model represents the user’s interpretation

13 The NEPOMUK Semantic Desktop 257

themselves, as well as giving a high-level, but concrete overview of the kind of

things the Semantic Desktop should be able to help the user with.

We present a short glimpse at the activities of one persona, Mrs. Claudia Stern,

and her Semantic Desktop:

Claudia Stern originates from the town M€unchen in Germany and lives now in

Karlsruhe. She has a diploma in information technology and is a project manager

at SAP Research. Currently, Claudia is working on a big EU project called CID.

She utilizes various tools to control and follow up on the activities in the project.

She books meetings with the project members through her calendar. In meetings,

she always uses her laptop to access relevant information and presentations and to

write minutes. She often needs to access her own and others’ calendar and the lists

of tasks for the project members. Through these tools she has a clear picture of what

needs to be done in the project.

On the 27th of February, 2008, Claudia is having a meeting with Klaus,

a colleague at SAP Research, and they are planning a meeting in the project called

CID they are going to attend in Belfast. The meeting is organised by Marco

Andriotti. There are many things that need to be done before they leave for Belfast

in a few weeks.

Claudia goes to her office and adds the meeting to her calendar; furthermore she

creates a task called “Belfast meeting” in her personal task manager. From the

work-trip template, a set of relevant sub-tasks are created, including e.g. requesttravel permission and book travel/accomodation. Besides the practical travel issuesthe task list also includes some items connected to the work that needs to be done in

the meeting and Claudia adds some specific tasks to the list, fleshing out the

automatically generated sub-tasks.

When Claudia has got permission to travel, she proceeds to book train, flight and

hotel, the system knows her preferences and only recommends fitting options. The

travel information and electronic documents that result from her successful booking

activities are added to her “Belfast meeting package” which includes everything

related to the meeting, like work and travel documentation. The package is also

accessible online. When the trip gets closer the system checks the weather so that

Claudia can pack the right things. The system adds recommendations of restaurants

and shops according to her profile.

A travel timeline is created automatically, where she sees all the relevant

appointments and dates she needs to observe: from leaving the office in order to

make it to the train in Karlsruhe, to the tram when she returns home from the trip.

Afterwards she gives her colleagues access to her timeline. As she is preparing for

the meeting, Claudia can look up the details of colleages she met last time she went

to a meeting in Belfast, including notes about their interests as well as about the

restaurants were they had dinner.

When Claudia comes back from her trip she needs to make a travel report in

order to claim her expenses. This is automatically created according to the travel

timeline she created before she left. Claudia sends this report, together with the

collected receipts (still on paper!), to SAP’s HR department and is promptly

refunded for her expenses.


This scenario illustrates the support a NEPOMUK Semantic Desktop can offer

to the knowledge worker:

• Information is connected, combined, and accessed across different applications

and tools.

• A new event, insight, or activity results in descriptors or concepts created by the

user (“Belfast Meeting Package,” “Belfast Meeting”). However, as the Semantic

Desktop supports to relate such individual descriptions to pre-given, formal

concepts (“work-trip”), the system can offer various automatic support.

• Some relations and interconnections (e.g. time dependencies and timeline) are

observed automatically and can be used e.g. for report generating.

• Sharing of selected information items and structures support the communication

and exchange among collaborating colleagues.

In summary, Claudia benefits from an improved way to structure her personal

information space, to access information relevant to the current work context, to

express her own ideas and insights about key concepts and their relations, from the

exchange with her colleagues, and from various automatic helper services within

her personal computer.

13.2 Standards & NEPOMUK Technology

The Social Semantic Desktop (SSD) represents a platform that serves as a founda-

tion for building different social and semantic applications. Such applications share

a series of common functional aspects that the platform needs to capture and then

expose.

At the desktop level, applications need to be able to create and manage resources

and information about resources (in the form of annotations or relations). This

information needs to be stored and then efficiently retrieved when required. In order

to enable the transfer of information, the applications need to interact with the

Social Semantic Desktop. As a result, the platform exposes several services

representing access points to low-level functionalities. These functionalities range

from resource or notification management, to desktop sharing or off-line access.

However, three of them are essential in providing semantically-enhanced resources.

Firstly, the SSD supports different data analysis mechanisms (e.g., inference) to

enrich the semantics of low-level desktop information elements or resources. For

example, a keyword extraction mechanism enables automatic tagging of resources

or the summarisation of long textual information elements. Secondly, context is

crucial in delivering the right information for a given resource (Schwarz 2010).

Hence, the SSD supports not only user profiling, but also the detection of the current

context, in addition to ways of attaching such context information to resources.

Finally, making direct use of the results of the previous two, is searching. Thanks

to the rich semantic network of desktop resources, the user will profit from new

ways of accessing the available information space. Traditional keyword search and


hierarchy browsing is complemented by associative browsing in the concept space,

grouping along relevant relations, and multiple conceptual views on search results.

Search tools will offer support based on user’s profile or context, and will act

proactively by detecting and predicting a particular user’s search patterns.

Going beyond the desktop level, the SSD provides different means of Social

Interaction, among which the most important one is resource sharing. This enables

innovative and more efficient collaboration means among users by providing

context and semantics-based sharing facilities, in addition to the creation of

ad-hoc shared information spaces centred around particular resources or users.

Such functionalities obviously rely on more low-level ones like access control

management or user group management.

In order to support the above-listed set of aspects, in the context of the

NEPOMUK project, the Social Semantic Desktop relies on two standards:

• A suite of ontologies, that provide a formal way of encoding the semantics of the

resources participating in or describing these aspects, and

• A standard architecture, that enables a standard specification of the underlying

components, independently of the implementation platform.

In the following sections, we describe both the ontologies and the architecture,

as well as detailing two reference implementations.

13.3 Ontologies

In this section we provide an overview of the ontologies engineered for the Social

Semantic Desktop, the most important ontologies are the NEPOMUK Represen-tational Language (NRL); the Information Element set of ontologies (NIE) – which

define common information elements and legacy data that is to be found on the

desktop in its various forms; and finally the Personal Information Model Ontology

(PIMO) – which combines the knowledge in the other ontologies to express the

individual’s entire unified personal information model.

In order to correctly manage the creation and integration of the numerous

required ontologies, we pursued a serial approach, designing first the higher (gen-

eral, abstract) layers of the ontology stack, and then the more detailed ontologies.

This layered approach is illustrated in the Semantic Desktop Ontologies Stack –

a top-down conceptual representation of the required ontologies and their inter-

dependencies, which also served as the road-map for their gradual design

(Fig. 13.2). In addition to the ontologies detailed below, the diagram also shows

the NEPOMUK Annotation Ontology (NAO) for representing simple tagging, and

the NEPOMUK Graph-Metadata Ontology for annotating named graphs. We

differentiated between the three ontology layers following the classifications used

in (van Heijst et al. 1995) and (Semy et al. 2004), in order of decreasing generality,

abstraction and stability: The Representational Level provides stable and generic

language elements, upper level ontologies contain domain-independent and widely


agreed-upon concepts, whereas lower-level ontologies contain evolving personal

or group-level views.

13.3.1 Representational Level

The ontology on the representational level defines the concepts and relations

available for expressing the domain models building upon it. For example, RDFS

gives you the ability to express a subclass hierarchy of concepts, while OWL allows

richer class expressions using unions, intersections, or complements of exiting

concepts. OWL was deemed too complex for defining the ontologies needed on

the Semantic Desktop, especially as they had to be understandable for software

engineers with little Semantic Web background. However, plain RDF Schema lacks

some basic constructs like cardinality constraints. Another issue is that both RDFS

and OWL are intended for the open-world assumption of the Semantic Web,

whereas on the desktop making a closed-world assumption is more natural. We

therefore created a novel representational language – the NEPOMUK Represen-tational Language (NRL) (Sintek et al. 2007a), as an extension to the Resource

All Concepts a User of the Semantic Desktop deals with

All Concepts an Application Programmer of the Semantic Desktop deals with

RDF

RDFS

NRL

NAO NGM NIE

PIMO

Foundational

Mid-level

Personal-Level

Upper-Level Layer

Lower-Level Layer

Representational Layer

NEPOMUK Ontologies Pyramid

Fig. 13.2 Semantic desktop ontologies stack


Description Framework (RDF) and the associated RDF Schema (RDFS), that

imposes no specific semantics on data, and supports Named Graphs (Carroll et al.2005). NRL addresses several limitations of current Semantic Web languages,

especially with respect to modularisation and customisation. Aside from fulfilling

the basic requirements for a representational language for the SSD, NRL is also of

relevance to the general Semantic Web, in particular because of its support for

named graphs, which although being a widely-popular notion, had not been

supported by any standard representational language so far.2

Named graphs allow us to handle identifiable, modularised sets of data. Through

this intermediate layer, handling and exchanging RDF data, as well as keeping track

of provenance information is much more manageable. All data handling on the

semantic desktop including storage, retrieval and exchange, is carried out through

the use of such named-graphs. Alongside provenance data, it is also possible to

attach other useful information to named graphs. In particular, for better data

management we felt the need for named graphs to be able to be distinguished by

their roles e. g. an ontology, instance-base, knowledge-base, etc.

Although the naming of the NRL formalism might suggest otherwise, it is

a completely domain-independent representational ontology (or language), and

can be applied to other platforms and scenarios. The interested reader might want

to refer to the complete NRL guide and specifications (Sintek et al. 2007b) to learn

more about NRL.

13.3.2 Upper-Level Ontologies

This layer includes high-level, domain-independent ontologies. They provide

a framework by which disparate systems may utilise a common knowledge base

and from which more domain-specific ontologies may be derived. They are charac-

terised by their representation of common sense concepts, i.e., those that are basic

for human understanding of the world. Concepts expressed in upper-level onto-

logies are intended to be basic and universal to ensure generality and expressivity

for a wide area of domains.

The NEPOMUK ontologies shall provide the user with the concepts which allow

to capture and represent the users’ mental models, their resources and their

activities in a set of well-organised knowledge models, as such formal models

are the prerequisite for enabling semantic technologies on the desktop. We set

out to develop a number of upper-level ontologies that fulfil this requirement. The

modularisation of these ontologies in itself was a challenge, keeping the layered

approach described earlier in mind. Even though they are all upper-level ontologies,

2 However, there are non-standard representational languages with support for named graphs,

such as Notation3 or TriG.


dependency relationships exist between them. The design of the upper-level onto-

logies sought to:

• Represent common desktop entities (objects, people, etc.)

• Represent trivial relationships between these entities, as perceived by the desk-

top user

• Represent instances of a user’s mental model, consisting of entities and their

relationships as described above, on their desktop

Whereas the representation of high-level concepts like ‘user’, ‘contact’, ‘desk-

top’, ‘file’ is fairly straightforward, we also need to leverage existing information

sources in order to make them accessible to semantic applications on the SSD.

This information is contained within various structures maintained by the operating

system and a multitude of existing legacy applications. These structures include

specific kinds of entities like messages, documents, pictures, calendar entries and

contacts in address books. In (Sauermann et al. 2006) van Elst coins the term nativestructures to describe them, and native resources to describe the pieces of informa-

tion they contain. One of the core challenges was to integrate existing legacy

Desktop data into the Social Semantic Desktop, since in order to operate, it requires

meta-data represented in RDF/NRL.

13.3.2.1 NEPOMUK Information Element Ontologies (NIE)

A user’s desktop contains a multitude of applications serving numerous purposes, e.g.

word processing, calendar management, mail user agents, etc. They allow users to

create, store and process information in various ways in order to accomplish a wide

array of tasks. One of the major goals of the semantic desktop is to allow the user to

organise this information (the native resources) and enrich them with annotations,

connect related entities and map them to concepts in the PIMO. In order to implement

this kind of functionality native resources need to be expressed in a way that allows

for this kind of post-processing, i.e., as RDF graphs. This is the motivation for the

NIE set of ontologies, for which we now provide a brief overview.

Whereas the goal of PIMO, discussed below, is to be as close to the human

cognitive processes as possible, in order to represent objects as they are seen by the

user, NIE provides vocabulary intended to minimise information loss in the elici-

tation and representation of data from desktop data sources into RDF graphs.

However, this data lies outside the control of the semantic desktop system. It has

a dynamic nature, whereby new items appear, are modified and disappear in the

course of work. NIE is concerned with the synchronisation between the native data

sources and a data repository that doesn’t contain abstract concepts, and is designed

to describe concepts like “File”, “Email”, or “Contact”. PIMO can then be used to

add value to this data by expressing tags, people, places, projects and anything else

the user might be interested in.

NIE is composed of seven different, but unified, vocabularies. The NIE-core

vocabulary defines the most generic concepts, while six specialised ontologies


extend NIE towards specific domains: NEPOMUK Contact Ontology (NCO) for

contact information, NEPOMUK Message Ontology (NMO) for messaging infor-

mation, Nepomuk File Ontology (NFO) for basic file meta-data and expressing file

system structures in RDF, NEXIF for image meta-data, NID3 for audio meta-data

and NCAL for calendaring information. NIE is a unified ontology framework

as there is no overlap between the vocabularies. Classes in all vocabularies are

organised in an explicit inheritance hierarchy when appropriate. The interested

reader is advised to consult the official NIE specifications (Mylka et al. 2007) for

more detailed descriptions about the NIE unified ontology framework.

13.3.2.2 Personal Information Model Ontology (PIMO)

Whereas the previous ontologies focused on specific aspects or domains of the SSD,

the PIMO provides representation for a unified information model which reflects

the users’ Personal Information Model as perceived by them. In other words, it is

a mental conceptualisation of how users organise the domain of their own work: the

projects they work on, the people they know, the products they use, the cities they

travel to, the organisations that employ them, etc. The goal is for the knowledge

structure to be independent from both the way the user accesses the data, as well as

the source, format, and author of the data. What this means is that the PIMO is

concerned with real-world things, rather than the technological details of how those

things are implemented in the software.

While each user has their own PIMO and can modify and personalise it accord-

ing to their own personal needs (i.e., add classes, properties and instances), the

standard PIMO (PIMO-Upper) already contains around 40 classes that represent

what we consider to be the lowest common denominator of typical personal

information models.3 PIMO defines a top-level class pimo:Thing, from which

most other classes in PIMO-Upper (as well as all user-defined classes) are derived,

with examples such as pimo:Person, pimo:City or pimo:PhysicalObject (some

classes in PIMO-Upper are not pimo:Things, such as pimo:Association or pimo:TimeConcept).

In order to prevent what is described as the cold start problem in knowledge-

based systems, we also assume that organisations and other employers will ideally

provide their employees with an extension of PIMO-Upper that reflects the work

context of that particular organisation. Within the ontologies stack (Fig. 13.2), such

extensions are called group-level PIMOs.

The reader can refer to the full specification (Sauermann et al. 2009) for further

information about PIMO.

3 It should be noted that PIMO does not try to model every possible world view and achieve

ontological perfection, but instead proposes a simple classification that we consider useful in the

context of knowledge work.


13.3.3 Lower Level Ontologies

This consists of group and personal ontologies. Group-level ontologies (e.g. an

organisational ontology) are domain-specific and provide more concrete represen-

tations of abstract concepts found in the upper ontologies. They serve as a bridge

between abstract concepts defined in the upper level ontologies and concepts

specified in personal ontologies at the individual level. Personal ontologies arbi-

trarily extend (personalise) group-level ontologies to accommodate requirements

specific to an individual or a small group of individuals. Using common group-level

and upper ontologies is intended to ease the process of integrating or mapping

personal ontologies. Given the nature of the Semantic Desktop, a large number

of ontologies are either related to, or meant to be used for personal knowledge

management. A group of individuals or an individual user is free to create new

concepts or modify existing ones for their collective (shared) or individual personal

information models. This personal-user aspect of the ontologies is highlighted

accordingly in the stack, and conceptually it includes all concepts and relationships

that the end desktop user deals with, as opposed to all concepts and relations required

to model every aspect of the semantic desktop.

13.4 Architecture

As noted in the beginning of Sect. 13.2, the Social Semantic Desktop platform

exposes a series of low-level services required for building applications. This

structuring is similar to publishing services on the Web, where each service

exposes an interface used for communication and integration. Hence, our decision

to adopt the same principles and techniques for describing and deploying the

SSD architecture seems natural. The SSD is organised as a Service Oriented

Architecture (SOA), where:

• The SSD service interfaces are defined using the Web Service Description

Language (WSDL),

• XML Schema (XSD) is used for defining primitive types, and

• Simple Object Access Protocol (SOAP) is used for the actual inter-service

communication.

The vision of the NEPOMUK project is to provide a standard architecture

comprising a small set of services (represented by their interfaces), which enable

developers to adopt it and extend it. Ultimately, this will lead to an evolving

ecosystem. Figure 13.3 depicts the structure and the set of services defined by the

NEPOMUK architecture. It is straightforward to observe the two categories of

aspects that were targeted (emerged from the platform description): at the desktop

level, the semantics (via Text Analytics or Context Elicitation), and between

desktops, the social (enabled by the peer-to-peer (P2P) infrastructure). Both

categories are accessible via the Service Registry, that act as an access point to


all the low-level functionalities provided by the SSD platform. The NEPOMUK

architecture is composed of two layers: the Middleware and the Application Layer.

The Middleware groups together the services exposed by the platform and to be

used by the applications present in the Application Layer. The Application Layer

consists of all applications that interact directly with services published by the

Middleware, and creates a bridge between the user and the low-level functionalities

provided by the Social Semantic Desktop. In the following we describe both layers,

starting with the NEPOMUK Middleware.

The Middleware is split into multiple categories of services, managed by the

Service Registry. This represents the access point to the low-level functionalities,

both for services using other services, as well as, for applications using the Middle-

ware services. Its duties relate strictly to registering and de-registering services,

in addition to providing service discovery facilities based on their interfaces.

Two sets of services, i.e., the Local Data Services and the P2P Infrastructure,

represent the foundational block of the SSD Middleware. In order to support

social aspects, and implicitly communication between several desktop, the SSD

Middleware comprises a P2P Infrastructure. The NEPOMUK P2P Infrastructure

is based on GridVine (Aberer et al. 2004), which in turn is built on top of P-Grid

(Aberer et al. 2003). This manages a distributed RDF store and provides distributed

search facilities, hence enabling higher-level functionalities such as meta-data or

resources sharing. Additionally, the infrastructure also provides an Event Manage-

ment service, which is responsible for the distribution of events between SSD peers,

P2P Infrastructure

Event Management

StorageSearch

Local Data Services

Mapping

StorageSearch

Helper Services

Alignment

Publish /Subscribe

Messaging PIMOService

Context Elicitation

Text AnalyticsData Wrapper

CORE

EXT.TaskManagement

CommunityManagement

...

Ser

vice

Reg

istr

y

TaskManagement

OfficeApplications

WebBrowser

FileBrowser

EmailClient

Wiki ......

Social SemanticDesktop

Middleware

ApplicationLayer

Fig. 13.3 The architecture of the social semantic desktop


in addition to supporting the higher-level Publish/Subscribe service. The Event

Management manages subscriptions received from users (via some applications) or

from services, in form of RDF descriptions of the resources of interest, which are

stored in the underlying distributed store. Hence, when an event occurs, its RDF

payload is matched against all subscriptions and the subscribers are notified.

The Local Data Services have a similar role to the P2P Infrastructure, but on the

local side of things, i.e., on the desktop. The group consists of three foundational

services: the Local Storage (an RDF Repository), Search and Mapping. The Local

Storage controls the manipulation of the desktop semantic resources, including

their insertion, modification or deletion as RDF graphs. If a resource is shared with

other users in an information space, the meta-data is also uploaded to the distributed

index of the peer-to-peer infrastructure. Querying the Local Storage is done via

the Local Search service, which maintains also a context-based search history, as

well as user-profiled query templates that can be also shared as resources. Finally,

before new meta-data can be added to the repository, one needs to check whether

this meta-data describes resources that are already instantiated (i.e., an URI has

been assigned) in the RDF repository. In a positive case, instead of duplicating

resources, the already existing ones should be used, by re-using their URIs. The

Local Mapping Service handles this responsibility. The Core of the Middleware

contains a last category of Services that complement the functionalities provided by

the foundational ones, called Helper Services. Among these, the Data Wrapper or

the Text Analytics services extract information from desktop applications such as

email clients or calendars and store it as resources in the Local Storage. Generally,

the Data Wrapper handles structured data sources (e.g., email headers, calendar

entries, etc.), while the Text Analytics handles unstructured data sources, like email

bodies or generic document contents.

Other Helper Services include:

• The Alignment service that is used by the other Middleware services or

applications to transform RDF graphs from a source ontology to a target

ontology (a facility required due to the different partly overlapping ontologies

or vocabularies possibly being used on the desktop),

• The Context Elicitation service that acquires the current user context environment

by analysing the logs created by the Middleware and stored in the Local Storage,

• The Publish / Subscribe service that enables subscription-based events (for

services or users) on a local or distributed (via the P2P Event Management) basis,

• The Messaging Service providing synchronous and asynchronous communi-

cation mechanisms between SSD users, or

• The PIMO Service that represents an abstraction for an easier manipulation of

RDF graphs.

The second group of the Middleware services consists of Extensions. These

are services created by third-party developers that use the Core services to

provide certain functionalities for the Application layer. For example, a Task

Management service could use the Context Elicitation and Messaging services to

notify a group of users about an upcoming project deadline. While the actual

business logic would be encapsulated within this service, the interaction with the


user could be realized either via existing applications such as an Email Client, or via

specific applications, like a Task Management application (as seen in the Applica-

tion layer and discussed below).

The top layer of the architecture is the Application layer. This layer includes the

programs and tools employed by the user. This ranges from typical knowledge

workbench applications of rather generic nature (e.g., File Browser, Web Browser,

and Office Applications) to specialized and domain-specific programs. Both legacy

tools and new developments need to be integrated with the Social Semantic

Desktop Middleware in order to profit from its functionalities. For available legacy

applications and third-party programs, this interfacing (or integration) is usually

done by developing plug-ins, complying to the data structures and user interfaces

used by the respective applications. As an example, in NEPOMUK, we developed

plug-ins for the Email Clients Thunderbird and Microsoft Outlook. These exten-

sions allow e.g. to automatically enrich e-mails (“Semantic Email”) with formal

descriptions of the concepts concerned (thus offering sender and recipient of such

e-mail an unambiguous reference to shared entities). Furthermore, the enhanced

clients provide simple workflow functionalities, like delegation and tracking

of tasks (Task Management) or automated calendar functionalities. Finally, Seman-

tic Search capabilities are a promising feature, as targeted e.g. in the KDE Dolphin

file browser.

13.5 Implementation

In NEPOMUK the above architecture was implemented in the Personal Semantic

Workbench (PSEW). PSEW is the central user-interface to the NEPOMUK Seman-

tic Desktop, it allows configuring data-sources, and connections to other programs

as well as editing and browsing the personal ontology. PSEW was implemented in

Java as an Eclipse-RCP application – an application framework built on top of the

Eclipse code-base, providing user-interface controls as well as a high-level user-

interface organisational concepts of perspectives, views and multi-document

editors. A screenshot showcasing some of the main views of PSEW is shown in

Fig. 13.4. Various versions of PSEW can be downloaded from http://dev.nepomuk.

semanticdesktop.org/download/.

13.5.1 KDE

In addition to the Java based research prototype PSEW, the NEPOMUK

technologies have also been embedded in the KDE Desktop Environment (http://

kde.org/), which provides fundamental desktop functionalities like task- and file-

management, and common core applications like web-browsers, internet chat, basic

document editing etc. Traditionally, KDE has been used with the Linux operating

system, but the newest version 4 is also available for Mac or Windows. In version 4


of the KDE system, NEPOMUK also plays a central role (See http://nepomuk.kde.

org/), and it is now available to millions of KDE users world-wide, and this has as

such become the most enduring concrete outcome of the NEPOMUK project.

In KDE4 NEPOMUK an RDF store (based on Virtuoso, see http://virtuoso.

openlinksw.com/) is available to all applications, and tagging, rating and com-

menting on files is available throughout the operating system. For example, one can

replace the normal save/load file dialog with a Semantic view as shown in Fig. 13.5,

where rather than saving a file into a particular folder, a file can be saved with

particular tags, removing the burden of maintaining a strict hierarchical folder

structure from the user.

KDE also offers faceted browsing for desktop data, file indexing, full-text and

structure desktop search and many other features.

13.6 Experiences & Applications

The ultimate goal of a Semantic Desktop is to increase the efficiency of information

handling for its users, based on additional modelling efforts. Careful evaluation

must therefore demonstrate whether the perceived benefits ultimately outweigh the

perceived additional costs.

Fig. 13.4 The personal semantic workbench – showing the PIMO class hierarchy, the PIMO

editor editing the Claudia Stern concept, a timeline and a map-view


From the scientific perspective, evaluations observe the user when they store,

retrieve, and process information. To gain a relevant insight on real-world

scenarios, it is beneficial to conduct evaluations with real users working withtheir real data. Measurement in such experiments is done mostly by observation

or interviewing users in long-term case studies. Short-term laboratory experimentscan also give insights on the immediate benefit of a certain feature, or to measure

usability indicators.

13.6.1 Evaluating the Core NEPOMUK Application

The core NEPOMUK Application, the Personal SEmantic Workbench (PSEW),

was also evaluated independently of any of the case-studies. Several long-term

studies with up to eight participants were carried out. Additionally, qualitative

interviews with 22 members about possible use cases of the system have been

carried out (Sauermann 2009). These studies come to the conclusion that relations

are the key feature to retrieve information. Long-term users evolve a combined

pattern of using text search, relations, and semantic wiki features in order to find

Fig. 13.5 Semantic Save/load dialog in NEPOMUK-KDE – instead of specifying a folder for

saving, the user only associates a number of tags with the document


information. The system supports an intuitive behaviour of navigation by taking

small steps; a phenomenon previously observed by other researchers and described

as “orienteering” (Teevan et al. 2004).

In an earlier evaluation of NEPOMUK (Papailiou et al. 2008), problems were

found when the terms in the user interface are of too technological nature, while the

users prefer terms from their daily work processes. Also, a unification of various

interfaces was desired, which in turn overlaps with Quan’s findings in his thesis

Quan (2003).

Unfortunately, the NEPOMUK-KDE Semantic Desktop implementation with

the largest user-group has seen the least amount of formal evaluation so far.

Informal reactions collected from individual users range from praising the unified

tagging and the powerful search now available to criticising performance problems,

on which one focus of further community development has now been put.

13.6.2 NEPOMUK Case-Studies

In addition to the formal evaluation, NEPOMUK has proved it’s worth in practical

applications. The prototypes developed in the NEPOMUK project were customised

and deployed with four case-study partners:

• SAP Research, Germany, developed the Kasimir task management prototype

and additional productivity tools for Organisational Knowledge Management.

The Kasimir prototype was evaluated at the SAP Research lab in Karlsruhe in

a test phase of 4 weeks. The experience of the test users has been compiled in

a survey and analysed on the basis of an activity theoretic evaluation approach.

The results show that despite the fact that the prototype does not yet provide full

functionality it was nevertheless perceived as useful and promising work tool.

• Cognium Systems, France, developed a prototype called BioNote, a software

system whose goal is to help biomedical researchers at Institute Pasteur to manage

the information they collect or use during their work. The implemented prototype

was assessed through iterative expert and user evaluations. The test user comments

were generally encouraging: people understood and approved the general concept

and added value of the prototype and they were able to easily carry out a simple

scenario using the available user interfaces. Commercial products resulting from

this work can be seen at: http://www.cogniumsystems.com/.

• EDGE-IT, France, developed a prototype help-desk application for the members

of the Mandriva Linux community. It features collaborative semantic annotation

at the desktop or at the Web levels using a dedicated vocabulary, and semantic

search across a set of peers. This allows interlinking a personal Semantic Desktop

with a public Semantic Web maintained by a large community, improving the

learning and problem-solving processes. The help-desk application was evalua-

ted by Mandriva Linux users through an online questionnaire after experimenting

with the system through evaluation scenarios. User answers to the questionnaire


show a real interest in the approach: users consider that the help-desk improves the

efficiency of the community for collectively bringing answers to technical

questions.

• TMI, a Greek consulting company, developed a collection of light-weight add-ons

to the basic PSEW application, especially for providing social and semantic

functionalities customised for knowledge workers in the sector of Professional

Business Services. The tools are collectively known as Sponge, and take the formof widgets that can be placed on the desktop providing quick and easy access to

common tasks such as search and annotation. The prototypes were evaluated both

using the unobtrusive observation method using the “think aloud” protocol, as well

as a more traditional questionnaire. Eleven employees in realistic work conditions

at different offices too part in the evaluation, and the results confirm that the

prototype was simple, intuitive, easy to use and satisfies most of the expected

benefits of users.

13.7 Outlook

The NEPOMUK project ended in 2008, but the activities around the Semantic

Desktop continue. The KDE implementation is still being actively maintained and

enhanced, and the Open Semantic Collaboration Architecture Foundation (OSCAF)

(http://www.oscaf.org/) was created to maintain the ontologies and other standards.

Several of the components of the NEPOMUK implementation, such as the meta-data

extraction framework Aperture (http://aperture.sf.net) are being used by many people

and development continues. The research directions started in NEPOMUK go on in

other research projects, such as Perspecting (http://www.dfki.uni-kl.de/perspecting/).

In addition to the open-source and standardisation work, several spin-off com-

panies were created to commercialise semantic desktop technologies. For instance,

Gnowsis (http://gnowsis.com) recently launched their product Refinder (http://www.

getrefinder.com/), a web-based productivity tool centred around the PIMO ideas.

Acknowledgements This work was supported by the European Union IST fund (Grant FP6-

027705, Project NEPOMUK, http://nepomuk.semanticdesktop.org/) and the German BMBF in

Project Perspecting (Grant 01IW08002).

References

Aberer K, Cudre-Mauroux P, Datta A, Despotovic Z, Hauswirth M, Punceva M, Schmidt R (2003)

P-Grid: a self-organizing structured P2P system. SIGMOD Record 32(3):29–33

Aberer K, Cudre-Mauroux P, Hauswirth M, Pelt, TV (2004), Gridvine: building internet-scale

semantic overlay networks, 3th International Semantic Web Conference ISWC 2004, SpringerVerlag, pp 107–121. http://lsirpeople.epfl.ch/aberer/PAPERS/ISWC2004.pdf, Accessed on

9 Aug 2011


Carroll, JJ, Bizer C, Hayes P, Stickler P (2005) Named graphs, provenance and trust, WWW ’05:

Proceedings of the 14th international conference onWorld Wide Web, ACM Press, New York,

NY, USA, pp 613–622

Gudjonsdottir R (2010) Personas and Scenarios in Use. Doctoral Dissertation, Department of

Human–Computer Interaction, Royal Institute of Technology, KTH, Stockholm, Sweden.

Mylka A, Sauermann L, Sintek M, van Elst L (2007) Nepomuk information element ontology,

http://www.semanticdesktop.org/ontologies/2007/01/19/nie/, Accessed on 9 Aug 2011

Papailiou N, Christidis C, Apostolou D, Mentzas G, Gudjonsdottir R (2008) Personal and group

knowledge management with the social semantic desktop. In Cunningham P, Cunnigham M

(eds) Collaboration and the knowledge economy: issues, applications and case studies,echallenges e-2008 conference, 22–24 October 2008. Stockholm, Sweden, pp 1475–1482

Quan D (2003)Designing end user information environments built on semistructured data models,PhD thesis, Massachusetts Institute of Technology, Department of Electrical Engineering

and Computer Science

Sauermann L (2009) The gnowsis semantic desktop approach to personal information manage-ment, PhD thesis, University of Kaiserslautern. http://www.dfki.uni-kl.de/~sauermann/papers/

Sauermann2009phd.pdf, Accessed on 9 Aug 2011

Sauermann L, Dengel A, van Elst L, Lauer A, Maus H, Schwarz S (2006) Personalization in the

EPOS project. Proceedings of the semantic web personalization workshop at the ESWC 2006conference, pp. 42–52. http://www.dfki.uni-kl.de/~sauermann/papers/Sauermann+2006a.pdf,

Accessed on 9 Aug 2011

Sauermann L, van Elst L, M€oller K (2009) Personal information model (PIMO), v1.1, Recom-mendation, NEPOMUK. http://www.semanticdesktop.org/ontologies/2007/11/01/pimo/v1.1/

pimo_v1.1.pdf, Accessed on 9 Aug 2011

Schwarz S (2010) Context-awareness and context-sensitive interfaces for knowledge work,Dissertation, Technische Universit€at Kaiserslautern, Fachbereich Informatik. http://www.dr.

hut-verlag.de/978-3-86853-388-0.html, Accessed on 9 Aug 2011

Semy SK, Pulvermacher MK, Obrst LJ (2004) Towards the use of an upper ontology for U.S.

government and military domains: An evaluation, Technical report, MITRE. http://www.

mitre.org/work/tech_papers/tech_papers_04/04_0603/index.html, Accessed on 9 Aug 2011

Sintek M, Elst L, Scerri S, Handschuh S (2007a) Distributed knowledge representation on the

social semantic desktop: Named graphs, views and roles in nrl, ESWC’07: Proceedings of

the 4th European conference on The Semantic Web, Springer-Verlag, Berlin, Heidelberg,

pp. 594–608

Sintek M, Elst LV, Scerri S, Handschuh S (2007b) Nepomuk representational language specifi-

cation. Nepomuk specification. http://www.semanticdesktop.org/ontologies/nrl/, Accessed on

9 Aug 2011

Teevan J, Alvarado C, Ackerman MS, Karger DR (2004) The perfect search engine is not enough:

a study of orienteering behavior in directed search. CHI ’04: Proceedings of the SIGCHIconference on Human factors in computing systems. ACM, New York, NY, USA, pp 415–422.

http://portal.acm.org/citation.cfm?id¼985745, Accessed on 9 Aug 2011

van Heijst G, Falasconi S, Abu-Hanna A, Schreiber G, Stefanelli M (1995) A case study in

ontology library construction. Artificial Intelligence in Medicine 7(3):227–255


14

Context-Aware Recommendation for

Work-Integrated Learning

Stefanie N. Lindstaedt, Barbara Kump, and Andreas Rath

14.1 Introduction

In order to improve knowledge work productivity we need to understand the factors

influencing the use of knowledge within organizations. A host of management

science research highlights the importance of ‘soft factors’ which influence the

individual knowledge worker to apply her abilities, invest efforts, face frustrating

experiences and to perform for the sake of the organization. Wall et al. (1992)

extend the fundamental equation of organizational psychology to include the factor

opportunity: Performance ¼ Ability � Motivation � Opportunity.

That is, the performance of a knowledge worker is enhanced by organizational

practices that increase the individual’s knowledge (ability), the individual’s motiva-tion to use this knowledge, or the individual’s opportunity to do so in the workplace.This equation highlights the assumed non-compensatory relationship between the

factors. If an individual lacks for example motivation (motivation ¼ 0), then the

other factors could be as high as they may be, but the performance would still be zero.

Our work focuses specifically on increasing the factor ability (by also taking the

other two factors into account). That is, we strive to build computational support (so

S.N. Lindstaedt (*)

Know-Center, Graz University of Technology, Inffeldgasse 21a, Graz A-8010, Austria

Knowledge Management Institute, Graz University of Technology, Inffeldgasse 21a, Graz,

A-8010, Austria


B. Kump

Knowledge Management Institute, Graz University of Technology, Inffeldgasse 21a, Graz,

A-8010, Austria


A. Rath

Know-Center, Graz University of Technology, Inffeldgasse 21a, Graz A-8010, Austria



275

called knowledge services) which enables users to continuously improve their

competencies during work. Our approach builds upon the notion of Work-

Integrated Learning (WIL, see Sect. 14.2) which has the potential to improve

knowledge work productivity by building awareness of relevant knowledge

resources, pointing out learning opportunities, and improving task completion

ability. We engaged in a 4 year quest for building a computational environment

which supports WIL. This quest was undertaken within the integrated EU-funded

project APOSDLE1. Applying a multitude of participatory design methods (shortly

sketched in Sect. 14.3) we identified three key challenges: learning within real work

situations, utilization of real work resources for learning, and learning support

within the user’s everyday computational work environment.

This chapter addresses mainly the first challenge (real time learning). Our

participatory design studies indicate that learners require varying degrees of

learning support in different ‘contexts’. Specifically, it is important to know and

understand what the user is working on or is trying to achieve. In our work we

distinguish between two types of context: On the one hand, we take the work situation

into account by identifying the concrete work task a user is currently executing (e.g.

preparing a project kick-off event), i.e. the work task specifies the short-term context

of the user; On the other hand, we strive to infer the competencies a user possesses

(e.g. experienced project manager); i.e. the accumulated experiences from work task

executions specify the long-term user context. Our approach to supporting WIL in

real time is to provide a variety of recommendation services which are adapted to the

short-term as well as long-term user context. Specifically, we not only recom-

mend documents but fine granular parts of documents (so-called snippets), people,

possible learning goals, application opportunities, relationships between tasks, etc.

In doing so, these recommendation services positively influence productivity.

Section 14.4 introduces three different learning situations in the form of three

WIL scenarios. These scenarios mainly differ in the available time which the user

can spend to engage in a learning situation. Different types of recommendations

(e.g. content versus learning opportunities) are helpful in these different situations

but all of them need to be adapted to the current work task of the user and her

competencies in order to improve productivity effectively. Our goal is the design of

an environment which can provide learning support of varying degrees of learning

guidance (see Sect. 14.3.3) independently from the application domain and which

utilizes knowledge resources from within an organization for learning – thus

keeping costs and efforts for learning material generation low.

Section 14.5 offers a conceptual view of the APOSDLE environment which

represents our ontology-based approach to designing domain-independent learning

support. Within this chapter we focus specifically on three knowledge services

which implement our approach to context-aware recommendation for improving

knowledge work productivity, namely task detection service (Sect. 14.5.2), user

1Advanced Process-Oriented Self-Directed Learning, www.aposdle.org.

276 S.N. Lindstaedt et al.

competence inference service (Sect. 14.5.3), and content recommendation service

(Sect. 14.5.4).

In other words, we present a design environment (Eisenberg and Fischer 1994)

which enables the creation of environments for WIL support specifically tailored to

the unique needs of a company and concrete learning domain. We see the

“APOSDLE solution” as consisting of (a) modelling the learning domain and the

work processes, (b) annotating documents and other sources of information avail-

able in the company repository, (c) training the prospective users of the APOSDLE

environment, and (d) using the APOSDLE environment with its varying learning

guidance functionalities at the workplace.

We have evaluated our approach to WIL by embedding the APOSDLE environ-

ment for 3 months into three application organizations – not only technically but

also socially by building the relevant processes around it. Section 14.6 shortly

presents the results of our summative workplace evaluation and discusses modeling

efforts needed during APOSDLE instantiation in a specific application domain.

14.2 Work-Integrated Learning

Building on theories of workplace learning (such as [Lave and Wenger 1991] and

[Eraut and Hirsh 2007]) we conceptualize learning as a dimension of knowledgework which varies in focus (from focus on work performance to focus on learning

performance), time available for learning, and the extent of learning guidance

required. This learning dimension of knowledge work describes a continuum of

learning practices which starts at one side with brief questions and task related

informal learning (work processes with learning as a by-product), and extends at the

other side to more formal learning processes (learning processes at or near the

workplace). This continuum emphasizes that support for learning must enable a

knowledge worker to seamlessly switch from one learning practice to another as

time and other context factors permit or demand.

Research on supporting workplace learning and lifelong learning so far has

focused predominantly on the formal side of this spectrum, specifically on course

design applicable for the workplace and combinations of face-to-face and online

learning elements (blended-learning). In contrast, the focus of our work is on the

informal side of the spectrum, specifically covering work processes with learning as

a by-product and learning activities located within work processes. We have coined

the term work-integrated learning (WIL) in order to refer to this type of informal

learning practices at the workplace that are truly integrated in current work pro-

cesses and activities. WIL is relatively brief and unstructured (in terms of learning

objectives, learning time, or learning support). The main aim of WIL activities is to

enhance task performance. From the learner’s perspective WIL can be intentional

or unintentional, and the learner can be aware of the learning experience or not

(Schugurensky 2010).

14 Context-Aware Recommendation for Work-Integrated Learning 277

WIL makes use of existing resources – knowledge artifacts (e.g., reports, project

results) as well as humans (e.g., peers, communities). Learning in this case is a by-

product of the time spent at the workplace. This conceptualization enables a shift

from the training perspective of the organization to the learning perspective of the

individual.

14.3 Distributed Participatory Design

This section gives an overview of the activities and their settings leading to the

development of the APOSDLE environment. The development process involved

three prototyping iterations lasting 1 year each and involving in-depth requirements

elicitation, conceptual design, implementation, and evaluation phases.

14.3.1 Application Domains and Ssettings

The APOSDLE environment was designed in close cooperation and involvement

of five different knowledge intensive application domains. Three application domains

were chosen in collaboration with different enterprises participating in the

project: simulation of effects of electromagnetism on aircraft (EADS – European

Aeronautic Defense and Space Company, Paris, France), innovation management

(ISN – Innovation Service Network, Graz, Austria), and intellectual property rights

consulting (CCI Darmstadt – Chamber of Commerce and Industry, Darmstadt,

Germany). Two additional domains refer to general methodologies that can be used

in different settings and disciplines: the RESCUE process for the thorough elicitation

and specification of consistent requirements for socio-technical systems, and the

domain of statistical data analysis (SDA).

14.3.2 Design and Development Process

The design process employed is an example of distributed participatory design,

which has been carried out in the context of real-world project constraints on time

and cost. The process consisted of instances of synchronous and asynchronous,

distributed and non-distributed design activities, and also integrated activities

designed to stimulate creative inputs to requirements. For example, the activities

included workplace learning studies (Kooken et al. 2007) and surveys (Kooken

et al. 2008), iterative use case writing and creativity workshops (Jones and

Lindstaedt 2008) resulting in 22 use cases and more than 1,000 requirements. In

order to improve the system throughout the development process, formative

evaluations of the second prototype (Lichtner et al. 2009) were carried out that


triggered a re-design of APOSDLE using the personas approach (Dotan et al. 2009).

The third prototype was then exposed to extensive usability studies with students,

application partners in real world settings and usability labs, and a variety of

evaluations of individual components. A final summative evaluation spanning

3 months of real-world application concluded the process. The main findings of

this summative evaluation are reported in Sect. 14.6.

14.3.3 Challenges for Supporting WIL

With the described participatory design and development activities, three major

challenges for WIL support were identified (see also Sect. 14.1): real time learning,

i.e. learning within real work situations; utilization of real work resources for

learning; and learning within a user’s everyday (real) computational environment.

Real time learning:WIL support should make knowledge workers aware of and

support them throughout learning opportunities relevant to their current work task.

WIL support needs to be adapted to a user’s work context and her experiences, and

should be short, and easy to apply.

Real knowledge resources: WIL support should dynamically provide and make

users aware of available knowledge resources (both human as well as material)

within the organization. By providing ‘real’ resources the effort for learning

transfer is reduced and the likelihood for offering opportunities to learn on different

trajectories is increased.

Real computational environment: WIL support should be provided through a

variety of tools and services which are integrated seamlessly within the user’s

desktop and allow one-point access to relevant back-end organizational systems.

These tools and services need to be inconspicuous, tightly integrated, and easy to

use. They must support the knowledge worker in effortlessly switching between

varieties of learning practices.

Concerning real time learning, our participatory design activities and workplace

learning studies suggest that learners need different support and guidance within

different work situations. Specifically, it became apparent that learning guidance is

needed on varying levels ranging from descriptive to prescriptive support. While

prescriptive learning guidance has the objective of providing clear directions, or

rules of usage to be followed, descriptive learning guidance offers a map of the

learning topic and its neighboring topics, their relationships, and possible

interactions to be explored. Specifically, prescriptive learning support is based on

a clearly structured learning process which imposes an order on learning activities

while descriptive support does not do so.

The learning guidance discussed here is applicable to WIL situations and covers

the more informal side of the learning dimension of knowledge work. Clearly the

spectrum of learning guidance could be extended to much more formal learning

guidance approaches such as predefined learning modules and enforced learning

processes. However, this is beyond the scope of our work. The learning guidance


we consider and explore ranges from building awareness of descriptive learning

support (exposing knowledge structures and contextualizing cooperation) to par-

tially prescriptive (triggering reflection and systematically developing

competencies at work). The following section illustrates how these different

degrees of learning guidance are realized within the APOSDLE environment.

14.4 Providing Recommendations with Varying Degrees

of Learning Guidance

Within this section we present varying types of context-aware recommendation

functionalities for WIL in the form of three scenarios based on the ISN application

case. These scenarios also provide an overview of the overall APOSDLE environ-

ment from the user’s point of view. Section 14.5 then examines the conceptual

architecture together with the knowledge services which implement these

recommendations.

The following scenario illustrates varying types of learning guidance that we

have developed. Our scenario is located in a company called ‘Innovation Service

Network’ (ISN). ISN is a network of small consultancy firms in the area of

innovation management. Consultants at ISN support customer companies in the

introduction of innovation management processes and the application of creativity

techniques. One of these consultants for innovation management is Eva. Eva has

been assigned to lead a new project with a client from the automotive industry. The

objective is to come up with creative solutions for fastening rear-view mirrors to the

windshield.

14.4.1 Building Awareness: Recommendation of Snippetsand People

Let us assume that our innovation management consultant from the scenario, Eva, is

in a hurry. She needs to plan the kick-off meeting for her new innovation project

within the next hour. As Eva begins to create the agenda using her favorite word

processor, APOSDLE automatically recognizes the topic of Eva’s work activity,

namely “moderation”. A small pop-up notification (see Fig. 14.1) unobtrusively

informs Eva that her work topic has been detected and that relevant information is

available.

Over the years the APOSDLE knowledge base has collected a large variety of

resources (e.g. project documents, checklists, videos, pictures) about innovation

management which Eva and her colleagues have produced or used. In the back-

ground, APOSDLE proactively searches the knowledge base utilizing the detected

work topic together with information from Eva’s User Profile (see User Profile and


Experiences in Sect. 14.5.3) to form a personalized query. Eva is interested in

getting some help to speed up her work. Therefore, she accesses APOSDLE

Suggests by clicking on the notification or the APOSDLE tray icon. APOSDLE

Suggests (Fig. 14.2) displays a list of resources related to the topic “moderation”

(see Sect. 14.5.2). This list of resources is ranked based on her expertise in

moderation techniques.

Eva finds a checklist for moderating a meeting or a workshop which was put

together by a colleague in the past. She opens the checklist in the APOSDLE

Reader (Fig. 14.3) which jumps to and highlights the most relevant part (called

Fig. 14.1 APOSDLE automatically detects a user’s context and displays notifications. In this case

APOSDLE detected the Topic “Moderation” Eva is currently working on

Fig. 14.2 “APOSDLE Suggests” recommends Knowledge Resources for Eva’s current context

(Topic “Moderation”). The tree view of Topics in the Browse Tab is shown on the left


‘Snippet’) for her. In addition, the APOSDLE Reader indicates other parts of the

checklist also relevant to Eva’s work task via a theme river (see yellow blocks

within center part of Fig. 14.3). Eva finds some ideas suitable to her current project

and integrates them into her own workshop preparation.

APOSDLE supports Eva in performing her work task without her even having to

type a query. Moreover, her own expertise level is taken into account when making

recommendations. This unobtrusive proactive information delivery raises aware-

ness of knowledge resources which Eva would not have searched for otherwise. It

provides learning guidance in that it highlights possible learning opportunities

within the current work task without imposing constraints on the learner.

14.4.2 Descriptive Learning Guidance: Recommendationof Learning Goals and Communication Channels

After the kick-off meeting Eva prepares for the next step in the innovation project,

namely a creativity workshop. Since she has never moderated such a workshop

before, she takes some more time to explore different possibilities and their

implications. Eva opens APOSDLE Suggests and starts searching for the keywords

“creativity techniques” and “creativity workshop”. Eva selects the task “applying

Fig. 14.3 APOSDLE Reader showing relevant information for Eva’s context. It highlights the

relevant part in a document (left), suggests other relevant parts (Snippets) throughout the document

as a ThemeRiver (center), and offers a more detailed list view of Snippets (right) which can be

sorted according to different criteria


creativity techniques in a workshop”. APOSDLE Suggests analyzes her User

Profile and recommends a number of possible Learning Goals she could pursue in

the context of the chosen work task. These Learning Goals are related to

competencies which she would need to execute the task properly and for which

Eva has not exhibited a sufficient level of expertise, i.e. for which she has a

competency gap. Eva selects one of the Learning Goals (e.g. basic knowledge

about creativity techniques in Fig. 14.4) and thus refines the snippet and people

recommendations.

Eva opens a video which contains an introduction about creativity techniques

and creativity. The APOSDLE Reader again highlights relevant parts of the video

and provides an overview of the video by displaying a theme river similar to the one

shown in Fig. 14.3. The video helps Eva to get a better understanding of basic

creativity theories and methods. But she still has some more concrete questions in

particular in the context of the snippet she has found.

By simply clicking on ‘contact snippet author’, Eva contacts Pierre (another

consultant in her company) to ask him about his experiences. APOSDLE supports

Eva in selecting a cooperation tool by knowing Pierre’s preferred means of cooper-

ation (e.g. asynchronous vs. synchronous, tools he uses, etc.). APOSDLE also

contextualizes the cooperation by providing Pierre with the parts of Eva’s work

context which are relevant to her question. That is, Pierre can review which

resources Eva has already accessed (assuming Eva’s privacy settings allow this).

Pierre accepts Eva’s request and Pierre and Eva communicate via Skype (Pierre’s

preferred means of communication). Eva can take notes during the cooperation, and

can reflect on the cooperation afterwards in a dedicated (Wiki) template. If Eva and

Fig. 14.4 Recommended resources can be refined according to a user’s Learning oals listed in the

drop down box. Learning Goals allow narrowing down large lists of resources to specific needs of

users


Pierre decide to share this cooperation result with other APOSDLE users, the

request, resources, notes and reflections will be fed into the APOSDLE knowledge

base. After talking to Pierre, Eva continues with preparations for the upcoming

creativity workshop.

By exposing the relationships between topics and tasks of the application domain

the learner is enabled to informally explore the underlying formal knowledge

structures and to learn from them. Specifically, users can be made aware of topics

relevant to the current task. These might constitute relevant learning goals for the

future. In addition, APOSDLE supports communication between peers by helping to

identify the right person to contact, to select the preferred communication channel,

to contextualize the cooperation, and to document it if desired. It is up to the user if at

all, in which situations, and in which order to take advantage of this support.

14.4.3 Partially Prescriptive Learning Guidance:Recommendation of Reflection Situationsand Learning Paths

Eva has some additional time which she wants to spend on acquiring in-depth

knowledge about creativity techniques. She opens the APOSDLE Experiences Tab

(see Sect. 14.5.3 for more details on the User Profile) and reflects on her past

activities. This tab (Fig. 14.5) visualizes her own User Profile indicating which

topics she routinely works with (middle layer), which topics she needs to learn more

about (top layer), and in which topics she has expertise (bottom layer).

She realizes that she is a learner in many of the common creativity techniques

and therefore she decides to approach this topic systematically by creating a

personalized Learning Path. Eva opens the Learning Path Wizard (Fig. 14.6) and

browses through the task list. She selects the task “applying creativity techniques in

a workshop”. Based on this task, the Learning Path Wizard suggests a list of topics

to include in her Learning Path (see Sect. 14.5.1.1), and Eva adds some more

creativity techniques Pierre mentioned in their last conversation. Eva saves the

Learning Path and also makes it public so that other colleagues can benefit from it.

To execute the Learning Path Eva then activates it in APOSDLE Suggests. At

this time APOSDLE Suggests recommends relevant knowledge resources for the

topic she selected from her Learning Path. Eva now follows her Learning Path in

dedicated times during her working hours. Whenever new relevant resources are

added to the knowledge base Eva is made aware of them.

APOSDLE explicitly triggers the learner to reflect upon her (learner) activities

while the reflection process itself is not formally supported but left to the user’s

discretion. In addition, the creation of semi-formal Learning Paths for longer term

and more systematic competence development is supported and partially

automated. However, the time and method of Learning Path execution is not

predetermined and can be performed flexibly.


Fig.14.5

TheExperiencesTab

provides

userswithan

overviewabouttheirexperienceswithtopicsin

thedomain.APOSDLEusesthreelevels(Learner

[top

layer],Worker

[middlelayer],Supporter

[bottomlayer])to

indicatedifferentlevelsofknowledge


14.5 Conceptual View on the APOSDLE Environment

As was illustrated in the above scenarios the APOSDLE environment provides a

variety of recommendations to support WIL. Within this section we present our

approaches to three key challenges for context-aware recommendation: automatic

detection of a user’s work task, inference of a user’s competences based on her

interactions, and the computation of the recommendation itself.

All components have been designed and are implemented as domain-indepen-

dent knowledge services (Lindstaedt et al. 2008). That is, none of the components

embody application domain knowledge and thus constitute a generic design envi-

ronment for WIL environments. In order to create a domain-specific WIL environ-

ment for a specific company, all application-specific domain knowledge has to be

added to APOSDLE in the form of three ontologies (see Sect. 14.5.1) and the

different knowledge resources within the knowledge base. Our approach to ontol-

ogy engineering support using the ModellingWiki (MoKi) is presented in Chap. 12.

In the present chapter, we first shortly present the structure and knowledge

resources of the knowledge base which provide a conceptual basis for all other

components (Sect. 14.5.1). Then, we describe our approach to automatically

detecting a user’s work context (Sect. 14.5.2). We briefly introduce our method

of unobtrusively diagnosing a user’s knowledge and skills, and recommendation

services that make use of the information in the user profile (Sect. 14.5.3). Finally,

Fig. 14.6 Wizard for creating a Learning Path based on the topics available in the domain. The

left column contains topics which have been recommended to reach a specific Learning Goal.

Users can choose out of this list of Topics to assemble their individual Learning Path (rightcolumn)


we describe, how an associative network is used to retrieve relevant content for a

task at hand (Sect. 14.5.4).

14.5.1 Knowledge Base

Different types of Knowledge Resources are presented to the user within

APOSDLE: Topics, Tasks, Learning Paths, Documents, Snippets, Cooperation

Transcripts, and Persons. All of these resources can be organized into Collections

which can be shared with others and thus may serve as Knowledge Resources

themselves.

14.5.1.1 Topics, Tasks, Learning Goals, and Learning Paths

Topics, Tasks and Learning Paths are structural elements which are presented to the

users and which can be used for accessing further Knowledge Resources. All of

them are encoded within an integrated OWL ontology within the Knowledge Base

and provide the basis for intelligent recommendation of resources and for

inferences on the user’s competencies (Lindstaedt et al. 2009).

Topics (domain model) are core concepts which knowledge workers in a com-

pany need to know about in order to do their jobs. For instance, Topics in the ISN

domain are “creativity technique” or “workshop”. Each Topic has a description. A

Topic can be added to a Collection, its relations with other Topics and with Tasks

can be browsed or it can trigger recommendations in APOSDLE Suggests.

Tasks (process model) are typical working tasks within a specific company.

Examples for ISN Tasks include “applying creativity techniques in a workshop” or

“identifying potential cooperation partners”. Each Task has a description. In addi-

tion, to each Task a set of Learning Goals is assigned which are required for

performing the Task successfully. For instance, for the ISN task “applying creativ-

ity techniques in a workshop” one required Learning Goal might be “basic knowl-

edge about creativity techniques”. Each of these Learning Goals is related to one

Topic. That way, Tasks and Topics are inherently linked. A Task in APOSDLE can

be added to a Collection, its relations with other Tasks and with Topics can be

browsed and it can trigger suggestions in APOSDLE Suggests.

In essence, a Learning Path is a sequence of Topics for which recommendations

can be obtained in APOSDLE Suggests. The sequence of Topics about which

knowledge should be acquired shall maximize learning transfer and follows a

prerequisite relation computed on the Learning Goal model based on compe-

tence-based knowledge space theory (Ley et al. 2008). Learning Paths are

generated by APOSDLE users themselves with the help of a Learning Path Wizard

(Sect. 14.4.3) starting from a Task, a Collection, or a Topic. Learning Paths can be

shared with others or added to a Collection.


14.5.1.2 Documents, Snippets, and Cooperation Transcripts

Documents, Snippets and Cooperation Transcripts are the actual ‘learning content’

within APOSDLE. They constitute previous work results of knowledge workers in

the company which can be accessed. Such context-related knowledge artifacts

improve the likelihood of offering highly relevant information which can be directly

applied to the work situation with little or no learning transfer required. In addition,

they have the advantage that no additional learning content has to be created.

By Documents, we understand both textual and multimedia documents which

can be accessed in the APOSDLE Reader. Documents can be opened and saved,

shared with others, added to a Collection or rated.

Snippets are parts of (textual or multi-media) documents annotated with one

Topic which can be viewed in the APOSDLE Reader. Users can create Snippets by

highlighting a part of a Document and annotating it with one Topic from the domain

ontology, share Snippets with their colleagues, add them to a Collection, and rate

them. In addition, APOSDLE automatically generates Snippets fitting to the

domain ontology provided.

Cooperation Transcripts are textual documentation of information exchanged

during cooperations. Cooperation Transcripts can be fed back into APOSDLE and

made available in APOSDLE Search or Suggest.

14.5.1.3 Knowledgeable Persons

All APOSDLE users are potential sources of knowledge and hence constitute Knowl-

edge Resources. Knowledgeable Persons are identified topic-wise, i.e. there are no

‘overall Knowledgeable Persons’ but only Knowledgeable Persons for a Topic at

hand. For instance, a person can be knowledgeable with respect to the Topic “work-

shop” but might have little knowledge about the Topic “creativity technique”. The

information of who is a knowledgeable person at a certain point in time for a Topic at

hand is obtained from the APOSDLE User Profile (see Sect. 14.5.4). Persons can be

contacted directly or be added to Collections for future contact.

14.5.2 Automatically Determining the User’s Work Task(Short-Term Context)

Within this section we present our approach to automatically determining a user’s

current work task based on her interactions (e.g. keystrokes, application specific

events) with the computer. The unique characteristic of our approach is the use of a

generic ontology to represent the user interaction context which preserves and

establishes the relationships between knowledge resources. It is utilized to engineer

features which improve task detection precision also for knowledge intensive tasks.


The identification of the current work task on the one hand triggers the context-

aware recommendation services (see Sect. 14.5.4) and on the other hand serves as

input for inferring a user’s competences (see Sect. 14.5.3).

14.5.2.1 User Interaction Context Ontology

We refine Dey’s definition of context (Dey et al. 2001) by focusing on the user

interaction context that we define as “all interactions of the user with resources,

applications and the operating system on the computer desktop” (Rath et al. 2009).

Various context model approaches have been proposed, such as key-value models,

markup scheme models, graphical models, object oriented models, logic-based

models, or ontology-based models (Strang and Linnhoff-Popien 2004). However,

the ontology-based approach has been advocated as being the most promising one

(Baldauf et al. 2007) mainly because of its dynamicity, expressiveness, and

extensibility.

We have defined a user interaction context ontology (UICO) (Rath et al. 2009)

which represents the user interaction context through 88 concepts, 215 datatype and 57

objecttype properties. It is modelled in the ontology web language (OWL), a W3C

standard for modelling ontologies widely accepted in the Semantic Web community.

The majority of concepts represents the types of user interactions and the types of

resources. The high number of datatype properties represents data andmetadata about

resources and application user interface elements the user interacted with. The

objecttype properties relate (1) the user interactions with resources, (2) resources

with other resources or parts of resources, and (3) user interactions with themselves

for modelling the aggregation of user interactions. Context observers, also referred to

as context sensors, are programs, macros and plug-ins that record the user’s

interactions on the computer desktop. It is important to note that the UICO is a generic

ontology for personal information management which covers most resource, interac-

tion types, and applications available to the user at a typical windows desktop

computer. It can easily be extended to include more specialized applications.

From a data perspective, the UICO is a much richer representation of the user’s

interaction context than is typically preserved in simple sensor streams. From a

semantic perspective, the UICO transcends top–down approaches to semantic

desktop ontology design in not only providing high level concepts but connecting

them to low level sensor data. We argue that by relating semantics with sensor data

the UICO lends itself naturally to: (1) a variety of context-aware applications, and

(2) “mining” activities for in-depth analyzes of user characteristics, actions,

preferences, interests, goals, etc.

14.5.2.2 Automatic Population of the UICO

We developed a broad range of context sensors for standard office applications and

the operating system Microsoft Windows (XP, Vista and 7). A complete list of


sensors is given in (Rath et al. 2009). The sensed contextual data sent by the context

sensors is used as a basis for automatically populating the UICO. Automatic

population here means an autonomous instantiation of concepts and creation of

properties between concept instances of the UICO based on the observed and the

automatically inferred user interaction context. For example, if a user copies a piece

of text from an e-mail to a word document we will automatically create an instance

of the concept e-mail and the concept word document within the UICO and connect

them via a relationship link. The automatic population exploits the structure of user

interface elements of standard office applications and preserves data types and

relationships through a combination of rule-based, information extraction and

supervised learning techniques. We also use our knowledge discovery framework,

the KnowMiner (Klieber et al. 2009) to perform named entity recognition of

persons, locations and organizations as well as for extracting data and metadata

of various resource types. Hence, the UICO is a much richer representation of the

user interaction context than is typically stored in attention metadata sensor streams

(Wolpers et al. 2007) since it preserves relationships that otherwise are lost.

14.5.2.3 Task Detection as a Classification Problem

Performing task detection consists of training machine learning algorithms on

classes corresponding to task models. This means that each training instance

presented to the machine learning algorithms represents a task that has to be

‘labelled’. Thus, training instances have to be built from features and feature

combinations derived from the user context data on the task level. In our ontol-

ogy-based approach, this means deriving features from the data associated with a

Task concept. Based on our UICO, we have engineered 50 features for constructing

the training instances. They are grouped in six categories: (1) action, (2) applica-

tion, (3) content, (4) ontology structure, (5) resource, and (6) switching sequences.

We use the machine learning toolkit Weka (Witten and Frank 2005) for parts of the

feature engineering and classification processes. A number of standard pre-

processing steps are performed on the content of text-based features.

14.5.2.4 Evaluations

We have performed three laboratory experiments for evaluating the influence on

our ontology-based user task detection approach of the following factors: (1) the

classifier used, (2) the selected features, (3) the task type, and (4) the method chosen

for training the classifiers. We have gained several insights from our evaluation.

First, the J48 decision tree and Naıve Bayes classifiers provide better classifica-

tion accuracy than other classifiers, in our three experiments.

Second, we have isolated six features that present a good discriminative power

for classifying tasks, namely the accessibility object name feature, the window title

feature, the used resource metadata feature, the accessibility object value feature,


the datatype properties feature and the accessibility object role feature. Because of

the low standard deviation values associated with them, the performance of these

six features also proves to be stable across datasets.

Third, even though it could seem easier to classify routine tasks, our experiments

show that knowledge intensive tasks can be classified as well as routine tasks. For

example, within the domain of computer science students the following tasks can be

considered routine tasks: registering for an exam, finding course dates, reserving a

book from the university library, etc. On the other hand knowledge intensive tasks

here include: programming an algorithm, preparing a scientific talk, planning a

study trip, etc. We attribute this result to the fine granularity of the usage data

captured within the UICO.

Fourth, we have shown that a classifier trained by a group of experts on

standardized tasks also performs well while classifying personal tasks performed

by users. This result suggests, that the classifier can be trained by a domain expert

and then utilized by other users – eliminating the time intensive training period for

individual users. For more details please refer to (Rath et al. 2010).

In future work we will investigate combining unsupervised learning mechanisms

for identifying boundaries in the user interaction context data, based on the six

discovered context features and applying the J48 decision tree and Naıve Bayes

learning algorithms for classifying these clusters to task classes.

14.5.3 Inferring the User’s Competencies (Long-Term Context)

Within this section we present our approach to inferring a user’s competencies based

onKnowledge Indicating Events (KIE, see below) and thus automaticallymaintaining

her user profile. Themajor KIE we have examined is the execution of work tasks. The

KIE approach distinguishes itself from other approaches in the user modeling field in

that it takes into account user events which occur naturally during everyday work as

opposed to being restricted to events within specialized eLearning systems.

The inferred user competencies are used to rank the learning goal and content

recommendations and form the basis for the recommendation of subjectmatter experts.

14.5.3.1 Knowledge Indicating Events

We suggest tackling the challenge of user profile maintenance by observing the

naturally occurring actions of the user (Jameson 2003) which we interpret as

knowledge indicating events (KIE). KIE denote user activities which indicate that

the user has knowledge about a certain topic (Ley et al. 2008).

Within our first two prototypes we have based the maintenance of the user profile

solely on information on past tasks performed (task-based knowledge assessment).

The algorithms for maintaining the user profile and the ranking of learning goals

were based on competence-based knowledge space theory (cbKST, Korossy 1997)


which is based on Doignon and Falmagne’s knowledge space theory (Doignon and

Falmagne 1985). It is a framework that formalizes the relationship between overt

behavior (e.g. task performance) and latent variables (knowledge and competencies

needed for performance).

While there is some evidence that in fact most learning at the workplace is

connected to performing a task, and that task performance is a good indicator for

available knowledge in the workplace, this restriction to tasks performed certainly

limits the types and number of assessment situations that are taken into account. We

have therefore extended our approach and looked at a variety of additional potential

KIEs. Examples include communication with other users about a topic or the

creation of documents which deal with a topic. KIE thus are based on usage data.

Our approach goes into a similar direction as Wolpers et al. (2007), who suggested

using attention metadata for knowledge management and learning management

approaches. This idea is similar to the approach of evidence-bearing events (e.g.

Brusilovsky 2004). By now, the approaches of attention metadata and evidence-

bearing events have been discussed from a rather technical point of view, and how

the approaches, once implemented, could be used in different settings. With our

approach to KIE, we extend the technical perspective by taking into account the

entire process of implementing the KIE approach in a learning system, from

identifying potential KIE to their evaluation.

14.5.3.2 User Profile

We constitute the user profile as an overlay of the topics in the knowledge base

(Sect. 14.5.1; see also Lindstaedt et al. 2009). In APOSDLE’s second prototype, the

user profile was maintained using the following rule: whenever a user executes a

task (e.g. workshop preparation) within the environment the counter of that task

within her user profile is incremented. The user profile counts how often the user

has executed the task in question. It therefore constitutes a simple numeric model of

the tasks which are related to one or several topics in the domain. Based on the

learning goal model we can infer that the user has knowledge about all the topics

related to that task. By means of an inference service (see below), information is

propagated along the relationships defined by the learning goal model (see

Sect. 14.5.1; Lindstaedt et al. 2009), and the counters of all topics related to the

task are also incremented. Consequently, the user profile contains a value for each

user and each topic at any time during system usage.

In APOSDLE’s third prototype, the simple numeric user profile was changed

into a qualitative user profile that distinguishes three qualitatively different knowl-

edge levels (learner, worker, supporter). Therefore, we extended our set of KIE and

assigned to each knowledge level those events which indicated the respective knowl-

edge level. For example, ‘asking for a learning hint’ is an event that indicates ‘learner’

knowledge level, ‘performing a task’ would be ‘worker’ knowledge level and ‘being

contacted about a certain topic’ would be supporter knowledge level.We then tracked

all KIE that occurred for a user and developed an algorithm that allowed us to diagnose


one of the three knowledge levels for each topic in the domain the user has usedwithin

APOSDLE (e.g. by opening a document, carrying out a task etc.).

14.5.3.3 Learning Goal Ranking

A user’s learning need is inferred in three steps (Lindstaedt et al. 2009). Starting

with the user’s current task, the learning goal model is queried to retrieve the

required learning goal vector r for this task (step 1). The vector r represents forall learning goals of the domain whether or not they are required to perform the

user’s current task. In step 2, the user profile is queried with the required learninggoal vector r as parameter to retrieve the current knowledge levels vector k for theuser. The vector k consists of one of three knowledge levels for all topics which arereferenced within vector r. The third step generates the learning need g by ranking

the knowledge levels of all required topics (vector k) from learner topics (top of the

list) to supporter topics (bottom of the list). The ‘lower’ the knowledge level of a

topic in the learning goal vector g, the higher the rank of the learning goal. The

‘most required’ learning goal is therefore listed on the top of the learning need. If

there are multiple topics with the same knowledge level, these topics which are

assigned to more tasks in the knowledge base are ranked higher. This simple rule

ensures that the prerequisite relation that is assumed between topics of the knowl-

edge base is taken into account (see Ley et al., 2009). The learning need is used by

APOSDLE in two ways. An application running in the working environment of the

user visualizes the result as a ranked list (see Fig. 14.3). The ‘top’ learning goal is

automatically pre-selected, which invokes the Associative Retrieval Service (see

Sect. 14.5.4) to find knowledge resources relevant for the ‘most pressing’ learning

need. The user can select other learning goals from the ranked list and thus filter the

offered knowledge resources accordingly.

14.5.3.4 Identifying Knowledgeable People

Within APOSDLE, a People Recommender Service aims at finding people within the

organization who have expertise related to the current learning goal of the user.

Obviously, the People Recommender Service is based on the information in the user

profile. Users specialising in certain topics are represented in the user profile with

high knowledge levels for these topics. Other users can now individually be provided

with colleagues having equal or higher experience. Compared to e.g. the MetaDoc

system (Boyle 1994) this service uses a more dynamic way of identifying experts.

Knowledgeable users are identified by comparing the current knowledge levels

vectors of all users with the knowledge level vector of the user who will receive the

recommendation. To infer knowledgeable users, the People Recommender Service

utilises the Learning Need Service (see Sect. 14.5.3.3) to retrieve current knowledge

levels vectors for all users. The next step removes all users with lower knowledge

levels compared to the user receiving the recommendations. The remaining users are


then ranked according to their knowledge levels in the current knowledge level

vectors. The most knowledgeable user will be ranked highest. The service can be

configured to also use the availability status of users as a ranking criterion.

14.5.3.5 Evaluation

We conducted a variety of studies to evaluate the automatically maintained user

profile and its related services (Lindstaedt et al. 2009) of which we here only report

on one that addresses the accuracy and usefulness of the learning goal ranking.

In a lab study psychology students were observed and interviewed while they were

trying to learn with APOSDLE in the learning domain of statistical data analysis. Ouraim was to investigate the effects of the learning goal ranking on the actual perfor-

mance of users in realistic tasks which they were not able to solve before the study

(pre-test). In the pilot study, control groups were used to compare three different

versions of the ranking algorithm: (a) the ranking algorithm as it was designed for

APOSDLE (taking into account both the requirements of the task and the knowledge

of the user), (b) a shuffled list of learning goals required for the task at hand (taking into

account the requirements of the task but not the knowledge state of the user), and (c) a

set of learning goals randomly selected (neither taking into account requirements of

the task, nor the knowledge state of the user). Each of the participants had to solve

three different tasks, one for each version of the algorithm. With versions (a) and (b),

the previously unknown tasks could be solved by all participants, whereas the task

could be solved by none of themwhen algorithm (c)was applied.Additionally, a slight

difference in the users’ behavior was found between versions (a) and (b): In case of

version (a), users by tendency selected less learning goals and more frequently carried

out learning activities for learning goals on the top of the list in comparison with

version (b) of the ranking algorithm. This serves as an indicator that the ranking

algorithm is useful but clearly more studies are needed.

14.5.4 Recommending Content Relevant to the Work Task

We introduce a hybrid context-based search mechanism on the basis of an associa-

tive network. The network consists of a semantic layer which is built from the

domain ontology and a content-based layer which is built from the textual knowl-

edge artifacts of the organization. This hybrid approach combines semantic-simi-

larity with text-based similarity measures in order to improve retrieval

performance. The novelty of our approach lies in combining spreading activation

search in both layers using one and the same model.

Here spreading activation (Crestani 1997) is used for all three types of search:

search for documents based on a set of Topics, associative search for Topics, and

associative search for documents/snippets. This allows uniform search based on the

detected user work task.


14.5.4.1 Associative Retrieval Service

The Associative Retrieval Service (Scheir et al. 2008) relies on knowledge

contained in the domain ontology and the statistical information in a collection of

documents. The service is queried with a set of Topics from the ontology and

returns a set of snippets. As discussed above, snippets in the environment are

annotated with ontological Topics. In APOSDLE, the annotation process is

performed manually for a training set which is then utilized to train a classifier to

automatically create and annotate snippets.

Topics from the ontology are used as metadata for snippets in the system. In

contrast to classical metadata, the ontology specifies relations between the Topics

(Fig. 14.7). For example, class-subclass relationships are defined and domain-

specific relations between Topics are modeled. The structure of the ontology is

used for calculating the similarity between two Topics in the ontology (see below).

This similarity is then used to expand a query with similar Topics before retrieving

documents dealing with this set of Topics. After thus retrieving documents based on

metadata, the result set is expanded by means of textual similarity. The implemen-

tation of the associative network allowed us to develop and test different

combinations of query and result expansions that are based on the spreading

activation algorithm.

The three concept similarity measures used are: a measure based on the shortest

path between two Topics in the same class hierarchy, the measure of Resnik (1999)

and a vector measure based on the properties which relate Topics in an ontology.

All three measures share the fact, that for calculating the semantic similarity

between two Topics these two Topics have to originate from the same ontology.

Concept to Concept layer(C2C)

Concept to Document layer(C2D)

Document to Document layer(D2D)

Fig. 14.7 Anatomy of the associative network


14.5.4.2 Evaluation

In our evaluation we compare the search performance of this associative network

using a number of different similarity measures. All associative search approaches

employing semantic similarity, text-based similarity or both, increase retrieval

performance compared to the baseline. In addition, since there exist no standardized

test corpora for semantic retrieval we built such a test corpus based on an applica-

tion case within the APOSDLE project.

We tentatively conclude that text-based methods for associative retrieval result

in an increase in retrieval performance (Scheir et al. 2008); therefore we want to

explore the approach of attaching a set of terms to every Topic in our domain

ontology during modeling time to provide search results even for Topics that are not

used for annotation. In addition we want to extend our research towards the

application of different semantic similarity measures within our service (Stern

et al. 2010). A key research question for the future is the appropriate selection of

similarity measures for given ontologies. Therefore further experiments with dif-

ferent ontologies have to be conducted.

14.6 Summative Evaluation

As mentioned in the introduction, we see the “APOSDLE solution” as consisting of

(a) modelling the learning domain and the work processes, (b) annotating

documents and other sources of information available in the company repository,

(c) training the prospective users of the APOSDLE environment, and (d) using the

resulting domain-specific WIL environment with its varying learning guidance

functionalities at the workplace. This means that a comprehensive summative

evaluation of the APOSDLE solution requires a summative evaluation of each of

these aspects. This is even more mandatory as the aspects depend on each other. If

the domain modelling has not been done correctly, the annotation will fall short of

what is needed; if the annotation is done badly retrieval of relevant information will

be unsatisfactory; if the users are not well trained, their use of the APOSDLE

system will be sub-optimal.

Within this chapter we shortly report on results of the workplace evaluation (step

d) and briefly discuss the efforts required for model creation (step a). The modelling

of the learning domain and the work processes were conducted over a period of

2–3 months (see also Chap. 12).

The workplace evaluation assessed the third APOSDLE prototype in use at the

sites of three application organizations, EADS, ISN and CCI (for brief descriptions

see Sect. 14.3). It spanned a period of about 3 months and involved 19 persons.

EADS, CCI, and ISN reported modelling efforts between 59 and 304 person hours

for domain, task, and learning goal models ranging from 94 to 145 Topics, 13–100

Tasks, and 59–291 Learning Goals.


For the evaluation a multi-method data collection approach was followed using a

questionnaire, interviews, log data, user diaries kept while working with APOSDLE,

and site visits. This allowed for triangulation of results. An overview of the findings is

given in the following. Please refer to (Dotan et al. 2010) for details.

14.6.1 Workplace Evaluation

At the beginning of the APOSDLE project all application partners were asked to

express which goals they would like to reach by employing computational support

for WIL. The following goals were expressed by all three application partner

organizations:

1. Learning material relevant to current task

2. Aware of learning material

3. Existing knowledge improved

4. High quality learning material provided

5. Learning helped task completion

6. Learning time planned and managed

7. Experts accurately sorted by relevance

Within the exit questionnaire the participants of the three companies that

participated in the usage evaluation were asked to which extent they thought that

these goals were reached by APOSDLE. The exit questionnaire was filled out by 19

participants. Figure 14.8 shows the mean and standard deviation on a 5-point Likert

scale for each question. It shows that all goals were reached to a large extent.

Two main findings were observed:

1. APOSDLE was used most frequently by trainees and new employees who were

expected to learn and were also given the time to do so. The highly-specialized

content recommended was appreciated and end users reported they could not

have accessed it that effectively by other means. These learners used all features

of the system in various combinations to access and manage their learning

Fig. 14.8 Exit questionnaire results


process. In contrast, the work schedule of ‘regular’ and ‘expert’ knowledge

workers was to a large extent dictated by clients’ needs. This meant that work

activities were often unexpected, performed under time pressure and left little

room for learning activities. This led to sporadic and less-frequent interaction

with APOSDLE.

2. APOSDLE was especially effective within high specialized domains in which

much of the domain knowledge is documented. APOSDLE proved less effective

in broad customer-driven domains where knowledge was shared to a large extent in

person. One reason for this result probably was that in those domains documenta-

tion was not part of the established work practice and thus the system did not have

access to all knowledge. This was aggravated by the fact that users in these domains

worked in the same offices and extensively shared knowledge face-to-face; there-

fore collaboration support provided by APOSDLE was not utilized often.

During the evaluation period and across all three sites there was clear evidence

that using the APOSDLE environment improved people’s knowledge in various

ways. Assessing whether knowledge has been improved and to what extent has

been based on personal accounts (diary entries and interviews) describing enhanced

understanding of topics and tasks following different kinds of interactions with

APOSDLE. Overall APOSDLE supported the acquisition of new knowledge by the

users by making them aware of learning material, learning opportunities and by

providing different degrees of learning guidance. In EADS especially, it was

reported on numerous occasions in the user diary that explicit and implicit learning

material enabled knowledge workers to gain useful insight, improve their knowl-

edge, and complete a task they were working on.

In all application cases, users utilized the awareness building and descriptive

learning guidance (e.g. exposing knowledge structures) more often than the more

prescriptive learning guidance (e.g. triggering reflection, learning paths). Learners

extensively used the different functionalities to browse and search the knowledge

structures (descriptive learning guidance), followed the provided content

suggestions (awareness), and collected relevant learning content within

Collections. Learning Paths were used by trainees and new employees in order to

structure and plan their own learning process. Their potential as a teaching tool for

experts was not realized since experts only rarely used the system. The reflection

tool (MyExperiences, see Fig. 14.5) was employed by learners mainly to examine

the environment’s perception of their usage behaviour. To which extent this also

lead to reflection of their learning activities could not be identified.

14.7 Conclusion

One overall conclusion of the workplace evaluation is that the WIL approach has

proven effective for (a) end users in explicit learner roles (e.g. trainees) in (b)

highly-specialized domains (such as EADS’s Electromagnetism Simulation


domain) in which much of the knowledge to be learned is documented within work

documents. In those circumstances, APOSDLE delivered an effective work-

integrated learning solution that enabled relatively inexperienced knowledge

workers to efficiently improve their knowledge by utilizing the whole spectrum

of learning guidance provided.

Our results concerning improving productivity of experienced knowledge

workers are inconclusive. Further evaluations have to be conducted to determine

the effect on topic experts over time.

Secondly, our findings show that awareness building and descriptive learning

guidance effectively support learning tightly intertwined with task executions.

These ‘soft’ learning guidance mechanisms were used frequently by users of all

levels of expertiese. Moreover, people did not consider them as additional steps or

efforts but rather as inherently part of task execution. On the other hand, our

partially prescriptive support mechanisms were used nearly exclusively by novices

and new employees. This suggests that supportive measures derived from instruc-

tional theories which are focusing on longer term learning processes require a

‘learner’s mindset’ and time dedicated to learning.

Finally, we can conclude that the domain-independent WIL design environment

approach was successful. Relying on existing material instead of tailor made

learning material provided to be effective and cost efficient. Crucial for this is

having good modelling tools, experienced modellers, and high quality annotations

of snippets (see Chap. 12). In a recent instantiation within a new organization and

new application domain we were able to further reduce instantiation efforts (com-

pare above for reported efforts of EADS, ISN, and CCI) to 120 person hours for 51

Topics, 41 Tasks, and 124 Learning Goals. We believe that these numbers are quite

competitive when comparing them to efforts needed to instantiate a traditional

Learning Management System at a site and to develop custom learning material.

Acknowledgements The Know-Center is funded within the Austrian COMET Program - Com-

petence Centers for Excellent Technologies - under the auspices of the Austrian Ministry of

Transport, Innovation and Technology, the Austrian Ministry of Economics and Labor and by

the State of Styria. COMET is managed by the Austrian Research Promotion Agency FFG.

APOSDLE (www.aposdle.org) has been partially funded under grant 027023 in the IST work

programme of the European Community.

References

Baldauf M, Dustdar S, Rosenberg F (2007) A survey on context-aware systems. Int J Ad Hoc

Ubiquitous Comput 2(4):263–277

Boyle C (1994) An adaptive hypertext reading system. User Model User Adapt Interact 4(1):1–19

Brusilovsky P (2004) KnowledgeTree: A Distributed Architecture for Adaptive E-Learning. In:

Proceedings of WWW 2004, May 17–22, 2004, New York, New York, USA, 104–113

Crestani F (1997) Application of spreading activation techniques in information retrieval. Artif

Intell Rev 11:453–482


Dey AK, Abowd GD, Salber D (2001) A conceptual framework and a toolkit for supporting the

rapid prototyping of context-aware applications. Human Comput Interact 16(2):97–166

Doignon JP, Falmagne JC (1985) Spaces for the assessment of knowledge. International Journal of

Man-Machine Studies 23:175–196

Dotan A, Maiden NAM, Lichtner V, Germanovich L (2009) Designing with only four people in

mind? – A case study of using personas to redesign a work-integrated learning support system.

In: Gross T et al (eds) Proceedings of INTERACT 2009, Part II, Uppsala, Sweden, pp 497–509

Dotan A, Maiden N, Lockerbie J, de Hoog R, Leemkuil H, Ghidini C, Rospoche M, Lindstaedt SN,

Kump B, Pammer V, Faatz A, Zinnen A (2010) Summative evaluation report, deliverable

D6.12, EU project 027023 APOSDLE, City University, London, 2010

Eisenberg M, Fischer G (1994) Programmable design environments: integrating end-user pro-

gramming with domain-oriented assistance. In: Adelson B, Dumais S, Olson J (eds) CHI’94.

Conference proceedings, human factors in computing systems. ACM, New York, pp 431–437

Eraut M, Hirsh W (2007) The significance of workplace learning for individuals, groups and

organisations, SKOPE is based at Oxford and Cardiff universities

Jameson A (2003) Adaptive interfaces and agents. In: Jacko J, Sears A (eds), The human-computer

interaction handbook: Fundamentals, evolving technologies and emerging applications.

Mahwah, NJ: Erlbaum, pp 305–330

Jones S, Lindstaedt S (2008) A multi-activity distributed participatory design process for

stimulating creativity in the specification of requirements for a work-integrated learning

system. Workshop at CHI 2008, Inderscience, www.inderscience.com

Klieber W, Sabol V, Muhr M, Kern R, Ottl G, Granitzer M (2009) Knowledge discovery using the

KnowMiner framework. In: Proceedings of the IADIS’09, 2009, Inderscience, www.

inderscience.com

Kooken J, Ley T, de Hoog R (2007) How do people learn at the workplace. Investigating four

workplace learning assumptions. In: Proceedings of EC-TEL 2007, Crete, Greece, pp 158–171

Kooken J, de Hoog R, Ley T, Kump B, Lindstaedt SN (2008) Workplace learning study 2.

Deliverable D2.5, EU project 027023 APOSDLE, Know-Center, Graz, 2008

Korossy K (1997) Extending the theory of knowledge spaces: a competence-performance-

approach. Zeitschrift f€ur Psychologie 205:53–82Lave J, Wenger E (1991) Situated learning: legitimate peripheral participation. Cambridge

University Press, Cambridge

Ley T, Ulbrich A, Scheir P, Lindstaedt SN, Kump B, Albert D (2008) Modelling competencies for

supporting work-integrated learning in knowledge work. J Knowledge Manage 12(6):31–47

Ley T,KumpB,MaasA,MaidenN,Albert D (2009) Evaluating theAdaptation of a Learning System

before the Prototype Is Ready: A Paper-Based Lab Study. UMAP 2009: 331–336, Trento, Italy

Lichtner V, Kounkou A, Dotan A, Kooken J, Maiden N (2009) An online forum as a user diary for

remote workplace evaluation of a work-integrated learning system. Proceedings of CHI 2009,

2009, Boston, MA

Lindstaedt SN, Beham G, Kump B, Ley T (2009) Getting to know your user – Unobtrusive user

model maintenance within work-integrated learning environments. In: Proceedings of ECTEL

2009, Nice, France, pp 73–87

Lindstaedt SN, Ley T, Scheir P, Ulbrich A (2008) Applying scruffy methods to enable work-

integrated learning. Upgrade Eur J Inform Prof 9(3):44–50

Rath AS, Devaurs D, Lindstaedt SN (2009) UICO: an ontology-based user interaction context

model for automatic task detection on the computer desktop. In: Workshop on context,

information and ontologies, ESWC’09, 2009, Springer, Berlin and Heidelberg

Rath AS, Devaurs D, Lindstaedt SN (2010) Studying the factors influencing automatic user task

detection on the computer desktop. In: Sustaining TEL: from innovation to learning and

practice, Proceedings of EC-TEL 2010 (Lecture Notes in Computer Science), vol 6383.

Springer, Barcelona, Spain, pp 292–307

Resnik P (1999) Semantic Similarity in a Taxonomy: An Information-Based Measure and its

Application to Problems of Ambiguity in Natural Language, Volume 11, pages 95–130


Scheir P, Lindstaedt SN, Ghidini C (2008) A network model approach to retrieval in the Semantic

Web. Int J Semantic Web Inform Syst 4(4):56–84

Schugurensky D (2010) The forms of informal learning: towards a conceptualization of the field.

http://hdl.handle.net/1807/2733. Retrieved 10 July 2010

Stern H, Kaiser R, Hofmair P, Kraker P, Lindstaedt SN, Scheir P (2010) Content recommendation

in APOSDLE using the associative network. J Univ Comput Sci 16(16):2214–2231

Strang T, Linnhoff-Popien C (2004) A context modeling survey. In: Workshop on advanced

context modelling, reasoning and management, UbiComp’04, Nottingham, 2004

Wall TD, Jackson PR, Davids K (1992) Operator work design and robotics system performance: a

serendipitous field study. J Appl Psychol 77:353–362

Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn.

Morgan Kaufmann, San Francisco

Wolpers M, Najjar J, Verbert K, Duval E (2007) Actual usage: the attention metadata approach.

Educ Technol Soc 10(3):106–121


15

Evolving Metaphors for Managingand Interacting with Digital Information

Natasa Milic-Frayling and Rachel Jones

15.1 Introduction

We are in the midst of the digital revolution which permeates all the areas of human

endeavour. From the information management point of view, digital content is the

most ephemeral of all the media that we have historically used to store and transfer

information. Digital documents can be replicated easily and shared at unprece-

dented speeds. They can be aggregated, augmented, and transformed to generate

new value. Yet, digital media exhibits vulnerabilities that require special attention

by the users and designers of digital systems. Indeed, digital media depends on

a dedicated infrastructure to sustain its existence, access, and consumption.

One dominant aspect is the need to store the digital content for reuse. Many usage

scenarios rely upon persisting digital contentwhile it is being authored orwhen viewed

and organized for subsequent use. Thus, it is not surprising that the management of

content storage has been an important part of the user’s interaction with the informa-

tion systems. It has distinctly shaped the design of applications and played a significant

role in defining the information management paradigms. As the users move from the

traditional Desktop environment to Internet services we see increased reliance upon

services to secure the persistent data storage, even in the case of the user’s personal

content, e.g., Flickr (www.flickr.com) stores and organization of photo albums.

Similarly, authoring and publishing of digital information have changed with the

wide adoption of the Web. From two distinct functions they morphed to a more

unified activity, with the focus on efficiency of on-line publishing and communi-

cation. The traditional metaphors of documents and database records have been

N. Milic-Frayling (*)

Microsoft Research Ltd, 7 J J Thomson Avenue, Cambridge, United Kingdom


R. Jones

Instrata Ltd, 12 Warkworth Street, Cambridge, United Kingdom



303

complemented by hypertext and content streams. Indeed, wikis, blogs, and micro-

blogging services have emerged as self-sufficient authoring and publishing environ-

ments that produce distinctly different information formats. The wikis expand by

growing the number of interconnected wiki pages and enforce access and editing

control to coordinate collective authoring. The blogs provide a simple linear

structure and support threaded conversations, similar to those in forums and online

discussions groups.

While new authoring environments come with less flexibility in organizing and

reusing information, they are optimized for social interaction, communication, and

broad exposure. The design of services and user experience are shaped by the

continuous influx of information through content streams, ranging from email

and instant message threads to RSS feeds and dynamic Web pages (Milic-Frayling

et al. 2002; Fetterly et al. 2003; Teevan et al. 2009). That is particularly apparent

with the micro-blogs like Twitter (www.twitter.com), designed for live broadcast of

information snippets that spread through pre-defined types of social interaction

such as retweets and the follower relationship among participants.

Finally, the means of accessing information have also been affected by the

paradigm shift. Substantial information influx through email and Web services

has impact on the user practices, putting a significant strain on the manual filing

of content. Indeed, the filing and classifying paradigm has caved under the volume

of information and gave way to search and, more recently, a light-weight tagging of

the user generated content with keywords that are then exposed through tag cloudsfor content browsing (Cutrell et al. 2006; Dumais et al. 2003; Jones et al. 2001).

As we continue to embark on another computing paradigm, the data storage and

computation in the cloud, the informationmanagementmodels and practices are being

redefined once again. The storage and computational infrastructure have become

a commodity that can be “hired” as required. The content is now resident in the data

centres that can optimize data processing and facilitate integrationwith related content

repositories and services. This will have implications for the ways enterprises manage

and leverage their information assets in the new environment. At themoment,much of

the content management in the cloud is based on the previous generation software

and new applications and practices are still to evolve. How will one reconcile the

existing data stores and user practices with the emerging digital environments that

are abundant with services, data repositories, and social engagement?What should be

the principles of managing and interacting with information and data?

We explore some of these issues by reflecting upon the user experience and

practices in the early days of Internet penetration and integration with the Desktop

environment. We anchor our discussion around the findings of a user observation

study that we conducted in 2004. The study highlights the changes in processes and

practices that the users developed to interact with digital content, from authoring,

storage, and viewing to organizing and sharing information. This led to the con-

sideration of user activities as a driver for information management (Bardram

et al. 2006). That approach is particularly amenable to connecting the Desktop

with the Web environment and services in the cloud. We discuss the metaphors

of collections and compositions that allow for gathering information across

304 N. Milic-Frayling and R. Jones

distributed information repositories and support both rich representations of context

and effective access to the collected resources (Oleksik et al. 2009; Kerne et al.

2008). In essence, we take the Web hypertext model to another level by supporting

integrated content and activity spaces.

In the following sections we discuss the paradigm shifts and evolving metaphors

for information management through examples of user practices observed in two

studies. We conclude by reflecting upon future directions.

15.2 Information Management Processes

Information management systems are designed to facilitate interaction with informa-

tion resources. Their characteristics impact the efficiency with which information

workers are able to accomplish their everyday tasks. At the same time, the digital

environments in which information is created and exchanged are evolving continu-

ously. The self-contained and well defined desktop paradigm has been challenged by

the new opportunities for acquiring information through the desktop connectivity to

the Internet. Similarly, new ways of communication have emerged, ranging from

synchronous messaging via Instant Messenger and voice over IP to discussions in

community forums, blogs, and most recently social media sites (LinkedIn, Facebook,

etc.). This increased exposure to information and services makes new demands on

the user’s engagement with technology in terms of the complexity and variety of tasks

that need to be accomplished. At the same time, the applications and tools that are

supposed to provide support for their work remain disconnected and inflexible.

We shall reflect upon the emergence of these issues by discussing an early study

of the Web enabled desktop environment. Back in 2004 we observed a dynamic,

fast-paced work place and had a chance to see how the change has impacted the

workers. The task overload and the increased volume of received and produced data

were apparent. That led to frequent breakdowns in the use of existing applications.

The users continued to push the boundaries of the available facilities in order

to handle fast changing and demanding situations. Yet, they were experiencing

difficulties. We analysed the sources of their inefficiencies and reported the detailed

findings in (Milic-Frayling et al. 2006). Here we reflect on selected aspects that

are relevant to our discussion of the changing information management practices.

15.2.1 In-Situ Observations of Information Workers’ Practices

In October 2004 we gained access to a workplace at a leading international public

relations firm and conducted in-situ observations of nine employees over a period of

8 days. In selecting the participants, we made a conscious decision to improve

on previous studies that looked at individual workers in isolation. We chose a group

of co-workers whose activities were rather intricately interwoven. They were

15 Evolving Metaphors for Managing and Interacting with Digital Information 305

collocated within an open plan, with their desks put together in an elongated area.

Sitting across each other, separated only by the personal computers, they could

easily collaborate on projects and assist each other with individual tasks. In their

work they used Microsoft desktop applications, including Instant Messenger,

MS Internet Explorer, MS Excel, MS Word, MS PowerPoint, and MS Outlook.

In order to get a broader view and investigate both offline and online information

and communication activities, we applied a hybrid method described by (Jones

et al. 2007). We combined complementary methods: repeated interviews with the

participants, on-site observations, and activity logging, and analyzed these three

sources of information in concert to gain deeper insights.

The researcher who conducted the observations spent 8 days on site. She was

provided with a desk in the proximity of the participants. That enabled her to observe

the activities continuously and approach an individual or a group when she judged

important for the study. She recorded the activities unobtrusively using a portable

video camera and made a conscious effort to minimize the impact on the participants.

By installing the logging software on the individual participant’s machine, we contin-

uously captured the application level events that resulted from the user’s interaction

with the Windows computing environment. The logging software also captured

screenshots of the desktop displays every 5 min. Synchronized with the rest of the

data, this visual aid proved invaluable for viewing and interpreting the log data.

Through integrated data analysis we arrived at rich profiles of participants’

practices (Milic-Frayling et al. 2006). Here we discuss in more details the emerging

workflows, multi-tasking, and interruptions that the participants dealt with and

the rise of communication media as an important factor in shaping information

management practices.

15.2.1.1 Workflow and Task Management

Most of the participants maintained paper or electronic to-do lists to keep track of

their work plans. However, our observations revealed that their typical workflows

were sensitive to various triggers and required flexibility and re-adjusting of plans

according to the emerging circumstances.

Indeed, the participants worked in teams, on tasks with multiple overlapping

activities. They were responsible for press releases, briefing clients before media

interviews, and preparing information packs for other agencies that were dealing

with their clients. Their business required them to be informed of any media that

mentioned or affected their clients and to respond to new information and new

developments as they occur. They employed RSS news services to help them stay

abreast of the happenings. The main such service, NewsNow, delivered news via

email. They also subscribed to services like Media Disk and Factiva to access

information about local agencies and monitored alternative news sites where news

sometimes breaks out first, such as the Recorder site. Essentially, they checked their

email continuously, either for news or for information requests from agencies,

media contacts, and clients.


In Fig. 15.1 we depicted a typical taskworkflow.A task beginswith a trigger event,a request from a colleague or client, or a personal reminder that prompts about

a planned activity. The participants would first try to locate relevant information and

then continue with authoring or communication. During this process theymay consult

or collaborate with other colleagues by exchanging information electronically or face-

to-face, sitting next to a person or seeking advice by conversing across the desk.

The tasks normally required focus and concentration. Dealing with interruptions

was one of the apparent challenges and they used various methods to keep track of

multiple tasks and respond to changes. Furthermore, the tasks required multiple

desktop applications and the participants were constantly switching between them

in their work. While this was a persistent phenomenon it did exhibit highly variable

patterns, even for the same individual. This is apparent from the time log analysis in

Fig. 15.2, showing the patterns of switching between different application windows

on three different days for the same participant. We indicate explicitly the three

major categories of activities: authoring, searching and browsing, and communi-

cating via email. The dark lines and boxes in the timelines (Fig. 15.2) show the

periods that the user spent switching between application windows, without remain-

ing in any one for 5 s or more. Some of the observed patterns can be traced

to specific characteristics of the desktop user interface.

For example, the participants maintained high awareness of the applications

that were open and frequently switched between them. Still, they often confused the

documents with one another and would maximize the wrong window. The tabs on

the MS Windows taskbar were the primary means of accessing documents and

applications that were open. However, the tabs truncated the visible file information

and thus provided suboptimal support. We also observed that participants managed

email by opening each message in a separate window and deleting unimportant

ones within few seconds. Thus, it is not surprising that dark areas, corresponding to

many reviewed email messages, are adjacent to the longer intervals of communi-

cation that involved writing emails for sustained periods of time.

The analysis of user logs also showed switching between different applications

(Fig. 15.3). Tasks like preparation of press releases, interview briefings, and

Fig. 15.1 The workflow diagram shows the task trigger and the “task loop” with typical activities

that involve locating, communicating, authoring, and filing information


17:42:45 15:00:16 15:22:24

14:07:4713:09:55

13:00:3812:47:08

11:32:0111:21:54

15:03:5512:21:24

11:44:1511:41:27

09:46:4309:36:28

08:06:04 08:14:53

26-Oct

CommunicationAuthoringBrowse/SearchOtherIdle

20-Oct15-Oct

13:53:4413:40:04

13:26:1412:49:11

12:32:3211:35:37

17:24:1517:05:52

16:57:1516:44:00

Fig. 15.2 Beth’s time log for three separate days. The logged activities are identified in the key.

The black boxes indicate periods in which all applications were viewed for less than 5 s. Published

with permission from (Milic-Frayling et al. 2006)


information packs required the use ofMSWord,MSOutlook,MS Excel, IE, andMS

PowerPoint together. Since these applications are disconnected and optimized for

work on a specific type of documents, the participants experienced many problems

when trying to use them in concert and move content between them. Often they

would compose a document in MS Word and then export it to MS Outlook to find

that fonts have changed, bullet points had moved, and spacing was irregular.

Some of the participants’ practices were shaped by the customers’ demands.

For example, some customers avoided email attachments and expected information

to be included in the body of the email message. That meant either using email as

the main authoring environment or moving content from MS Word and MS Power

Point documents into the email message.

15.2.1.2 Impact of Communication on Content Storage and Organization

The predominant use of email as communication media had ripple effects not only on

authoring practices but equally so on the storage and organization of project and

customer related information (Fig. 15.3). Incoming emails were prioritized and often

time-stamped bymoving into theOutlook calendar.Once read and dealtwith, important

messages were filed and kept as accountable records, especially regarding communica-

tion with clients or journalists. The participants often filed emails into folders named

after clients. This expanded role of email resonated well with the notion of email as

a habitat by (Ducheneaut and Bellotti 2001) and observations by (Mackay 1988).

Fig. 15.3 Aggregate statistics across participants show the typical time breakdown across desktop

applications and implied task types on a typical work day


A female participant took a lead in filing and organizing information for the group.

If email arrived from someone new, she would set up a new folder. Her colleagues

often asked her for the location of information they were looking for. The team

primarily used the folder system within MS Outlook but also filed emails on the

shared network \G: drive, especially those that were needed by other team members.

Participants frequently “lost” documents and emails, and spent time looking for

them across Outlook folders, personal network drives, and the shared drive. They

searched Outlook by a sender or by using the “advanced search” feature. This was

often a frustrating experience: “Outlook searches against names and sometimes

keywords, but never by dates. It would be good if search could break out the results

by months rather than just a long list.”

In many instances, the information was embedded in received and sent emails

that were carefully archived in network folders. However, the context of an attached

document or email was lost once removed from Outlook and placed on a local or

a shared drive. There it was not possible to search on all the email attributes: “It is not

as easy to search on the H (personal network) drive as it is in Outlook. Explorer does

not recognize names, just keywords in files. No option to search by asking who did

you get it from or who sent it to you.” Indeed, our study pre-dated the desktop search

releases byGoogle andMicrosoft. Google delivered this feature in the form of a plug-in

associatedwith theWebbrowser toolbar. Previous research did show the importance of

search as a unifying access medium across application data silos (Dumais et al. 2003).

15.2.2 Changing Information Management Metaphors

Dynamic, fast-paced environments, such as the one we observed, are becoming

more commonplace (Gonzalez and Mark 2004). The workers have to accommodate

emergent activities and work reactively when needed. They are required to accom-

plish multiple tasks in parallel, despite the constant interruptions. As the study

revealed, their experience is affected by several distinct issues:

• Increased rate of acquiring and handling information leads to higher demands for

multi-tasking and micro-switching between applications

• Information silos created by desktop applications hinder the user tasks that

involve multiple applications and services

• Communication media, like email, is emerging as a unifying function that

includes and contextualizes digital artifacts but lacks appropriate access and

archiving support.

Over the years, researchers have proposed alternatives to the traditional PC

desktop metaphor and designed tools to address similar information management

issues. (Kaptelinin 2003) provides a comprehensive review of proposed models and

tools. Among them are attempts to create dedicated project spaces, such as Rooms

(Henderson and Card 1986), Task Gallery (Robertson et al. 2000), Manufaktur,

Kumira, and similar. Well aligned with our observations are recommendations to


create communication based work environments such as the Taskmaster and

ContactMap. These approaches either imposed a significant overhead onto the

workers’ time or excluded important work scenarios. Systems like Lifestreams,

Presto, MIT Semantic File System, and MIT Haystack attempted to remove the

overhead of maintaining the hierarchical organization of the file system but had

limited success. By relying on search and filtering, the users had to define formal

criteria to select relevant information and that has proven to be difficult in general.

In search of appropriate metaphors to support the user across the evolving

paradigms, from desktop and Internet to cloud and multi-device computing, we

consider two important aspects: the shift from storage management to content

organization and the benefit of activity focus management of information to capture

a rich usage context.

15.2.2.1 Content Storage and Organization

Considering the use of Web based authoring, e.g., through blogs, forums, online

wikis, and social sites like Facebook and Flickr, we observe a separation between

storing and organizing the content. The storage is tightly coupled with the Web appli-

cation and hidden from the author. Thus, the user is mainly focussed on organizing

information through the facilities provided in the user interface. Organizing content

across Web services is accomplished through the Internet browser bookmarks that

include URL references to the content. Instead of managing the digital artefacts

themselves, the users are organizing resource links that are resolved through the

Internet services in order to present the content upon the user request.

Within the desktop environment, the separation of storage and content organi-

zation can be observed in the design of applications such as MS OneNote that is

based on the notebook metaphor. The storage of individual OneNote pages and

files embedded in the pages is not transparent. Yet, the user can organize infor-

mation in a variety of ways. Storage and organization of the content are completely

enclosed within the single application interface.

We also note that user activities in forums, discussion groups, Facebook, and

Twitter tightly integrate publishing and authoring and emphasize instant exposure

of the content. This is facilitated by constraining the user interaction and content

format to a predefined organizational structure. By supplying templates that deter-

mine organization and interaction, these services redirect the focus from content

storage and organization to the information broadcast.

15.2.2.2 Application and Activity Management

User interaction with digital information comprises authoring, storing, viewing,

organizing, and sharing content through communication and publishing media.

Optimization of these functions within and across applications directly impacts

the user experience and the outcome of the user activities.


Furthermore, from the study we observed that email, as a content rich communica-

tionmedium, captured resources related to the user tasks and provided valuable context

for embedded documents. Generalizing this principle, we anticipate that mechanisms

for capturing association of resources based on user activities would be beneficial.

Refocusing information organization towards activities rather than software

applications has been proposed back in 1983 by (Bannon et al. 1983). However, that

has to be approached carefully. Resorting to the workspace metaphor and clearly

delineating individual activities may cause problems with activity switching, similar

to those observed with application switching. Furthermore, user activities may be

transient, dynamic, or not yet completely formed (Bardram et al. 2006; Kaptelinin

2003).Thus, the user may have difficulty pre-defining their activities. The same

problem was observed in the organization of applications and documents into “work-

ing spheres” suggested by (Gonzalez and Mark 2004; Mark et al. 2005).

Persisting the activity state is likely to help with interruptions that disrupt the

planned workflow and often originate from communication. Czerwinski et al.

(2000) looked at the implications of IM for interruptions in task management.

In the diary study (Czerwinski et al. 2004) analyze the requirements for tools

to aid with the recovery from interruptions and developed the GroupBar tool

(Smith et al. 2003) that enables the user to organize application windows in sets

that can be easily folded or evoked to switch from one task to another.

15.2.2.3 Unified User Experience

Considering the interconnection of authoring, storing, viewing, organizing, and

sharing digital content, one is presented with a dilemma whether to

• Support all or most of these phases in each application

• Provide a meta-layer that enables interconnection of applications and leverage

complementary functions across them.

Reflecting upon the work that has been done so far, we see several trends.

(Bellotti et al. 2003) implemented the Taskmaster, a system designed for tight

integration of email, task, and project management. They introduced thrasks—threaded task-centered collections of items and aggregated rich metadata about the

task related content. (Boardman and Sasse 2004) suggested to look at a broader

spectrum of user activities to understand the benefits of alternative design strate-

gies. They contrast the Taskmaster approach with the approach taken in (Isaacs

et al. 2002) and (Kaptelinin 2003). There the design involves a consolidated

interface that unifies interaction across the tools. In the Stuff I’ve Seen system by

(Dumais et al. 2003) this is accomplished through the unified search interface.

In UMEA (User-Monitoring Environment for Activities) by (Kaptelinin 2003)

the system provides a project centered overview of the information space with

integrated user monitoring function and user data.

With these insights, we decide to explore a hybrid approach. We designed

a unifying tagging facility for gathering activity related resources across desktop


applications and on-line services. At the same time,we extended each applicationwith

the activitymanagement features that enable task switching andmicro-switching from

the context of the current application. We essentially move towards the metaphor of

virtual collections where light-weight tagging is a mechanism for collecting relevant

content from remote or local repositories, in the context of the user work.

15.3 Supporting Activities in Dynamic Work Environments

The user’s perspective and objectives evolve over time and that is reflected in the

reuse of existing documents or creation of new ones. A collection of digital

resources used at a given time represents a stage in the user’s activity and offer

a valuable context. In order to capture these stages, we implemented a TAGtivity

prototype that enables light-weight tagging of resources relevant to the user work

including local and remote documents, email messages, Web pages and storage

locations such as folders, services, and databases (Oleksik et al. 2009). Labelling of

resources can be done at the desktop level and within individual applications to

enable tagging and micro-switching without leaving the current application context.

A similar approach has been taken by (Voida et al. 2008), with the Giornata

system, and promoted through the Placeless Documents project and the resulting

Presto system (Dourish et al. 1999). With Giornata, the user can operate in multiple

virtual desktops and separate resources associated with distinct activities. Any file

accessed within a specific desktop is automatically linked to the corresponding

desktop tag. One can also assign tags to individual files but that was designed

as a less accessible feature, through the file property settings. Presto, on the other

hand, allowed users to specify and apply attributes to individual documents and use

them to retrieve, index, and organize documents for specific tasks. While the users

could browse and further tag the resulting collections through a purposefully

designed Vista browser, the tagging facility was not closely integrated with the

desktop applications and the user’s workflow.

We have learnt from these systems and based the TAGtivity prototype on two

essential design principles:

1. A user activity is represented as a set of references and metadata rather than the

document files themselves, and

2. Document tagging is light-weight, based on the user generated labels. That

reduces the overhead of maintaining a strictly controlled vocabulary of tags

and maximizes the flexibility of use.

We deployed the TAGtivity in a natural user setting and observed the use of

tagging over time. We analyzed the emerging usage patterns and the principles of

organizing information that the users applied. The details of the study and the

results are reported in (Oleksik et al. 2009). Here we discuss the findings that are

pertinent to our reflections on the changing desktop metaphor. In particular, we

demonstrate the use of references to create logical organization of distributed


and heterogeneous resources that complement the file system organization. We

also point to the affordances of tagging that help with the issues identified in

Sect. 15.2: the management of dynamic content streams and the increased demand

for multi-tasking and attentiveness to emerging tasks.

15.3.1 TAGtivity Features

The TAGtivity prototype consists of two UI components, the TAGtivity Manager

implemented as a deskbar and the TAGtivity Toolbar associated with individual

applications. They both enable the users to assign existing tags to resources or

create a new tag as needed. The prototype is compatible with Microsoft Windows 7,

Vista and XP operating systems, and the Microsoft Office 2007 suite. In addition to

the MS office documents, one can tag a broad range of document types by using

a drag-and-drop facility to associate them with the activity.

Figure 15.4 shows the TAGtivity deskbar that includes the TAGtivity Manager

(TM), a centralized place where users can manage tags and access their activities and

resources. The TM displays a list of the user’s tags in a selected order: alphabetically,

by recency of use, or by the number of associated resources. One can access a resource

by clicking on the title. The reference to the resource file is resolved via a database that

stores the association of tag labels and corresponding files or locations. The file or

location is then opened in the corresponding software application.

The text box at the top of the list allows the user to access a specific tag or to

create a new one. By typing text into the text box, the list of tags is filtered to show

only matching tags. If the keyword is not found in the list, the user can choose to

Fig. 15.4 TAGtivity Manager comprising the list of user tags, tagged resources, and thumbnail

previews with metadata about each item


use it as a new tag. Once the tag is created or found, the user can drag and drop

a resource onto the tag to create the association.

We also designed and implemented a TAGtivity Toolbar as an extension of

the main MS Office 2007 applications: Word, Excel, PowerPoint and Outlook, and

Internet Explorer 7 (IE7) (Fig. 15.5). Within the IE7 browser, each browser tab

is handled independently. TAGtivity Toolbars are located at the bottom of each

application window.

Similarly to the TAGtivity Manager, the toolbar also provides a text box for the

user to type in keywords and find existing tags or create new ones. The user can

attach a tag to the current resource by selecting a tag from the list or by typing a new

one into the text box.

One important aspects of TAGtivity is the integration with the file system.

TAGtivity enables the user to associate files and folders with tags. The user can

simply drag and drop a folder onto the tag and confirm whether to associate

individual files with the tag or the folder as a whole. In the former case, all the

files from the folder are added to the activity list and can be accessed independently.

In the latter case, the folder location is added to the list and the user can access the

folder content through the file system hierarchy presented in theWindows Explorer.

Furthermore, the user can “export” a tag. The TAGtivity application would

create a tag folder within the File System containing the links to resources and

metadata about files associated with the tag. The metadata typically includes the

title, the author, the storage location, and a thumbnail image of the application

Fig. 15.5 TAGtivity Toolbar for Microsoft PowerPoint, showing the set of tags associated with

the document and the list of items associated with the specific tag “NodeXL Study – UMD.” The

thumbnail of a specific item is display on mouse hover over the title


displaying the file or location. If desired, the export function can create a copy of

all the associated files, thus providing the archiving support for the past activities.

15.3.2 TAGtivity User Study

We observed the use of the TAGtivity prototype by 16 participants over the period

of 3 weeks (Oleksik et al. 2009). The participants included four employees of

a small software development company, seven research interns, three full-time

research scientists, one intern with a legal department, one independent market

researcher, and one small business owner. Participants were aged between 20 and

60, 14 male and 2 female. For their participation, they were compensated with

computer software or hardware accessories at the end of the study.

During the study we conducted four interviews with each participant, first to

capture information about the existing data management practices and then to learn

about their use of TAGtivity over time. We recorded and transcribed the interviews

and analysed TAGtivity logs to study the relationship between the tags and the

projects, tasks, and activities that the users conducted during the time of the study.

This revealed that:

• Tagging extends the file system function by providing additional views or

logical organization of the content in the file system organization.

• With tags, the users can capture ephemeral information that would otherwise

be unrecorded since the users would not create a folder to hold it.

• Tagging supports activity management. It helps with collecting resources related

to a task, enables flexible switching between tasks, and allows association of the

same resource across multiple tasks.

A detailed study report is provided in (Oleksik et al. 2009). Here we outline

several aspects that are of interest to our discussion of the changing paradigms and

metaphors.

The tags were used for the management of ephemeral information, as assistance

for time saving, planning and emerging activities, and for reorganizing and repur-

posing content across tasks through alterative logical grouping of the data.

These aspects are directly relevant to the heightened demand on the respon-

siveness and multi-tasking that the users are exposed to, as observed in Sect. 15.2.

In the extended PC environment, the relatively stable organization of files and

folder has been replaced by a new metaphor, the flow of information. Diversecontent streams have penetrated the PC environment through the Internet browser,

email, and RSS feeds and require support for capturing, organizing, and revisit-

ing relevant content. Similarly, increased communication triggers new tasks and

require support for planning of activities.

From the interviews we found, for example, that tags offered the means of low-

overhead planning and time saving:


• Place holding. Eleven users created tags as place holders for future activities.

For example, one of the participants created a tag to gather interesting papers

and online links related to robotics. However, he did not add any items to the tag

until a week later. In this instance, the tag was created in view of the user’s

anticipation to find relevant resources at some future point.

• New project. Eleven users created a tag at the beginning of a new project. Unlike

place holding tags, these were created with the intention to label relevant

documents right away.

• Time saving. Ten users created a tag to mark resources that were difficult to find.

This was often the case with documents found through search and browsing.

By bookmarking the item using a tag, the users circumvented the need to engage

in the same process again.

Furthermore, the tags were used for information gathering across desktop and

online services, normally not well supported by the standard desktop facilities.

They enabled groupings of heterogeneous resources that could not have been put

together through a single application, i.e., the file browser or the Internet browser.

During the study 742 resources were tagged and 608 were accessed through the

TAGtivity user interfaces. They covered a broad range of tagged resources, includ-

ing 157 email messages, 98 Web pages, and 174 Word or PDF documents. This

confirmed the important property of the tags as a unifying mechanism for accessing

resources across data silos and capturing relevant information from email streams.

Finally, the tags provided a unique advantage over the existing, more static

and permanent filing metaphor. By convenient capture of references to relevant

content, they supported short term, transient, and emerging activities and enabled

meta-organization of resources needed in the user’s tasks.

Indeed, TAGtivity was found effective for managing short term tasks and early

stages of longer term activities. This was observed with 12 users. Tags enabled

them to collect and associate resources before a task was well formulated. The tag

names could be easily modified as the task progresses. For short term tasks,

TAGtivity helped to manage resources up to the task completion, at which point

the tag would be removed if the resources were not deemed relevant any more.

Generally, the tags were kept and left traces of transient activities that would

normally not warrant creation of a file or a bookmark folder.

TAGtivity was also used to create alternative views, i.e., logical organizations

of the content in the file system or other data stores. One of the participants, who

conducts market research for various customers, stated that the TAGtivity enabled

alternative ways to “organize my files without creating them [folders]; so it helpedme group them based on my processes and my needs.”

15.3.3 Discussion

The study of TAGtivity revealed that simple desktop tagging can provide significant

benefits. Participants often commented on the ease and low overhead of creating tags.


This perception made tagging attractive even for the most transient activities.

It enabled the creation of collections that could be easily dispersed when not further

needed. Indeed, since tags only reference the content, deleting a tag is low risk, almost

non-consequential. This fine interplay between the persistent file storage and the tags

supports a rich set of new practices and enable transient user needs and early-stage

information management.

While TAGtivity is not an activity management application in the sense of

(Smith et al. 2003), (Boardman and Sasse 2004) and (Bardram et al. 2006), it has

proven to support users in performing their everyday tasks. By enabling tagging

from within applications, it supports multi-tasking. The users can tag the document

with multiple tags and easily access related resources without changing the current

context. This also helped with interruptions since the user can capture resources of

an emerging task without shifting the focus of their work. Finally, through tagging

the users brought together and made easily accessible the content that was other-

wise buried in the application specific data stores, like email exchange, or the file

system hierarchy. The visibility of tags and tagged resources had raised the aware-

ness and served as a reminder of activities that required attention.

Through the observed practices and feedback from the users, we gained valu-

able insights about the very essence of tagging, i.e., the use of references to create

associations among digital objects. The importance of the “broken” references and

the access control became apparent. If tagging were to be adopted broadly,

it would need to be supported by extensions of the file system with a notification

mechanism for tracking changes in the file status such as deletion or movement

to another storage location. With regards to the access control, the tagged items

currently inherit it from the file management system. In the future, this can be

reversed by enabling the users to specify access properties at the tag level and have

them propagated to the file and folders of the tagged resources. Finally, sharing of

tagged resource collections and reuse of tags by multiple individuals opens up

a host of additional design considerations. For example,(Mendes Rodrigues et al.

2008) outline practices that evolve around social of user-generated content in

online communities.

In summary, activity management as a principle of organizing information and

tagging as a supporting mechanism facilitate an important shift from content

to activity management. They form important building blocks for the emerging

computing paradigm, the computational cloud with distributed data and services

and multi-device personal computing. We reflect on these in the following section.

15.4 Glimpse into the Future

As our studies have shown, through the Web enabled desktop environment indivi-

duals are accessing information and media from distributed data repositories

and services. The use of the Web has caused a significant shift in their practices,

placing more emphasis on publishing and consuming the content. For that reason,


the notions of the document location and access control have not been central to

the design of information management support on the Web. Indeed, the document

location is important only insofar as the document URL ensures a repeated access.

Thus, the use of bookmarks and search engines to revisit documents has been the

primary means of accessing and managing Web information by the user.

With regards to content authoring, Web blogs and discussion posts are facilitated

by a simplified authoring environment and a limited control over the content storage

and organization. On the other hand, a broad adoption of Web mail services shaped

the authoring experience through the features of email editors and introduced a new

storage paradigm for personal information. Indeed, services like gmail (www.gmail.

com) byGoogle (www.google.com), Yahoo!Mail byYahoo! (www.yahoo.com), and

Hotmail (www.hotmail.com) by Microsoft (www.microsoft.com) enabled a broad

user community to create substantial repositories of emails and associated content.

The observed transformation of email from the communication media into a context

rich data store (Sect. 15.2) has now reached a broad adoption, primarily shaped by the

archiving and content management capabilities of the Web mail services.

Considering the continued expansion of personal computing environments with

novel devices and new modes of accessing services, e.g., through mobile phone

applets, a unified resource management will become essential for the optimal

user experience. Providing an adequate user support will require new metaphors

to guide the design of the information management models and features.

15.4.1 Metaphor Transformations

In this section we reflect upon the new notions that are emerging and the role

they are expected to play in shaping future information systems and practices.

15.4.1.1 From Folders to Collections of Distributed Resources

Broad adoption of Web and mobile platforms have highlighted issues with the user

experience in the highly distributed, heterogeneous, and disconnected data stores.

The TAGtivity study showed the benefit of abstracting the notions of files, folders,

and bookmarks and introducing mechanisms for flexible content organizations

based on references to digital objects.

This points to collections of digital items as the fundamental element of the

data organization, represented by the metadata that refers to the data files and their

locations. Such collections arise in different contexts, assembled using various

mechanisms from hand gestures on the multi-touch devices to tagging as demon-

strated through TAGtivity.


15.4.1.2 From Lists to Context Rich Views

In most content management systems, the basic organization structure of resources

is a list, optimized for easy access through sorting and browsing. However, the lists

provide limited support for capturing and describing relationship among individual

resources in the collection or adding information about the collection itself. At the same

time, it is often important to record the meaning and the purpose of the collected

resources.

Building on the TAGtivity model, we propose descriptive representations of the

collections called resource maps (Fig. 15.6). Resource maps can be created auto-

matically from the metadata captured during user interaction with the digital items.

By importing desired content descriptors into an authoring or interactive environ-

ment, one can generate maps through established templates or allow the users to

compose them manually. Once persisted, the maps provide a rich access mecha-

nism to the collection items and become an integral part of the collection.

The notions of the collections and the corresponding resource maps are impor-

tant vehicles for creating knowledge in different contexts, from gathering and

reflecting upon content during brain-storming to sharing knowledge through

discussions and publishing.

Fig. 15.6 Creating visual representations of a distributed and heterogeneous collection of

resources by tagging the resources, exporting the metadata into an authoring application and

saving the map in a sharable format with active links to the resource files, Web pages, or services


15.4.1.3 From Documents to Composite Information Constructs

In order to support a broader range of information needs, it is often necessary to

relinquish the traditional boundaries of files and applications. The users organize

information into compositions of digital information created by combining parts

of documents rather than using collection of files. Indeed, knowledge is often

conveyed by bringing together text passages, images, and media content that,

in the new assembly, convey a meaning beyond individual parts.

In summary, we suggest that the traditional file organization, database schemas,

and hypertext forms can be further enhanced with the context-rich representations

of digital collections in terms of resource maps and content compositions. These

new representations capture the semantics and enable sharing and propagation of

knowledge. At the same time they are connected with the resources and facilitate

easy switch to detailed view of individual files. They become the essential means

for browsing and reusing information from distributed information environments

and services.

15.5 Concluding Remarks

In this chapter we reflected upon the evolution of information management

practices over the past decade and extrapolated into the next era of information

management in the computational cloud and across distributed data stores, services,

and devices. Direct observations of users revealed issues with the support for

content management, from authoring, storing, and viewing to publishing and

sharing. These have been amplified by the transformation of the PC desktop through

the connectivity to the Internet and the prolific use of Web services. Based on

the related research and our own user studies, we hypothesize about the metaphors

that are likely to shape content management in the cloud computing environment

and across multiple personal computing devices. We promote an activity based

approach in which the fundamental elements are collections of resources, including

content locations and applications relevant for the user task. We demonstrate how

such collections can be created through light-weight tagging. With low overhead

and flexibility in creating, naming, and dispersing collections, this approach

minimizes the disruption to the main user task. However, it is important to explore

further the implications of managing and organizing potentially a large number of

tagged collections and dealing with the complexities that arise from multiple

contexts in which digital objects may be used.

We here conclude with the mention of several key aspects that are likely to shape

the future of content management and require further considerations. First, as more

valuable information is stored in the digital form there is a need for its secured

persistence over long periods of time. At the same time, we are moving away

from static files towards “live” and dynamic documents. Second, a substantial


amount of information is now generated through social media and various forms of

crowdsourcing efforts. This trend will continue to shape the future content services

and information management practices.

Acknowledgments Prototypes and research studies described in this Chapter are conducted by

the Integrated Systems Group at the Microsoft Research Lab in Cambridge, UK in collaboration

with Instrata Ltd., Cambridge, UK. Special acknowledgments go to Yvonne Sanderson and Gerard

Oleksik from Instrata Ltd. for their work on the user studies. Eduarda Mendes Rodrigues

conducted quantitative analyses and, together with Gabriella Kazai and Annika Hupfeld worked

on the design of the TAGtivity features. We particularly recognize the work by Gavin Smyth who

is solely responsible for the implementation and deployment of TAGtivity software. The prototype

is publicly released as the Microsoft Research Project Colletta http://research.microsoft.com/en-

us/um/cambridge/projects/ResearchDesktop/ProjectColletta.

References

Bannon L, Cypher A, Greenspan S, Monty ML (1983) Evaluation and analysis of users’ activity

organization. In: Proceedings of CHI 1983. ACM Press, New York, pp 54–57

Bardram JE, Bunde-Pedersen J, Soegaard M (2006) Support for activity-based computing in

a personal computing operating system. In: Proceedings of CHI 2006. ACM Press,

New York, pp 211–220

Bellotti V, Ducheneaut N, Howard M, Smith I (2003) Taking email to task: the design

and evaluation of a task management centered email tool. In: Proceedings of CHI 2003.

ACM Press, New York, pp 345–352

Boardman R, Sasse MA (2004) “Stuff goes in the Computer but it doesn’t come out”: a cross-tool

study of personal information management. In: Proceedings of CHI 2004. ACM Press,


Cutrell E, Robbins DC, Dumais S, Sarin R (2006) Fast, flexible filtering with phlat – personal

search and organization made easy. In: Proceedings of CHI 2006. ACM Press, New York,

pp 261–270

Czerwinski M, Cutrell E, Horvitz E (2000) Instant messaging and interruptions: influence of task

type on performance. In: Proceedings of OZCHI 2000. ACM Press, New York, pp 356–361

Czerwinski M, Horvitz E, Wilhite S (2004) A diary study of task switching and interruptions.

In: Proceedings of CHI 2004. ACM Press, New York, pp 175–182

Dourish P, Edwards K, LaMarca A, Salisbury M (1999) Presto: an experimental architecture for

fluid interactive document space. ACM Trans Comput-Hum Inter 6(2):133–161

Ducheneaut N, Bellotti V (2001) E-mail as habitat: an exploration of embedded personal infor-

mation management. Interactions 8(5):30–38

Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins D (2003) Stuff I’ve seen: a system

for personal information retrieval and re-use. In: Proceedings of SIGIR 2003. ACM Press,


Fetterly D, Manasse M, Najork M, Wiener J (2003) A large-scale study of the evolution of

web pages. WWW 2003, 669–678

Gonzalez VM, Mark G (2004) Constant, constant multi-tasking craziness: managing multiple

working spheres. In: Proceedings of CHI 2004. ACM Press, New York, pp 113–120

Henderson DA, Card S (1986) Rooms: The use of multiple virtual workspaces to reduce space

contention in a window-based graphical user interface. ACM Transactions on Graphics 5,3,

211–243


Isaacs E, Walendowski A, Whittaker S, Schiano DJ, Kamm C (2002) The character, function,

and styles of instant messaging in the workplace. In: Proceedings of CSCW’02. ACM Press,


Jones WP, Bruce H, Dumais S (2001) Keeping found things found on the web. In: Proceedings

of CIKM 2001. ACM Press: New York, pp 119–126

Jones R, Milic-Frayling N, Rodden K, Blackwell A (2007) Contextual method for the redesign

of existing software products. Int J Hum Comp Interact 22(1–2)

Kaptelinin V (2003) UMEA: translating interaction histories into project contexts. In: Proceedings

of CHI 2003. ACM Press, New York, pp 353–360

Kerne A, Koh E, Smith SM, Webb A, Dworaczyk B (2008) CombinFormation: mixed-initiative

composition of image and text surrogates promotes information discovery. ACM Trans Inf

Syst 27(1):1–45, Article 5

Mackay WE (1988) More than Just a communication system: diversity in the use of electronic

mail. In: Proceedings of CSCW 1988. ACM Press, New York, pp 26–28

Mark G, Gonzalez VM, Harris J (2005) No task left behind? Examining the nature of fragmented

work. In: Proceedings of CHI 2005. ACM Press, New York, pp 321–330

Mendes Rodrigues E, Milic-Frayling N, Fortuna B (2008) Social tagging behaviour in community-

driven question answering. In: Proceedings of the 2008 IEEE/WIC/ACM International

Conference on Web Intelligence, WI’08, IEEE 2008, Sydney, Australia, pp 112–119

Milic-Frayling N, Sommerer R, Tucker R (2002) MS WebScout: web navigation aid and personal

web history explorer. Poster paper 170, WWW 2002

Milic-Frayling N, Jones R, Mendes Rodrigues E (2006) User study of interconnection

among communication, authoring, and information management processes. Microsoft research

technical report MSR-TR-2006-96

Oleksik G, Wilson ML, Tashman C, Mendes Rodrigues E, Kazai G, Smyth G, Milic-Frayling N,

Jones R (2009) Lightweight tagging expands information and activity management practices.

In: Proceedings of CHI 2009, ACM Press, New York, U.S., pp 279–288

Robertson G, van Dantzich M, Robbins D, Czerwinski M, Hinckley K, Risden K, Thiel D,

Gorokhovsky V (2000) The task gallery: a 3-D window manager. In: Proceedings of CHI

2000. ACM Press, New York, pp 494–501

Smith G, Baudisch P, Robertson GG, Czerwinski M, Meyers B, Robbins D, Andrews D (2003)

GroupBar: the taskBar evolved. In Proc. of the 2003 Australasian Computer-Human Confer-

ence, OzCHI 2003, S. Viller and P. Wyethh (eds), CHISIG 2003, Available on web at http://

www.ozchi.org/proceedings/2003/ozchi2003.pdf, Accessed date 15 Aug 2011, pp 34–43

Teevan J, Dumais ST, Liebling DJ, Hughes R (2009) Changing the way people view changes

on the Web. In: UIST 2009. ACM Press, New York, pp 237–246

Voida S, Mynatt ED, Edwards WK (2008) Re-framing the desktop interface around the activities

of knowledge work. In: Proceedings of UIST 2008. ACM Press, New York, pp 211–220


Part V

Conclusions

16

Conclusions


16.1 Introduction

We started this book by describing three challenges which we saw as important for

increasing the efficiency and effectiveness of knowledge work: the failure to fully

share knowledge in organisations, including knowledge about informal processes;

the problem of information overload; and the disruptive effect of continual changes

of task focus. These beliefs were confirmed by our work in the ACTIVE case

studies, described in Part III of our book. We also started our book by talking about

three technology areas which we thought were important in tackling these

problems: the synergy of the informal approach of Web2.0 and a more formal

approach to semantics based on the use of ontologies; the use of context to deliver

information related to the user’s current task; and tools to support the informal

processes which underlie how we undertake our daily work. ACTIVE’s work to

develop these technologies has been described in Part II; the work in the case

studies has also confirmed the importance of these technologies. Moreover, we

found that others shared the same intuition about the problems of knowledge work

and were investigating related technological solutions. Some of this work is

described in Part IV of our book. In this chapter we briefly review these three

technology areas, in each case looking at the likely trends for the future and

indicating some research challenges.

P. Warren (*)



J. Davies



E. Simperl

Institute AIFB, Karlsruhe Institute of Technology (KIT), Berlin, Germany



327

16.2 Web2.0 and Semantic Technologies

A prominent feature of Web 2.0 has been the popularity of wikis. Semantic wikis,

representing a synergy of Web2.0 and semantic technology, are now an important

topic. At recent conferences addressing the application of semantic technologies to

industry, the use of semantic wikis has been a major theme. This theme was taken

up in Chap. 3 which described work in ACTIVE to extend the use of the SMW.

Complementing this, Chap. 12 described both how the concept of the SMW is being

developed to make it more adapted for use in the enterprise and how the SMW is

being used for enterprise process and application modelling.

Another important aspect of Web2.0 is the use of tags to describe web-pages,

photos and all forms of information objects, thereby creating folksonomies.

Folksonomies can be contrasted with the ontologies of more formal semantics, or

the semi-formal taxonomies frequently used in content management systems.

Folksonomies are perceived as offering a lower barrier to use, in particular by

occasional users. Chapter 15 describes a tagging system and users’ reaction to it. In

ACTIVE we have supported the creation of folksonomies through the use of

machine intelligence to make tag suggestions. Another way in which Web2.0 and

more formal semantic technologies come together is through the use of algorithms

for learning taxonomic or ontological structures from folksonomies. There has been

significant work in this area. For example Heymann and Garcia-Molina (2006) have

developed an algorithm which converts a tag cloud into a hierarchical taxonomy.

The starting point is to create a tag vector for each tag, of dimensionality equal to

the number of objects, and such that the component in each dimension is the

number of times the tag has been applied to a particular object. From this, the

cosine similarity between tag vectors is used to calculate the similarity between

tags. These similarities are used by the algorithm to create a taxonomy. Hotho and

J€aschke (2010) provide an overview of the state of the art in ontology learning from

taxonomies, as of late 2010.

A related theme is that of the social semantic desktop, discussed in Chap. 13.

Here we saw how the semantic interpretation of available metadata could enrich the

user’s experience. This approach requires neither the creation of formal, ontological

metadata nor informal tags. Instead, it makes use of metadata available but nor-

mally not exploited.

In the coming years we are likely to see these synergistic approaches between

Web 2.0 and formal semantics being adopted more widely in the enterprise and also

in personal applications on the WWW. Amongst the research questions which

remain are:

• What kind of informal semantic structures, and modes of presentation, are most

natural to users? How does this vary between applications and types of users?

• How far can we improve machine learning algorithms to create ontologies from

user tagging? We have to bear in mind here that the quality of such an algorithm

is in part subjective; what matters is the user’s perception of the improved

experience made possible by the ontology

328 P. Warren et al.

• To what extent could the use of tags replace the use of conventional hierarchical

folders? Could this trend be encouraged by the automatic learning of ontological

structures from tag classes?

• To what extent does the notion of informal semantics, as implemented in the

SMW, scale to the WWW? Could web page editors be encouraged to associate

informal semantics with their hyperlinks, creating a kind of global SMW? Of

course, the rel attribute in RDFa permits the association of semantics with

hyperlinks, but this is in the world of formal semantics and requires a relatively

sophisticated user. Can we create user interfaces which encourage the creation

and use of vocabularies to describe web links and resources?

16.3 Context

We have already observed that context is a very broad concept, incorporating ideas

such as location as well as the task focus which was central in ACTIVE. During the

last decade of the previous century and the first decade of the current century there

were a series of conferences devoted to modelling and using context. The most

recent of these was held in 2007 (http://context-07.ruc.dk). A glance at the list of

papers reveals the range of topics covered. One topic included was the use of

ontologies. The joint themes of context and ontologies have been taken up in the

CIAO workshop series on Context, Information And Ontologies; see http://ftp.

informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-626/ for the proceedings

of the 2010 workshop. Outside of the research arena, the analysts Forrester

(http://www.forrester.com) have highlighted the importance of context in informa-

tion delivery. Rugullies et al. (2007) report on a survey which asked North

American IT and business professionals about the importance of context. When

asked “how important is it that content is delivered to users within the context of the

business process in which they are involved?” 60% responded that it was

‘extremely important’ and 38% thought it was ‘somewhat important’. There was

also considerable support for just-in-time eLearning. When the same people were

asked “how important is it that eLearning is available at the point in time when the

user needs it?” 32% thought it was ‘extremely important’ and 51% thought it

‘somewhat important’. This supports the importance of the work-integrated

learning approach described in Chap. 14. Rugullies et al. also commented on the

need to “provide context with a sufficient level of simplicity to make implementa-

tion practical while at the same time addressing the intent of the worker”.

In ACTIVE, work on context integrated the explicit creation and manipulation

of context by the user with the automatic creation and management of context using

machine learning techniques. Chapter 5 described the former approach, depending

on user intervention; whilst Chap. 7 described some of the algorithms used to

automatically exploit context.

In a world where people’s information needs change rapidly, as does their mode

of interacting with information, context-based information delivery will remain

16 Conclusions 329

high on the research agenda and will increasingly find its way into user applications.

In this book we have outlined two complementary approaches. In ACTIVE we have

combined a relatively simplistic approach to modelling context with the use of

sophisticated machine-learning techniques. The approach described in Chap. 14

makes use of more sophisticated ontological modeling, besides also employing

machine learning techniques. Each approachwill be appropriate for different domains.

However implemented, context is natural to the way people think and work and

we are likely to see greater use of context, both within applications, e.g., eLearning

as has been pioneered in APOSDLE; and for the general organization of informa-

tion, as has been pioneered by ACTIVE.

Two research themes stand out in their importance. On the one hand, algorithm

research needs to continue, to provide high quality results; acknowledging that this

quality is subjective and must be judged by its impact on the user experience. On

the other hand, the overall user experience needs to be improved to increase its

acceptability to the user. That means the user experience needs to be simple and

natural, and oriented towards achieving the user’s work goals. Moreover, where

machine learning is used to create recommendations, this must be done in a way

which does not intrude on the user. However good the algorithms, the

recommendations will not always be right, and false recommendations must be

easily ignored. In an extreme approach the user might not need to be aware of

context at all. Machine learning algorithms could make use of the concept

of context to make recommendations to users, without users having an awareness

of the context-related basis of those recommendations.

16.4 Informal Processes

The study of informal processes has come as much from the ethnographical

community as from the technological, e.g., (Hill et al. 2006) which was referenced

in Chap. 1 and also (Kogan and Muller 2006), both from the same issue of the IBM

Systems Journal. The latter make some observations about the kind of tools

required for what they call ‘personal work processes’. Chapter 15 also reported

on observations by Microsoft researchers of people’s working practices and the

workflows they used. Chapter 6 described the tools which ACTIVE has developed

to assist in these personal work processes.

Motivated by the role that email plays in most people’s working processes, there

has been research to extract actions from incoming emails, e.g., (Sow et al. 2006)

and (Tagg et al. 2009). Work in ACTIVE has taken a different but related direction.

The project has developed Contextify (http://babaji.ijs.si/contextify/), an Outlook

plug-in which provides contextual information about a selected email. For example,

a sidebar displays information about the sender; a graphical display allows threads

to be visualized; attachments in a particular thread can be easily identified; the

social network of those displayed in a thread can be displayed; and recipients of an

email can be suggested.


Supported by references to a number of pieces of field work, Kogan and Muller

(2006) also contend that even where we expect people to be following a formal

approach, actual practice often requires them to deviate from the prescribed pro-

cess. The ProM framework (http://prom.win.tue.nl/tools/prom/) exists to enable an

understanding of how business processes are actually executed. Several hundred

plug-ins are available to allow learning and visualization of actual process flows.

The informal processes which we have been concerned with in ACTIVE are more

varied than business processes executing on a workflow engine. Chapter 6 has in

part described ACTIVE’s response to the challenge this poses.

Processes underlie all knowledge work, and understanding informal processes is

crucial to improving the productivity of knowledge work. Hence process learning

will remain an important topic. Algorithm development needs to continue; both

algorithms to detect processes, and algorithms to understand the user’s information

needs at each stage of a process. Here the user need not be seen in isolation; an open

research question is how much can be learned from users as a group, and how much

can be learned from one user and applied to another.

Hill et al. (2006) identified the user as an integrator, frequently cutting and pasting

between applications. Our current approach to knowledge work is application-centric;

our information systems are a set of badly-communicating applications. The need

for a different approach was recognized as long ago as the 1990, e.g., Kolojejchick

et al. (1997) describe an information-centric paradigm for the user interface. The

problem of implementation is not just technical, but has to do with the nature of a

market which quite naturally offers a set of independent, often isolated applications.

Given the importance of this problem to knowledge work, it is likely we will see

innovative solutions in the coming years.

The same authors also saw the need for applications to exchange semantic

information. This is a theme taken up by the work on the social semantic desktop

described in Chap. 13.

16.5 Final Words

The information landscape has changed in the years which have elapsed since

ACTIVE, and the other initiatives described in this book, were conceived. The

volume of information has increased, and this has only served to increase the

problems we identified at the beginning of our work. The arrival of Linked Open

Data, not just as a concept but as a reality, has expanded the potential offered by

information on the internet. To achieve that potential we need even more to

understand the user’s information needs in context; to use semantics, but in a way

which is accessible; and to understand and exploit the users’ processes.

So the challenges outlined in our book remain pertinent today. We believe the

solutions we have described are pertinent also, and that in the coming years these

solutions will be exploited and further developed to contribute significantly to

increasing the effectiveness and efficiency of knowledge work.

16 Conclusions 331

References

Heymann P, Garcia-Molina H (2006) Collaborative creation of communal hierarchical taxonomies

in social tagging systems. Technical report 2006-10, Stanford University http://ilpubs.

stanford.edu:8090/775/, Accessed on 9 August 2011

Hill C, Yates R, Jones C, Kogan S (2006) Beyond predictable workflows: enhancing productivity

in artful business processes. IBM Syst J 45(4):663–682

Hotho A, J€aschke R (2010) Ontology learning from Folksonomies. Tutorial at EKAW, Lisbon

Kogan S, Muller M (2006) Ethnographic study of collaborative knowledge work. IBM Syst J

45(4):759–772

Kolojejchick J, Roth S, Lucas P (1997) Information appliances and tools: simplicity and power

tradeoffs in the visage exploration environment. IEEE Comput Graph Appl 17(4):32–41

Rugullies E, Moore C, Markham R (2007) Context is king in the new world of work. Forrester

Research Inc, Cambridge, MA, USA

Sow D, David J, Ebling M, Misra A, Bergman L (2006) Uncovering the to-dos hidden in your

in-box. IBM Syst J 45(4):739–757

Tagg R, Gandhi P, Raaj S (2009) Recognizing work priorities and tasks in incoming messages

through personal ontologies supplemented by lexical clues. ECIS 2009 Proceedings, paper 163


Index

A

Ability, 275

ACTIVE

initial challenges, 4

scientific hypotheses, 5

technologies, 4, 325

ACTIVE knowledge work space (AKWS),

6, 7, 94, 154–155, 158, 161–167, 197

enterprise workspace, 154

local workspace, 154

management portal, 155

Microsoft applications, 154

Amazon, 13, 18, 19

AOL, 21

Aperture metadata extraction

framework, 272

APOSDLE, 8, 250, 276–299, 328

Apple, Google, Microsoft, Nokia, Research

in Motion, 23

ASBRU, 251

Assertional effects, 250, 252

Associative browsing, 260

Associative network, 286, 294, 295

Attention, 22

Authoring, 310

B

Beacon, 21

Benefits, 62, 69–71, 73, 78, 84, 88

Bid proposals, 151, 155–161

Bid unit, 151, 155, 161, 162, 164–165,

168–169

Blippy.com, 21

BPMN, 245, 247, 249, 253

Business indicators, 222, 226

Business models, 13, 14, 16, 21

Business processes, 108

Business process re-engineering, 241

C

Cadence Flow Infrastructure (CFI), 197

Cadence ProjectNavigator, 197

Case study

Accenture, 7

BT, 7

Cadence Design Systems, 7

Challenges, domain of data management, 17

Civil society, 22

Classification of knowledge structures, 34

Cloud, 302

Collaboration, 215, 216, 219, 220, 222–226

Collaboration support, 118

Collaborative filtering, 18

Collaborative process development, 117–119

Collective action, 18

Collective intelligence, 17–20

Competencies, 276, 280, 287, 291–294

Computational cloud, 316

Context, 5, 6, 152, 259, 276–283, 286–294,

327–328

association, 163, 165, 168

detection, 5, 103, 154, 162, 163, 166, 167

discovery, 5, 154, 162–164, 166, 167

elicitation, 265, 267

for knowledge-sharing, 152

loss, 107

mining, 130, 133, 135–144

model, 288

ontology, 289

sensors, 289

visualizer, 116–117, 154

Context-aware applications, 24

P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5, # Springer-Verlag Berlin Heidelberg 2011

333

Context-aware service delivery, 23

Contextify, 6, 136–138, 328

Context-rich representations, 319

Context-rich views, 317–318

Context-sensitive content (information), 17,

22–23

Context-specific mobile solutions, 23

Cooperation, 278, 280, 283, 284, 287–288

Co-ordination, civil society, 22

Costs, 62–79, 81–84

estimation, 79

model, 71, 73, 80, 81

Crowd-sourcing, 19

D

Data representation, 128–129, 133–134

Descriptive learning guidance, 279, 298

Design environment, 277, 286, 299

Design Project Visualizer, 195

architecture, 196

back-end, 198–201

front-end, 202

validation, 203–209

Digital content, 310

Discovered contexts, 102

Discussion threads, 140

Domain and scope, 36

Domain of data management, challenges, 17

Dynamic data management, 23–24

Dynamic use contexts, 16, 17

E

E-bay, 18

eLearning, 327, 328

Email, 135–144, 328

evolution of, 8

threads (see Discussion threads)

time spent on, 3

Emerging activities, 314

Enterprise 2.0, 215, 220

Enterprise intelligence, 14

Enterprise knowledge processes, 12–15

Enterprise knowledge structures, 29–58

Enterprise modelling, 244

Enterprise search, 171–178

Ephemeral information, 314

Ethics, 21

Exploiting social, relational or intellectual

capital, 21

Expressivity, 33

F

Facebook, 18, 20, 21, 25

Facebook and Beacon, 21

Factbook, 152, 153

Field trials, 161–169

Flikr, 19

FOLCOM, 62, 70, 73, 76, 83, 85, 88, 89

Folksonomy, 19, 45

Formality, 34

Formal models, 34

Freebase, 52–54

G

Gnowsis, 272

Google, 18, 21

Google Buzz, 22

Google, Facebook Amazon, 25

Granularity, 33

Graphics classification, 175

H

Hidden Markov model, 197

HP/Palm, 23

Huffington post, 21

I

IBM, 328

Identity management, 140–143

Incentives, 216, 225

Informal learning, 277

Informal models, 34

Informal processes, 5, 6, 224, 328, 329.

See also Knowledge processes

articulation and sharing, 193

knowledge representation, 191, 197

mining and extraction, 191–193, 198–201

visualization and discussion, 201–203

Information-centric paradigm, 329

Information economy, 12

Information integration, 40

Information management metaphors,

308–311

Information management models, 302

Information management paradigms, 301

Information overload, 107

Information quality, 40

Innovation, 216, 217, 219, 222, 224, 226

Intellectual property rights, 21

Interruptions, 304

334 Index

K

KDE Desktop Environment, 268–272

Kiva, 22

Knowledge

base, 280, 284, 286, 292, 293

differentiation from information, 9

intensive tasks, 288, 290

work(er), 9, 108

work productivity, 275, 276

Knowledge indicating events (KIE), 291

Knowledge management, 171

environments, 13

Knowledge processes, 107–125, 189

management, 113–117

model, 198

optimisation, 120

Knowledge spheres, 122

ontology, 122–123

KnowMiner, 249

L

Learning

content, 287, 298

goals, 283, 287, 296, 299

guidance, 276, 277, 279–286, 296,

298, 299

need, 292

Light-weight tagging, 311

LiveNetLife, 197

Lock in, 20

Long tail effect, 14

Long-term user context, 276

M

Machine learning, 127–144, 175, 187

supervised learning, 128

unsupervised learning, 128

Measures, 120

Metaphors, 302

Metrics, 120

Microsoft, 328

MIRROR, 251

Mobile telephones, 23

Mobilise, 22

Modeling paradigm, 33

Models of social enterprise, 22

MoKi, 8, 244–248, 250–252

Monetising collective intelligence, 21

Multi-device personal computing, 316

Multilingualism, 226

Multimedia, 226

Multi-relational clustering, 129

Multi-tasking, 304

N

Named graphs, 262

Napster, 13, 19

Native resources, 263

Native structures, 263

NEPOMUK, 8, 255–272

NEPOMUK Annotation Ontology

(NAO), 260

NEPOMUK architecture, 265–268

NEPOMUK Graph-Metadata

Ontology, 260

NEPOMUK Information Element

Ontologies (NIE), 260, 263–264

NEPOMUK Representational Language

(NRL), 260, 261, 263

Netflix, 18

Network economy, 12–15

Network effects, 13, 20

Network forms of organising, 11, 12

O

OceanTeacher Encyclopedia, 239

Offline working, 153–154

OntoBroker, 236

ONTOCOM, 62, 64, 70, 73–75, 77, 79–81,

83, 88, 89

Ontology, 61–89

questionnaire, 250, 252

OntoStudio, 242

Open innovation, 19

Open Semantic Collaboration Architecture

Foundation (OSCAF), 272

Open source, 19

Organizing, 310

Orphaned concepts, 249

OWL, 245, 249, 252, 253

ontology, 286

P

Participatory design, 276, 278–280

PC environment, 314

Peer-to-peer content sharing, 19

Peer-to-peer lending, 22

Personal information management, 289

Personal Information Model (PIMO),

256, 260, 263–265, 267

Personalized clusters, 179, 185

Index 335

Personalized query, 281

Personal SEmantic Workbench (PSEW),

268–270, 272

Personas approach, 279

Positive returns, 18

Potential market, 219–226

Prediction, 131

Prescriptive learning guidance, 279, 298

Primitive events, 93

Privacy, 121–123, 225, 226

Process mining, 130, 132–135,

176, 177

Process recording, 152, 153

Project knowledge navigation, 194–196

ProM framework, 329

PSI suite of ontologies, 198

PSI Upper-Level Ontology, 198

Pure information businesses, 16

Q

Quality issues, 54

R

RDF. See Resource DescriptionFramework (RDF)

RDFa, 327

Recommendations, 276, 280–287,

291, 293

services, 276, 286, 288

Redaction, 179, 180

Refactoring and optimization, 119–121

Refactoring tool, 121

References, 316

Reflect, 283, 284

Repairing knowledge structures, 54–57

Representation language, 36–37

Requisite corporate competences, 14

Resource Description Framework (RDF),

245, 249

Resource maps, 318

Richness and reach, 15

RSS, 156, 158, 160, 162

S

Search engine, 171–173, 175,

176, 184

Security, 121–123

policies, 122

Semantic data integration, 236, 241–243

Semantic desktop, 289

Semantic Email, 268

Semantic forms, 160, 232, 247

Semantic Media Wiki (SMW), 4, 6–8, 38,

116, 156, 158, 162, 168–169, 182,

197, 245–247, 250, 326

extensions, 158

Semantic-similarity, 294

Semantic toolbar, 231

Semi-supervised clustering,

129–130

Shareability, 35–36

Sharing, 310

Short-term context, 276

Size, 33

Smart phones, 23

SMW+, 8

deployment framework, 237

ontology browser, 234

query interface, 233

semantic tree view, 233

WikiTags, 235

WYSIWYG editor, 231, 232

SMW Ontology Editor, 41

Social and relational capital of networks,

20–22

Social intelligence, 13, 16

Social lending, 22

Socially intelligent targeting mechanism, 20

Social marketing, 20

Social media, 16, 18

Social networking sites, 20

Social networks, 12, 16, 20–22,

137, 139

Social selling, 21

Social Semantic Desktop (SSD), 8, 255–272,

326, 329

Social Semantic Desktop ontologies stack,

260, 261

Social web, 23

Socio-technical systems, 278

Software Usability Measurement Inventory

(SUMI), 164, 165

Spreading activation, 294, 295

Storing, 310

Street View, 22

Summative workplace evaluation, 277

Syndication, 24

T

Tags(tagging), 152, 154, 155, 162, 163, 165,

166, 168, 169, 326, 327

TAGtivity deskbar, 312

336 Index

TAGtivity manager (TM), 312

TAGtivity prototype, 311

TAGtivity toolbar, 313

Task, 314

completion ability, 276

detection, 276, 288, 290

pane, 114

recording, 115

service, 115

switching, 107

wizard, 115

Text-based similarity, 294, 295

Third sector, 22

TNT (text, network, time), 133

Trade-off, 33–38

Transient activities, 315

Triple store connector (TSC), 230

Trust, 21

Twitter, 19

U

Use-context information, 17

User experience, 310–311

User profile, 280, 283, 284, 286,

288, 291–293

User study, 314–315

User tests, 161–169

V

Validation, 183

Value propositions, 22

Viewing, 310

Viral marketing, 20

Virtual collections, 311

Visualization, 141, 176–178

W

Web 2.0, 4, 5, 8, 326

Web 2.0 capabilities, 15–17

Wikipedia, 18, 19

Work environments, 311–315

Workflows, 304

Work-Integrated Learning (WIL),

8, 275–299, 327

Workplace learning, 277–279

Z

Zopa, 22

Index 337