Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Context and Semantics for Knowledge Management
Paul Warren l John Davies l Elena Simperl
Editors
Context andSemantics forKnowledgeManagement
Technologies for Personal Productivity
EditorsPaul WarrenEurescom GmbHWieblinger Weg 19/469123 [email protected]
Dr. John DaviesBritish Telecommunications plc.Orion G/11Ipswich, IP5 3REAdastral ParkUnited [email protected]
Dr. Elena SimperlKarlsruhe Institute of TechnologyInstitute AIFBEnglerstr. 1176128 [email protected]
ACM Codes: H3, H.4, I.2, J.1
ISBN 978-3-642-19509-9 e-ISBN 978-3-642-19510-5DOI 10.1007/978-3-642-19510-5Springer Heidelberg Dordrecht London New York
Library of Congress Control Number: 2011937697
# Springer-Verlag Berlin Heidelberg 2011This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violationsare liable to prosecution under the German Copyright Law.The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,even in the absence of a specific statement, that such names are exempt from the relevant protectivelaws and regulations and therefore free for general use.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
The Web and information technology have become part of our daily lives and an
integral part of work. In a short period of time, the way we access and use
information has undergone a fundamental change. This is not only due to the fact
that technology has enabled us to create new ways of storage and retrieval, and
novel forms of content, but it is also related to the increasing amount of information
now generated on a constant basis.
Knowledge and information form part of the biggest assets of enterprises and
organizations. However, efficiently managing, maintaining, accessing, and reusing
this intangible asset is difficult. The fact that much of corporate knowledge only
resides in employees’ heads seriously hampers reuse and conservation. This prob-
lem is not only evident on an organization-wide scale but also for the individual
user: knowing where information can be found and which data is relevant for a
certain workflow or context is typically a human-driven task where computers
provide only limited computational support. In an age where practically every
industry is becoming increasingly information based, the problem of information
finding, interpreting, and combining is omnipresent for knowledge workers.
While a human user can interpret and combine information from different
sources, integrate data using heterogeneous formats, or extract essential knowledge
from distributed chunks of information, a machine cannot easily handle such a
complex task. On the other hand, however, the human user is limited in terms of
computational speed. Consequently, both capabilities must be combined and
knowledge management systems must allow as much automation as possible to
support users and make use of human input where needed.
The Semantic Web and semantic technology address these computational chal-
lenges and aim to facilitate more intelligent search and smoother data integration.
With the recent success of Linked Data the technology has taken a more data-
centric and lightweight approach to semantics. Individual pieces of data are often of
little value, while the combination and integration of many create a new asset. Still,
a human contribution is required in several areas and this contribution can be
encouraged by providing incentive mechanisms: either through time saving or
other forms of rewards that are made visible to the user. The evolution of the
v
Web to a Web of people, Web 2.0, brought many examples that demonstrate the
power of such motivation mechanisms. This socio-technical combination integrates
computational power with human intelligence in order to improve and speed up
knowledge work and to create increased knowledge-based value.
The ACTIVE project acknowledged the challenge of today’s knowledge work-
ers with a pragmatic approach, integrating semantic technology, the notion of
context, the Web 2.0 paradigm, and supporting informal processes. The selection
of technologies and the objectives of the project were driven by the fact that
enterprises can only partially conserve and reuse their own knowledge. The out-
comes of the project are tools and methods that substantially improve the situation
for knowledge workers in their daily tasks and increase individual and collaborative
productivity. Validated in case studies in large organizations, ACTIVE technology
has proven to significantly improve the way users interact with and use information.
Common problems of knowledge work could be alleviated by a powerful combi-
nation of machine and human intelligence. The results of the project will have an
impact on individual and collaborative knowledge worker productivity and on the
capture, reuse, sharing, and preservation of knowledge in organizations.
Innsbruck Prof. Dieter Fensel
vi Foreword
Contents
Part I Addressing the Challenges of Knowledge Work
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Paul Warren, John Davies, and Elena Simperl
2 Web 2.0 and Network Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Yasmin Merali and Zinat Bennett
Part II ACTIVE Technologies and Methodologies
3 Enterprise Knowledge Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Basil Ell, Elena Simperl, Stephan Wolger, Benedikt Kampgen,
Simon Hangl, Denny Vrandecic, and Katharina Siorpaes
4 Using Cost-Benefit Information in Ontology
Engineering Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Tobias Burger, Elena Simperl, Stephan Wolger, and Simon Hangl
5 Managing and Understanding Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Igor Dolinsek, Marko Grobelnik, and Dunja Mladenic
6 Managing, Sharing and Optimising Informal
Knowledge Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Jose-Manuel Gomez-Perez, Carlos Ruiz, and Frank Dengler
7 Machine Learning Techniques for Understanding Context
and Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Marko Grobelnik, Dunja Mladenic, Gregor Leban, and Tadej Stajner
vii
Part III Applying and Validating the ACTIVE Technologies
8 Increasing Productivity in the Customer-Facing
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Ian Thurlow, John Davies, Jia-Yan Gu, Tom Bosser,
Elke-Maria Melchior, and Paul Warren
9 Machine Learning and Lightweight Semantics
to Improve Enterprise Search and Knowledge
Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Rayid Ghani, Divna Djordjevic, and Chad Cumby
10 Increasing Predictability and Sharing Tacit Knowledge
in Electronic Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Vadim Ermolayev, Frank Dengler, Carolina Fortuna,
Tadej Stajner, Tom Bosser, and Elke-Maria Melchior
Part IV Complementary Activities
11 Some Market Trends for Knowledge Management
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Jesus Contreras
12 Applications of Semantic Wikis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Michael Erdmann, Daniel Hansch, Viktoria Pammer,
Marco Rospocher, Chiara Ghidini, Stefanie Lindstaedt,
and Luciano Serafini
13 The NEPOMUK Semantic Desktop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Ansgar Bernardi, Gunnar Aastrand Grimnes, Tudor Groza,
and Simon Scerri
14 Context-Aware Recommendation for Work-Integrated
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Stefanie N. Lindstaedt, Barbara Kump, and Andreas Rath
15 Evolving Metaphors for Managing and Interacting
with Digital Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Natasa Milic-Frayling and Rachel Jones
viii Contents
Part V Conclusions
16 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Paul Warren, John Davies, and Elena Simperl
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
Contents ix
Contributors
Zinat Bennett Independent Consultant, 19 Foxes Way, Warwick, CV34 6AX,
Ansgar Bernardi German Research Center for Artificial Intelligence (DFKI)
GmbH, Postfach 2080, Kaiserslautern, D-67608, Germany, [email protected]
Tom Bosser kea-pro, Tal, Spiringen CH-6464, Switzerland, [email protected]
Tobias Burger Capgemini Carl-Wery-Str. 42, Munich D-81739, Germany,
Jesus Contreras iSOCO, Intelligent Software Components, S.A, Avenida
Del Partenon, 16-18, Madrid 1� 7a 28042, Spain, [email protected]
Chad Cumby Accenture Technology Labs, Rue des Cretes, Sophia Antipolis,
France, [email protected]
John Davies British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE,
Adastral Park, United Kingdom, [email protected]
Frank Dengler Karlsruhe Institute of Technology, Englerstr. 11, Building 11.40,
Karlsruhe 76131, Germany, [email protected]
Divna Djordjevic Accenture Technology Labs, Rue des Cretes, Sophia Antipolis,
France, [email protected]
Igor Dolinsek ComTrade d.o.o, Litijska 51, Ljubljana 1000, Slovenia, igor.dolinsek@
comtrade.com
Basil Ell Karlsruhe Institute of Technology, KIT-Campus Sud, Karlsruhe
D-76128, Germany, [email protected]
xi
Michael Erdmann Ontoprise GmbH, An der RaumFabrik 29, Karlsruhe 76227,
Germany, [email protected]
Vadim Ermolayev Zaporozhye National University, 66 Zhukovskogo st, Zapor-
ozhye 69600, Ukraine, [email protected]
Carolina Fortuna Jozef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia,
Rayid Ghani Accenture Technology Labs, Rue des Cretes, Sophia Antipolis,
France, [email protected]
Chiara Ghidini Fondazione Bruno Kessler, Via Sommarive 18, Povo, I-38122
Trento, Italy, [email protected]
Jose-Manuel Gomez-Perez iSOCO, Intelligent Software Components, S.A.,
Avenida Del Partenon, 16-18, Madrid, 1� 7a 28042, Spain, [email protected]
Gunnar Grimnes German Research Center for Artificial Intelligence (DFKI)
Gmb, Postfach 2080, Kaiserslautern D-67608, Germany, [email protected]
Marko Grobelnik Artificial Intelligence Laboratory, Jozef Stefan Institute,
Jamova 39, SI-1000, Ljubljana, Slovenia, [email protected]
Tudor Groza DERI & The University of Queensland, School of ITEE, The
University of Queensland Level 7, General Purposes South Building (#78), Staff-
house Road, St. Lucia Campus QLD 4072, Australia, [email protected]
Jia-Yan Gu British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE,
Adastral Park, United Kingdom, [email protected]
Simon Hangl STI Innsbruck, University of Innsbruck, Technikerstraße 21a, 6020,
Innsbruck Austria, [email protected]
Daniel Hansch ontoprise GmbH, An der RaumFabrik 29, Karlsruhe 76227,
Germany, [email protected]
Rachel Jones Instrata Ltd, 12 Warkworth Street, Cambridge, United Kingdom,
Benedikt Kampgen Karlsruhe Institute of Technology, KIT-Campus Sud,
Karlsruhe D-76128, Germany, [email protected]
xii Contributors
Barbara Kump Knowledge Management Institute, TU Graz, Inffeldgasse 21A,
Graz, A-8010, Austria, [email protected]
Gregor Leban Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova
39, Ljubljana, SI-1000, Slovenia, [email protected]
Stefanie Lindstaedt Know-Center and Knowledge Management Institute TU
Graz, Inffeldgasse 21A, Graz A-8010, Austria, [email protected]
Elke-Maria Melchior Kea-pro GmbH, Tal, CH-6464 Spiringen, Switzerland,
Yasmin Merali Warwick Business School, Warwick University, Coventry, CV4
7AL, UK, [email protected]
Natasa Milic-Frayling Microsoft Research Ltd, 7 J J Thomson Avenue,
Cambridge, United Kingdom, [email protected]
Dunja Mladenic Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova
39, Ljubljana SI-1000, Slovenia, [email protected]
Viktoria Pammer Know-Center and Knowledge Management Institute TU Graz,
Inffeldgasse 21A, Graz A-8010, Austria, [email protected]
Andreas Rath Know-Center, GmbH, Inffeldgasse 21A, Graz A-8010, Austria,
Marco Rospocher Fondazione Bruno Kessler, Via Sommarive 18, Povo, I-38122
Trento, Italy, [email protected]
Carlos Ruiz iSOCO, Intelligent Software Components, S.A, Avenida Del Partenon,
16-18, Madrid, 1� 7a 28042, Spain, [email protected]
Simon Scerri DERI, National University of Ireland, Galway, Lower Dangan,
Galway, Ireland, [email protected]
Luciano Serafini Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento
I-38122, Italy, [email protected]
Elena Simperl Karlsruhe Institute of Technology, KIT-Campus Sud, Karlsruhe
D-76128, Germany, [email protected]
Katharina Siorpaes STI Innsbruck, University of Innsbruck, Technikerstraße
21a, Innsbruck 6020, Austria, [email protected]
Contributors xiii
Tadej Stajner Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova
39, SI-1000 Ljubljana, Slovenia, [email protected]
Ian Thurlow British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE,
Adastral Park, United Kingdom, [email protected]
Denny Vrandecic Karlsruhe Institute of Technology / Wikimedia Deutschland e.V,
KIT-Campus Sud, Karlsruhe D-76128, Germany, [email protected]
Paul Warren Eurescom GmbH, Wieblinger Web 19/4, Heidelberg D-69123,
Germany, [email protected]
Stephan Wolger STI Innsbruck, University of Innsbruck, Technikerstraße 21a,
Innsbruck 6020, Austria, [email protected]
xiv Contributors
Part I
Addressing the Challengesof Knowledge Work
1
Introduction
Paul Warren, John Davies, and Elena Simperl
1.1 Motivation for Our Book
Using and interacting with information is an important part of wealth creation in the
modern world. For many of us, much of our working time is spent at computer
screens and keyboards, and at other information devices. A report by The Radicati
Group (2009) indicates that the average corporate worker spends a quarter of his or
her time on email-related activities alone. Despite the importance of this activity,
we all know that this interaction can be both inefficient and ineffective.
The purpose of this book is to describe how a set of technologies can be used to
improve the personal and group productivity of those interacting with information.
Much of the material for our book comes from the ACTIVE project, which was
motivated precisely by the need to improve personal and group productivity.
ACTIVE was a European collaborative project which ran from March 2008 until
February 2011; an overview of ACTIVE can be found in Simperl et al. (2010).
However, this book is not just about ACTIVE. We also discuss how similar and
related technologies are being developed and used elsewhere, and we include
contributions from other projects using these technologies to respond to the same
challenges.
P. Warren (*)
Eurescom GmbH, Wieblinger Weg 19/4, D-69123 Heidelberg, Germany
e-mail: [email protected]
J. Davies
British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE, Adastral Park, United Kingdom
e-mail: [email protected]
E. Simperl
Institute AIFB, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
e-mail: [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_1, # Springer-Verlag Berlin Heidelberg 2011
3
The people who use our systems undertake what, following the management
scientist Peter Drucker, has come to be known as knowledge work. Drucker (1999)
identified the increased productivity of manual work as a major distinguishing
feature of successful organizations in the twentieth century and saw increased
productivity of knowledge work as a similarly distinguishing feature of
organizations in the twenty-first century. Our concern in this book is with technology
to realize that increased productivity.
1.2 The ACTIVE Project
The ACTIVE project began with the insight that three factors were exerting a
significant impact on the efficiency and effectiveness of how we interact with
information.
Firstly, despite all the technological and organisational efforts which have been
applied to improve the sharing and use of knowledge, organisations still fail to
make full use of their own knowledge. At the same time that technologies are being
developed to address this, so the problem is becoming more complex through the
increasing volume of information available to be shared, through the global nature
of many organisations, and through modern working practices such as distributed
teams and home-working.
Secondly, we all of us face an information overload which makes it difficult to
find what is relevant to our current task. It is not just that there is too much
information; it is that at any given time we only need a part of what is available
to us and the sheer volume makes it hard to locate that part.
Thirdly, we most of us find the focus of our work changing, sometimes from
minute to minute, as we face continual interruptions. These may be in the form of
emails, instant message, phone calls, electronic reminders, or even the nagging
internal voice which reminds us that there is something we need to do urgently.
Each time we switch our task focus we need a different set of information. The
overhead of finding and re-finding that information inhibits our productivity.
These were the challenges we faced in ACTIVE. In response we saw three
technological approaches as being of particular importance: the combination of
Web 2.0 and semantic technology; the use of context; and the use of informal
processes. We do not see these technologies as stand-alone, but as interrelating to
respond to the challenges of personal and group productivity.
The synergy of Web 2.0 and semantic technology is viewed as valuable both to
share knowledge and to retrieve information. In the last decade, the development of
knowledge representation languages based on ontologies, of the corresponding
ontology editors, and of reasoners which can make inferences over these languages,
has provided powerful ways of finding and manipulating knowledge. There is a
perception, though, that the creation and maintenance of ontologies carries a
significant overhead. The requirement is to use these technologies but in a more
accessible way, i.e. in a way more like that of Web 2.0 applications. In ACTIVE we
have two responses to this; through the Semantic MediaWiki (SMW) and through
4 P. Warren et al.
the use of tagging. The SMW offers users the ability to create and share structured
knowledge alongside shared text. In addition, ACTIVE’s enhanced approach to
tagging helps the user to create tags and to use those tags for information retrieval.
The use of context is crucial both to locating the information relevant to us at any
given time and to reducing the overhead of switching tasks. In fact, the word contextmeans different things in different communities. To some it means characteristics
such as location, or the kind of device a person is using. Our interest is in taskcontext. By this we mean a grouping of information objects required by a user for a
particular task. These information objects could be documents, spreadsheets,
emails, images, or even people, e.g., represented by contact details. The point is
that they form a grouping which enables a user, or group of users, to better perform
their work. Viewed from the system, a context is used by an agent to define the
current working focus and determine working priorities (Ermolayev et al. 2010).
For an overview of how context is used in ACTIVE, see Warren et al. (2010).
We use the term informal processes to describe the procedures which we all of us
create and use to undertake our work. We differentiate between business processes,
which are created by the organisation and the informal processes which we create
ourselves. The problem with the latter is that they are frequently not well described.
Because of this they are not shared and hence reinvented many times in an
organisation. Moreover, because they are not shared, they are not subject to the
peer-review which leads to improvement. Hill et al. (2006) call these artful businessprocesses, because “there is an art to their execution that would be extremely
difficult, if not impossible, to codify in an enterprise application”. ACTIVE
has developed tools to make it easier to create, share, view and edit such processes.
We hope that these tools will help ordinary employees, rather than business process
designers, to create and share their own processes.
From the scientific standpoint, in ACTIVE we were testing three hypotheses:
• That the use of lightweight ontologies and tagging offersmeasurable benefits to the
management of corporate knowledge without offering significant user barriers.
• That the use of context helps users cope with information overload, and mitigate
the effects of continual switching of task focus; and that users are further aided
by the deployment of machine learning techniques to discover contexts and
detected changes of context, based on the user’s behaviour.
• That the productivity of knowledge work would be improved by providing tools
to create, view and edit informal processes; that the use of machine learning to
learn processes from users’ behaviour would further support knowledge work;
and that machine learning could also be used to suggest information resources,
based on process and context information.
1.3 The Structure of This Book
Our book is organised into five parts. The next chapter concludes Part I by looking
at the opportunities and challenges faced by organisations as they move towards
exploiting Web 2.0 capabilities. A challenge for knowledge management is the
1 Introduction 5
integration and exploitation of internal and external intelligence. The chapter
describes strategies for exploiting Web 2.0 intelligence.
Part II then looks at the technologies, and also some methodologies, developed
in ACTIVE. Part III describes how these technologies have been exploited and
evaluated in the ACTIVE case studies. Part IV starts with a chapter describing the
principal market trends in the areas addressed by our technologies, and then
includes a number of chapters describing related work in other projects. Finally,
Part V draws some conclusions and indicates some further areas for research.
Parts II to V are all briefly reviewed in the remainder of this chapter.
1.3.1 The Technologies
The first chapter of Part II, Chap. 3, is concerned with the development, mainte-
nance and use of knowledge structures within organisations. Such knowledge
structures are currently very diverse, ranging from database tables to files in
proprietary formats used by scripts. The chapter describes the use of the Semantic
MediaWiki (SMW) to provide an environment for enterprise knowledge structures.
Chapter 4 continues the theme of enterprise knowledge structures. It presents
models to estimate the costs and benefits associated with the development of
ontologies and related knowledge structures, and of the applications using them.
The chapter provides guidelines to assist project managers in using these models
throughout the ontology life cycle.
In our next chapter, Chap. 5, we look at another of the key technologies
underpinning ACTIVE, that of context. The chapter explains how the concept of
context is realised in the ACTIVE Knowledge Workspace (AKWS). It describes the
top-down and bottom-up perspectives of context. In the former, the users create
context, set their current context, and associate information objects with contexts.
In the latter, ACTIVE’s machine intelligence software is used to discover contexts,
based on the users’ behaviour; to detect a user’s current context; and to associate
information objects with context. In ACTIVE, these two perspectives are merged
into a single end-user experience.
Chapter 6 looks at the third of our ACTIVE technologies: informal processes.
The chapter describes tools to support how these processes can be captured, shared
and optimised. It also describes tools to provide the security mechanisms to allow
knowledge to be shared safely.
The final chapter in Part II, Chap. 7, looks at a fundamental technology which
supports ACTIVE’s use of context and process. As already noted, the application of
machine learning techniques to support the use of context and process information
is one of the project’s principal research challenges. The chapter also includes a
discussion of Contextify, an outlook plug-in which uses the technology to support
the management of email.
6 P. Warren et al.
1.3.2 Applying and Evaluating the Technologies
The central premise of the ACTIVE project is that the use of the three approaches
described above, based upon lightweight ontologies and folksonomies, context-
based information delivery, and informal processes, will significantly increase the
efficiency and effectiveness of knowledge work. In Part III we explain how this has
been put to the test. We describe how we have applied these technologies in three
case studies, and how in turn we have used these case studies to evaluate our
technologies, at the user level as well as the technical level.
Chapter 8 describes a case-study with customer-facing people in BT. They
confront the three challenges we discussed earlier: the need to share information;
to combat information overload; and to mitigate the effect of continual
interruptions. The chapter describes how ACTIVE tools, based both on the
AKWS and the SMW, are being used to respond to these challenges; a particular
application is the creation of customer proposals.
A case study in Accenture, a global consultancy with over 200,000 employees, is
described in Chap. 9. The chapter focuses on two enterprise problems: enterprise
search and collaborative document development. It describes how ACTIVE tech-
nology has been used to make generic knowledge management tools context and
task sensitive. The confidentiality of documents is often a bar to their being shared,
even where non-confidential aspects could be exploited across an organisation. The
chapter also describes the use of machine learning technology to automatically
redact documents to enable their being shared.
The third ACTIVE case study, described in Chap. 10, is with Cadence Design
Systems in the electronic design sector. Electronic designers do not follow
predefined workflows but use their tacit knowledge to navigate their own informal
processes. In this case study, ACTIVE’s machine intelligence technology is used to
learn and visualize these informal processes. In this way, bottlenecks can be
identified, and processes can be optimized and shared. A project navigation meta-
phor is used, whereby an electronics designer is helped to find a productive
execution path through the state space of an engineering design project.
All these case studies have been conducted in large organisations. However,
these technologies are also applicable to those working in smaller organisations, or
even those working alone. Similarly, the applications described here are accessed
via a conventional personal computer, but equally the approaches developed could
be used with other devices, including the mobile devices on which an increasing
volume of information interaction takes place.
1.3.3 Complementary Activities
Part IV looks beyond the ACTIVE project to see how others are using related
approaches to achieve the same or similar goals. The first chapter, Chap. 11, looks
1 Introduction 7
particularly at the Web 2.0 marketplace, at what products are available, and at
customer needs and perceptions. The chapter looks at some of the challenges being
faced by those developing tools to boost knowledge worker productivity and
identifies some market trends.
Chapter 12 complements Chap. 3 by describing a range of applications of the
Semantic MediaWiki. The chapter is divided into two sections. Section 12.1
presents SMW+, a product for developing semantic enterprise applications, and
describes applications in content management, project management, and semantic
data integration. Section 12.2 presents MoKi, a semantic wiki for modeling enter-
prise processes and application domains. Example applications of MoKi include
modeling tasks and topics for work-integrated learning, collaboratively building an
ontology and modeling clinical protocols.
The concept of the social semantic desktop is described in Chap. 13, and in
particular the work of the NEPOMUK project. Based on semantic web technology,
the NEPOMUK Social Semantic Desktop allows access to information across
various applications within a knowledge worker’s personal computer. It facilitates
the interconnection, management, and ontology-based conceptual annotation of
information items.
Chapter 14 describes the use of semantic technologies, machine intelligence and
heuristics to enable learning support during the execution of work tasks, a paradigm
known as work-integrated learning. The work described here, which formed part of
the APOSDLE project, includes the automatic detection of a user’s work task, and
context-aware recommendation both of relevant content and relevant colleagues.
The work described also includes the inference of a user’s competences based on
her past activities.
The final chapter of this Part, Chap. 15, relates the shifts in the metaphors we use
to manage and relate to digital content. The chapter notes, for example, the
evolution of email from a communication channel to a rich authoring environment
and the increased rate at which knowledge workers are exposed to information
through content streams. These changes present challenges to the segmented and
application bound information storage on our PCs. An approach is presented which
de-emphasizes storage management and focuses on support for user activities, at
the same time generalizing the notions of folders and files.
1.3.4 Concluding Words
In our final chapter we review the chief themes of our book, in particular our three
technology themes. For each theme we outline some remaining challenges. We also
make some predictions about how these technologies will develop and be used. Yet
that is as much for you, the readers, to determine as for the authors of this book. Our
hope in writing the book is that you will be as excited by the technologies’
possibilities as we are, and help to take them forward.
8 P. Warren et al.
1.4 Knowledge, Knowledge Work and Knowledge Workers
A final comment about language may be of value. The terms knowledge, informa-tion and data will be used in the course of the book. A widely quoted articulation of
the difference between the first two of these is due to Ackoff (1989). Ackoff sees
information as useful data, providing answers to the ‘who’, ‘what’, ‘where’, and
‘when’ questions. Knowledge, on the other hand, enables the application of infor-
mation; it answers the ‘how’ questions. This is a useful differentiation. However,
our approach is pragmatic and in general we will use the word which seems most
natural in the particular circumstances.
We have already noted that the term knowledge work was introduced by
Drucker. He also introduced the term knowledge worker for someone who spends
much of his or her time undertaking knowledge work. It is important not to be elitist
about these terms. Knowledge work is not restricted to those with professional
training. For us, knowledge workers are those who spend much of their working
time interacting with information at a computer or similar device. It is for all such
people that our technology is intended.
Acknowledgement Much of the work reported in this book has received funding from the
European Union’s Seventh Framework Programme (FP7/2007–2013) under grant agreement
IST-2007-215040. Further information on ACTIVE is available at http://www.active-project.eu.
References
Ackoff RL (1989) From data to wisdom. J Appl Syst Anal 16:3–9
Drucker P (1999) Knowledge-worker productivity: the biggest challenger. Calif Manage Rev
41:79–94
Ermolayev V, Ruiz C, Tilly M, Jentzsch E, Gomez-Perez J, Matzke W (2010) A context model for
knowledge workers. Proceedings of the second workshop on context, information and
ontologies, 2010. http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-626/ last
accessed 8 Aug 2011
Hill C, Yates R, Jones C, Kogan S (2006) Beyond predictable workflows: enhancing productivity
in artful business processes. IBM Syst J 45(4):663–682
Simperl E, Thurlow I, Warren P, Dengler F, Davies J, Grobelnik M, Mladenic D, Gomez-Perez J,
Ruiz Moreno C (2010) Overcoming information overload in the enterprise: the ACTIVE
approach. IEEE Internet Comput 14(6):39–46
The Radicati Group (2009) Email Statistics Report, 2009–2013, Palo Alto, CA
Warren P, Gomez-Perez J, Ruiz C, Davies J, Thurlow I, Dolinsek I (2010) Context as a tool for
organizing and sharing knowledge Proceedings of the second workshop on context, informa-
tion and ontologies, 2010. http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-
626/ last accessed 8 Aug 2011
1 Introduction 9
2
Web 2.0 and Network Intelligence
Yasmin Merali and Zinat Bennett
2.1 Introduction
ICTs have been implicated as both the drivers and enablers of the “Information
Society/Economy”, “Network Society/Economy”, and concurrently, in the emer-
gence of Knowledge Management as a distinctive management practice. The
communication and information capabilities and processes enabled by the Internet
and associated technologies are integral to the realisation of the network society and
the network economy (Castells 1996). At the most fundamental level the techno-
logical developments have the potential to increase:
• Connectivity (of people, applications and devices),
• Capacity for distributed storage and processing of data,
• Reach and range of information transmission, and
• Rate (speed and volume) of information transmission.
The exploitation of these capabilities has given rise to the emergence of network
forms of organising as processes, information and expertise are shared across
organisational and national boundaries. Increased global connectivity and speed
of communication have contracted the spatio-temporal separation of world events:
informational changes in one locality can very quickly be transmitted globally,
influencing social, political and economic decisions in geographically remote
places (Merali 2004). In the managerial discourse (Merali 2004; Evans and Wurster
2000; Axelrod and Cohen 1999; Shapiro and Varian 1999), these changes, seen
Y. Merali (*)
Warwick Business School, Warwick University, Coventry CV4 7AL, UK
e-mail: [email protected]
Z. Bennett
19 Foxes Way, Warwick CV34 6AX, UK
e-mail: [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_2, # Springer-Verlag Berlin Heidelberg 2011
11
as the harbingers of a “new” economy (or “information economy”), were
characterised by
• The critical role of information and knowledge in competition,
• Increased dynamism, uncertainty and discontinuity in the competitive context,
• Pressures for fast decision making in the absence of complete information, and
• The importance of learning and innovation to afford requisite flexibility
and adaptability for survival.
Consequently, whilst most of the 1990s were characterised by management
engagement with Business Process Reengineering (BPR) and the development of
infrastructures and opportunities for e-commerce and e-business, the end of that
decade and the 2000’s saw the shift to a concern with knowledge management,
innovation and business intelligence. Some of us believed that internet-based ICTs
would lead to a qualitative change in management theory and practice:
The “network” form of organising is a signature of the Internet enabled transformation of
economics and society. We find that strategy and managerial discourse are shifting from
focusing solely on the firm as a unit of organisation to networks of firms, from
considerations of industry-specific value systems to considerations of networks of value
systems, and from the concept of discrete industry structures to the concept of ecologies.
In the domain of information systems, the focus on discrete applications development
has become imbued with issues of flexibility, connectivity and compatibility with other
systems. Driven by the business need for intra- and inter-organisational integration of
information processes, we have moved from concentrating on applications development
to engaging with issues of information architectures.
The Internet is implicated as both an enabler and a driver of this interconnected world.
At a more general level, there is an escalation of interest in the idea that information
technology networks and social networks self-organise into a constellation of networks of
networks (Watts 1999, 2003; Barabasi 2002). This is analogous to conceptualising the
interconnected world as a kind of global distributed information system comprising
networks of networks.
Merali 2004, p. 408
Web 2.0 technologies are a gateway to realising the potential of “global
distributed information system” for business and society. However they also pose
significant challenges for firms with regard to the development of appropriate
enterprise architectures and strategies for dealing with the emergent networked
competitive terrain. This chapter provides a perspective on the implications of Web
2.0 developments for enterprise knowledge processes in this context.
2.2 Enterprise Knowledge Processes and the Network Economy
Knowledge Management initiatives in the 1990s were characterised by their focus
on developing information infrastructures (Knowledge Management Environ-
ments, or KMEs) that would enable cross-boundary processes and collaborative
work to be executed efficiently, and their attempts to codify, organise and extract
value from information assets. Table 2.1 provides an illustration of the kinds
12 Y. Merali and Z. Bennett
of capabilities afforded by ICTs that were at the heart of most knowledge manage-
ment strategies in that era.
Experience with these ICT-enabled knowledge management environments
(KMEs) clearly demonstrated that whilst technology solutions could provide access
to information, the value of the information could only be leveraged if it could be
applied effectively by those participating in the information-based interactions.
Consequently the strategy literature on the Information Economy in the 1990s
focused on new business models that were enabled by the Internet with an increased
emphasis on customer relationship management, business intelligence and
innovation. Much of the corporate development in the 1990s and the early 2000s
was concerned with providing integrated platforms and architectures for seamless
delivery of service in conventional value chains, or more ambitiously, in value webs
that were well-defined in scope and scale. Grid computing was feted for enabling
models of provision that could accommodate demand irregularities and spikes in
capacity requirements for computing power, but the potential for ubiquitous con-
nectivity offered by the internet and www remained under-realised by most
organisations. The mid-to-late 1990s brought us pioneers like Napster and Amazon
who used the network capabilities of the web not just for delivering services but for
harnessing social intelligence to enhance the value and utility of their provision.
Looking at developments over the past 5 years, it is clear that realising the full
potential of connectivity in the network economy entails a step change in the scale
of business networks and in the richness, speed and reach of communications across
these networks. This leads to a more complex competitive terrain as there are a
number of different ways that organisations can exploit network effects with
different business models. For example, whilst faster diffusion of information and
ideas may lead to speedier imitation and short-lived first mover advantages for
innovators, it is also possible for smart innovators to exploit network effects to
rapidly establish a dominant footprint in the market space, making it difficult for
imitators to usurp them.
Table 2.1 Capabilities and functionality associated with ICT-based KMEs
Capability Functionality
Discovery and creation • Search
• Filter
• Alert
• Analysis and Synthesis (content analysis, pattern
matching, discovery, “reasoning”)
Sharing • Organisation (classification and clustering)
• Creation of summaries and abstracts
• Display
• Dissemination (e.g. profile-based targeting)
Utilisation and development • Collaborative work space
• Work flow
• Communication and interaction support (wikis,
blogs, social computing environments, etc.)
• Simulation environments
• Learning environments
2 Web 2.0 and Network Intelligence 13
Similarly, whilst the scale of the internet and the extended reach it affords may
enable large corporations to occupy dominant positions in the global market, it also
enables many more niche players to survive comfortably. The increasing scale and
scope of web-based enterprise intelligence is predicated on requisite corporate
competences associated with the management and analysis of very large databases
and scalable architectures, capable of supporting web-based transactions at any
scale. In particular, it has spawned an ecology of business models supporting a
diversity of business size and scope.
The long tail effect1 (Anderson 2006) is a popularly cited characteristic of the
Web 2.0 landscape (see Fig. 2.1). Even if one is catering for only a relativelyminiscule niche market, the global reach of the internet makes it possible to attract
the critical mass of customers necessary for the viable delivery of a service or
product offering (i.e. a small share of a very large market is still large in terms of
absolute numbers). Whilst Anderson pointed out the impact of global reach on
enabling businesses to survive with small numbers of high-value sales, the web has
also spawned a population of business models based on small (low-value)
transactions enabled by the evolution of efficient micro-payment systems lowering
the transaction costs for small purchases. Thus global reach and socio-economic
developments enable many diverse niche providers (ranging from those engaging in
high volume, low value transactions, to those engaging in high value, low volume
transactions) to co-exist on the competitive landscape.
Fig. 2.1 The long tail (modified from (http://www.longtail.com/about.html last accessed 21/02/12))
1 The long tail refers to the statistical property that a larger share of the population rests within thetail of a probability distribution than observed under a “normal” (symmetric about the mean) or
Gaussian distribution.
14 Y. Merali and Z. Bennett
The increase in power and impact of capabilities afforded by successive
generations of ICT over the decades has been accompanied by a change in the
scope and scale of organisational knowledge processes. Whilst the knowledge
management agenda for the 1990s was generally focused on improving knowledge
processes within organisational boundaries, organisations today are engaging with
exploiting the intelligence of markets and civil society for open innovation and
crowd sourcing. Web 2.0 capabilities and the possibilities of exploiting user-
generated content extend the scope of KMEs beyond organisational boundaries,
supporting employees engaging in novel interactions with external players in
dynamic contexts to make decisions “on the hoof”. The next sections outline the
impact of Web 2.0 and some of the challenges that this raises.
2.3 Web 2.0 Capabilities and Knowledge Work
Although Web 2.0 has been in the public lexicon for several years, its definition is a
“work in progress”. In terms of the Network Economy we can think of Web 2.0 as
the enabler for realising the networkness of networks (Merali 2011). O’Reilly’s
(2007) statement provides a good working definition and captures the features of
Web 2.0 capabilities and use that are the most significant for the current discussion:
Web 2.0 is the network as platform, spanning all connected devices; Web 2.0 applications
are those that make the most of the intrinsic advantages of that platform: delivering
software as a continually-updated service that gets better the more people use it, consuming
and remixing data from multiple sources, including individual users, while providing their
own data and services in a form that allows remixing by others, creating network effects
through an “architecture of participation”, and going beyond the page metaphor of Web 1.0
to deliver rich user experiences.
O’Reilly 2007
Industry observers like Gartner (Valdes et al. 2010) also point to Web ubiquity
and utility of web browser combined with
• Better performing (bandwidth, speed, price) networks and mobile devices,
• Provision of services on the cloud, and
• Social computing and access to the social web;
as the enablers of the transition from what O’Reilly labels as “going beyond the
page metaphor of Web 1.0” to business models for the future.
Mobile and pervasive devices and social computing add the quality of dynamiccontext specificity to the “richness and reach”2 of Internet communications. The
2 In the early 1990s Evans and Wurster’s Blown to Bits underlined the idea that compared to
traditional business models for reaching customers, where there had to be a trade-off between the
richness and reach of communications (e.g. one-on-one interaction with a salesperson versus a
broadcast advertisement in a national paper with a wide circulation) the Internet made it possible to
have both richness and reach, as it was possible to have customised communications for different
2 Web 2.0 and Network Intelligence 15
promise of cloud computing is very closely associated with the Web 2.0 vision
because it confers two essential capabilities: the possibility of accessing and
managing distributed resources (data, applications, computing power), and the
flexibility of the utility model (offering platform as a service, infrastructure as a
service, software as a service).
Whilst there has been an explosion of new businesses spawned by the emergence
of Web 2.0 capabilities, it is perhaps not surprising to note that the most cited
incumbent organisations making the transition smoothly to “Web 2.0” practices
were those who
• Integrated access to information from large numbers of different sources to
serve dynamic needs (data management competence), and/or
• Provided space for social/economic interactions between large numbers of
people who were not previously connected.
This applies to both pure information businesses (such as Google and Facebook)
and those whose services are implicated in the sale or exchange of material objects
(such as Amazon and e-bay). They were all orchestrators and hosts of transaction
spaces, and used network effects to amass a large following very rapidly. Most did
not own the information or material objects involved in the transactions, and they had
a variety of business models, but all provided ease of access to network resourcescombined with an attractive value proposition for non-paying and paying participants
(very often advertisers). It is salutary that whilst their rapid establishment of a critical
mass in the marketspace made it difficult for new entrants to usurp them immediately,
none have ceased to innovate and expand, often acquiring and absorbing innovative
start-ups to enhance their portfolio of capabilities.
Whilst firms are adopting social media and using social networks, blogs, wikis
and tweets to enhance communications and information sharing within the tradi-
tional paradigm of knowledge management, there is an important change in the way
that firms need to perceive enterprise intelligence in the Web 2.0 era. The
challenges for enterprise intelligence in the “pre-Web 2.0 businesses intelligence
era” were concerned with managing internal data, understanding its value and
leveraging it effectively for competitive positioning, whilst those of the Web 2.0
era are concerned with integration and exploitation of internal and external intelli-
gence for competition and collaboration. The boundary between “business intelli-
gence” and “social intelligence” is no longer a clear one, and this leads to some
interesting tensions in emerging business models. Integrating data from large
numbers of sources and realising the potential for global reach to harness business
and social intelligence poses new challenges for process and data management in
dynamic use contexts.
audiences spread across vast geographical areas. Hal Varian’s Information Rules developed the
idea of developing value propositions based on the versioning of information for different
audiences. These ideas were influential in shaping e-business models, but were largely used in
the context of the traditional linear value chain.
16 Y. Merali and Z. Bennett
For process design and implementation, the central challenge is that of develop-
ing scalable, agile architectures for secure inter-operability and integration across
diverse organisational boundaries, enabling mashups to deliver context-sensitive
information in real time. The promise of cloud computing for dealing with issues of
scalabilty and demand-based delivery is important in this aspect, and Gartner
(Valdes et al. 2010) predict that content driven architectures building on the
interaction and partitioning styles pioneered by Service Oriented Architectures
and Event Driven Architectures will enter mainstream adoption in the next
5 years. On the software development side, the large scale development and
maintenance for much of the software is often characterised by the Open Source
ethos of collective development and continuous improvement. Dynamic program-
ming languages, syndication and reuse of software all contribute to the speed and
diversity of context-specific assembly that is demanded by users on the hoof.
Use-context information may be provided explicitly by devices or sensors
(e.g. for spatial location) or it may have to be inferred from user queries or dialogue.
This means that application design faces the challenge of having to cater for different
media and device characteristics. However the emerging consensus is that the
greatest challenges are in the domain of data management. There are conceptual
challenges in the organisation and semantic articulation of data originating from
diverse and varied sources and destined for use in undefined future contexts. There
are technical challenges for real-time contextual matching: e.g. anticipating what
data is valuable at any instant in dynamic use-contexts, and folding this in with
served applications. The semantic web literature highlights the challenges of
organisation and seamless integration of data and metadata, but most interestingly
from the perspective of knowledge management are the challenges associated with
the incorporation of social intelligence in corporate business models.
The traditional user-centred and customer relationship management approaches
are compatible with the exploitation ofWeb 2.0 capabilities to improve the customer
experience or to tailor services and products to meet specific customer requirements.
Similarly the use of social computing and the deployment of wikis, blogs and tweets
may be viewed as enhanced communication capabilities to reinforce the knowledge
management practices inherited from the KMheyday of the 1990s. Distinctive in the
Web 2.0 vision is the notion of leveraging the collective intelligence of customers
and civil society, and the diversity of scale and scope for interactions with potentialcustomers and external sources of intelligence.
2.4 Leveraging Collective Intelligence
The business models of the most cited examples of Web 2.0 success stories are all
predicated on leveraging collective intelligence as a resource by doing some or all
of the following:
• Hosting and orchestrating the emergence of collective intelligence,
• Harnessing collective intelligence to
2 Web 2.0 and Network Intelligence 17
– Improve product/service offering, and
– Develop product/service offering
• Using social media and networks to market product/service offering, and
• Using social media, networks and collective intelligence to mobilise collective
action.
Examples exist for both informational goods and services, and for those involv-
ing physical processes and material objects. Wikipedia, the online encyclopaedia is
probably the most frequently cited example of a site that hosts and orchestrates the
emergence of collective intellectual property – it provides the space where anybody
can add to, comment on, or correct existing entries to constantly develop the scope
and quality of the stored material.
E-bay, Amazon and Netflix are interesting stalwarts from the e-business era
that embodyWeb 2.0 characteristics. E-bay provides a trusted transaction space and
facilitates interactions between prospective buyers and sellers, and in doing so it
amasses social capital and has access to social networks enhancing its reputational
standing. It analyses the usage patterns and bidding behaviours of its users to refine
and manage its product categories and their flows. Amazon combines information
from suppliers and customer reviews with an analysis of patterns of customer
behaviours to provide an enhanced purchasing experience. Its considerable foot-
print in the book trade enabled it to have a head start in the e-book market along
with its Kindle e-book reader. Like Amazon, Netflix (a film loaning business) uses
collaborative filtering to make recommendations to users based on what other users
like, and, according to the Economist, nearly two-thirds of the film selections by
Netflix’s customer come from referrals made by computer (Economist 2010a).
Social networking sites like Facebook that host and orchestrate social interactions
are attractive to advertisers wanting to tap into the potential of social networks to
act as recommendation channels.
The key feature of these success stories is that in retrospect they all have a pattern
in which they build a large following using their service/product, coupled with
a generative business model in which the more a product or service is used, the
more valuable it becomes. This phenomenon of positive returns (Arthur 1990) where
the adoption of a product generates more demand for it, was associated with network
effects in the 1990s. This dynamic was used to explain why VHS overshadowed
Betamax in video formatting, or why there was a tipping point effect that established
the winners in the diffusion of competing communications technologies.
In the case of Web 2.0, in addition to raising the popularity of the offering
and generating the non-linear diffusion effects for market penetration of products,
use of the product or service generates data and intelligence which is harnessed to
enhance the quality of the product or service. Google’s PageRank3, and query
3 In Google’s terms, PageRank “. . .reflects our view of the importance of web pages by consider-
ing more than 500 million variables and 2 billion terms. Pages that we believe are important
pages receive a higher PageRank and are more likely to appear at the top of the search
18 Y. Merali and Z. Bennett
recognition algorithms and Amazon’s recommendation systems are based on data
about user actions and behaviours. Google’s spell checker uses an analysis of user-generated corrections of mis-spelt queries to hone the appropriateness of its
responses and to generate the “Did you mean. . .?” service for its query recognition,whilst its page-ranking algorithm presents the most “valuable” items first, based on
its analysis of the way in which different kinds of users access and spend time on the
various web-pages.
The potential of the web to generate innovative community-based models of
production was recognised in the pre-Web 2.0 era. The Open Source community’s
model of software development and peer-to-peer content sharing sites like Napster
were seen as disruptive innovations, threatening to displace the dominant
incumbents in the software and music industry. In the business community there
was a great deal of interest in open innovation, with customers and users
contributing to the design of products (Chesbrough and Vanhaverbeke 2006).
These movements were harbingers of the Web 2.0 era which ramped up the
potential scale of involvement from the level of the community to the level of the
entire society, and which extended the scope of their involvement by enabling
a range of different modes of engagement.
Crowd-sourcing4 – getting the users to carry out work or provide content that
enhances the value of a product – is a typical model for social engagement.
Wikipedia, the online encyclopaedia is often cited as an example of crowd-sourc-
ing, as anybody can contribute to its collective, dynamic, universal repository of
knowledge, and entries are open to modification and refinement through an open
process. Flikr, in addition to hosting user-contributed images, uses the intellect of
its users to structure and organise its massive image store in way that enables image
retrieval through multiple and diverse associations – a “folksonomy” emerges as
users tag and classify the images using their own keywords. A more commercial
exploitation is Twitter’s crowd-sourcing of its language translation tool: users
provide translations, giving their time and text for free, whist Twitter owns the
rights to the translations and their reuse to provide apps in different languages for
different devices, thus increasing its global footprint. The idea of crowd-sourcing
has its variants in finance: in contrast to the traditional route of venture capitalists
with deep pockets backing start-ups, there is a small but growing number of crowd-
financed business ventures where a number of small investors collectively back
a venture in return for equity.
results. . .PageRank also considers the importance of each page that casts a vote, as votes from
some pages are considered to have greater value, thus giving the linked page greater value. We
have always taken a pragmatic approach to help improve search quality and create useful products,
and our technology uses the collective intelligence of the web to determine a page’s importance.”4 Howe (2006) coined the term “crowd-sourcing” for “the act of taking a job traditionally
performed by a designated agent (usually an employee) and outsourcing it to an undefined,
generally large group of people in the form of an open call.”
2 Web 2.0 and Network Intelligence 19
The examples above illustrate the way in which Web 2.0 enables
• The harnessing of intelligence and cognitive capabilities of non-corporate agents
(customers, users or enthusiasts) to do “knowledge work”, and
• The collection and leveraging of metadata to improve the business value
proposition.
However the current excitement about Web 2.0 is very strongly linked with the
developments in social media and the exploitation of social networks for a variety
of economic and social goods. The focus on intellectual capital of the pre-Web 2.0
era is now combined with a quest to monetise the social and relational capital
of networks.
2.5 Social Networks: Leveraging Social and Relational Capital
2.5.1 Socially Intelligent Targeting
Social networking sites like Facebook host and facilitate the development of
extensive and dense networks of “friends”. The sharing of personal information
and experiences between peers, and the degree to which users indulge in self-
disclosure on such sites makes “the network” a repository of social and relational
capital. The ease with which users can deploy a diversity of social media to extend
and organise their personal networks serves both to accelerate growth of the site’s
footprint due to network effects5, and to lock in users who will have ended up with
a large chunk of their relational capital invested in that site.
The attraction of such sites for advertising lies in the fact that they combine the
potential of large networks for amplifying the reach of communications with an
inbuilt, socially intelligent targeting mechanism: the network of friends serves to
select and promote particular brands or products and services. Compared to
Amazon’s recommendation system this has a powerful additional social driver:
recommendations from a “friend” with knowledge of a user’s taste, life style and
aspirations will be couched in terms that are meaningful within the context of the
recipient’s self-concept, and have the assurance of “personally known” trusted
sources. The use of viral marketing to exploit network effects was already well-
established, but the increased potency of social media combined with the capacity
of social networking sites to act as “hubs” for targeting marketing messages has led
to the emergence of a much more sophisticated and personalised form of social
marketing.
5 The power of this effect is illustrated by Facebook’s reported trajectory of growth from 50 M
active users in October 2007 to 500 M in July 2010 (http://www.facebook.com/press/info.php?
timeline accessed 21/02/11).
20 Y. Merali and Z. Bennett
A recent review inWired magazine (Rowen and Cheshire 2011) gives examples
of ways in which companies are engineering the social use of blogs and tweets to
mobilise social selling, with models that incorporate a range of different socially-
based mechanisms. Many of these are based on combining conventional promo-
tional techniques (such as coupons and product placement) with detailed individual
disclosures of purchasing choices on social media. There are diverse sources from
which to garner intelligence about individual purchasing behaviours and trends,
ranging from dedicated websites like Blippy.com (which openly publishes all its
members’ credit card transactions along with their product reviews), to spontaneous
postings on personal blogs. However, whilst there is significant investment by
venture capitalists and media companies in social commerce start-ups, mainstream
businesses have yet to develop the competencies and strategies to exploit them
(Econsultancy 2010).
2.5.2 Tensions
Business models predicated on monetising collective intelligence or exploiting the
social, relational or intellectual capital embodied by social networks are subject
to an inherent tension unless the terms of engagement are clearly defined up-front.
The tension is connected with three inter-related issues:
Intellectual property rights: There is likely to be a resistance (social and/or legal)to the exploitation of common intellectual property and information for private
profit. A recent example of this is AOL’s $315M acquisition of the Huffington Post,
a news and content aggregation site: there was sensational media coverage of
Huffington Post’s contributors’ fears “that the site’s distinctiveness would be
blunted by its new corporate parent” and their sense of betrayal (Economist
2011). A landmark legal case was the class action lawsuit challenging Facebook’s
Beacon utility which exposed members’ purchasing history to their contacts: this
resulted in Facebook closing down Beacon and agreeing to a settlement fund of
$9.5 M. In the case of crowd sourcing, corporates are concerned about the escala-
tion of associated costs: the Economist reports that some firms are realising that
crowdsourcing can be more expensive than doing things themselves, with the most
significant costs being associated with checking the provenance of contributions
and whether or not they infringe copyright.
Trust: Pecuniary rewards for reviewing and recommending products and
services may undermine trust in the impartiality of the recommenders and diminish
the credibility and influence of recommendation. Similarly reported abuse or
unauthorised exposure of personal data (as in the case of Facebook and Beacon)
is likely to incur a loss of trust and significant reputational damage.
Ethics: The use to which user-contributed content is put may give rise to ethical
issues. For example in December 2009, Facebook was challenged for changing the
default settings on its privacy controls so that individuals’ personal information
would be shared with “everyone” rather than selected friends. Google also attracted
2 Web 2.0 and Network Intelligence 21
legal scrutiny when it was found to be capturing Wi-Fi data without permission: bits
of sensitive private data from 30 countries were collected and stored for years,
without the Street View leaders’ knowledge (Economist 2010b), and in April 2010,
ten privacy and data-protection commissioners from countries including Canada,
Germany and Britain demanded changes in Google Buzz, the social-networking
service, which was dipping into users’ Gmail accounts to find “followers” for them
without clearly explaining what it was doing.
The examples cited here illustrate the absence of precedent for many of the legal
and ethical issues that arise as firms start to exploit Web 2.0 capabilities in powerful
ways. Whilst Google and Facebook responded to criticisms by modifying their
practice, at the industry level we can interpret these episodes as examples of testing
the boundaries, and expect to see more infringements and challenges in the coming
years. For firms wanting to develop social media strategies, these types of issues are
important to consider as making repairs for transgressions may be costly.
Whilst the mainstream focus has been on the deployment of social media by
corporates to improve their value propositions and competitive positioning, it is
important to note that the division between commerce and civil society is not so
clear-cut. Blogs, wikis tweets and social networks can be used to mobilise and
co-ordinate civil society to act in ways that have an economic or political impact,
e.g. by boycotting goods and services or launching denial of service attacks on
corporate websites, or mobilising civil action to bring about constitutional change
as evidenced by the events of February 2011 in Egypt. In the third sector,
social networks and social media have been used to generate new and sustainable
models of social enterprise, as evidenced by the success of peer-to-peer lending
organisations like Kiva and Zopa (http://www.kiva.org, http://www.zopa.com, last
accessed 21/01/11), who mediate social lending for micro-financing.
2.6 Time and Space Matters
The richness and reach of communications in the internet-enabled world combined
with the proliferation of social networking sites, blogs and tweets exposes well-
connected individuals to multiple, possibly simultaneous “feeds” of information on
a variety of topics from a diversity of sources. Businesses wanting to use social
media for promoting value propositions are therefore confronted with the challenge
of getting their potential clients to attend preferentially to their promotional
messages.
2.6.1 Context Sensitive Content
One potential solution to the problem of capturing attention lies in exploiting social
networks to send targeted messages as discussed in the last section: the extent to
22 Y. Merali and Z. Bennett
which the recipient values the opinion of the sender may serve to prioritise the
speed and seriousness with which the message is attended to. However, to do this
effectively the advertiser needs to have detailed knowledge not only of who is
connected to whom, but also of the nature of the relationships embodied in those
connections. Whilst there are companies specialising in mining social network data
to extract or infer this type of knowledge, it is far from being a perfected practice.
Another, more accessible, approach is to provide contextually-relevant informa-
tion – at the right time and the right place for the message to be particularly relevant
and the product or service to be particularly useful or desirable. The ubiquity of
mobile telephones6 makes them the device of choice for delivering context-specific
mobile solutions. Smart phones are increasingly equipped with accelerometers
(sensing the motion of the telephone), Global Positioning System (GPS) receivers,
Internet connectivity and multi-media capabilities, providing the means to collect
and disseminate context-specific information in real-time using a mix of sources
and modes of communication (e.g. voice, video, text, and graphics) to cater for
a diversity of contexts and devices. Gartner (Valdes et al. 2010) suggest that as the
mobile sector becomes more fragmented with the appearance of many competing
platforms (from Apple, Google, Microsoft, Nokia, Research in Motion and HP/
Palm), developers are turning to cross-platform Web applications that will use
HTML5 and mobile-enabled browser engines that support GPS, tilt, proximity
and other sensors.
Context-aware service delivery opens up the possibility of enhancing the user
experience by connecting the cyber-experience with the web of places and things.
Combining this access with access to the social web would deliver the benefits of
social endorsement complemented by the immediacy of the communication: both
in terms of real-time delivery and in terms of relevance to an individual’s current
needs at a specific time and place.
2.6.2 Dynamic Data Management
The focus on real-time data about behaviours distinguishes Web 2.0 strategies from
those of the earlier e-business era. With regard to enterprise intelligence, current
analysis suggests that effective exploitation of Web 2.0 capabilities entails both
access to very large volumes of data (about user behaviours, personal networks,
spatial orientation and geographic location) and the capacity to mine this data for
meaningful patterns.
Access to the requisite data is likely to be via a number of different channels,
ranging from individual personal blogs and their network of followers to
6 “The 23/04/09 issue of Nature (Kwok 2009) reported that the GSM Association, a mobile
communications industry trade group, announced in February that the number of mobile-phone
connections worldwide had hit 4 billion and was expected to reach 6 billion by 2013.”
2 Web 2.0 and Network Intelligence 23
commercial information brokers. Promotional efforts are also likely to entail syn-
dication of the focal enterprise’s own content with content from social networks
and that from providers of other complementary products and services. Thus
an important functionality of the enterprise web will be to syndicate and push
content and logic to other sites, and to aggregate in-coming data from other sites,
deploying cloud/web platforms, composite applications, and mashup approaches
and technologies.
Providing seamless delivery of context-aware applications requires agile and
scalable enterprise architectures and entails integrating diverse technologies and
defining standards for data formats, semantics and application interfaces. The key
challenges are associated with getting integrated platforms and architectures at the
back end to provide seamless delivery of products and services, eliciting an
enhanced user experience at the front end. This demands both competence in real
time data analysis and process management (process synchronisation; harmoni-
sation of interaction channels, real-time analysis of evolving interaction patterns,
etc.). Gartner (Valdes et al. 2010) identify context brokers, state monitors, sensors,
analytic engines and cloud-based transaction processing engines as the requisite
relevant technologies, along with standards for defining data formats, metadata
schemas, interaction and discovery protocols, programming interfaces, and other
formalities to enable context-aware applications to be widely adopted. They also
suggest that the pull for context-aware data architectures will come from packaged-
application and software vendors expanding to integrate communication and
collaboration capabilities, unified communications vendors and mobile device
manufacturers, Web megavendors (e.g. Google), social-networking vendors (e.g.
Facebook), and service providers that expand their roles to become providers
and processors of context.
2.7 Conclusions
The purpose of this chapter was to illustrate the step change in the scope and scale
of the “intelligence” that enterprises needing to exploit Web 2.0 capabilities will
have to engage with. In conclusion, it is clear that there are a number of dimensions
that strategies for Web 2.0 exploitation should address, and these are summarised
below.
Strategies for positioning within the competitive landscape need to address
both the structure and the dynamics of the network economy: the old paradigm
of competition within well-defined industry structures is being displaced by the
imperative to survive by competing and collaborating in an ecology populated by
a diversity of players. This carries with it opportunities for the exploitation of
network effects and the exploitation of long tail distributions to develop new niches,
but it also exposes firms to competition on a global playing field. Understanding
the structure and dynamics of the ecology is therefore an essential feature of the
intelligent enterprise.
24 Y. Merali and Z. Bennett
The provision of enterprise intelligence in this complex networked context is
likely to entail integration of data and applications from diverse sources across
organisational boundaries, and this carries with it challenges for the development of
standards, interfaces and agile, scalable architectures. However even more chal-
lenging is the step change in the diversity, volume and granularity of data available
for collection, organisation, analysis and exploitation. This is due to both the
increased scope and scale of web enabled transactions and diversity of multimedia
devices that interact with the web. Whilst organisations like Google are pioneering
the development of powerful datamining algorithms, formidable issues of semantic
organisation, analysis and manipulation remain. Web 2.0 business models are likely
to entail collaborations and partnerships between providers and users of data and
services: Google, Facebook and Amazon all have extensive webs of collaboration
with technology and content providers as well as with businesses that want to use
their services.
One of the early defining characteristics of the Web 2.0 era was the proliferation
of user-provided content. The rapid co-evolution of social media and social net-
working sites like Facebook has focused corporate attention on creating business
models that leverage social intelligence by harnessing the intellectual, social and
relational capital that is embodied in social networks. A number of legal, ethical
and philosophical issues have emerged as firms experiment with exploiting user-
generated content and crowd-sourcing. It is strategically important to establish clear
governance frameworks for such engagement particularly with regard to intellec-
tual property rights and the management of personal information, as transgressions
can incur very significant financial and reputational damage.
Finally, the combination of multimedia capabilities, sense and location data,
highly granular data about user behaviours, context-aware applications, and large
scale data analysis combined with the social, relational and intellectual capital
of individuals and collectives presents the opportunity and challenge of defining
business models that can really exploit the synergies between cyberspace and place
that are inherent in the “web of places and things”.
References
Anderson C (2006) The long tail. Hyperion, New York
Arthur B (1990) Positive feedbacks in the economy. Sci Am 262(2):80
Axelrod R, Cohen M (1999) Harnessing complexity: organizational implications of a scientific
frontier. Free Press, New York
Castells M (1996) The rise of the network society. Blackwell, Oxford
Chesbrough HW, Vanhaverbeke W (2006) Open innovation. Oxford University Press, Oxford
Economist (2010a) Clicking for gold. Economist, 2/27/2010, 394(8671), Special section p9–11
Economist (2010b) Dicing with data. Economist, 5/22/2010, 395(8683), p16
Economist (2011) Content couple. Economist, 2/12/2011, 398(8720), p71
Econsutancy (2010) http://econsultancy.com/uk/reports/social-media-and-online-pr-report Accessed
on 21 Feb 2011
2 Web 2.0 and Network Intelligence 25
Evans P, Wurster T (2000) Blown to bits: how the new economics of information transforms
strategy. Harvard Business School Press, Cambridge, MA
Howe J (2006) The rise of crowdsourcing. Wired 14(6). http://www.wired.com/wired/archive/
14.06/crowds.html. Accessed on 21 Feb 2011, p1–4
Kwok R (2009) Phoning in data. Nature 458(23):959–961
Merali Y (2004) Complexity and information systems. In: Mingers J, Willcocks L (eds) Social
theory and philosophy of information systems. Wiley, Chichester, pp 407–446
Merali, Y (2011) Beyond problem solving: Realising organisational intelligence in dynamic
contexts, OR Insight advance online publication 13 July 2011
O’Reilly T (2007) What is web 2.0: design patterns and business models for the next generation of
software. MPRA paper no. 4578. http://mpra.ub.uni-muenchen.de/4578/ posted 07. November
2007 04:01. Accessed on 21 Feb 2011
Rowen D, Cheshire T (2011) Commerce gets social. Wired, 84–91
Shapiro C, Varian H (1999) Information rules: a strategic guide to the network economy. Harvard
Business School Press, Cambridge, MA
Valdes R, Phifer G, Murphy J, Knipp E, Smith DM, Cearley DW (2010) Hype cycle for web
and user interaction technologies, 2010. Gartner Research Report Number 00201568.
Gartner, Stanford
26 Y. Merali and Z. Bennett
Part II
ACTIVE Technologies and Methodologies
3
Enterprise Knowledge Structures
Basil Ell, Elena Simperl, Stephan W€olger, Benedikt K€ampgen,Simon Hangl, Denny Vrandecic, and Katharina Siorpaes
3.1 Introduction
One of the major aims of knowledge management has always been to facilitate the
sharing and reuse of knowledge. Over the years a long list of technologies and
tools pursuing this aim have been proposed, using different types of conceptual
structures to capture the knowledge that individuals and groups communicate and
exchange. This chapter is concerned with these knowledge structures and their
development, maintenance and use within corporate environments. Enterprise
knowledge management as we know it today often follows a predominantly
community-driven approach to meet its organizational and technical challenges.
It builds upon the power of mass collaboration and social software combined with
intelligent machine-driven information management technology delivered though
formal semantics. The knowledge structures underlying contemporary enterprise
knowledge management platforms are diverse, from database tables deployed
company-wide to files in proprietary formats used by scripts, from loosely defined
folksonomies describing content through tags to highly formalized ontologies
through which new enterprise knowledge can be automatically derived. Lever-
aging such structures requires a knowledge management environment which not
only exposes them in an integrated fashion, but also allows knowledge workers
to adjust and customize them according to their specific needs. We discuss how
the Semantic MediaWiki provides such an environment - not only as an easy-to-
use, highly versatile communication and collaboration medium, but also as an
B. Ell (*) • E. Simperl • B. K€ampgen • D. VrandecicKarlsruhe Institute of Technology, KIT-Campus S€ud, D-76128 Karlsruhe, Germany
e-mail: [email protected]; [email protected]; [email protected];
S. W€olger • S. Hangl • K. SiorpaesSTI Innsbruck, University of Innsbruck, Technikerstraße 21a, 6020 Innsbruck, Austria
e-mail: [email protected]; [email protected]; [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_3, # Springer-Verlag Berlin Heidelberg 2011
29
integration and knowledge engineering tool targeting the full range of enterprise
knowledge structures currently used.
This chapter is split into two parts. In the first part we undertake a comparative
analysis of the different types of knowledge structures used by knowledge workers
and enterprise IT systems for knowledge sharing and reuse purposes. In the second
part we devise a comprehensive approach to develop, manage and use such
structures in a collaborative manner. We present an ontology editor bringing
together Web 2.0-inspired paradigms and functionality such as Flickr (http://
www.flickr.com) and wikis to support laymen in organizing their knowledge as
lightweight ontologies. Integration with related knowledge resources is exemplified
through a series of methods by which arbitrary folksonomies, but also highly
popular knowledge bases such as Wikipedia (http://www.wikipedia.org/) and
Freebase (http://www.freebase.com/) are made accessible in ontological form.
To further optimize the usability of knowledge structures – an issue which becomes
particularly important in a non-expert-driven knowledge engineering scenario
integrating various resources – we design techniques to check for common fallacies
and modeling errors, which offer a solid baseline for cleansing the underlying
knowledge base. The implementation is based on Semantic MediaWiki (http://
semantic-mediawiki.org/), and has been deployed in the three case studies of the
ACTIVE project which are introduced in Chaps. 9–11.
3.2 Enterprise Knowledge Structures and How Are They Used
The question of how to optimally capture and leverage enterprise knowledge has
engaged the knowledge management community since its inception. As already
discussed in the introductory section of this chapter, the prominence of this topic is
reflected in the different types of conceptual structures which we can find behind
the scenes of enterprise knowledge management platforms, a diversity which is
multiplied by the wide spectrum of methodologies, methods and techniques pro-
posed for their development, maintenance and use. In the present day, enterprise
knowledge management essentially follows a community-driven approach,
implementing solutions for crowdsourcing and social networking in order to opti-
mize communication and collaboration – within the company and its ecosystem of
business partners and end-customers – and knowledge sharing and reuse. In addi-
tion, formal semantics provide intelligent information management technology for
capturing, accessing, managing and integrating knowledge. The approach is based
on ontologies, knowledge structures whose (community-agreed) meaning is expec-
ted to be exploitable by machines, in particular via reasoning facilities by which
implicit knowledge is derived and inconsistencies are detected.
In the following we illustrate how enterprise knowledge structures can be used,
and the various trade-offs which are associated with the different types of struc-
tures, in terms of three motivating scenarios taken from the case studies.
30 B. Ell et al.
3.2.1 Knowledge Management at an International ConsultingCompany
The first scenario is set in a large, knowledge-intensive enterprise – a consulting
company – where employees collaborate around the globe on various topics to
provide services to clients with best efficiency.
Most enterprise knowledge management systems are set up for the ‘prototypical’
user with no specific task in mind. Especially in a large enterprise context,
employees need information for various different tasks. For instance, they may
want to find information on previous projects, get an overview of a specific tech-
nology, or they may be interested in learning about a particular group within the
company. These tasks are particularly relevant in the context of proposal develop-
ment, by which a company creates a description of the products and services it is
offering at an estimated cost to a potential customer.
Proposal writing follows standardized processes and procedures – giving
instructions about the tasks to be undertaken, the information to be gathered, the
documents to be created, etc. Equally important are less formalized practices –
calling contacts that may have information on similar projects, or searching for
similar proposals in the intranet. Often, information about previous projects cannot
simply be obtained from a central data repository. This is due to the fact that many
documents created within the context of a client project are client-proprietary, and
may not be shared within the entire company. There are also many technical
challenges related to the decentralized and heterogeneus nature of the enterprise
IT landscape, and to the limitations of keyword-based information management
technologies. Especially in an enterprise scenario, and in the context of a specific
task, it will often be useful to retrieve actual facts rather than the documents
that mention them. Such facts refer to entities, for example, to experts, locations,
clients, other companies, and relationships among them. Which facts should be
retrieved naturally depends on the task at hand: for instance, in the case of proposal
development, one might want to find clients for which the company has submitted
similar bids.
Enterprise knowledge structures are the backbone of such sophisticated infor-
mation retrieval facilities. They capture enterprise domain knowledge at various
levels of expressivity and formality. When choosing the most appropriate among
these levels, it is important to weigh the advantages and disadvantages of heavy-
weight ontology-based approaches, supporting reasoning and full-fledged semantic
search, vs. the additional costs associated with the maintenance and usage of the
knowledge structure, which should be integrated into the daily workflow and allow
user participation at large. Enterprise document repositories support bookmarking
and tagging as means to describe the content of documents. The resulting con-
ceptual structures contain knowledge which could prove extremely useful to create
rich, formal ontologies to implement more purposeful information retrieval
solutions.
3 Enterprise Knowledge Structures 31
3.2.2 Knowledge Sharing at a Large Telecom Operator
A similar scenario has been identified at a large telecom operator.
Operating in multiple projects is a reality of modern businesses. As part of their
daily work knowledge workers interact with various systems, information sources and
people. Their work is highly dependent on contextual dimensions as diverse as the
customer, the status of the sales opportunity, current project issues, and the suppliers
involved. To improve productivity, frequently used information such as contact and
customer data and product documentation should be easily available; the knowledge
worker should not have to search around for these things as they change from one
working context to another. Furthermore, as the user resumes an earlier task, her
working context should be restored without problem to the state it was before.
There is an abundance of information held within the company’s repositories,
much of which may not be easily accessible to technical consultants, solution
consultants, and sales specialists. In addition there is a wealth of tacit knowledge
which may not be being captured to best effect. The key problem here is that
knowledge workers may not be aware of earlier solutions; it is possible that
comparable solutions to similar problems are being worked on in isolation rather
than in co-operation, or even that a particular problem has already been solved.
A better awareness of the solutions to specific business problems and the business
domains in which those solutions were applied should enable common patterns of
solutions to be identified.
To support agile knowledge working, several knowledge management features
are required: information such as contacts, relevant (technical) documentation,
emails, and customer-specific information must be captured and easy retrieval
must be enabled. Moreover, the context of a knowledge worker has to be captured.
This involves modeling of general enterprise knowledge as well as appropriate
knowledge representation formalisms, suitable from an information-management
point of view, but also tangible for knowledge workers.
3.2.3 Process Optimization at a Digital Chip Design Company
The order of the design activities during chip design is hard to determine before
process start. Usually a designer or a team decides on the best possible continuation
of the activity flow in an ad-hoc manner during the process. Problems can occur in
the case of goal changes, requirement changes, environment changes, etc.
It is important to collect data about the actual execution and sequences of design
activities in several concurrent design project flows. Ideally, this should be sup-
ported by a knowledge management application that assists in eliciting the knowl-
edge about how a sequence of design and verification activities is related to
a particular type of a designed artifact, the configurations of used design tools,
and the capabilities of design teams.
32 B. Ell et al.
The company uses a modeling framework and an upper-level ontology for repre-
senting dynamic engineering design processes and design systems as process
environments. The modeling approach is based on the understanding that an
engineering design process can be conceived as a process of knowledge transfor-
mation which passes through several states. Each state is the state of affairs in
which a particular representation of a design artifact or several representations are
added after being elaborated by a design activity leading to this state. Evidently,
the overall goal of a design process is to reach the (target) state of affairs in which
all the representations are elaborated with enough quality for meeting the require-
ments. The continuation of the process is decided by choosing an activity from the
set of admissible alternatives for that state. Engineering design processes are
situated in and factually executed by the design system comprising designers,
resources, tools, and normative regulations.
The ontology used in the chip design company is a core component of
all processes. The ontology constantly evolves and its evolution needs to be
supported by collaborative ontology engineering tools. The objective is to ensure
that the enterprise knowledge structures and the proprietary ontology suite
are aligned.
3.2.4 Trade-off Analysis
Enterprise knowledge structures vary with respect to a number of aspects, ranging
from expressivity to size, granularity and modeling paradigm followed. These
aspects influence not only the utility of (a category of) knowledge structures in
a particular scenario, but has also direct consequences on the ways in which
a knowledge structure is developed, maintained and used. This section aims to
conduct a baseline analysis of the trade-offs implied by these aspects and to
introduce methods which can be used to perform such an analysis in a systematic
manner.
Particular attention is paid to the use cases discussed above. Considering the
scenario within the large consulting company, enterprise knowledge structures can
be used to allow for the implementation of intelligent knowledge organization and
retrieval techniques. Questions related to the most adequate type of knowledge
structure, its tangible benefits, and the associated development and maintenance
costs are crucial to demonstrate the added value of the technology in this scenario.
The maintainability of knowledge structures is, besides reuse, an essential aspect of
the second scenario we investigate in the project. Here, the additional problem to be
looked into is the extent to which reusability of existing knowledge structures is
economically feasible. The chip design scenario leverages ontologies as means to
capture domain knowledge and enable communication between designers. Cost-
benefit-motivated quantitative and qualitative means are expected to optimize the
ongoing ontology engineering process (see Chap. 4).
3 Enterprise Knowledge Structures 33
Trade-offs are specified along a number of dimensions used in the literature to
classify and describe knowledge structures:
1. Formality: (Uschold and Grueninger 1996) distinguish among four levels of
formality:
– Highly informal: the domain of interest is modeled in a loose form in natural
language.
– Semi-informal: the meaning of the modeled entities is less ambiguous by the
usage of a restricted language.
– Semi-formal: the knowledge structure is implemented in a formal language.
– Rigorously formal: the meaning of the representation language is defined in
detail, with theorems and proofs for soundness or completeness.
(McGuiness 2003) defines a ‘semantic spectrum’ specifying a total order
between common types of models. This basically divides ontologies or ontology-
like structures in informal and formal as follows (Fig. 3.1):
– Informal models are ordered in ascending order of their formality degree as
controlled vocabularies, glossaries, thesauri and informal taxonomies. In this
category we can also include folksonomies, sets of terms which are the result of
collaborative tagging processes.
– Formal models are ordered in the same manner: starting with formal taxono-mies, which precisely define the meaning of the specialization/generalization
relationship, more formal models are derived by incrementally adding formalinstances, properties/frames, value restrictions, disjointness, formal meronymy,general logical constraints etc.
In the first category we usually encounter thesauri such as WordNet (Fellbaum
1998), taxonomies such as the Open Directory (http://www.dmoz.org) and the ACM
classification (http://www.acm.org/class/1998/) or various eCommerce standards
(Fensel 2001). Most of the available Semantic Web ontologies can be localized at
the lower end of the formal continuum (i.e. as formal taxonomies), a category whichoverlaps with the semi-formal level in the previous categorizations. However, the
usage of SemanticWeb representation languages does not guarantee a certain degree
of formality: while an increasing number of applications are currently deciding to
Fig. 3.1 Semantic spectrum
(based on McGuiness [2003])
34 B. Ell et al.
formalize domain or application-specific knowledge using languages such as RDFS
or OWL, the resulting ontologies do not necessarily commit to the formal semantics
of these languages. By contrast, Cyc (Lenat 1995) or DOLCE (Gangemi et al. 2002)
are definitively representative for the so-called heavyweight ontologies category,
which corresponds to the upper end of the continuum.
In (Vrandecic 2009b) we offer a complete formalization of all the above types of
knowledge structures, and thus also how OWL2 (Grau et al. 2008) can be used to
represent each of the other types besides ontologies. This allows us to classify
knowledge structures automatically, and to check if they indeed meet the criteria of
a specific type of knowledge structure. What is important here is that we can use any
of the given structures without restrictions and nevertheless guarantee the integra-
tion of all these knowledge structures.
2. Shareability: due to the difficulties encountered in achieving a consensual
conceptualization of a domain of interest, most of the ontologies available today
reflect the view of a restricted group of people or of single organizations.
Standard classifications such as the Open Directory (http://www.dmoz.org),
classifications of job descriptors, products, services or industry sectors have been
developed by renowned organizations in the corresponding fields. Due to this fact,
these knowledge structures are being expected to be shared across a wide range of
applications. However, many of them have been developed in isolated settings
without an explicit focus on being shared across communities or software
platforms. Given this state of the art we distinguish among four levels of (expected)
shareability:
– Personal ontologies: the result of an individual development effort, reflecting
the view of the author(s) upon the modeled domain. Personal Semantic Web
ontologies are published online and might be accessed by interested parties, but
their impact is limited, as there is no explicit support for them being reused in
other application contexts. Depending on the complexity of the ontology, they
still might achieve a broad acceptance among a large user community.
– Application ontologies: developed in the context of a specific project for pre-
defined purposes and are assumed to reflect the view of the project team
(including the community of users) upon the modeled domain. Whilst under
circumstances made public on the Web, they are de facto intended to be used
within the original, project-related user community. Their acceptance beyond
these boundaries depends on the impact of the authoring authority in the specific
area, but also on the general reusability of the ontologies. Many of the domain
ontologies available so far can be included in this category.
– Openly developed ontologies: developed by a large, open community of users,
who are free to contribute to the content of the ontology. The ontology, as a
result of continuous refinements and extensions, emerges to a commonly agreed,
widely accepted representation of the domain of interest. The evolution of the
Open Directory classification is a good example for collaborative, Web-based
ontology development: the core structure of the topic classification, originally
proposed by Yahoo! (http://dir.yahoo.com/) and used in slightly modified form
3 Enterprise Knowledge Structures 35
by various search engines, was extended by users, who also played an crucial
role in the instantiation of the ontology with Web documents. Another promi-
nent example is the Gene Ontology (Gene Ontology Consortium 2000).
– Standardontologies: developed for standardization purposes by key organizationsin the field, usually being the result of an extended agreement process in order to
satisfy a broad range of requirements arisen from various user communities. The
majority of standard ontology-like structures currently available are situated in the
area of eCommerce: The United Nations Standard Products and Services Codes
UNSPSC (http://www.unspsc.org), the RosettaNet classification (http://www.
rosettanet.org) or the North American Industry Classification System NAICS
(http://www.census.gov/epcd/www/naics.html). Another example is the FOAF
ontology (http://xmlns.com/foaf/0.1/). The simple ontology describing common
inter-human relationships enjoys significant visibility, not only as a result of the
standardization efforts of the FOAF development team.
3. Domain and scope: according to (Guarino 1998) ontologies can be classified
into four categories:
– Upper-level/top-level ontologies: they describe general-purpose concepts and
their properties. Examples are the Top-Elements Classification by Sowa (Sowa
1995), the Suggested Upper Level Merged Ontology SUMO (Pease et al. 2002)
or the Descriptive Ontology for Linguistic and Cognitive Engineering DOLCE
(Gangemi et al. 2002).
– Domain ontologies: they are used to model specific domains in medicine or
academia. A typical example in this area is the Gene Ontology developed by the
Gene Ontology Consortium (2000).
– Task ontologies: they describe general or domain-specific activities.
– Application ontologies: they are extensions of domain ontologies having regard
to particular application-related task ontologies and application requirements.
A last category of ontologies, which was not covered by the classifications
mentioned so far, are the so-calledmeta-ontologies or (knowledge) representationontologies. They describe the primitives which are used to formalize knowledge in
conformity with a specific representation paradigm. Well-known in this category
are the Frame Ontology (Gruber 1993) or the representation ontologies of the W3C
Semantic Web languages RDFS and OWL (http://www.w3.org/2000/01/
rdf-schema, http://www.w3.org/2002/07/owl).
When describing the scope of an ontology, the types of knowledge that should be
available to the engineering team to build the domain conceptualization are highly
relevant. In principle, one can distinguish between ontologies capturing common
and expert knowledge, and based on this distinction determine the composition of
the team engineering a particular ontology.
4. Representation language: a wide range of knowledge structures emerged in
a pre-Semantic Web era. In order to overcome this syntactic and semantic barrier
a plethora of approaches investigate the compatibility between different forma-
lisms, while the aforementioned representation ontologies are intended to capture
36 B. Ell et al.
these differences explicitly. The most popular representation paradigms regarding
ontologies are Frames, Description Logics and UML-MOF.1
On the Semantic Web, the classic trade-offs regarding expressivity have been
decidability and complexity. For a language to be decidable, a reasoner can be
implemented such that all questions that can be asked against a knowledge base
that are expressed using that language have an answer. Decidability as a property of
languages is highly desirable: it guarantees that all questions that can be asked can be
answered, and that the associated reasoning algorithms are effectively imple-
mentable. Research in Description Logics explores the borders of decidability.
Besides decidability, which guarantees the effective implementation of reason-
ing algorithms, we further need to regard the complexity of the algorithms that
can answer questions against the knowledge structures. In general it can be said
that the more expressive a language is, the higher the complexity. Since neither
expressivity nor complexity are continuous spectra, it can happen that we can
increase expressivity but retain the same complexity.
In the context of the scenarios introduced earlier, OWL DL fulfills the require-
ment with regards to decidability, but both decidable OWL languages (OWL DL
and OWL Lite) have an exponential (or worse) complexity (Horrocks and Patel-
Schneider 2004), which makes it possibly unsuitable for our use cases – since we
have to expect to deal with a high number of instances. Languages that allow the
use of algorithms that can be implemented with a tractable complexity are consid-
ered more suitable in cases where we can expect such a high number of instances as
in enterprise settings. OWL2 introduces language profiles (Motik et al. 2008),
which are well-defined subsets of the OWL2 constructs. These profiles have
specific properties that are also guaranteed for all models adhering to these profiles.
Other aspects not mentioned in this classification, but relevant when describing
an ontology, or every other knowledge structure, are covered by so-called meta-
ontologies and metadata schemes thereof. Metadata schemes such as OMV
(Hartmann et al. 2005) cover general information about ontologies, such as the
size in terms of specific types of ontological primitives, the domain described, the
usage scenarios, the support software and techniques, and so on. Many of these
aspects are interrelated and can be traded against each other, as it will be elaborated
later in this section. Potential developers and users of ontologies should be made
aware of the trade-offs associated to engineering and using a particular type of
knowledge structure. More precisely, these tasks require specific expertise, soft-
ware and infrastructure, as well as the compliance with processes and methodo-
logies, all under circumstances related to considerable costs.
These trade-offs are summarized in Table 3.1.
The considerations presented in Table 3.1 can be used as general guidelines to be
taken into account and applied in the process of engineering an ontology. Their
operationalization has to rely on methods which allow a quantification of costs and
1 http://www.omg.org/technology/documents/modeling_spec_catalog.htm
3 Enterprise Knowledge Structures 37
benefits involved and their analysis (see Chapter ‘Using Cost-Benefit Information
in Ontology Engineering Projects’).
3.3 How are Enterprise Knowledge Structures Being Built
In this section, we first give a short overview of the wiki technology Semantic
MediaWiki (SMW) (Kr€otzsch et al. 2007) as a flexible tool for dealing with
enterprise knowledge structures. Then we describe in more detail selected aspects
of knowledge structure editing, leveraging, and repair.
Social software as a tool for knowledge sharing and collaboration is gaining
more and more relevance in the enterprise world (Drakos et al. 2009). This is
especially true for so-called enterprise wikis, that, just as wikis in the public Web,
Table 3.1 General trade-offs
Formality A formal ontology is useful in areas which require sophisticated processing
of background knowledge and automatic inferencing. This assumes the
availability of mature tooling for these tasks. In addition, the more formal
an ontology should be the higher the level of expertise and the costs of the
ontology development processes. Finally, heavyweight ontologies can
not be acquired automatically, as properties and axioms can not be
feasibly learned from unstructured knowledge structures using the
present software.
Shareability The main advantage of a shared ontology is its capability to enable
interoperability at the data and interoperability levels. Developing
a commonly agreed ontology implies, however, additional overhead
in terms of the development process to be followed, including
methodological support and software to support the discussions and
consensus reaching task. In addition, a shared ontology will not be able to
optimally match very specific needs of many usage scenarios in which it
is involved. Thus additional overhead to understand and adapt is required.
Domain and scope First there is the aforementioned trade-off between the scope and the
reusability of particular categories of ontologies. In addition, higher-level
ontologies tend to be more costly, as they require specific expertise. The
same applies for ontologies dealing with expert knowledge, such as those
in areas of chip design. The size of a knowledge artifact (expressed, let’s
say, in the number of concepts, properties, axioms and fixed instances) is
an important factor to be aware of, not only because of the direct
relationship to the development and maintenance costs, but also because
of the difficulties associated with the processing of large artifacts by
reasoners and alike. There is a trade-off between the domain coverage of
an ontology and the additional effort required to build, revise and use it.
Representation
language
Besides to the link to the formality dimension, the choice of a representation
language has consequences with respect to the ways an ontology can be
used in knowledge inferencing tasks and the extent to which particular
aspects of the knowledge domain can or can not be captured by the
ontology. In addition, formal, logics-based representation languages
require specific expertise within the ontology development and
maintenance team.
38 B. Ell et al.
provide their advantages of low usage-barriers and direct benefits within a company
intranet. However, the simple provision of a Wikipedia-like internal page does not
guarantee acceptance by employees; such wiki software needs to be customized
to the specificities of the corporate context.
SMW provides this customization by combining the complementary
technologies of Web 2.0 and the Semantic Web (Ankolekar et al. 2007). It enhances
the popular open-source wiki software MediaWiki (http://www.mediawiki.org/
wiki/MediaWiki) with semantic capabilities. In addition its functionality can be
enriched with general-purpose extensions developed by the community2 as well as
custom extensions tailored to the needs of specific enterprise scenarios. The usage
of the Semantic Web standards RDF, RDFS and OWL, and of ontologies enables
the realization of comprehensive knowledge-management solutions, which provide
integrated means to formally describe the meaning and organization of the content
and to retrieve, present and navigate information.
In the following, we describe how enterprise knowledge structures are collabo-
ratively built, enriched, and exploited using SMW.
Creating Structured Information Information stored in SMW can be converted
into machine-readable RDF. In other words, it is possible to have property-value
pairs explicitly assigned to wiki pages. Such a property-value pair can indicate
a named link (a so-called object property) to another page, e.g., ‘locatedInCountry’,or a typed attribute (a so-called datatype property), e.g., String ‘hasTag’, Date
‘hasFoundingDate’, and Number ‘hasHeight’. Such properties can be freely
inserted into a page via wiki syntax or forms. Enterprise knowledge structures
can be defined through categories (so-called classes) of pages with certain
properties and encoded as an ontology in RDFS and OWL. The resulting ontology
can be automatically or manually applied to the wiki (Vrandecic and Kr€otzsch2006), for instance, in the form of categories, pages, wiki templates, forms and
properties.
Retrieving Information The availability of machine-processable information
facilitates the realization of concept-based search, presentation and navigation
features going beyond traditional keyword-based approaches. The user can issue
structured queries, addressing certain properties of a page, e.g., the customer of a
proposal. All pages belonging to a category having certain properties can be listed
as an overview, including links to those pages, e.g., all products within a specific
price range. Various result formats can be used, starting from simple tables to more
advanced calendars, time lines, and maps. Through facetted search one can incre-
mentally filter lists of pages via keywords and property-ranges. More complex, but
still user-friendly, querying following similar patterns as the standard Semantic
Web querying language SPARQL (http://www.w3.org/TR/rdf-sparql-query/) is
supported as well. When the user enters a keyword, the system looks for
connections between pages described with the keywords and lists those pages
2Openly available at http://www.mediawiki.org/wiki/Extension_Matrix (MediaWiki) and http://
semantic-mediawiki.org/wiki/Help:Extensions (SMW)
3 Enterprise Knowledge Structures 39
(Haase et al. 2009) In SMW, these queries are possible through forms on special
pages, but can also be embedded as so-called inline queries in single pages.
Integrating External Information External sources can be integrated and their
content merged with existing enterprise knowledge structures. The results can be
organized as new pages or properties, referenced from other pages, and visualized
in new ways.
Enterprise knowledge is rarely represented in RDF, but there are many tools
available that deal with such transformations from established formats and
standards, most notably tabular ones. The same applies to online knowledge sources
such as Freebase (http://www.freebase.com), other SMW installations or the
Linked Open Data cloud, for which a growing number of Web services delivering
RDF are available (http://www.linkedopenservices.org). Orthogonal to the transla-
tion to RDF is the question of how to map specific elements of the source
knowledge structure into the wiki model. Simply creating a page for each element
within an external source and copying the data into the wiki may prove suboptimal
for subsequent data usage. In Sect. 3.3.2 we provide additional details on SMW’s
integration features.
Improving Information Quality SMW specifically targets scenarios where
knowledge is created in a decentralized manner – be that by exposing and
integrating external sources, or by supporting collaborative editing and interaction.
In such scenarios information quality can quickly become a problem. A prominent
dimension we discuss here is consistency, both with respect to the primary sources
and with respect to the domain at hand. For the former, SMW adheres to a regime in
which users may only refer to, and comment upon the primary sources from within
the wiki, while changes may only be undertaken at the level of these sources. For
the latter, one can use an inference service operating on the wiki knowledge base.
Deduction methods on the enterprise knowledge structures can provide insights
about the wrong usage of categories, pages, and properties (Vrandecic 2009a). Most
such errors cannot be automatically repaired, but at least, made visible to the users
or administrators. For example, if the imported data contains information about a
proposal with customer X and a wiki page exists about X, which is not a member of
the customer category, adding that page to the category can be automatically
suggested to the administrator. In addition, visualizing information in a structured
way may lead to the identification of missing and incorrect information, which
applies to both genuine wiki content and content from external sources. Users may
not directly correct the latter, but they can rate it, and comment on it for revision.
In Sect. 3.3.2 we will discuss a number of simple measurements whose results
can indicate specific quality issues.
Interplay with other Enterprise Tools To maximize its added value for knowl-
edge workers, SMW should not be used in isolation from existing enterprise
systems and workflows. This is enabled by the information integration functionality
presented earlier, and by a number of additional features targeting application
integration. The content of a semantic wiki can be exported as RDF, as well as
many other structured data formats, e.g., JSON, vCard, and BibTeX. Results of
queries can be monitored for new pages and modified properties, and published as
40 B. Ell et al.
RSS feeds and send per e-mail. Using HTTP requests to the wiki, external
applications such as office productivity tools can access, add, and modify pages
and properties.
In the following sections we go into more details of how enterprise knowledge
structures can be edited, leveraged, and repaired.
3.3.1 Building Knowledge Structures Manually
The SMW OntologyEditor is an extension of Semantic MediaWiki for developing
and maintaining knowledge structures (so-called vocabularies). As such it inherits
many of the features and the mode of operation of Semantic MediaWiki. It targets
Semantic MediaWiki users, but it also provides a comfortable interface for people
less experienced in using wikis, in particular the wiki syntax. In this section we will
briefly introduce the functionality of this extension – a more detailed account is
given in (Simperl et al. 2010).
Main Page The main page is the entry point of the SMW OntologyEditor (see
Fig. 3.2). It contains the primary navigation structure and links to important pieces
of functionality, including the creation of new vocabularies consisting of categories
and properties, the integration of other knowledge structures, such as folksonomies
and external vocabularies encoded in RDFS and OWL, and knowledge repair (see
Sect. 3.3.2). In addition, the user is provided with a short introduction to the
tool, important links, as well as an overview of the content of the current wiki
installation, in terms of namespaces of the individual vocabularies and a tag cloud.
Vocabulary Creation To create a new vocabulary one can use the corresponding
link in the primary navigation menu, which leads to a form (see Fig. 3.3). There the
user can enter a vocabulary name and a description and add categories and
properties. Once the vocabulary is created the user is presented with a vocabulary
overview, which contains automatically added metadata such as Flickr images in
addition to the information manually provided by the user and a link to the CreateCategory Form (see Fig. 3.4).
The Create Category Form includes a short explanation and a number of input
fields. They are autocompletion-enabled, by which the user is presented with a list
of entities with a similar name to the one she is about to type-in (see Fig. 3.4). The
user can enter a name for the category, refer to an existing vocabulary, define sub-
and super-categories, and add new and existing properties to the category. Subse-
quently the system displays the category overview page illustrated in Fig. 3.5,
including a tag cloud to easily access category instances (so-called entities) as
well as the most interesting images on Flickr related to the category. Categories and
entities are visualized in a tree-like hierarchy, which can be altered by clicking on
the Edit links which open pop-ups for inline editing (see Fig. 3.6).
Knowledge Repair The knowledge repair algorithms can be accessed as so-
called SpecialPages in the wiki.
3 Enterprise Knowledge Structures 41
After clicking on Category Statistics the system provides the user with a com-
prehensive overview of all categories available in the system, and potential model-
ing issues (see Fig. 3.11). At the top there is a table with explanations followed by
a table with minimum, maximum and average values serving as basis for error
detection. An additional table displays the corresponding values for all categories.
Colors and symbols are used to direct the user focus to potential problems. If the
user is not interested in all categories, but rather in a specific one she can click on
the tab repair on the category page which will lead to an overview page of the
specific category displaying similar information (see Fig. 3.10).
The SpecialPage Categories in cycles, shown in Fig. 3.9 lists cycles in a categoryhierarchy. The user then has to decide whether the specific cycle will be accepted or
not. Redundancies in the hierarchy are displayed in the SpecialPage Categorieswith redundant subcategory relations (see Fig. 3.9). The user can decide which
link is indeed redundant and delete it on-the-fly. The SpecialPage Entities withsimilar names provides the user with information about entities with similar
names (Fig. 3.9). For each entity the system calculates the Levenshtein distance
(Leveshtein 1966) to other categories and displays the results. In Sect. 3.3.2
Fig. 3.2 Main page
42 B. Ell et al.
Fig. 3.3 Create vocabulary form and overview page
Fig. 3.4 Create category form
3 Enterprise Knowledge Structures 43
we introduce additional knowledge repair features, such as the Category Histo-gram, the Property Histogram, Categories with similar property sets and Unsub-categorized categories.
Versioning The versioning SpecialPage gives an overview of the history of
changes of vocabularies and categories. When a vocabulary or category is selected,
a pop-up with detailed versioning information is displayed. On the left-hand side
the user can choose between Vocabulary Structure Changes and CategoryChanges. Different versions are displayed (via AJAX) on the right-hand side of
the pop-up. A selected version can be restored by clicking the Restore SelectedVersion button, as depicted in Fig. 3.7.
Import and export One of the advantages of Semantic MediaWiki as a knowl-
edge management platform is its ability to provide integrated access to a multitude
of knowledge structures, most prominently folksonomies and ontologies. The
folkosonomy import relies on the technique described in Sect. 3.3.2 thus the
Fig. 3.5 Category overview
Fig. 3.6 Changing parameters via inline editing
44 B. Ell et al.
associated information – a collection of tagged resources such as bookmarks or
conventional documents – has to be organized in a specific XML format. Given
this, the folksonomy is enriched with additional structuring information and is
transformed in a lightweight ontology which can be explored and further revised
in the editor just as any other vocabulary. When importing an existing OWL
ontology – for instance, one that was developed in a different ontology engineering
environment – the system uploads the OWL file specified by the users, extracts all
ontological entities and creates corresponding wiki content following the
instructions defined in a so-called meta-model. This meta-model describes the
types of ontological primitives supported by the editor – in this case, as we are
dealing with lightweight knowledge structures, a subset of OWL consisting of
classes, instances and properties, in particular specialization-generalization – and
how they are mapped to SMW artifacts. Once this step is concluded, the resulting
vocabulary can be further processed in a collaborative fashion in our tool. Every
vocabulary can be locally stored as an OWL file using the export tab on the
vocabulary overview page.
3.3.2 Leveraging External Knowledge Sources
Enterprise knowledge structures come in various forms, from database tables,
standardized taxonomies and loosely defined folksonomies to strictly organized
knowledge bases. To optimally support knowledge management tasks in a corpo-
rate environment Semantic MediaWiki needs to provide mechanisms to access,
Fig. 3.7 Versioning
3 Enterprise Knowledge Structures 45
integrate and use all these different formats. This is important for its acceptance as
a knowledge management solution – as it builds upon established resources and
platforms – and for its efficient use – as reusing existing resources can reduce
costs and improve the quality of the resulting enterprise knowledge structures.
In the previous section we have explained how such knowledge structures can be
manually created and maintained. The techniques introduced in the following are
complementary to this functionality. The first one adds a critical mass of formal
semantics to folksonomies in order to overcome some of their typical limitations,
such as the usage of abbreviations and alternative spelling, synonyms and different
natural languages to tag the same resource. The resulting lightweight ontology can
be explored, further developed and used in Semantic MediaWiki. In contrast, the
focus of the second technique is on leveraging existing knowledge bases, which
might contain significant amounts of (instance) data which could be useful within
SMW. The implementation is based on Freebase as one of most visible collections
of structured knowledge created in recent years; however, the mediator-based
approach underlying the implementation can be equally applied to other knowledge
bases.
3.3.2.1 Turning Folksonomies into Ontologies
This section gives an overview of our approach to extract lightweight ontologies
from folksonomies. The approach consists of 12 steps that have to be carried out in
the given order (see Table 3.2).
Step 1: Filter Irrelevant Tags In the first step, we eliminate tags, which do not
improve the information content, but have a downgrading effect on the quality of
the data basis. Unusual tags, which do not start with a letter are therefore filtered
out. Additionally, uncommon tags are dismissed. In this context a certain tag is
uncommon, if it is used less than a predefined threshold.
Step 2: Group Tags Using Levenshtein Metric The process of annotating
a certain resource with tags is an uncontrolled operation, which means that no
spell-checking or any other input verification can be assumed to take place. As
a consequence typing errors, mixing of plural and singular forms, annotations in
different languages and other possible minor discrepancies between tags are likely
to occur. The Levenshtein similarity metric (Leveshtein 1966) is used to discover
morphologically similar tags.
Step 3: Enrich Tags with WordnetWordnet is a rich resource of lexical informa-
tion (Fellbaum 1998). The database is organized in so-called synsets and can be
accessed locally or remotely over a simple user interface. If a certain tag can be
found in Wordnet, one can expect the tag to be a valid English term. All tags which
are covered by Wordnet are assigned a flag containing the exact number of
occurrences in Wordnet synsets.
Step 4: Enrich Tags with Wikipedia Wikipedia is a large, high-quality, and up-
to-date online encyclopedia. If a certain tag can be mapped to a Wikipedia article,
this tag can be considered a correct natural language term. In addition, we can
46 B. Ell et al.
benefit from the redirect pages functionality implemented in Wikipedia, so that
even when a tag is incorrectly spelled or abbreviated, there is a high chance to find
the correct corresponding Wikipedia article.
Step 5: Spell-check and Translate Spell checking and translating single words
(not sentences or pieces of text) can be done automatically at a high precision. We
apply this additional step because after the Levenshtein similarity check and the
exploitation of Wikipedia redirects, not all tags can be related to these resources.
This might occur when a tag is misspelled, or a tag is not in English.
Table 3.2 The 12 steps of our method
Step Title Description
1 Filter irrelevant tags Consider only tag data that is shared between
a sufficiently high number of users to increase
the community representativeness of the prospected
ontology.
2 Group tags using Levenshtein
metric
Compare relevant tags using the Levenshtein similarity
metric and group the highly similar ones. Tags
within the same group are considered to have
equivalent meaning and differences are assumed to
be the result of spelling mistakes.
3 Enrich tags with Wordnet
information
Check whether a tag is covered by the Wordnet
thesaurus, which we consider a feasible indicator for
a valid English term.
4 Enrich tags with Wikipedia
information
Use information available on Wikipedia to enrich the
tags.
5 Spell-check and translate Perform English spell-checking and translate those tags
that were neither found in Wordnet, nor inWikipedia
from foreign languages.
6 Update group assignments Update the tag groups created in step 2 based on the
additional information gathered in steps 3–5.
7 Find representative for each
group
Select representative for each tag group based on its
quality.
8 Create co-occurrence matrix Create symmetric square matrix containing information
on the frequency with which two tags (or tag groups,
respectively) were used to annotate the same
resource.
9 Calculate similarities Apply vector-based algorithm (Pearson correlation
coefficient) in order to detect similarities between
vectors in the co-occurrence matrix.
10 Enrich co-occurrence matrix with
co-actoring information
Augment co-occurrence matrix with the information
about the frequency with which two tags (or tag
groups, respectively) were used by the same author.
11 Create clusters Create clusters of tags (or tag groups, respectively) on
the basis of the calculated correlation coefficients
and co-actoring information.
12 Create ontologies Transform the tag clusters created in step 11 into SKOS
ontologies exploiting all information gathered in the
previous steps.
3 Enterprise Knowledge Structures 47
Step 6: Update Group Assignments In this step we update the tag groups defined instep 2 based on the information collected in steps 3–5 and eventually decide which
tags are relevant for the ontology to be created. The step can be further divided into
3 activities: (1) the re-grouping based on spell-checking and translation results;
(2) the re-grouping based on Wikipedia results; and (3) the selection of relevant tags.
Re-grouping based on spell-checking and translation results The first groupupdate is triggered by the mapping defined according to spell-checking and trans-
lation results. In order to ensure consistent groups after this update, four different
scenarios for mappings of the type tagA – > tagB have to be considered:
1. Neither tagA nor tagB are assigned to a group: in this case tagA and tabB,plus all other tags mapped to any of them, form a new group.
2. tagA is assigned to a group, but tagB is not: in this case tagB, and all other
tags mapped to it, are included into the group of tagA.3. tagB is assigned to a group, but tagA is not: just as in the previous case, tagA,
and all other tags mapped to it, are included into the group of tagB.4. Both tags are already assigned to the same or different groups: in addition to the
group updates, those tags, which are already assigned to one of the corres-
ponding groups, have to be considered as well. Existing group members of
tagA will be assigned to the group of tagB.
Re-grouping based on Wikipedia results The second group update is perfor-
med if two tags are assigned to the same Wikipedia article. Just as in the previous
update step based on spell-checking and translation results, we consider existing
groups and its members, which means that also other group members may be
affected by this update operation.
The selection of relevant tags We assume that all tags, or groups of tags,
containing either a Wikipedia or a Wordnet reference, are relevant for the genera-
tion of ontologies. The relevancy of the remaining tags and groups thereof is based
on their frequency of occurrence in the folksonomy, i.e., on their usage. If this
frequency is below a certain threshold, the tag or the tag group will not be
considered. All affected tags will, therefore, be marked with a corresponding flag,
indicating that the tag is not relevant for future steps towards the generation of
ontologies. If the frequency of usage is above the given threshold, the tag, or tag
group, will be considered to describe a new term created by the tagging community.
Step 7: Find Representative for Each Group This step is about finding the most
representative tag in a tag group. This decision is taken as follows: the tag groups
defined in the previous steps can contain many single tags. By definition all tags in
a group are equivalent to each other, regardless of whether they are misspelled,
occur with a certain lower or higher frequency, or are translations from other natural
languages. For the generation of ontologies, however, we need to identify which of
these tags is the most representative for the meaning of the corresponding tag group.
Preference is given to tags occurring in Wordnet and Wikipedia references, in this
order. If neither is the case, the decision is based on the highest frequency of usage.
Step 8: Create Co-occurrence Matrix Co-occurrence matrices provide the means
to derive some kind of semantic relation between two entities. Amongst many
48 B. Ell et al.
others, this approach was chosen by (Begelman et al. 2006; Cattuto et al. 2007a, b;
Simpson 2008; Specia andMotta 2007) to analyze connections between tag entities.
The symmetric n � n co-occurrence matrix M contains information about how
frequently two tag entities are used to annotate the same resource. The value mij,
representing the intersection of (entityi, entityj) for 1 � i, j � n, corresponds to the
frequency with which the two tag-entities entityi and entityj were used to annotate
the same resource. The diagonal elements mij, where i ¼ j, of the matrixM contain
information on how often the tag-entity entityi was used at all. This serves as
a starting point for steps 9–11.
Step 9: Calculate Similarities The co-occurrence matrix is a starting point to
derive relations between tag entities. From a simplistic point of view, the relation
between co-occurrence values and the total frequency of tag entries (as proposed
by (Begelman et al. 2006)) can be seen as a good indicator for the relation of two tag
entities. This approach, however, has one important disadvantage: it does not take
into account similarities of the two tags to other tags or tag groups. A vector-based
similarity measurement, as proposed in (Specia and Motta 2007), resolves this issue.
A vector represents a row (or column) of the co-occurrence matrix. The similarity
measure is based in our case on the Pearson correlation coefficient. Algorithm 1
below shows how the Pearson correlation coefficient is calculated for two variables
X and Y, the means �X and �Y and standard deviations Sx and Sy, respectively.Algorithm 1 Pearson Correlation Coefficient
r ¼Pni¼1
ðXi � �XÞ � ðYi � �YÞðn� 1Þ � Sx � Sy
A positive coefficient value is evidence for a general tendency that large values
of X are related to large values of Y and that small values of X are related to small
values of Y. A correlation above 0.5 is an indicator that the two vectors are strongly
correlated.
Step 10: Enrich the Co-occurrence Matrix with Co-actoring Information The
outcomes of step 9 do not allow us to derive relations between tags. This holds in
particular for tags that are used frequently, but only by a limited number of users.
Usually the insertion of tags by spam robots is causing this phenomenon. Even
though there are many related tags with correlation values below 0.5, the threshold
can not be lowered any further without taking the risk to derive faulty relations as
well. To cope with this issue we enrich with so-called “co-actoring information”.
This key-figure can be calculated in a manner similar to the co-occurence informa-
tion, the only difference being the fact that the focus lies rather on the users instead
of tags. As such, the co-actoring information for two tags is defined as the total
number of users who used both tags.
Step 11: Create Clusters In this step, we aim at creating sets of strongly related
tags that we refer to as “clusters”. To do so we calculate the relation of a tag entity
to the total number of usage and the co-occurrence/co-actoring information and
raise the correlation coefficient if the relation proportions are high enough.
3 Enterprise Knowledge Structures 49
Algorithm 2 shows the exact formula, where ccoff denotes the correlation coeffi-
cient of two tag entities, #(tag1) and #(tag2), denote the total usage of a certain tag,coac(tag1,tag2) stands for the co-actoring information of the tag tag1 and tag2and cooc(tag1,tag2) represents the co-occurrence value of the two tags.
Algorithm 2 Correlation Coefficient Strengthener
r ¼ ccoeff � coacðtag1;tag2Þ#ðtag1Þþ#ðtag2Þ�coacðtag1;tag2Þ
� �� coocðtag1;tag2Þ
#ðtag1Þþ#ðtag2Þ�coocðtag1;tag2Þ� �
� 100
The algorithm minimizes the problem of spam entries and related tags with
lower correlation coefficients dramatically. Tag pairs with either a basis correlation
above the defined threshold th1 or with a strengthened correlation coefficient to
reach the threshold are then automatically considered to be related and form the
basis for a cluster.
Tags are merged into one cluster only if the calculated correlations between tag
entities, which are indirectly connected by the transitive law, are above another
threshold th2. This means, only if cooc(tag1,tag2) � th1, cooc(tag2,tag3) � th1and cooc(tag1,tag3) � th2, the three tag entities belong to the same cluster. Addi-
tional tag entities are added to a cluster only if all correlation values, with respect to
the other tags in the cluster, exceed the defined threshold th2.While useful, applying this strategy results in a relatively high number of very
similar clustering differing only in one or two elements. To solve this issue we
apply two smoothing heuristics as follows.
1. If one cluster is completely contained in another one, the smaller cluster is
deleted.
2. If the differences between two clusters are within a small margin and, addition-
ally, the number of elements of both clusters exceeds a certain percentage with
respect to the total number of elements of both clusters, the smaller cluster is
deleted and the tags not included in the larger one are added to it.
The second smoothing heuristics is depicted in Algorithm 3, where #(cl1) and#(cl2) denote the number of elements within the clusters cl1 and cl2, respectively.The relevant threshold in this algorithm is thcl.
Algorithm 3 Second Smoothing Heuristics for Two Clusters
if#ðcl1\cl2Þ#ðcl1[cl2Þ � thcl
and thcl � #ðcl1Þ#ðcl1Þþ#ðcl2Þ
and thcl � #ðcl2Þ#ðcl1Þþ#ðcl2Þ
then remove(cl1), remove(cl2)and insert(cl1\cl2)
Step 12: Create OntologiesAll the terms occurring in a cluster are assumed to be
related to each other in some way. The concrete type of the inter-tag connections is,
nevertheless, hardly resolvable. We consider this limitation to be of less importance
for creating lightwight ontologies. We use the SKOS standard (http://www.w3.org/
50 B. Ell et al.
2004/02/skos/), which allows establishing associative links between concepts with-
out the need to further specify their semantics. More precisely, the SKOS property
skos:related can be used to designate all kinds of relationships amongst terms
within one cluster. The clusters themselves are considered to be the domain of the
ontology, for which meta-properties (e.g., by using Dublin Core) can be included.
In SKOS, skos:ConceptScheme is used to identify a certain ontology. As
a consequence, all entities within this scheme have to include a reference to this
scheme; this is achieved through the construct skos:inScheme. The terms
within a cluster represent the entities the ontology consists of. This direct mapping
is possible as within the SKOS language, there is no distinction between classes and
instances. The construct to designate these entities is skos:Concept (Fig. 3.8).The SKOS constructs previously mentioned allow us to define the basis structure
of the ontologies. The information that was collected with respect to translations,
spell-checks, and so on, is used to enrich the ontologies. The preferred label for
a concept is the respective representative of a tag group, which is denoted by
skos:prefLabel. If there are other terms within the same group of tags,
which do occur in Wordnet, the corresponding term can be considered as a valid
substitute for the preferred label, information which is captured through the skos:altLabel construct. As SKOS does allow language distinctions, this feature is
Fig. 3.8 Example ontology created through our method
3 Enterprise Knowledge Structures 51
also used for both preferred labels and alternative labels. If a translation was found
for a tag, this information is attached to the label, otherwise the label is considered
to be English. This is done by using standard XML annotation, e.g., skos:prefLabel xml:lang ¼ “EN”. All other tags of a certain group are consideredto be “hidden labels” for the corresponding concept. The set of labels marked by
skos:hiddenLabel comprises common spelling mistakes.
3.3.2.2 Integrating Freebase into Semantic MediaWiki
This section gives a brief overview of an extension to SMW that allows the use of
inline queries to query Freebase (http://www.freebase.com) content via a mediator.
The mediator creates an MQL query (Metaweb Query Language, the query lan-
guage used in Freebase), handles the communication with Freebase, and returns
query results in the same way as for conventional SMW inline queries. A full
documentation of the extension is available in (Ell 2009).
Imagine you want to create a list of all European countries and their populations
within your SMW-based knowledge management system. This information is
available in general-purpose knowledge bases such as Freebase, and can be impor-
ted into the local SMW installation. The query statement could look as follows,
where the source argument is an extension of the original AskQL syntax indicating
the external knowledge base to be used.
{{#ask: [[Category:Country]] [[Located in::Europe]]
| ?Population| source ¼ freebase
}}
The AskQL query has to be translated into an MQL query, which could look
as follows.
[{
"/type/object/name" : null,"/location/statistical_region/population" : [{
"number" : null}],"/type/object/type" : "/location/country","/location/location/containedby" : [{
"/type/object/name" : "Europe"}]
}]
In order to be able to perform this translation additional information is needed.
In this case it is necessary to know that
52 B. Ell et al.
1. the category Country maps to /location/country,2. the property Located in maps to /location/location/containedby, and3. the print request Population maps to /location/statistical_region/population
where the field storing the value has the name number.
The transformation, which essentially follows a local-as-view approach, is
presented in detail in (Ell 2009). Mapping information is stored in pages via
properties, thus being editable and reusable for various inline queries.
Category mapping information is stored on category pages using the property
freebase category mapping. For example the page Category:City (the pagedescribing this category in the category namespace) may contain the statement
[[freebase category mapping::/location/citytown]].Page mapping information is stored on pages in the main namespace using the
property freebase page mapping. For example the page Karlsruhe may contain the
statement [[freebase page mapping::#9202a8c04000641f800-00000000b283e]].
Property mapping information is stored on property pages using the properties
freebase property mapping and freebase property type. For example the page
Property:Population (the page describing this property in the property namespace)
may contain the statements [[freebase property mapping::/loca-tion/statistical_region/population ;number]] and [[free-base property type::number]]. Path elements are separated by ‘;’. Ifno type mapping is specified then the standard type string is assumed per default.
Print request mapping information is stored on property pages since print
requests relate to properties. For storing the mapping information the property
freebase pr mapping is used. For example the property page Property:Located inmay contain the statement [[freebase pr mapping::/location/loca-tion/containedby]].
In case the mapping information is missing or can not be properly interpreted,
the extension behaves as follows.
Ambiguities The page where mapping information is expected to be contained
may contain the mapping property multiple times. For example a category page
may contain several properties with the property name freebase category mapping.In this situation the mapping information is ambiguous and only the first result
returned by the SMW database is used.
Property type If the property type of a property is not given using freebaseproperty type then type string is assumed.
Page mapping information missing If no page mapping exists for page P then
an MQL query is created where an entity is requested with name P. If the query is
specified with parameter language ¼ L then an MQL query is created that requests
an entity that has the name P in language L.Category mapping information missing If no category mapping information
is found then the category statement and all subordinated statements in the descrip-
tion object tree returned by the query processor are ignored.
3 Enterprise Knowledge Structures 53
Property mapping information missing If no property mapping information
is found then the property statement and all subordinated statements in the descrip-
tion object tree returned by the query processor are ignored.
This behavior is robust since missing mapping information is ignored. In case
of ambiguities or missing mapping information, a warning is displayed to the user.
Thereby a step-by-step development and improvement of the query is supported.
3.3.3 Repairing Knowledge Structures
Quality issues are a natural consequence of the collaborative, integrated knowledge
engineering approach followed by Semantic MediaWiki and its extensions. There-
fore, our solution also includes techniques to support users in detecting and
correcting potential modeling errors or missing information. This section provides
an overview of the types of quality issues we deal with and the implementation of
the associated knowledge repair functionality.
Similar Names In an ontology we have different types of entities. A common
issue with adding entities to an ontology is that a user might overlook that the entity
she intends to add is already in the ontology with a name slightly different from the
name the user would have chosen. By adding the entity, the user introduces
redundancy to the ontology which makes the ontology unnecessarily larger, and
more error prone. To avoid such issues we measure similarities between entities via
the Levenshtein distance, and present the results to the user, who then has to decide
whether the entities under consideration represent the same and thus should be
merged, or whether they do not represent the same and therefore should be kept
separately in the ontology.
Similar Property Sets The idea here is to compare the property sets of ontology
classes in order to identify potential similarities. The ontology editor introduced in
Sect. 3.3 displays all the sibling categories which have at least 50% of their
properties in common (see Fig. 3.12) for the user to decide for appropriate action.
Cycles and Redundancies This measurement identifies cycles within a special-
ization-generalization hierarchy (see Fig. 3.9). Similarly the knowledge repair
functionality includes means to identify redundant is-a relationships, which are
presented as decision support to the user.
Missing Properties The underlying rationale for this metric is the inherent
difficulties experienced by knowledge modelers in distinguishing between the
data and the schema level of an ontology. Here we display those ontological
primitives that do not have any successors in the hierarchy, thus indicating missing
specialization-generalization properties or misclassifications of specific entities
as classes or instances.
Category knowledge repair The previously discussed attempts to solve
problems are used primarily by certain users who aim at keeping the knowledge
base consistent. The methods mentioned enable the user to get an idea which
categories are part of a problem of the knowledge base no matter which taxonomy
54 B. Ell et al.
they belong to. However, there are also users who create an ontology because
the domain under consideration is a domain of interest of such a user. Therefore
the user might be keen on creating an error-free ontology. Instead of using each
approach sequentially in order to resolve the issues about a certain category, the
user also has the possibility to get all information about one category at a time.
Besides the previously mentioned methods the user gets also information about
minimum, average and maximum values which can be compared to the values of
the category under consideration as well as information about the meaning of
certain figures, which is useful for the non experienced users. In order to guide
the attention of the user to severe problems these are marked with a symbol or red
color. Minor problems are marked and all the other information is not marked
(see Fig. 3.10).
Category statistics Some of the previously described methods provide the
user with information of all categories regarding one specific type of problem.
The method Category knowledge repair in contrast provides the user with informa-
tion of all the types of problems regarding one specific category. This approach
combines these two types of problem solving attempts. It displays all categories
together with the results of each problem solving attempt. Therefore the user gets
all information about issues regarding all categories. The use of this approach is to
have a global view on the situation of taxonomies and the categories. Without such
a comprehensive view it can be rather difficult to solve issues which spread over
many categories. Then a user would have to jump from one category to another
many times to resolve an issue. This gets more complicated if the branching of the
category under consideration is more complex than it is with a category only having
one supercategory and one subcategory. So far, the user gets quite the same
information for all categories, as he does when using the Category knowledge
repair approach for one specific category. In order to guide the attention of the
Fig. 3.9 Categories in cycles, categories with redundant relationships and entities with similar
names
3 Enterprise Knowledge Structures 55
Fig. 3.10 Category knowledge repair
Fig. 3.11 Category statistics
56 B. Ell et al.
user to severe problems these are eye-catchingly marked. Minor problems are
highlighted and all the other information is not marked as seen in the figure (see
Fig. 3.11).
Category and Property Histogram In an ontology we have many entities
starting with different letters. In order to get an overview of the distribution of
entities starting with a specific letter in the ontology in relation to the alphabet
a histogram can be very useful. It provides a comprehensive view on how many
entities start with a specific letter in comparison to other letters (see Fig. 3.12).
A normalized histogram can point out unusual things, however this requires that
there is a certain number of entities in the database. The more entities there are
the more likely they will follow a specific distribution regarding their first letters.
3.4 Conclusions
The chapter has covered the area of enterprise knowledge structures, starting from
the requirements and research questions derived from use cases all the way to
methodologies and implementations to bridge the different heterogenous structures
that are in use today.
We expect that a common language for representing knowledge structures will
foster further development and research in this area. The research results presented
in this chapter are examples of what can be achieved once some foundational
questions (such as the representation language or the necessary expressivity) have
been settled, and we can move forward towards unifying knowledge management
Fig. 3.12 Category histogram and categories with similar property sets
3 Enterprise Knowledge Structures 57
tools and methodologies, further integrating results from heterogeneous areas in
order to support the knowledge worker to the fullest possible extent.
Enterprise knowledge structures are heterogeneous in nature, and their inte-
grated use requires a framework that allows understanding the trade-offs between
different structures, and optimizes for given scenarios.
Many enterprises may already apply folksonomy-like systems. We have shown
how folksonomies can be used as the foundation for developing lightweight
ontologies which can then be in turn used to connect to further knowledge sources.
Besides tagging, we have explored further Web 2.0 inspired paradigms, and imple-
mented extensions to a wiki-based system that allows for the seamless integration
of external data sources like Flickr or a company database. This system allows
for explicit but lightweight management of an ontology within the wiki-interface,
and powerful gardening and knowledge quality assessment tools.
References
Ankolekar A, Kr€otzsch M, Tran T, Vrandecic D (2007) The two cultures: mashing up web 2.0 and
the semantic web. In: WWW ’07: proceedings of the 16th international conference on world
wide web, ACM Press, New York, pp 825–834, ISBN 9781595936547. doi: 10.1145/
1242572.1242684, URL http://dx.doi.org/10.1145/1242572.1242684. 2007
Begelman G, Keller P, Smadja F (2006) Automated tag clustering: improving search and explora-
tion in the tag space. In: Proceedings of the collaborative web tagging workshop co-located
with the 15th international world wide web conference (WWW2006), 2006
Cattuto C, Loreto V, Pletronero L (2007a) Semiotic dynamics and collaborative tagging. Proc Nat
Acad Sci U S A 104(5):1461
Cattuto C, Schmitz C, Baldassarri A, Servedio VDP, Loreto V, Hotho A, Grahl M, Stumme G
(2007b) Network properties of folksonomies. AI Commun 20(4):245–262
DrakosN,Rozwell C,BradleyA,Mann J (2009)Magic quadrant for social software in theworkplace.
Gartner RAS core research note G00171792, Gartner. http://www.gartner.com/technology/
media-products/reprints/microsoft/vol10/article4/article4.html. Accessed date Jan 2010
Ell B (2009) Integration of external data in semanticwikis.Master thesis, Hochschule,Mannheim, 2009
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA
Fensel D (2001) Ontologies: silver bullet for knowledge management and electronic commerce.
Springer, Berlin
Gangemi A, Guarino N, Masolo C, Oltramari A, Schneider L (2002) Sweetening ontologies with
DOLCE. vol 2473 of Lecture notes in artificial intelligence (LNAI), Springer, Siguenza, Spain,
pp 166–181, ISBN 3-540-44268-5
Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nat Genet
25:25–30
Grau BC, Horrocks I, Motik B, Parsia B, Patel-Schneider P, Sattler U (2008) OWL 2: the next step
for OWL. Web Semant Sci Serv Agent World Wide Web 6(4):309–322. ISSN 1570–8268. doi:
http://dx.doi.org/10.1016/j.websem.2008.05.001
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5
(2):199–220
Guarino N (1998) Formal ontology and information systems. In: Guarino N (ed) Proceedings
of the first international conference on formal ontologies in information systems (FOIS),
vol 46 of Frontiers in artificial intelligence and applications, IOS-Press, Trento, Italy, 1998
58 B. Ell et al.
Haase P, Herzig DM, Musen M, Tran DT (2009) Semantic wiki search. In: 6th annual european
semantic web conference, ESWC2009, vol 5554 of LNCS. Springer Verlag, Heraklion, Crete,
Greece, pp 445–460, Juni 2009
Hartmann J, Sure Y, Haase P, Palma R, Suarez-Figueroa MC (2005) OMV – Ontology metadata
vocabulary. In: Welty C (ed) Ontology patterns for the semantic web workshop, Galway,
Ireland, 2005
Horrocks I, Patel-Schneider PF (2004) Reducing OWL entailment to description logic
satisfiability. J Web Semant 1(4):7–26
Kr€otzsch M, Vrandecic D, V€olkel M, Haller H, Studer R (2007) Semantic wikipedia. J Web
Semant 5:251–261
Lenat DB (1995) CYC: a large-scale investment in knowledge infrastructure. Commun ACM 38
(11):33–38
Leveshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals.
Soviet Physics Doklady 10:707–710
McGuiness DL (2003) Ontologies come of age. In: Fensel D, Hendler J, Lieberman H, Wahlster W
(eds) Spinning the semantic web: bringing the world wide web to its full potential. MIT Press,
Cambridge, MA
Motik B, Grau BC, Horrocks I, Wu Z, Fokoue A, Lutz C (2008) OWL2 web ontology language:
profiles. W3C Working Draft 2 December 2008, Available at http://www.w3.org/TR/2008/
WD-owl2-profiles-20081202/.
Pease A, Niles I, Li J (2002) The suggested upper merged ontology: a large ontology for the
semantic web and its applications. In: Working notes of the AAAI-2002 workshop on
ontologies and the semantic web, 2002
Simperl E, W€olger S, B€urger T, Siorpaes K, Han S-K, Luger M (2010) An ontology authoring tool
for the enterprise 3.0. Taylor and Francis Publishing, 2010, London
Simpson E (2008) Clustering tags in enterprise and web folksonomies. Technical Report HPL-
2008-18, HP Labs, 2008
Sowa JF (1995) Top-level ontological categories. International Journal of Human-Computer
Studies 43(5/6):669–685. ISSN 1071–5819. doi: http://dx.doi.org/10.1006/ijhc.1995.1068
Specia L, Motta E (2007) Integrating folksonomies with the semantic web. In: Proceedings of the
4th European semantic web conference (ESWC2007), pp 624–639, 2007
Uschold M, Grueninger M (1996) Ontologies Principles, Methods and Applications. Knowledge
Engineering Review 11(2):93–155
Vrandecic D (2009) Towards automatic content quality checks in semantic wikis. In: Social
semantic web: where web 2.0 meets web 3.0, AAAI spring symposium, Springer, Stanford,
CA, March 2009a
Vrandecic D (2009) Ontology evaluation. PhD thesis, Karlsruhe Institute for Technology,
Germany, 2009b
Vrandecic D, Kr€otzsch M (2006) Reusing ontological background knowledge in semantic wikis.
In: V€olkel M, Schaffert S (eds) Proceedings of the first workshop on semantic wikis – from
wiki to semantics, Workshop on Semantic Wikis. AIFB, ESWC2006, June 2006. URL http://
www.aifb.uni-karlsruhe.de/Publikationen/showPublikation?publ_id¼1211
3 Enterprise Knowledge Structures 59
4
Using Cost-Benefit Information in OntologyEngineering Projects
Tobias B€urger, Elena Simperl, Stephan W€olger, and Simon Hangl
4.1 Introduction
Knowledge-based technologies are characterized by the use of machine-under-
standable representations of domain knowledge – in form of ontologies, taxonomies,
thesauri, folksonomies, and others – as a baseline for the realization of advanced
mechanisms to organize, manage, and explore information spaces. These techno-
logies have been promoted since the nineties, or even earlier, but experienced a real
revival only with the raise of the Semantic Web almost a decade ago. Primarily
introduced by Sir Tim Berners Lee, the originator of theWorldWideWeb, the idea of
providing the current Web with a computer-processable knowledge infrastructure in
addition to its current, semi-formal and human-understandable content foresees the
usage of knowledge structures which can be easily integrated into, and exchanged
among arbitrary software environments in an operationalized manner (Berners-Lee
et al. 2001). In this context, these structures are formalized usingWeb-suitable, and in
the same time semantically unambiguous representation languages; are pervasively
accessible; and can be – at least theoretically – shared and reused across the World
Wide Web. Complementarily, similar forms of knowledge organization, typically
referred to as ‘folksonomies’, emerged in the realm of Web 2.0, most prominently in
relation to technologies and platforms promoting community-driven knowledge
T. B€urger (*)
Capgemini Carl-Wery-Str. 42, Munich D – 81739, Germany
e-mail: [email protected]
E. Simperl
Karlsruhe Institute of Technology, KIT-Campus S€ud, D-76128 Karlsruhe, Germany
e-mail: [email protected]
S. W€olger • S. HanglSemantic Technology Institute, ICT – Technologie Park Innsbruck, Technikerstraße 21a,
A-6020 Innsbruck, Austria
e-mail: [email protected]; [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_4, # Springer-Verlag Berlin Heidelberg 2011
61
management and sharing. Just as many other (by-) products of Web 2.0 enterprises,
they are mainly user-generated, and can be enriched with formal meaning and
transformed into characteristically lightweight, machine-understandable, widely
accepted ontologies. Chapter ‘Enterprise Knowledge Structures’ provides an over-
view of the various forms of knowledge organization populating the enterprise
knowledge management landscape, of their characteristics, relationships, and of the
ways they can be created and maintained.
The benefits of semantic technologies across business sectors and application
scenarios have been investigated at length in the Semantic Web literature of the past
decade (Davis et al. 2003; Fensel 2001; Hepp et al. 2008; McGuinness 2003).
Nevertheless, most discussions have been carried out at the technical level, and an
economic analysis of ontology-based applications in terms of their development,
deployment and maintenance costs – or of any kind of tangible return value
assessment – have only been recently considered. In previous work of ours we
have introduced a series of models that analyze and predict the costs and benefits
associated with the development of knowledge structures such as ontologies,
taxonomies and folksonomies, and of the semantic applications using them (B€urgeret al. 2010; Paslaru Bontas Simperl et al. 2006; Popov et al. 2009; Simperl et al.
2009b; Simperl et al. 2010). We will not introduce these models here, as they have
been extensively described in previous publications of ours which the reader is
kindly referred to. This chapter can be seen as a continuation of our work – while in
past publications we have focused on the definition of the models and their evalua-
tion, this chapter is concerned with the application of the models in enterprise
projects and provides guidelines – both scenario and tool-oriented – that assist
project managers in utilizing the models throughout the ontology life cycle.
The remainder of this chapter is structured in three parts. The first part gives
a brief overview of the results of a survey performed between October 2008 and
March 2009 of 148 ontology engineering projects from industry and academia. We
have analyzed the findings of this survey in order to identify how cost-related
information could be of use to alleviate some of the issues ontology engineering
is currently confronted with. The second part investigates the life cycle of
ontologies and ontology-based applications and presents 15 scenarios in which
cost-benefit information could support decision-making. The third part proposes
a number of methods and tools by which models such as ONTOCOM, FOLCOM or
SEMACOM can be applied in these scenarios in an operationalized manner.
4.2 The Impact of Cost-Benefit Information on OntologyEngineering Projects
In this section we briefly present the results of a six months empirical survey
that collected data from 148 ontology engineering projects from industry and
academia in order to give an account of the current ontology engineering practice,
62 T. B€urger et al.
and the effort involved in these activities. Through its size and the range of the
subjects covered, the survey gives a comprehensive overview of the current state
of practice in ontology engineering as of 2009. A detailed report and analysis of
its findings has been published in (Simperl et al. 2009a). For the purpose of this
chapter, we will focus on those aspects which refer to activities in which decisions
could be influenced by the availability of reliable cost-benefit information.
The survey pointed out that the use of methodological support for developing
ontologies clearly varies from project to project. Concrete use of an ontology
engineering methodology was, for instance, observed only in one out of nine
projects. As improvement in ontology engineering is concerned, participants
suggested the following: Project settings in which domain-analysis and evaluation
needs run high, mandate domain-specific customizations of the generic
methodologies available. This can be confirmed by our data analysis, which
indicates low tool-support for these ontology-engineering activities. High tool-
support, therefore, was shown to reduce development time considerably. Such
customizations might be particularly beneficial for very complex domains, for the
development of ontologies with broad coverage, and for those that involve non-
common-sense knowledge such as life sciences. A last issue to be highlighted, in
particular as more and more high-quality ontologies are becoming available, is
ontology reuse. Our survey showed that reuse is still not systematically performed,
and often hampered by technical obstacles. While the adoption of ontology-based
technologies will continue, it is likely that such scenarios will gain in relevance and
significant efforts will need to be invested in revising existing ontology reuse
methods, techniques and tools towards providing the adequate level of support
for non-technical users.1
With respect to process-related aspects we could observe discrepancies
between (1) the complexity of particular activities as perceived by ontology engi-
neering practitioners; (2) the significance of these activities as measured in terms
of their impact on the total development costs; and (3) the level of maturity
achieved at present by the R&D community with respect to methods and tools
supporting these activities. To further investigate these discrepancies, we analy-
zed the correlation between various ontology engineering aspects which are
covered by the survey.
1 The survey did not cover reuse efforts related to the usage of ontologies in the context of the
Linked Open Data (LOD) initiative, where a (relatively small) number of vocabularies is reused
through so-called ’interlinking’. With LOD acting as a real game-changer in the semantic-
technologies landscape, a new survey is required in order to fully understand the state-of-practice
of ontology engineering in such data-driven scenarios.
4 Using Cost-Benefit Information in Ontology Engineering Projects 63
4.2.1 Correlation between Ontology Engineering Aspects andTotal Effort
Table 4.1 shows the correlation between ontology engineering aspects and the
actual project effort in person-months.2
DCPLX Out of the six positively correlated factors, domain analysis was shown
to have the highest impact on the total effort, achieving a significantly higher
correlation value over the other five activities. This is an assessment of the time-
consuming nature of the knowledge-acquisition process, which was also confirmed
by comments received from the participants in the interviews, and by previous
surveys in the field. As our results point out, tool support for this activity was very
poor. Many interviewees questioned the utility of available tools, which were per-
ceived as too generic especially when it came to ontologies developed for highly
specialized domains such as health care, or in projects relying on end-user contri-
butions. In addition, participants shared the view that process guidelines tailored
for such specialized cases are essential for the success of ontology engineering
projects. Existing methodologies are very generic when it comes to issues of
knowledge elicitation. They state the imperative need of a close interaction between
domain experts and ontology engineers, but extensive studies on using techniques
such as concept maps, card sorting and laddering (Cooke 1994) are largely missing.
These particular techniques, complemented with detailed insights on the practices
established in the respective domains, could be very useful to design specially
targeted methodologies and guidelines for ontology engineering.
Table 4.1 Correlation between ontology engineering aspects and effort
Ontology engineering
aspect Description Correlation
DCPLX Complexity of the domain analysis 0.496
CCPLX Complexity of the ontology conceptualization 0.237
ICPLX Complexity of the ontology implementation 0.289
REUSE Percentage of integrated reused ontologies 0.274
DOCU Complexity of the documentation task 0.346
OE Ontology evaluation 0.362
OCAP/DECAP Capability of the ontologists/domain experts �0.321
OEXP/DEEXP Expertise of the ontologists/domain experts �0.192
PCON Personnel continuity �0.134
LEXP/TEXP Level of experience with respect to languages and tools �0.172
SITE Communication facilities in decentralized
environments
�0.168
2We use the cost-driver abbreviations defined in the ONTOCOM model [Popov et al. 2009].
A positive correlation means, that, if the value of one factor rises, the other rises as well; the
opposite holds for negatively correlated factors. For instance, if OCAP decreases, the effort is
expected to rise.
64 T. B€urger et al.
OE The quality of the implemented ontology remains a major concern among
ontology engineers. Nevertheless, the projects we surveyed seldom used any of the
existing ontology-evaluation methods and techniques, but relied on expert judge-
ment. In projects in which systematic ontology-evaluation practices were observed,
they immediately had a significant impact on the effort. More than 50% of surveyed
projects reported minor effort in formally testing the ontologies they developed.
Other 48% reported fair use of simple testing methods which were mostly carried
out manually. Only three projects performed extensive testing using several
methods. The survey indicated a combination of manual testing and self-validation
by the engineering team as the preferred and common choice in most projects.
At this juncture, ontology evaluation plays a passive role in ontologies developed in
less formal project settings such as in academia. However, as ontology-evaluation
practices increase with the demand for quality assurance, the associated impact on
effort can be substantial.
DOCU Documentation proved to be a costly factor as well. The survey results
point out that most of the developers of highly-specialized ontologies perceived
documentation as a resource-intensive activity. This was not necessarily true for
less complex ontologies, and in cases where the development process was less
formal.
CCPLX, ICPLX The ontology conceptualization, which is responsible for the
modeling of the application domain in terms of ontological primitives (concepts,
relations, axioms), and the ontology implementation where the conceptual model
is formalized in a knowledge-representation language, are positively correlated
factors. However, their impact on the total effort is not as high as the one of the
domain analysis or the ontology evaluation. This outcome speaks for the relatively
well understanding and high-quality tool support for these activities of the ontology
engineering process.
OCAP/DECAP, OEXP/DEEXP, LEXP/TEXP The impact of the personnel-
related aspects suggests that more training programs in the area of ontology
engineering, better collaboration support, and an improved, more fine granular
documentation of the decisions taken during the ontology engineering process
may have positive effects.
SITE The data analysis produced counter-intuitive results for the SITE para-
meter which accounts for the degree of distribution of the team and their com
munication and collaboration facilities. Here the analysis suggested that email
communication lowered the effort needed to build ontologies while frequent face-
to-face meetings increased the effort significantly. This could be based on the
assumption that face-to-face meetings produced more different views on the onto
logy, and resulted in more discussions which, of course, raise the costs of ontology
development.
The slight dominance of factors such as DCPLX (domain analysis) and OE(ontology evaluation) indicates that any facilitation in these activities may result
in major efficiency gains. Even though tools such as wikis may be helpful especially
in collaborative settings, they are still rarely used for the purpose of domain
analysis. More generally, the results of the interviews indicated a low tool support
4 Using Cost-Benefit Information in Ontology Engineering Projects 65
for this task. This situation could be improved by applying methods such
as automated document-analysis, or ontology learning approaches, to support the
analysis of the domain, the assessment of the information sources available and the
knowledge-elicitation process. Extending existing methodologies with specific
empirically determined practices in place in particular vertical domains could
also have a positive effect. A similar conclusion can be drawn for ontology
evaluation. Despite of the availability of automated approaches such as unit tests,
consistency checks or taxonomy-cleaning techniques, ontology evaluation still
lacks tools which are easy to use and comprehensible for most users.
Concluding the correlation analysis between ontology engineering aspects and
effort, we can state that process activities such as domain analysis, conceptuali-
zation, implementation and evaluation, as well as the level of reusability of
the ontology, and the documentation requirements have a well-distributed correla-
tion factor associated with the effort. This means that each of these activities
exhibits a relevant impact on the effort, while at the same time indicating that no
individual activity plays a overwhelmingly dominating role. As expected, the
quality of the ontology engineering team is crucial for the success of a project;
it would be interesting to investigate, however, the effect of such aspects in more
collaborative scenarios, which could become the norm in ontology engineering.
The data set on which this analysis is based on is not relevant to highly decen-
tralized scenarios of community-driven ontology engineering. More research is
needed to assess the state of the art in the area of ontology reuse and associated
activities such as ontology understanding, merging, and integration. In this respect
the survey is not representative and should be revisited once this engineering
approach gains more importance, for instance, as a consequence of the wide scale
development of a critical mass of ontologies in diverse vertical sectors. An addi-
tional, equally interesting scenario is related to the increased popularity of data-
driven, LOD-oriented approaches, which reference vocabularies already in use in
the LOD cloud. Reuse is defined not necessarily at the level of ontologies to be
assessed, selected, modified and integrated, but in terms of data sets and mappings
between them. Provided a clear understanding of these new concepts, the costs
associated with such knowledge engineering exercises, and models which poten-
tially apply to the new scenarios, require further investigation.
4.2.2 Correlation between Ontology Engineering Aspects
In addition to the impact on the total development effort we analyzed the correla-
tion between specific aspects of ontology engineering projects. The most important
findings are discussed in the following.
Personnel-related aspects (Table 4.2) were shown to be positively correlated.
This was obvious for those questions referring to the capability and experience of
the ontology-engineering team. In most cases the survey showed that the capability
of the participants was largely based on their project experience. Additionally, the
66 T. B€urger et al.
software support available to projects carried out by the same ontology-engineering
team tended to remain unchanged. When new tools were introduced, the learning
period for experienced practitioners was much higher than for novel developers.
Similar observations were made in software engineering, in which habits of
software use have a significant influence on acceptance and adoption of new
software.
High correlation values were also measured between activities within the ontol-
ogy engineering process (Table 4.2). One, in particular, was between ontology
evaluation and ontology documentation (Table 4.3). Data analysis showed that
these results mainly apply to large-scale ontology engineering projects. This is
possibly due to the fact that such ontology-development projects run more exten-
sive evaluation tests, which in turn might lead to additional documentation effort.
Domain analysis was most highly correlated with conceptualization and implemen-
tation. The majority of the interviewees did not perceive a clear cut between the
conceptualization and the implementation activities. Conceptualization in most
cases was a lightweight description and classification of the expected outcomes.
In most of the projects surveyed there was no language- or tool-independent
representation of the ontology. Instead, the ontology was implemented with the
help of an ontology editor. In over 40% of the projects the development was
performed mainly by domain experts, who generally agreed that current ontology
editors are relatively easy to learn and utilize. This finding is different from the
observations of previous surveys and comparative studies, and confirms one more
time the fact that ontology engineering has reached an industry-strength level of
maturity.
According to this survey, in ontology engineering the main technological build-
ing blocks and development platforms are meanwhile available from established
vendors. Despite this promising position, the information known about the process
underlying the development of ontologies in practice is still very limited. Cost-
benefit information might lead to improvements along the lines we have argued
above. We have used the findings of the correlation analysis to identify scenarios in
which ontology engineering processes can be improved by paying proper attention
to cost-benefit information delivered through models developed in the ACTIVE
project.
Table 4.2 Correlation
between personnel-related
aspects
OCAP/DECAP OEXP/DEXP
OCAP/DECAP 1 0.552
OEXP/DEXP 0.552 1
LEXP/TEXP 0.489 0.584
Table 4.3 Correlation
between process-related
aspects
DCPLX DOCU
OE 0.211 0.389
4 Using Cost-Benefit Information in Ontology Engineering Projects 67
4.3 Guidelines for Using Cost-Benefit Information in OntologyEngineering
Starting with the findings of the survey discussed in the previous section, we have
performed an analysis of the ontology life cycle with respect to the utility of cost-
benefit information for decision-making purposes.
In order to provide measurements of specific key indicators throughout the
ontology life cycle, clear goals for improvement have to be defined. This is crucial
not only for the selection of the relevant indicators, but also for the definition of
associated methods and tools, and for directing the acquired information to the right
recipient. This can be achieved, for instance, through the GQM (‘goal-question-
metric’) approach (Ebert and Dumke 2007). GQM proposes three steps to identify
suitable metrics by (1) first defining the goals to reach (i.e., business objectives,
key performance indicators, improvement goals), (2) then reviewing the goals by
asking appropriate questions to reach the goals (i.e., improvement programs,
change management, project management techniques), and (3) finally, by defining
measurements to assess the current status (i.e., to review the work of employees,
and the development status of products or processes) (Ebert and Dumke 2007). To
identify how the availability of cost-benefit information could impact the operation
of ontology engineering projects, we will develop GQM templates for a series of
scenarios and activities, in which such information is expected to provide an
improved baseline for decision making.
4.3.1 Using Cost-Benefit Information Throughout the OntologyLife Cycle
From an enterprize perspective the ontology life cycle – just as any other product
life cycle – consists of four phases: (1) business case and strategy development,
(2) project and concept definition, (3) development and market entry, and
(4) maintenance and evolution. In each of these phases cost-benefit information is
typically used to support decision making (Ebert et al. 2005):
Business case and strategy development Cost-benefit information might be
used to support a trade-off analysis according to (development) time, costs, content,
benefit, or return of investment (ROI).
Project and concept definition Here this information is typically used to make
effort estimates, to develop budget plans, and to support buy vs reuse decisions.
Development and market entry Cost-benefit information might be used to
support trade-off analysis and to determine the costs to completion. Additionally
it might be used to influence the development process, staffing and tool acquisition,
and to perform progress controls.
68 T. B€urger et al.
Maintenance and evolution In this phase one could assess the costs and benefits
of extensions in order to decide for repair against replacement, and to estimate
maintenance costs.
We can distinguish between a number of recipients of this information. This
includes the senior management of the company, the project management team, and
the engineers. They use cost-benefit information as follows (Ebert and Dumke
2007):
Senior Management They use the information
– To obtain an reliable view on the business performance;
– For forecasts and indicators where action is needed;
– To drill down into underlying information and commitments; and
– For flexible resource refocus.
Project Management Project managers appreciate cost-benefit information
– For immediate project reviews;
– To check the status and forecast for quality, schedule, and budget;
– To identify follow-up action points; and
– To generate reports based on consistent raw data.
Engineers During the development phase engineers rely to cost-benefit
information
– To gain immediate access to team planning and progress insights;
– To get an overview into own performance and how it can be improved; and
– To identify weak spots in deliverables.
4.3.2 Scenarios Supported by Cost-Benefit Information
This section defines scenarios in which cost-benefit information can help to support
decisions in knowledge engineering projects. The scenarios will be further detailed
in Sect. 4.3.
Supporting the Business Case Development Phase In this phase, cost-benefit
information will mostly be used by the senior management planning a new business
or the introduction of a new technology. The information has to be based on data
from historical projects in order to be able to derive estimations based on experi-
ences from previous endeavors. In some cases, experience reports or analytical
insights may prove useful as well (B€urger and Simperl 2008), but the ultimate goal
is to have an objective baseline to decide for or against the introduction of semantic
technologies at a general level.
Scenario 1.1 Introduction of an ontology-based application Cost-benefit
information can be used to support the decision whether or not an ontology-based
application should be introduced and/or developed inside a company.
4 Using Cost-Benefit Information in Ontology Engineering Projects 69
Scenario 1.2 Development of certain featuresHere information about the costs
and benefits of ontology engineering can be used to assess the economic impli-
cations of specific features to be introduced in an application, or to define the scope
of an ontology to be developed.
Scenario 1.3 Expressivity/Type of knowledge structure Based on a set
of initial requirements, insights about the efforts and added value associated
with the development and maintenance of specific knowledge structures can be
used to decide upon the most appropriate type of knowledge structure, in particular
about the required expressivity, to support a certain business plan. The aim is to
perform a trade-off analysis between costs and benefits of the use of ontologies,
taxonomies, folksonomies, and other knowledge structures, using as input the
results delivered by methods such as ONTOCOM (Simperl et al. 2009b) or
FOLCOM (Simperl et al. 2010).
Supporting the Project Definition Phase In this phase, the information will be
used by the project management but also by the senior management team. Relevant
decisions include the overall planning of a project, but also rather specific issues
such as buy vs reuse vs in-house development. Decisions should be backed up with
detailed requirements in order to be able to provide accurate estimates.
Scenario 2.1 Effort planning Cost information can foremost be used to estimate
the effort needed to realize a project.
Scenario 2.2 Project planning Cost information can support project planning
and help to estimate the length of certain project phases.
Scenario 2.3 Team building Both cost and benefit measurements can be used
to assess the performance of team members with certain skill levels, and, provided
the availability of such measurements from previous projects, the information
can be used to support the initial assignment of staff to project teams.
Scenario 2.4 Tool acquisition Historical cost-benefit information typically
allows to judge the efficiency of the team in different project phases based on the
tools used. This information can be used in the planning phase of new projects to
decide about tools to use or purchase.
Scenario 2.5 Develop vs buy vs reuse Based on methods such as ONTOCOM,
associated with a deep knowledge of the existing ontologies that are relevant for the
project at hand, the project management decides whether to buy, reuse, or develop
an ontology from scratch.
Supporting the Development Phase Cost-benefit information can be used
to influence decisions in iteratively organized processes, in which, for instance,
the development of ontologies undergoes certain revisions, which is common for
collaborative ontology engineering. During the planning of a new iteration, differ-
ent development options can be assessed against their economic feasibility in
terms of expected costs or expected benefits. In non-iterative projects, the same
information can be used to assess the performance of the development process and
to make necessary adjustments. This can either be based on cost information, but
also on benefit information, indicating that certain features do not receive the
necessary acceptance among a preliminary user base.
70 T. B€urger et al.
Scenario 3.1 Length of development cycles Cost-benefit information can be
used to adapt the length of revision cycles as a trade-off between the benefit of an
ontology which might not meet the requirements of all its stakeholders and the costs
of its development.
Scenario 3.2 Performance assessment Monitored cost-benefit information can
be used for performance assessment of the team, or of the status of certain features
during project iterations.
Scenario 3.3 Revision planning Information from past iterations can be used to
do just-in-time revision planning and to revise priorities, to re-assign team members
to tasks, or to discontinue the development of certain features.
Supporting the Maintenance Phase The scenarios introduced so far reflect
needs arising at the beginning of the maintenance phase, but also during the use
phase of a developed ontology, or ontology-based application. Monitored benefit
information can be used to support extension planning, i.e., to judge what type of
benefit might be expected from an extension. Benefit information can also be used
to judge on the discontinuation of certain application features. Cost information
might most notably be used to decide about repairing or replacing broken features
by newly acquired or developed parts of an ontology or ontology-based application.
Scenario 4.1 Extension planning Information acquired in the use and main-
tenance phase of a project can support the planning of extensions into directions in
which more benefits are expected.
Scenario 4.2 Replace vs repair Information on development performance can
be used to judge whether to replace or repair a broken or insufficiently developed
feature.
Scenario 4.3 Maintenance planning Information acquired in previous projects
can help to estimate maintenance costs and to establish staffing plans for this phase.
Scenario 4.4 (Dis-)continuation of features Especially benefit information
acquired in the usage phase can provide valuable hints on the acceptance of certain
features.
We now turn to the presentation of guidelines supporting these scenarios.
4.3.3 Guidelines for the Realization of the Scenarios
The guidelines will be presented in form of GQM templates for each of the 15
scenarios, together with an analysis of the cost models developed in the ACTIVE
project with respect to their suitability for the realization of the scenarios and
potential adjustments and extensions.
Scenario 1.1 Introduction of an Ontology-based Application As explained in
the previous section, the purpose of this scenario is to provide methods and tools to
support the introduction of ontology-based applications. This decision might best
be supported using benefit information on semantic technologies in the context of
a concrete business case or application.
4 Using Cost-Benefit Information in Ontology Engineering Projects 71
In the following we provide a GQM template to grasp the problem at hand
(Table 4.4) and analyze the suitability of ACTIVE cost-benefit models to support
this scenario.
Two models are appropriate to support this scenario. First, the ONTOBEN
model which can be, as illustrated in (B€urger et al. 2010), used to calculate savings
based on the use of ontologies. Here the business case has to be analyzed, necessary
functionalities have to be identified and the benefit of ontology-based applications
to realize the use case at hand has to be highlighted. At the end, the model produces
a quantitative figure reflecting potential savings. Second, the ONTOLOGY-UIS
model can be used as a source of information on end user satisfaction which can be
interpreted as an indicator of the efficiency of ontology-based applications (B€urgerand Simperl 2008). To use this model, important features have to be identified
and their performance has to be assessed. In order to provide decision support for
this concrete scenario, benefit information from previous applications has to be
gathered and used to argue for or against features for similar applications in the
concrete business case based on analogy.
Scenario 1.2 Development of Certain Features The purpose of this scenario isto provide arguments for or against the development or introduction of certain
features in ontologies or ontology-based applications. The associated GQM tem-
plate is illustrated in Table 4.5.
Two models of the ACTIVE cost-benefit framework are appropriate to support
this scenario. First, SEMACOM, which assesses costs of semantic applications.
It can be used to produce estimates of the prospected costs necessary for the
Table 4.4 GQM template for scenario 1.1
Object of study A concrete business case for which technological possibilities
are being discussed
Purpose To judge whether or not the use of ontologies or ontology-based applications
makes sense in this case
Focus To identify and highlight potential business benefits of using ontologies or
ontology-based technologies in this case. This might be potential savings or
positive user acceptance values
Stakeholder A business case is typically discussed on a senior management level
Context factors Factors which influence this scenario are the requirements of the business case,
available development options, as well as restrictions in the company
Table 4.5 GQM template for scenario 1.2
Object of Study A concrete application feature that shall be part of a potentially
new or existing ontology-based product
Purpose To judge whether or not to implement a certain feature
Focus The focus is on the identification of a potential business or end user benefit
arising through this feature and to estimate costs needed to develop such
features
Stakeholder Senior management
Context factors Factors which influence this scenario are the requirements of the business case,
information on end user satisfaction of the current application and on the
performance of similar features
72 T. B€urger et al.
realization of a semantic application. Given that the granularity of the available
historical data for SEMACOM is sufficient, analogies could be drawn to judge on
(isolated) application features and to highlight their prospected costs and benefits.
This, however, demands for the availability of historical project data which is
ideally originated from the company in which the estimations should be made.
Furthermore the ONTOLOGY-UIS model can be used as an indicator for the effici-
ency of similar features implemented in other applications as already indicated in
the previous sections (B€urger and Simperl 2008). Again, this assumes that historical
project data is available.
Scenario 1.3 Expressivity and Type of Knowledge Structure The goal of thisscenario, which is further detailed in the GQM template available in Table 4.6, is
to provide tools that are easy to use and understand in order to compare existing
knowledge structures in terms of their prospected costs based on a number of
predefined parameters such as size, distribution of the project team, or complexity
of a domain.
A number of ACTIVE cost-benefit models can be used to estimate the costs
needed to develop an ontology (ONTOCOM) or a folksonomy (FOLCOM) and
thus to support this scenario. Support for this scenario, of course, demands for the
availability of calibrated cost models in order to be able to deliver the necessary
quantitative data to judge for or against a specific type of knowledge structure.
In addition, the ONTOLOGY-UIS model provides quantitative figures on the
efficiency of applications based on the different types of knowledge structures
used (B€urger and Simperl 2008), which can be used as arguments to influence the
decision at hand. To be able to apply ONTOLOGY-UIS reliably, historical data on
perceived benefits of ontology-based applications, which make use of different
types of knowledge structures, has to be available.
Scenario 2.1 Effort Planning In the effort planning scenario, the project
management team needs estimates on the needed effort to realize a project. This
scenario, which is detailed in the GQM template in Table 4.7, can be supported
using a number of ACTIVE cost models.
All methods targeting various types of knowledge structures developed through-
out ACTIVE are relevant for this scenario. In addition, SEMACOM can be applied
to estimate the costs needed for the development of a semantic application. In order
to use the models for that purpose, historical project data for the calibration of the
models is assumed to be available. If this is the case, initial requirements for the
Table 4.6 GQM template for scenario 1.3
Object of study A concrete business case in which the use of knowledge structures has already
been decided and for which the type of the structure still needs to be
discussed
Purpose To judge which type of knowledge structure is more appropriate for the current
business case
Focus The focus here lies on influencing this decision based on cost benefit information
Stakeholder Senior management
Context factors Past experiences in using specific types of knowledge structures
4 Using Cost-Benefit Information in Ontology Engineering Projects 73
ontology or ontology-based application can be used as input for the corresponding
model to generate effort estimates.
Scenario 2.2 Project Planning In the project planning scenario in Table 4.8 theproject-management team would like to generate estimates for the overall develop-
ment effort expected to be necessary for the development of an ontology, or
ontology-based application. The estimates should be split according to the phases
of the development life cycle.
In order to realize this scenario, and to use the mentioned models for project
planning, a number of pre-requisites have to hold. First of all, ONTOCOM and
SEMACOM would have to be adapted to the concrete methodologies followed in
the enterprise (Paslaru Bontas and Tempich 2005). More importantly, historical
project data, which is typically needed for the calibration of models, has to be
collected with a finer granularity than in (Paslaru Bontas et al. 2006) (Simperl et al.
2009b), which were concerned with effort estimates at the project level rather than
at the level of individual life cycle phases. An alternative approach could be the
distribution of the overall estimates, calculated using the existing models, across the
phases, as it was done for the inclusion of ONTOCOM information in the gOntt tool
(see Sect. 4.2).
Scenario 2.3 Team Building Here the aim is to support staff assignment to
project development teams. In order to realize this scenario ONTOCOM and
SEMACOM can be used to generate estimates that can be adjusted based on
different input factors such as team or tool experience. Based on these factors in
concrete projects, optimal team member profiles can be determined that minimize/
maximize the effort needed to develop an ontology or ontology-based application.
As usual, calibrated models have to be available to realize this scenario. If team
members shall be assigned to distinct phases the calibration of the model should be
Table 4.8 GQM template for scenario 2.2
Object of study A concrete project which shall be implemented
Purpose Estimate the length (and effort required) of development phases in the project
plan
Focus The focus here lies on splitting and distributing of the overall effort along
distinct project phases
Stakeholder Project management
Context factors Concrete development methodology and information on distinct phases.
Furthermore the application requirements of course influence the cost
estimates and thus the splitting as well
Table 4.7 GQM template for scenario 2.1
Object of study A concrete project which shall be implemented
Purpose Estimate the effort required to develop the knowledge structure inside the project
Focus The focus here lies on the costs of knowledge structures to be used
Stakeholder Project management
Context factors Application requirements, project team, domain, development environment (tool,
distributiveness), etc
74 T. B€urger et al.
adjusted accordingly, so values of the personnel-related cost drivers can be col-
lected at the level of the individual phases. So far, data collection has been at the
level of the overall project, and no lower-level information is taken into account
(Table 4.9).
Scenario 2.4 Tool Acquisition In the project planning phase a number of
decisions have to be taken which influence the project environment later on. This
includes the acquisition of tools that support one or more development phases. The
aim of Scenario 2.4 is to be able to argue for or against the acquisition of specific
tools based on empirical cost-benefit information (Table 4.10).
Cost estimates generated with ONTOCOM are influenced by the prospected tool
support, and thus allow to reason about the impact of certain type of tools on the
estimated development effort. In order to provide statements on the performance
and usefulness of concrete tools, the model has to be fed with more detailed data.
Otherwise it can only provide information on the influence of well or not well-
performing tools on the effort needed to conduct the whole development project.
Statements on specific tools are thus not possible. Besides gathering data on how
well the development process was supported by specific tools, the names of the
specific tools have to be collected as well. This information is not available in the
data set described in (Simperl et al. 2009b).
Scenario 2.5 Develop vs Buy vs Reuse Devising the most adequate engineering
strategy is based on both technical and economic matters. Scenario 2.5 (Table 4.11)
covers the latter.
A model which has been tailored to support this type of scenarios is
ONTOCOM-R which is an extension of the original ONTOCOM model towards
ontology reuse (Simperl and B€urger 2010).The ONTOCOM-LITE and the SEMACOMmodels, which account for the costs
of taxonomies, and semantic applications, respectively, would have to be extended
in a similar direction, paying proper account to the characteristics of the reuse
Table 4.9 GQM template for scenario 2.23
Object of study A concrete project which shall be implemented
Purpose Plan and optimize the allocation of team members to the project or distinct
phases
Focus The focus here lies on identifying optimal team member profiles in order to
realize the project in a given time frame
Stakeholder Project management
Context factors Concrete development methodology and project requirements
Table 4.10 GQM template for scenario 2.4
Object of study A concrete project which shall be implemented
Purpose To argue for or against the acquisition of a certain tool
Focus Requirements of the to be developed ontology with respect to tools
Stakeholder Project management
Context factors Project requirements, previous experiences with tools
4 Using Cost-Benefit Information in Ontology Engineering Projects 75
process. In order to reliably use ONTOCOM-R in this scenario, there are a number
of issues which would need to be tackled. First of all, a considerably higher amount
of data on the potentially reusable ontologies has to be available. This would result
in a better calibration of the REUSE cost driver, which accounts for the additional
effort required to build ontologies which should be reusable in contexts other
than the one they have been created for, and of all other cost drivers capturing aspects
related to ontology engineering by reuse. Furthermore, an analysis of the reuse
candidates with respect to the need to translate or align parts of the reusable ontology
has to be performed. More information about the application of ONTOCOM-R is
provided in (Simperl and B€urger 2010).Scenario 3.1 Length of Development Cycles The purpose of Scenario 3.1
(Table 4.12) is to estimate the length of development cycles in an iterative project
based on historical data. Such data could provide insights on the trade-offs between
centralized, commonly agreed development, which represents a compromise view
over the domain shared by all stakeholders, and localized, modified versions of the
ontology, which have to be mediated through alignments.
If the project is run in an agile fashion, then the scenario could be realized using
the FOLCOM model, which is by design targeted at such environments. For the
remaining models, a number of adjustments have to be made, triggered by the
slightly different process model followed. In (Paslaru Bontas and Tempich 2005)
we provide a detailed example of how such an adjustment could look like. Existing
data and calibration results remain valid, as the individual iterations still match the
core work breakdown structure underlying ONTOCOM.
Scenario 3.2 Performance Assessment The purpose of Scenario 3.2 (Table 4.13)is to assess the performance of team members at specific points in time during the
development of an ontology or ontology-based application.
BothONTOCOMand FOLCOMcan be used for this purpose if data on the costs of
development phases is gathered while the project is running. Within the FOLCOM
model this requirement is fulfilled by design which means that the model is supposed
to be used to estimate the effort needed for future iterations, which can then be com-
pared to performance information that was monitored. For ONTOCOM one would
have to carry out the adjustments discussed in the previous scenarios.
Scenario 3.3 Revision Planning In order to plan a next development iteration,
cost-benefit information might be useful as performance indicator. Thus it can be
Table 4.11 GQM template for scenario 2.5
Object of study A concrete (part of a) project which shall be implemented, e.g., an ontology
module
Purpose To assess whether to develop an ontology or a module thereof from scratch
or to reuse existing components
Focus The focus lies on the make vs. reuse decision for a concrete part of the
ontology
Stakeholder Project management
Context factors Application requirements, definition of the ontology module, profile of
potentially reusable modules or ontologies
76 T. B€urger et al.
considered as a valuable source for planning forthcoming iterations in development
cycles of a project, which is the purpose of Scenario 3.3 (Table 4.14).
To be used in this scenario, ONTOCOM would have to be adapted to an
(iterative and collaborative) methodology (Paslaru Bontas and Tempich 2005).
Furthermore, information on the reusability and integration effort needed to con-
solidate revisions, is required. On the other hand FOLCOM is by design usable for
this purpose, as discussed earlier.
Scenario 4.1 Extension Planning The purpose of Scenario 4.1 (Table 4.15) is toassess the feasibility of the integration of new features into the product (ontology or
ontology-based application) using on cost-benefit information.
To realize this scenario all models of the ONTOCOM framework, meaning
ONTOCOM, ONTOCOM-LITE and SEMACOM can be used to estimate the
costs of a planned extension, provided the planned extension is perceived as a
new development including reuse of an existing ontology. ONTOLOGY-UIS can
be used to assess the current performance and, based on that, can support decisions
on whether new features are required, or would be beneficial for an application. In
addition, it might provide cues on missing parts of the ontology, if configured
accordingly. In order to use the model an appropriate survey infrastructure, includ-
ing questionnaires and statistical techniques to analyze the results, has to be set up.
Scenario 4.2 Replace vs Repair In this scenario project managers need to
decide whether to replace or repair (parts of) an ontology or ontology-based
application based on cost-benefit information (Table 4.16). This information
is delivered by the ONTOCOM-R model, which ideally would be calibrated with
enterprise-specific data on concrete ontology reuse practices in order to provide
optimal results (Simperl and B€urger 2010).
Table 4.12 GQM template for scenario 3.1
Object of study A currently running development project
Purpose To estimate the length of development cycles based on historical
data (from the same project)
Focus The focus lies on an estimation of future project cycles at
the beginning or in the middle of a running project
Stakeholder Project management/ developers
Context factors Application requirements, team performance indicators, completed
project iterations
Table 4.13 GQM template for scenario 3.2
Object of study A currently running development project
Purpose To assess the performance of team members
Focus The focus lies on past and current development phases
Stakeholder Project management/ developers
Context factors Other projects currently running in the company might (negatively or
positively) influence the performance of team members. This aspect is
typically not assessable
4 Using Cost-Benefit Information in Ontology Engineering Projects 77
Scenario 4.3 Maintenance Planning Every of our models could be used to
support this scenario if extended towards cost drivers specific to the maintenance
activity. At the conceptual level this extension can be implemented in a straightfor-
ward manner, by introducing an addition cost driver accounting for the additional
effort, but the challenge resides in acquiring the empirical data which would be
necessary to re-calibrate the model (B€urger et al. 2010). The GQM template for this
scenario is depicted in Table 4.17.
Scenario 4.4 (Dis-)continuation of Features During the use of a product, it
might turn up, that a certain feature does not receive the necessary user support,
or is not functioning as desired. The purpose of Scenario 4.4 is to assess the per-
formance of such features during or after their development based on cost-benefit
information (Table 4.18).
The ONTOLOGY-UIS model assesses the performance of ontology-based
applications in order to isolate the benefits that are generated by ontologies or
particular features of an application. Given that the performance of specific features
can be monitored in isolation, ONTOLOGY-UIS can be applied to decide about the
(dis-)continuation of features. In order to so do, one would have to design the
questionnaires necessary for the assessment of the performance of the application
using the ontology and its features.
Table 4.14 GQM template for scenario 3.3
Object of study A currently running development project
Purpose To plan a next iteration in development cycles of a project
Focus The focus lies on estimating the (optimal) length of revision cycles
Stakeholder Project management, development team
Context factors Team structure and distribution, initial size of the ontology, etc
Table 4.15 GQM template for scenario 4.1
Object of study A completed product (ontology)
Purpose To assess the feasibility of the integration of new features into the product
(ontology or ontology-based application)
Focus New features to be integrated
Stakeholder Senior management, project management
Context factors Performance of the application, available budget
Table 4.16 GQM template for scenario 4.2
Object of study A previously developed ontology or ontology-based application
Purpose To decide whether to replace or to repair (parts of) an ontology or
ontology-based application)
Focus A broken (part of an) ontology or ontology-based application.
Stakeholder Senior management, project management
Context factors Existing budget, available ontologies, development team factors, etc
78 T. B€urger et al.
4.4 Methods and Tools for Cost/Benefit Driven CollaborativeKnowledge Creation
In this section we introduce a number of tools to support the realization of the
scenarios presented in Sect. 4.3. The tools target all stakeholders in the life cycle of
collaboratively developed ontologies and ontology-based applications.
4.4.1 Cost Estimation of Ontology Development
We have developed a tool which can be used to run the ONTOCOM estimation
process, and the calibration of the model using (company-internal) data. The tool
provides an easy-to-use user interface for running calibrations and predictions
(Fig. 4.1). The calibration can utilize the existing data set of 148 data points, self-
owned data, or a combination of both. Furthermore, users are able to customize the
model, selecting the cost drivers to be taken into account for the calibration or
adding new cost drivers. The latter is relevant, for instance, in scenarios explicitly
targeting iterative engineering.
Once the model is calibrated, project managers and engineers can use the model
to calculate effort estimates. To do so, they have to indicate the rating levels of the
cost drivers which best fit the project circumstances and provide an estimate of the
size of the ontology (Fig. 4.2). The rating levels are associated with numerical,
calibrated values, which serve as input to the ONTOCOM formula. The result is
expressed in person-months.
The tool can support scenarios 1.3, 2.1, 2.3, 2.4, and 4.1.
4.4.2 Planning of Ontology Development
gOntt is a project planning tool for ontology-development projects inspired by
software-engineering tools such as Microsoft Project (Gomez-Perez et al. 2009)
Table 4.17 GQM template for scenario 4.3
Object of study Currently developed ontology or planned ontology
Purpose To estimate the costs needed in the maintenance phase.
Focus Costs occurring while the ontology/ ontology-based application is in use.
Stakeholder Senior management, project management
Context factors Underlying development requirements, maintenance team
Table 4.18 GQM template for scenario 4.4
Object of study Ontology-based application
Purpose To assess the performance of certain features during or after
their development.
Focus Particular application features which are based on an ontology.
Stakeholder Senior management, project management
Context factors User perception, influencing factors of the application
4 Using Cost-Benefit Information in Ontology Engineering Projects 79
(http://www.neon-toolkit.org/wiki/Gontt). We are extending it into cost information in
order to facilitate the realization of scenario 2.2, using the results of ONTOCOM for
planning and scheduling purposes. In order to integrate the two approaches in
a sound manner, one has to take into account several key aspects related to the
parametric approach adopted by ONTOCOM (and related models in other areas) to
estimate development efforts. First, such models require a critical mass of empirical
data for calibration purposes, which inherently reduces the number of ontology-
engineering activities covered by the NeOn methodology (Suarez-Figueroa et al.
2007) – which gOntt relies upon – for which accurate estimates can be calculated.
Second, refining the cost model to deliver estimates at a more fine granular level
would ask for the definition of various sub-models covering a specific, potentially
complex ontology engineering activity; examples of such models include a model
estimating the costs of ontology reuse, a model explicitly targeting the costs of
ontology evaluation, and many more. All these models, provided the availability of
historical project data, could then be applied to produce estimates for specific
phases of the life cycle model followed. However, they would be rather focused
and restricted in application, and, for certain activities (such as domain analysis)
would very likely require deeper insights into the practices that are actually in place
Fig. 4.1 The ONTOCOM main user interface of the ONTOCOM tool
80 T. B€urger et al.
at the enterprise using the corresponding model. Therefore, we opted for a slightly
different strategy, which overcomes these issues. In a nutshell, ONTOCOM (in its
altered version resulting from the alignment between the NeOn methodology and
its current cost drivers) can be used to predict the total cost of an ontology
engineering project; the project manager subsequently devises a distribution func-
tion by which this estimate is broken down to the individual phases of the project,
based on her expertise and on existing insights from case studies in ontology
engineering literature. The simplest form of this distribution function is based on
percentages, which are then taken into account to calculate cost estimates per phase
or even activity. A mock-up is depicted in Fig. 4.3 below.
When aligning ONTOCOM cost drivers to the NeOn methodology we identi-
fied a series of issues related to the definition of the former, which will result in
a new release of the cost model:
– The need for a more detailed account of project management costs, coveringNeOn
activities such as feasibility study, scheduling and configuration management.
– The need for ontology location support as part of our ontology reuse cost drivers
group. This will be implemented as a new cost driver as well.
Fig. 4.2 Cost prediction user interface
4 Using Cost-Benefit Information in Ontology Engineering Projects 81
– Discrepancies in the level of granularity of certain cost drivers, in particular
ontology modification and ontology translation. These should be merged in a
next version of the model.
– Methodologically sub-optimal support for knowledge reuse, which includes all
aspects related to leveraging non-ontological resources in an ontology engineer-
ing project.
Fig. 4.3 gOntt extension for ONTOCOM
82 T. B€urger et al.
4.4.3 Decision Support for Choosing Appropriate KnowledgeStructures
Another envisioned tool combines ONTOCOM, ONTOCOM-LITE and FOLCOM
into one application so as to support senior and project management in the task of
choosing an appropriate knowledge structure based on the estimated size and
prospected costs (see Scenario 1.3).
A sketch of the knowledge structure comparison tool is presented in Fig. 4.4.
Essentially, the tool displays the cost development of different knowledge
structures based on their size. This graphical view should be generated from the
data sets of the corresponding cost benefit models, making knowledge structures
comparable in terms of the costs associated with their development, as a function of
their size. In addition, a filtering mechanism could be offered, allowing potential
users to select if the knowledge structures in the data set are based on reuse or
developed collaboratively from scratch, and how this influences the costs as
outlined in Sect. 4.2.
4.4.4 Budgeting Ontology-Development Projects
In order to assist the management in estimating the costs required for the develop-
ment of a knowledge structure, and in choosing the most suitable staff and support
tools, we envision a budgeting application which displays the results of, for
instance, ONTOCOM in different phases of the ontology-development project.
The budgeting tool could be adjusted along various parameters; the idea is that
potential users should provide parameters of their planned development projects,
and the tool should then generate estimates for the expected costs in different
development phases. Basically, users should indicate the type of knowledge struc-
ture and the estimated size which serves as a minimal input to the tool. Users should
Fig. 4.4 Knowledge structure comparison tool
4 Using Cost-Benefit Information in Ontology Engineering Projects 83
be able to indicate if a planned knowledge structure shall be developed based on
reuse or developed collaboratively. Furthermore the tool is supposed to provide
support for balancing the prospected efforts by adjusting several parameters,
including team or tool parameters. A possible screenshot of such a tool is depicted
in Fig. 4.5. The tool should furthermore indicate how an optimal profile of a team
member in each phase should look like and which tools are suggested for the
implementation of the planned knowledge structure. This functionality, supporting
scenarios 2.1–2.5 could be integrated into gOntt.
4.4.5 Decision Support for Ontology Reuse
As previously motivated, the reuse of ontologies has always been a lively discussed
problem in ontology-engineering community, and bears technical, as well as organiza-
tional challenges. A core decision to be taken in the planning phase of many ontology-
related projects is whether to develop an ontology from scratch or to foster reuse of
existing ontologies. This decision is often taken on economic premises, considering the
cost savings achieved by reuse and/or the prospected benefits of the two strategies.
A tool to support such decisions could look like the one displayed in Fig. 4.6.
At the bottom of the interface the user is supposed to insert the estimated size of the
ontology to be developed (manually or via reuse). Furthermore, she has to indicate
a breakdown of the total size in translated, modified, or aligned ontology elements,
as these operations are associated to a a different amount of effort. ONTOCOM-R
assesses the benefit of a reuse-driven strategy based on these inputs, which is
relevant for scenarios 2.5 and 4.2.
Fig. 4.5 Budgeting tool
84 T. B€urger et al.
4.4.6 Tagging Efficiency Monitoring
FOLCOM is a model to estimate tagging costs based on the principles of story-
points (Simperl et al. 2010). Information on the performance of the development
team is interesting for project management in order to adjust the team if the
performance is not satisfying. FOLCOM produces time estimates in an iterative
way, in other words it calculates an estimate for the next project iteration given data
gathered from previous iterations.
A screenshot of the tool monitoring the performance of a team based on
FOLCOM is shown in Fig. 4.7. The tool targets scenario 3.2. It is Web-based and
measures the time to tag selected resources by users. Efficiency measurements are
also required to calibrate the model, as by design the estimation model calculates
the effort associated with the remainder of a project based on the actual effort spent
so far.
In a nutshell the tool shows an excerpt of an information resource (PDF
documents, but also images and videos) from a specific collection and asks users
to tag it.3 Depending on the selected tagging mode, it is possible to add free tags, as
well as tags from a controlled vocabulary, and tags recommended by the tool
according to some heuristics. Deletion of tags is also supported.
Fig. 4.6 ONTOCOM-R reuse decision tool
3 FOLCOM calculates the time required to tag the entire collection.
4 Using Cost-Benefit Information in Ontology Engineering Projects 85
4.4.7 Performance and User Satisfaction Assessment
Assessing the performance of an application or a feature of it, can provide valuable
hints on the (dis-)continuation of the development or the introduction of certain
features. As illustrated in Sect. 4.3, the ONTOLOGY-UIS model can be used to
assess user satisfaction of features of an application and by that make a statement
about the performance of the application. The outcome of the method can be
visualized in a so-called snake diagram where one can see the gaps between the
expectation of the users vs the actual performance of the application. This is an
indicator of the efficiency of certain application features (B€urger and Simperl
2008). A mock-up interface of the tool is depicted in Fig. 4.8. It is relevant for
scenarios 1.1, 1.2, 4.1, and 4.4.
4.4.8 Organizational Impact Evaluation
The Organizational Impact Evaluation Tool (OIET) addresses the impact of seman-
tic applications at an organizational level. Being deployed in Pillar 5 of EVEKS
framework, it analyzes the corporate structure based on user ratings (Imtiaz et al.
Fig. 4.7 Tagging efficiency monitor
86 T. B€urger et al.
2009; Imtiaz et al. 2008). The output of the OIET can be used as an input for cost-
benefit models, in particular in scenarios 1.1 and 1.2.
The procedure to apply the OIET is supposed to be as follows (Fig. 4.9): First,
organization parameters of measurement are set by higher management. Afterwards
questionnaires will automatically be generated based on the first step, refining the
scope of the investigation. These questionnaires have to be answered by employees
of the company. The output of this activity is a cross-match of variables and
questions for every employee. Based on that, an overall matrix can be aggregated
showing the company-wide perceived structure and satisfaction of a particular
information system. This leads to an assessment of the three key parameters
collaboration, knowledge and technology with respect to their impact of variables
such as dumb nodes,4 active nodes,5 and external nodes (Imtiaz et al. 2009; Imtiaz
et al. 2008).6 Based on these findings, information systems can be evaluated in
terms of their company-wide benefit.
Fig. 4.8 Performance and UIS meter (based on [Remenyi et al. 2001])
4 Dumb nodes are non-specialists which are information/data pushers. These roles may be
incorporated into enterprise information systems.5 Active nodes are specialists that define the process dynamics and can only have an interface to
enterprise knowledge portals.6 External nodes are specialists on a sub-process level. They should therefore be considered while
designing enterprise information systems.
4 Using Cost-Benefit Information in Ontology Engineering Projects 87
Thus, OIET provides an overall impact at organizational level by describing
such an impact in terms of different variables which can explain the ultimate
benefits in adopting such technologies for organizations. B€urger et al. (2010)
explains in detail the steps which have to be performed in order to apply the tool.
At a later stage, its output can help the company to analyze the proper balance for
the parameters in cost-benefit models.
In table 4.19 we summarize which scenarios (as presented in Sect. 4.3) are
supported by the different models and tools.
4.5 Conclusions and Outlook
Industry is starting to acknowledge the technical value of ontologies for enterprises.
In the last years early adopters have been increasingly using them in various
application settings ranging from content management to enterprise application
Fig. 4.9 Organizational
impact evaluation
Table 4.19 Models and tools supporting the different scenarios
Scenarios Models Tools
1.1, 1.2 – OIET
1.1, 1.2, 4.1, 4.4 ONTOLOGY-UIS Performance and UIS meter
1.3 ONTOCOM, ONTOCOM-
LITE, FOLCOM
Knowledge structure comparison tool
1.3, 2.1, 2.3, 2.4, 4.1 ONTOCOM ONTOCOM Tool/Cost prediction user
interface
2.1, 2.2,2.3, 2.4, 2.5 ONTOCOM Budgeting tool, gOntt Extension for
ONTOCOM
2.2 ONTOCOM gOntt Extension for ONTOCOM
2.5, 4.2 ONTOCOM-R ONTOCOM-R Reuse decision tool
3.1 FOLCOM –
3.2 FOLCOM Tagging efficiency monitor
3.3 FOLCOM, altered version
of ONTOCOM
–
4.3 Any –
88 T. B€urger et al.
integration. The main technological building blocks are meanwhile available from
established vendors. Despite this promising position, the economic side of their
development, maintenance and usage are still not an integral part of ontology-
engineering projects. In this chapter we analyzed how cost-benefit information
delivered by models such as ONTOCOM, FOLCOM and ONTOLOGY-UIS can
be used at various stage of the ontology life cycle by management and technical
staff. Some of the scenarios we identified demand for additional empirical data,
either with respect to volume – in order to cover cost drivers which have been
neglected by the data sets currently used for calibration purposes – or with respect
to granularity – to allow for predictions at the phase, rather than project level. Most
notably, the former demands for the availability of reliable data on ontology
maintenance projects, and on reuse of ontologies or other similar knowledge
structures. While such data could not be collected in the past – mainly as a con-
sequence of the evolution of the Semantic Web area – the raise of data-driven
approaches in the context of the Linked Open Data initiative may quickly change
this state-of-affairs, even if it also may require a re-definition of the existing notion
of reuse and the methodologies therefor. The need for adjustments to provide better
coverage of the reuse topic has also been acknowledged when aligning ONTOCOM
with the NeOn methodology, nevertheless only at the conceptual level – the
question of data availability seems to be realistically accessible only in close
relation with developments around the LOD cloud.
Acknowledgements The research leading to this paper was partially supported by the European
Commission under the contract FP7-215040 “ACTIVE”.
References
Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43
B€urger T, Simperl E (2008) Measuring the benefits of ontologies. In OTM ’08: proceedings of theOTM confederated international workshops and posters on on the move to meaningful internetsystems, pp 584–594. Springer
B€urger T, Popov I, Simperl E, Hofer C, Imtiaz A, Krenge J (2010) Calibrated predictive model for
costs and benefits. Deliverable D4.1.2, ACTIVE
Cooke N (1994) Varieties of knowledge elicitation techniques. Int J Hum Comput Stud 41:
801–849
Davis J, Fensel D, van Harmelen F (eds) (2003) Towards the semantic web: ontology-drivenknowledge management. Wiley, The Atrium, Southern Gate, Chichester, West Sussex
Ebert C, Dumke R (2007) Software measurement: establish – extract – evaluate – execute.Springer, Berlin
Ebert C, Dumke M, Schmietendorf A (2005) Best practices in software measurement. Springer,
New York
Fensel D (2001) Ontologies: a silver bullet for knowledge management and electronic commerce.
Springer Verlag, Berlin
Gomez-Perez A, Suarez-Figueroa M-C, Vigo M (2009) Gontt: a tool for scheduling ontology
development projects. In Proceedings of the fifth international conference on knowledgecapture, ACM, New York
4 Using Cost-Benefit Information in Ontology Engineering Projects 89
Hepp M, De Leenheer P, de Moor A, Sure Y (eds) (2008) Ontology management: semantic web,semantic web services, and business applications (semantic web and beyond). Springer,New York.
Imtiaz A, Giernalczyk A, Davies J, Thurlow I (2008) Cost, benefit engineering for collaborative
knowledge creation within knowledge workpspaces. In Proceedings of EChallenges 2008.Imtiaz A, Giernalczyk A, B€urger T, Popov I (2009) A predictive framework for value engineering
within collaborative knowledge workspaces. In Proceedings of EChallenges 2009.McGuinness DL (2003) Ontologies come of age. In Fensel D, Hendler J, Lieberman H,Wahlster C,
(eds) Spinning the semantic web: bringing the world wide web to its full potential. MIT Press,
Cambridge
Paslaru Bontas Simperl E, Tempich C, Sure Y (2006) Ontocom: a cost estimation model for
ontology engineering. In Proceedings of the 5th International Semantic Web ConferenceISWC2006.
Paslaru Bontas E, Tempich C (2005) How much does it cost? Applying ONTOCOM to
DILIGENT. Technical Report TR-B-05-20, Free University of Berlin
Popov I, B€urger T, Simperl E, Imtiaz A (2009) Preliminary predictive model for costs and benefits.
Deliverable D4.1.1, ACTIVE.
Remenyi D, Money A, Sherwood-Smith M, Irani Z (2001) The effective measurement and
management of IT costs and benefits
Simperl E, B€urger T (2010) H. Jin, Z. Lv (editors): data management in semantic web, chapterOntology Reuse – Is it Feasible? Nova Science Publishers, Inc. (to be published)
Simperl E, Mochol M, B€urger T, Popov I (2009a) Achieving maturity: the state of practice in
ontology engineering in 2009. In Proceedings of Ontologies, DataBases, and Applications ofSemantics for Large Scale Information Systems (ODBASE’09).
Simperl E, Popov I, B€urger T (2009b) ONTOCOM Revisited: towards accurate cost predictions
for ontology development projects. In Proceedings of the 6th European Semantic WebConference (ESWC 2009), pp 248–262
Simperl EPB, B€urger T, Hofer C (2010) Folcom or the costs of tagging. In Proceedings of the 17thinternational conference on Knowledge Engineering and Management by the Masses(EKAW2010), pp 163–177
Suarez-Figueroa M, de Cea GA, Buil C, Caracciolo C, Dzbor M, Gomez-Perez A, Herrrero G,
Lewen H, Montiel-Ponsoda E, Presutti V (2007) Neon development process and ontology life
cycle. NeOn deliverable 5.3.1, NeOn
90 T. B€urger et al.
5
Managing and Understanding Context
Igor Dolinsek, Marko Grobelnik, and Dunja Mladenic
5.1 User’s Working Contexts in Knowledge Workspace
Some may say that the word context is rather broad and people may have difficulties
understanding the specific interpretation one is using when talking about context.
On the other hand, we were not able to find a more suitable replacement for this
word, so we stayed with context. To overcome the confusion we will initially spend
some time to describe our usage of the word context.
Our focus is on knowledge workers and on their daily work with computer
systems. More specifically, we are most interested in their knowledge processes.
From our perspective, knowledge workers are conducting knowledge processes by
using software tools like MS Office, Internet Explorer, Windows File Explorer,
CRM and ERP tools, Wikis, blogging tools, chatting tools etc. While using the tools
they are accessing information from all kind of sources like MS Word documents,
MS Outlook messages, Web pages etc. We will refer to them as informationresources. Furthermore, knowledge processes are performed to achieve some
personal or business goal. Often it happens that several people are involved in
jointly achieving a business goal or working on the same assignment. Furthermore,
the same knowledge process is often executed by the same person with different
information resources and in collaboration with different people to achieve differ-
ent goals. The proposed framework thus needs to be able to support such settings.
Our assumption is that grouping people, information resources and knowledge
processes needed to achieve a specific goal or perform a specific assignment, may
simplify the knowledge worker’s interaction with the computer system and their
I. Dolinsek (*)
ComTrade d.o.o., Litijska 51, Ljubljana 1000, Slovenia
e-mail: [email protected]
M. Grobelnik • D. Mladenic
Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova 39, Ljubljana SI-1000, Slovenia
e-mail: [email protected]; [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_5, # Springer-Verlag Berlin Heidelberg 2011
91
collaboration with others who are working on the same assignment. This grouping
is organized through the concept of context. A person or a group of collaborators
defines a context for a particular goal they would like to achieve or assignment they
would like to perform. Whenever they are performing some computing activities
which are related to that goal or assignment we say that ‘they are working in that
specific context.’ While working in that specific context all relevant information
resources (emails, documents, web pages etc.) and knowledge processes are linked
to that context. This makes the later information retrieval and repeated executions
easier because the users can navigate through the system by using the high level
concept of context rather than the detailed level of individual information
resources. The assumption is that such abstraction is closer to the usual thinking
process. As a consequence, the working context can be used also to filter the
information which is presented to the user by the software tools to reduce the
information overload.
The actual interpretation of the context is left to the users. For example, for a
patent lawyer a context could be a particular patent application he is working on
together with the patent inventors. For a sales person this could be a specific
proposal she is preparing in response to the request for proposal from the client.
Note that proposal preparation can be a lengthy and quite complex undertaking
where several coworkers are involved and a number of knowledge processes like
meeting organization and proposal review need to be performed. For a researcher a
context can be a project he is working on together with a team of co-workers. Large
projects can be further structured and therefore corresponding context can consist
of several sub-contexts. For example, work on the ACTIVE project could have
ACTIVE.planning, ACTIVE.development, ACTIVE.testing, ACTIVE.reporting,
ACTIVE.meeting and ACTIVE.dissemination sub-contexts to group people,
resources and knowledge processes in a more fine-grained way.
In short, in our sense a context is a collection of information resources, processes
and people which a knowledge worker finds it convenient to group together in order
to be able carry out his or her work more effectively.
5.1.1 Non-Observed Versus Observed User’s Activities
In order to manage and understand user context in computer environments we need
to conduct some form of user observation. For example, we need to keep track of
various information resources the user has been reading or modifying, of mail
messages he has been exchanging with others, etc. This is the price we have to
pay to gain the benefits of better search and reduced information overload. We are
aware of the fact that this may raise privacy concerns. Nevertheless, in an enterprise
environments user’s activities are being observed already by systems management
and data protection tools. We have to ensure that the user activity observation
process maintains the same level of privacy protection as is maintained by the
92 I. Dolinsek et al.
data protection and systems management processes which are already deployed in
the enterprise.
5.1.2 Observation Methods
For useful context management more data has to be collected about the user’s
activities than is normally available from off-the-shelf office tools. In addition,
relations between the people and the resources they are using have to be recorded in
a similar way as already done by e-mail and messaging tools. For our purposes a set
of primitive events has been defined to model the most fine-grained user activities
we are interested in. Examples of primitive events are
• User has accessed a particular web page
• User has sent an email message with particular subject, content and attachments
to the specified list of recipients
• User has started/stopped a particular program
• User updated a particular Word file
Each primitive event includes the relevant data that was involved in the event.
For example, the ‘user updated a particular Word file’ event includes the before and
after image of the Word document. Another example is email-related events that
carry the subject, mail content and attachments.
On the other hand, we are not interested in low-level user’s activities like mouse
clicks, windows selections, etc.
5.2 Knowledge Workspace
We believe that existing software tools in a typical MS Windows/MS Office setup
are not providing adequate support to deal with knowledge processes. Currently,
knowledge workers can deal with information on the individual document or file
folder level and interact with others on individual email message or chatting
message level. On the other hand the knowledge that links the relevant people,
activities, files and email messages in respective folders to the specific goal is
implicit and often exists only in the person’s mind.
For the most frequently used and well-defined business processes (like a typical
sales cycle), specialized software tools have been developed to link the necessary
process elements and resources together. However, for the support of informal and
dynamic knowledge processes there are very few suitable software tools available.
For example, with the Unified Activity Management project (Moran et al. 2005)
IBM has developed a set of specialized tools to cope with generic knowledge
processes and collaboration. On the other hand some European research projects
such as:
5 Managing and Understanding Context 93
• Nepomuk (http://nepomuk.semanticdesktop.org/) and
• Gnowsis (http://gnowsis.opendfki.de/)
have approached this field from the semantic desktop perspective.
The knowledge workspace we are proposing (called the ACTIVE knowledge
workspace – AKWS) is a software prototype which we have developed in the
ACTIVE project. Our approach to building a collaborative knowledge process
management platform differs from the existing approaches taken by the formerly
mentioned projects. The main guideline for developing this prototype was that it
should extend the existing Microsoft Windows platform and MS Office tools with
proposed concepts as much as possible. While context, knowledge process and
resource meta-data support are provided in the newly developed platform, they are
to a large extent delivered to the users through extensions to popular enterprise
office automation tools likeWindows File Explorer, Internet Explorer and other MS
Office tools.
The proposed knowledge workspace is architected as a set of cooperative
services, dealing with contexts, knowledge processes, resource’s meta-data and
other web platform-specific infrastructure and as a set of plug-ins for popular
programs in use in the enterprise environment. Workspace services reside in the
enterprise intranet and plug-ins extend the existing software on user’s desktops.
In the scope of the ACTIVE project, plug-ins were developed for the following
software products:
• Windows File Explorer
• Microsoft Internet Explorer
• MS Outlook 2007
• MS Excel 2007
• MS PowerPoint 2007
• MS Word 2007
• Semantic Media Wiki
• LiveNetLife
• LiveOfficeLife.
• miKrow
We will refer to them as ACTIVated applications. In addition, the ACTIVE
Taskbar tool provides access to the ACTIVE services from the desktop and the
ACTIVE portal provides web access to the services. A trial version of the ACTIVE
knowledge workspace software prototype is available for download for research
purposes at http://www.active-project.eu.1
1As at January 2011.
94 I. Dolinsek et al.
5.3 Context Support in the ACTIVE Knowledge Workspace
There are two perspectives for dealing with contexts in the AKWS. The top-down
perspective allows users to name and define contexts explicitly on an as-needed
basis. The bottom-up perspective relies instead on the observation of the user’s
computing activities where by using computer intelligence techniques AKWS
services infer the various contexts the user is working in.
5.3.1 Defining Contexts and Context Switching
In the proposed knowledge workspace, context is represented as a workspace
resource which can be defined and shared by the users. Context is subject to
Workspace access policies in the same way as any other Workspace resource.
When a user starts working on a new assignment a context with a suitable name
is created in the Workspace by the user. In the case that this work will require
collaboration with other people in the enterprise, they can be assigned to this
context, too. AKWS provides the means to see all contexts which are assigned to
the user and gives the user a possibility to switch between them explicitly, these
features being provided by the ACTIVE Taskbar. The screenshot in Fig. 5.1 shows
the user’s ACTIVE Taskbar at the moment when this user is working in the
ACTIVE.meetings context and is about to switch the context to ACTIVE.reporting.
The Taskbar context menu shows a list of assigned contexts for this user:
AMRoffer and ACTIVE which is further structured into sub-contexts ACTIVE.
planning, ACTIVE.development, ACTIVE.testing, ACTIVE.meetings and
ACTIVE.reporting. This way the user explicitly controls in which context he/she
is working. Other ACTIVated applications the user is running on this desktop are
automatically aware of the current context.
5.3.2 Associating Information Resources
All ACTIVated applications provide a way to associate the data they operate on
with the current context. This can be done either explicitly by the user via the
‘associate’ button or implicitly whenever the data is accessed by the ACTIVated
program. The screenshot in Fig. 5.2 shows a part of the ACTIVated MS PowerPoint
ribbon where explicit association of the current PowerPoint document ‘active-
knowledge-workspace-session-v1’ with the current context ‘ACTIVE.meetings’
can be performed simply by clicking on the ‘Associate’ button.
Similarly an e-mail can be associated with the context in MS Outlook, a
spreadsheet in MS Excel, an URL in Internet Explorer and text documents in MS
Word. An arbitrary file in Windows can be associated with the context by using the
ACTIVE Shell extension menu option as it is shown on the screenshot in Fig. 5.3.
5 Managing and Understanding Context 95
Here the user will insert the selected Excel spreadsheet ‘server_mem’ into the
workspace and associate it with the current working context ACTIVE.meetings.
5.3.3 Associating People
People can be assigned to a context by using the team context management feature
of the ACTIVE portal as it is shown on the screenshot in Fig. 5.4:
Fig. 5.1 Context switch in ACTIVE taskbar
Fig. 5.2 Associating a PowerPoint presentation to the working context
96 I. Dolinsek et al.
Fig. 5.3 Associate a windows file with the context
Fig. 5.4 Assigning people to the context
5 Managing and Understanding Context 97
Here user Carlos will be assigned to the ACTIVE.meetings context to which
Igor, Ian, Paul and John are already assigned. Once a person is assigned to a context,
this context will become available through the context selection list in that person’s
ACTIVE Taskbar.
5.3.4 Associating Knowledge Processes
Similarly the existing knowledge processes Trip arrangement and Venue arrange-ment can be associated with the context by using the TaskPane of the ACTIVE
Taskbar as shown by the screenshot in Fig. 5.5.
Here two knowledge processes (Trip arrangements and Venue arrangements)are already associated with the context ACTIVE.meeting and the user can associate
more of the predefined knowledge processes by importing them from the knowl-
edge process store or from a file.
5.3.5 Viewing and Searching Through the Context Perspective
Through various features of the ACTIVE Taskbar the user is given insight into the
content of the current context. Screenshot in Fig. 5.6 shows the ACTIVE.meeting
context together with the list of people currently working in that context and a list of
people who are also assigned to the context but are working in some other context
right now. On user request, a list of all information resources which are associated
Fig. 5.5 Associate knowledge processes to the context
98 I. Dolinsek et al.
with the current context is displayed in the ‘Resources in current context’ form and
the TaskPane shows the currently associated knowledge processes.
Alternatively the context can be visualized as is shown on the screenshot in
Fig. 5.7.
The associated context resource in question can be directly accessed or
manipulated from the respective list (a document can be opened by the appropriate
program, user can be contacted via mail or chat, etc.). Similarly the resource’s
meta-data as provided by the Workspace can be displayed.
The context name can be used to search for associated resources by using the
search option of the ACTIVE Taskbar. This can help to formulate searches in
situations like this: ‘I remember this data was presented in a PowerPoint slide which
we prepared in scope of the ILM offer for Nasa.’ A quick look at the context cloud
in the Workspace shows that there is a context named ‘Nasa ILM offer’ so a
context-restricted search can be made with further restriction to the PowerPoint
documents and we are presented with all potentially relevant PowerPoint slides.
One can argue that the same effect can be achieved when all team members are
using the same conventions for naming the directories where the documents are
stored or the same set of characteristic keywords embedded in the documents. In
that case standard desktop search facilities could be used for easier retrieval.
However, in that case we are relying on people always following the same pattern
of manual activities, not to mention the problem of consistently using the agreed
markup names. In addition, the team leader has to use conventional communication
tools to inform the others about the conventions. With the context concept in place
Fig. 5.6 Viewing workspace resources of a working context
5 Managing and Understanding Context 99
and with ACTIVE Taskbar being used by all team members this is already well
defined and instantly visible to all team members. Furthermore, with automatic
associations in place the ‘markup’ is also resilient to sloppy users. The only thing
expected from them is to accurately set their working context. But as will be shown
later in this chapter, the Workspace will also assist the user in setting the appropri-
ate working context.
The currently selected context is used to filter the data presentation in various
applications in the AKWS. For example, all Office tools have a new ‘Start button’
menu option ‘Open from context’ where the user can open a new document only
among the documents which are associated with the current context as shown on the
screenshot in Fig. 5.8.
Similarly a context filtering button is available in Outlook so that only mail
messages associated with the current context are displayed in the selected Outlook
message folder list. On the screenshot in Fig. 5.9 only two emails, associated with
the context ACTIVE.meetings are displayed in the Inbox because the context filter
was applied by the user by pressing on the ACTIVE.meetings button in the
ACTIVE toolbar in Outlook.
Fig. 5.7 Viewing context resources in context visualizer
100 I. Dolinsek et al.
Fig. 5.8 Context filtering at document open
Fig. 5.9 Context filtering of mail messages
5 Managing and Understanding Context 101
5.4 Context Discovery
So far the top-down perspective of context management was described where
contexts are named, defined and switched explicitly by the users. However, we
are dealing with contexts also from the bottom-up perspective. User observation
methods which are in place in all ACTIVated applications are recording a number
of primitive events. These streams of events are collected by the proposed ACTIVE
knowledge workspace context mining services. The machine learning technology
built into those services analyses the content of the documents, web pages and email
messages the users are working on. The goal is to identify the clusters of informa-
tion resources, collaborators and tasks which the users may recognize as their
context. The algorithms which are used for this purpose are described in detail in
Chap. 7: Machine learning techniques for understanding context and process.
The results of the context mining process are discovered contexts in the ACTIVE
knowledge workspace. Discovered contexts are not named, but they are chara-
cterized by a set of keywords which were suggested by the mining algorithms as
the most relevant representation of the context. In addition to the representative
keywords also the list of relevant information resources, collaborators and tasks
are determined by the mining software and associated with the discovered context.
The workspace user has the option of reviewing the list of currently discovered
contexts and then based on his judgment he can convert the discovered context into
a top-down context by giving it a name which is most meaningful to him.
Screenshot in Fig. 5.10 shows the current list of discovered contexts for one
Workspace user and detailed content of a discovered context with keywords
SKIING, HOLIDAY, AMADE, AUSTRIA, SNOW, etc.
Fig. 5.10 List of discovered contexts
102 I. Dolinsek et al.
Since this was a result of a personal investigation for various skiing conditions in
Austria, only one user was discovered for this context and all discovered resources
are links to popular skiing and travel web-sites. The user can decide and define a
name for this discovered context; for example ‘Austria-ski’ by selecting the ‘Name
Context’ button on the form and can then select the most relevant URLs he was
using during his search and automatically associate them with the newly named
context as shown on the screenshot in Fig. 5.11.
5.5 Context Detection
As described above, the context mining service discovers emergent contexts in the
incoming stream of primitive events. In addition it is trying to figure out the current
working context of a user. This process is called context detection and is used by the
user as an aid to automatically set the working context. Note that the user can set his
current working context explicitly through the ACTIVE Taskbar. However, this
setting might be inaccurate because the user forgets to switch the context when he
starts working on something else. Since the knowledge workspace software
maintains explicit associations between the information resources and the named
contexts it is possible to automatically verify if the current working context matches
the context which is deduced from the information resource, currently processed by
Fig. 5.11 Bulk association of discovered resources
5 Managing and Understanding Context 103
the user. In case of a mismatch a notification is displayed on the ACTIVE Taskbar
suggesting to the user to switch to one of the suggested contexts. Then it is up to the
user to either switch context as suggested or continue with the current setting.
Screenshot in Fig. 5.12 shows a context detection notification which can be then
followed by the context switch.
In this example user was working in the ‘rails_discussion’ context but then
started to browse a Sport magazine website. However, some time ago he has
associated that Web site to two of his contexts, skiing and golf. Therefore the
Workspace sent him a suggestion that he may not be working in the most relevant
context (this could not be seen on the screenshot) and suggested him to switch to
either skiing or golf context which were directly associated with the Sport magazine
web site. But the final decision to switch the context is left to the user.
5.6 User-Guided Discovery and Detection
The final judgment of the quality of context discovery and detection is always left to
the user. The ACTIVE knowledge workspace includes communication dialogues
where the user can review the currently discovered contexts and provide feedback
to the mining service about the quality of the discovered contexts. In a case where
Fig. 5.12 Working context suggestions
104 I. Dolinsek et al.
the discovered context does not represent a meaningful context for the user, the
service can be instructed to discard that context. In some situations the discovered
context is actually similar to an existing named context but the mining algorithms
failed to recognize it. In such situation it is possible to manually merge the
discovered context with an existing context.
Similarly the context detection is used to provide context switch suggestions
instead of automatically switching the working context for the user. The actual
user’s decision to switch the context to the one suggested by the system, can be used
by the mining to refine the context detection process.
5.7 Discussion
There are several Business Process Management (BPM) and Enterprise 2.0 tools on
the market and many attempt to assist in the collaboration between knowledge
workers by using the project concept, like for example Basecamp (http://
basecamphq.com/2) and @task (http://www.attask.com/3). At first glance it seems
that the concept of working context which is used in the ACTIVE knowledge
workspace is actually the same as project in other tools. Indeed, the ACTIVE
context has several features that are common to the project concept in other tools.
For example, project is often used to group people, well-defined business process
(es) and information resources in similar ways as the context groups people,
information resources and informal knowledge processes. However, the inclusion
of various information resources in BPM tools is often limited to emails and
documents. Furthermore, they are often attached to the project as supplementary
information to the underlying BPM data structures and their relation to the project
can be used only within the BPM tool itself. In AKWS the context is maintained by
the Workspace but can also be incorporated into other tools. A set of Workspace
web services makes it possible to deal with contexts and their relations to other
Workspace entities in any software tool.
When working with project-centric tools it is often the case that the users can
think and work in terms of project concept only inside of the project management
tool. When they start using other tools they are again forced to think and work on
the level of individual documents and rely more on the implicit relations between
the documents and projects which they have to remember. With the open structure
of the AKWS we can align more software tools around the working context and
simplify daily operations for knowledge workers.
With closer observation of users’ activities during their interaction with the most
commonly used office automation software tools, the Workspace can help the users
2As at February 2011.3 As at February 2011.
5 Managing and Understanding Context 105
to easily determine their current working context. Once the appropriate context is
identified it can be used for filtering and prioritization of the information, knowledge
processes and collaborators which are delivered to the users through the tools. This
makes it easier for the knowledge workers to stay focused on their work at hand.
References
Moran TP, Cozzi A, Farrell SP (2005) Unified activity management: supporting people in
e-business. Commun ACM 48(12):67–70, http://www.almaden.ibm.com/cs/projects/uam/
106 I. Dolinsek et al.
6
Managing, Sharing and Optimising InformalKnowledge Processes
Jose-Manuel Gomez-Perez, Carlos Ruiz, and Frank Dengler
6.1 Introduction
Knowledge workers are central to determine an organization’s success which more
and more depends on tacit individual knowledge, undocumented factual and proce-
dural expertise, essential for competent performance, and tediously learned by
experience and example. To significantly increase economic productivity it is
necessary, therefore, to increase the productivity of this knowledge-based and
knowledge-driven work. However, a high performance relies on how knowledge
workers and companies deal with some factors such as information overload, task
switching, context loss, and knowledge processes.
Knowledge workers access huge amounts of information available in personal
desktops, company knowledge repositories (ranging from document repositories to
knowledge management tools) and, eventually, on the Web. The result is that they
waste time searching and navigating to find the information needed to complete the
task-at-hand, or even attempting to figure out how they realized a similar task
related to the task-at-hand some time ago.
Furthermore, knowledge workers have to depend on personal communication
and ad hoc collaboration and communication tools, including email, to coordinate
their work. However, these tools do not unequivocally improve productivity:
Taglocity1 estimates that Intel employees spend 20 h per week to manage their
J.M. Gomez-Perez (*) • C. Ruiz
iSOCO, Intelligent Software Components, S.A., Avenida Del Partenon, 16-18, Madrid 1� 7a 28042,Spain
e-mail: [email protected]; [email protected]
F. Dengler
Institut AIFB, Karlsruhe Institute of Technology, Englerstr. 11, Building 11.40, Karlsruhe 76131,
Germany
e-mail: [email protected]
1 http://www.taglocity.com/.
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_6, # Springer-Verlag Berlin Heidelberg 2011
107
email, a third of which are unnecessary; Davenport affirms that “while all knowl-
edge workers surveyed used e-mail, 26% felt it was overused in their organization,
21% felt overwhelmed by it and 15% felt it actually diminished their productivity”
(Davenport 2005). In addition, knowledge workers deal constantly with multitude
of task. Thus, a knowledge worker typically spends a vast amount of time switching
to working on a different task before completing the task-at-hand. Whenever there
is a task switch, knowledge workers lose the work context that they have manually
built up, leading to distraction, frustration, and loss of productivity.
Besides all these factors, knowledge workers are engaged in processes. Some of
these processes are formal business processes typically defined by the organisation
focusing on describing how business processes take place for many years.
Languages for such processes now exist, e.g. WS-BPEL2 for process execution
and BPMN3 for process description. Moreover, business processes are static,
i.e. once defined they are recurrently used in the same application domain, treated
as stable, well-defined workflows with little or no variations in their execution
based on some business rules that guide decisions. As a matter of fact, business
processes are ill-suited with respect to change management, with evolution and
adaptability being neglected in the past, preventing possible reactions in case of
variable changing.
However, as a consequence of the technical evolution, the increasing exchange
of information and collaborative decision (Kotlarsky and Oshri 2005) among
workers (e.g. analysts, researchers, etc.) carrying out their activities in knowl-
edge-intensive applications, knowledge workers work around their own informal
processes. These processes are often called knowledge processes and exhibit a
totally different nature: they are quite flexible, neither standard, nor structured,
and it is the experience, knowledge and intuition of the knowledge worker that drive
the process to success. There is a huge variety of examples: the processes which an
organisational employee uses to obtain information on a customer from a variety of
sources; or the processes to set up a meeting, including booking a room, arranging
refreshments and notifying reception of the names of visitors; or the processes of
arranging the flow of technological operations in engineering design based on the
specific character of the design artifact and the skill profiles of involved designers.
They are frequently not written down, or if they are then only very informally. This
hinders their reuse even by their creators, and certainly hinders sharing between
colleagues. We need, therefore to provide assistance to knowledge workers in
creating, reusing, sharing and also improving on these knowledge processes.
While great productivity gains can be achieved in business processes by
formalizing fixed and frequently executed processes into computer processable
workflows, standardizing most work processes in a top-down manner is neither
economical, nor possible in practice. This is probably one of reasons why informal
2WS-BPEL (Web Services Business Process Execution Language) is an OASIS standard, see
http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf.3 BPMN is an OMG standard, see http://www.bpmn.org/.
108 J.M. Gomez-Perez et al.
knowledge processes, though acknowledged as essential, are poorly supported at
the enterprise level.
The purpose of this chapter is twofold: Firstly, we give details regarding the
notion of knowledge process with a deep comparison with business processes.
Moreover, we give a formal description of what we understand as knowledge
processes describing the complete knowledge process lifecycle. Secondly, we
show different tools developed to deal with and exploit knowledge processes in
order to empower knowledge workers’ productivity by significantly improving the
mechanisms through which enterprise information is created, managed, and used.
6.2 Knowledge Processes Within the ACTIVE Project
6.2.1 Some Definitions: Business Processes vs. KnowledgeProcesses
The term business process has received a lot of attention in corporations where the
value of creating them for an enterprise is in the intellectual assets that those
processes represent. A business process is usually understood as a set of coordi-
nated tasks and activities that will lead to accomplishing a specific organizational
goal, applied by a workflow which brings the details of the process. Nevertheless,
Davenport (2005) pointed already out the definition of business processes in
contrast of some others: “a structured, measured set of activities designed to
produce a specific output for a particular customer or market. It implies a strong
emphasis on how work is done within an organization, in contrast to a product
focus’s emphasis on what. A process is thus a specific ordering of work activities
across time and space, with a beginning and an end, and clearly defined inputs and
outputs: a structure for action. . . . Taking a process approach implies adopting the
customer’s point of view. Processes are the structure by which an organization does
what is necessary to produce value for its customers.” Moreover, concepts,
methods, and techniques from Business Process Management (BPM) (Weske
2007) support the design, configuration, enactment and analysis of these business
processes.
As opposed to traditional business processes, knowledge processes are defined
by Strohmaier (2003) as representations of complex knowledge flows. They depict
the generation, storage, transfer and application of knowledge that is necessary to
create products or services. In contrast to that, we do not focus on knowledge flows
and consider knowledge processes as (Still 2007) “high added value processes in
which the achievement of goals is highly dependent on the skills, knowledge and
experience of the people carrying them out”: e.g. a new product development
processes. Those informal processes are used by people to perform difficult tasks
requiring complex, informal decisions among multiple possible strategies to fulfil
specific goals.
6 Managing, Sharing and Optimising Informal Knowledge Processes 109
In general, both kinds of processes can be understood as a set of tasks and
activities that will lead to accomplishing a specific goal, but while business
processes are focused on organizational and highly structured goals at different
levels of details, knowledge processes do not take that vision into account and is
more focused in flexible and informal activities dependent on tacit knowledge and
experience of users. For example, while a typical business process might be “An
insurance claim process,” a knowledge process can be “Schedule a meeting” or
“Book a travel.” While the former is based on a set of formalized steps (e.g. fill up
an application form, wait for a response, etc.), the latter is very flexible, with no
defined steps or even up to the worker (experience should make users to check some
web sites instead of others). A comparison can be found in Table 6.1.
In addition, knowledge processes are usually created on the fly by knowledge
workers in every situation of their daily work. As soon as complex tasks arise
knowledge workers create these processes based on their experiences and skills but
also taking into account information about their current task context, and this
supports them as they navigate a knowledge process.
Usually there are many possible ways to achieve the process objectives or to
reach a certain goal. Though, the knowledge worker has to make a lot of complex
decisions to optimize his or her process and to reach the goal. For example, the
knowledge worker could decide to reduce a given quality to be able to deliver in
time or to deliver earlier.
Furthermore, knowledge workers use their connection to other workers to carry
out the process, because, most of the time, knowledge processes are collaborative.
By performing a process collaboratively it is possible that each task is carried out by
the most specialized, experienced and knowledgeable worker in that specific area.
Having a net of relations within the organization is a very important asset for people
executing knowledge processes.
It is extremely important to continuously improve knowledge processes, by
creating an environment through which they can evolve. It is crucial to establish
an adequate process context (the combination of technologies, procedures, people,
etc. that supports the processes). The process context must incorporate feedback
mechanisms, change evaluation procedures, process improvement methods and
techniques and must be flexible in order to be able to incorporate enhancements
in an agile but controlled way.
If the process is instantiated frequently and the instances are homogeneous, it is
possible to create great process models that dramatically increase the efficiency of
Table 6.1 Comparison: business process vs. knowledge process
Business process Knowledge process
Goal Business-goal driven User-goal driven
Scope Enterprise Individual
Nature Static Dynamic
Description Formal Informal
Guided Externally coordinated Ad hoc/spontaneous
Analyzed Monitored, analyzed, optimized Not monitored, emerging
110 J.M. Gomez-Perez et al.
the process. The best way to ensure process improvement is to generate an environ-
ment in which people are motivated, enthusiastic and passionate to provide feed-
back to the underlying knowledge process management system.
There is also the evolving area of case management (Weske 2007) in the BPM
community. According to Strohmaier (2003), events triggered from outside drive a
collaborative, knowledge intensive and dynamic process. Those events determine at
runtime which activities need to be performed by knowledge workers handling the
case and whether additional steps are required. In contrast to informal processes, the
control flow cannot be expressed in an explicit process diagram defined in advance.
6.2.2 Scenario: Hiring Process
Figure 6.1 provides a scenario concerning a hiring process which is a common
process within the enterprise (it is an adapted version of the hiring process you can
find in Hill et al. (2006)). This sample shows how a person – in this case Alice –
would perform certain tasks for the hiring process in her small company. The
company is so small that, so far, there is no valid business process defined for
hiring people; because of this, the process is not fixed or well-defined and it depends
highly on the skills and experiences of the hiring expert.
The following actors participate in this process: Dave, the hiring manager
(manager of the group with a vacancy) – Makes the ultimate decision as to whether
to hire a candidate; Alice, hiring expert – Performs the initial filter of candidates and
Informationabout
vacancy
Vacancy
AliceHiring Expert
BobAlice’s assistant
DaveHiring Manager
Boss
ChrisThe candidate
Post and Search Review Screen
Mail aboutjob
description
Add openposition toweb site
Ask tosearch forcandidates
Searchonline
services
Sendapplicantsper mail
Review andforward
Set upinterview
withcandidate
Phoneinterview
Takingnotes
Informabout
candidateSend offer
Fig. 6.1 Knowledge process example
6 Managing, Sharing and Optimising Informal Knowledge Processes 111
only forwards to Dave those that she deems appropriate; Bob, Alice’s assistant –
Performs administrative functions for Alice; Boss, Alice’s boss; and Chris is the
candidate applying for the job.
The following list explains the process steps and with whom and what Alice is
communicating:
1. Boss informs Alice that Dave has a vacancy.
2. Alice and Dave exchange e-mails on the text of the job description and then
meet to finalize.
3. Alice posts the vacancy in the online service with the text from the e-mail.
4. Alice asks Bob to search the online service for viable candidates (i.e., those
who have not explicitly applied for the job but have the necessary skills).
5. Bob searches the online service and sends Alice e-mails with viable candidates
by detaching the resumes from the online service and sending them as e-mail
attachments.
6. Alice reviews applicants from the online service (both those who have applied
for the position and those whom Debra found).
7. Alice forwards the candidates whom she deems viable in an e-mail to Dave.
8. Dave responds with an e-mail to inform Alice of the candidates whom he
wishes to pursue and those whom he wishes to reject. He also includes his
initial thoughts on the candidates.
9. Alice sets up interviews by phone for herself and the candidates.
10. Alice takes notes during the interviews by phone.
11. Alice informs Dave of the candidates whom she thinks Dave should send an
offer.
12. Dave sends an offer to the candidate.
6.2.3 Knowledge Process: A formal Extended Definition
As seen, previous definitions for knowledge processes are incomplete and do not
cover all relevant features knowledge workers need to face nowadays. Then, we
have extended that definition in the following way: A knowledge process is
• A loosely defined and structural ramified collection of tasks,
• not fully defined in terms of structure and the order of activities,
• in which activities require a decision by a knowledge worker about the follow-up
task,
• in which the actor knowledge worker uses her experience and expertise as well
as her working context to decide for the successor task,
• in which decisions have to been taken during execution time over the process
development path and lead to emerging structural ramification constituted by
admissible alternatives, and then, with dynamic ramification as one of the key
features.
112 J.M. Gomez-Perez et al.
6.2.4 Knowledge Process Life Cycle
In general, competitiveness of organizations depends on how they efficiently and
effectively create, locate, capture, and share organization’s knowledge and exper-
tise (Zack 1999). A similar statement can be applied to knowledge processes and,
then, four different stages can be defined as the knowledge process lifecycle:
• Definition – Capture and Acquisition. Knowledge processes are either created byknowledge workers or can be acquired from interaction with others.
• Share – Storage and Retrieval. This stage bridges upstream repository creation
to downstream knowledge distribution.
• Search. This stage represents the mechanisms used to make knowledge process
repository content searchable and how useful that information is.
• Compare – Presentation. The value of knowledge is pervasively influenced by
how it is visualized and the features such visualization tools provide.
6.3 Tools for Knowledge Processes within the ACTIVE Project
This section covers the main tools and approaches we have followed in the
ACTIVE project to deal with knowledge processes. These tools:
• Let knowledge workers create processes and tasks, and manage and structure
them at different levels of details. E.g. A knowledge worker can define a process
for scheduling a meeting as a set of tasks – complete the list of participants,
check their availability, fill out the request to arrange rooms and refreshments,
and wait for confirmation from administration
• Let knowledge workers create a process by recording their actions at runtime
• Let knowledge workers follow a step-by-step application to automate and
simplify some of the most common activities.
• Let knowledge workers visualize processes and how the different elements
relate to each other.
6.3.1 Framework for Knowledge Process Management
The goal of such a framework is to support knowledge workers in order to (re)
structure knowledge processes at runtime in a top-down and user-driven approach
(knowledge workers define their own processes, tasks, and related resources) in a
lightweight manner (processes still depend on the knowledge worker who is driving
the process). Thus, the framework provides the conceptual process model and data
structures as well as a set of services and applications to enable knowledge process
management by knowledge workers.
6 Managing, Sharing and Optimising Informal Knowledge Processes 113
The main components of the framework along with the connections among them
are all displayed in Fig. 6.2. In addition, the framework and its tools are integrated
as part of the ACTIVE Knowledge Workspace. These are:
The Task Pane – It is the front-end offering manual facilities to create and
manage processes at different levels of details helping users to manage their daily
work into processes and more fine-grained tasks. Also, it can be used to associate
resources, like documents, web address, or colleagues, with processes and tasks.
The knowledge worker can also import and export processes and templates or set
the active task in the Task Pane. This approach facilitates to reduce the time for
process switching – the Task Pane offers an immediate way to change between
processes- and alleviate context loss – the Task Pane manages the context in terms
of tasks and information resources associated to a process.
ACTIVE Knowledge Workspace Services
ACTIVETaskbar
Recording
TaskPane
TaskWizard
TemplateManager
TemplateRepository
Task Repository
Con
text
Min
ing
Ser
vice
patterns
Tas
k S
ervi
ce
Sem
antic
Med
ia W
iki Task
Manager
templates instances instances
Fig. 6.2 Framework for knowledge process management
114 J.M. Gomez-Perez et al.
The Task Recording (Fig. 6.3) – It enables knowledge workers to automatically
record the sequence of actions on his system (e.g. opening a document, browse a
web page). Conceptually, it is comparable to macro recording which was used in
many applications (e.g. Office Word).
The knowledge worker can start, pause, and stops the recording session by
pressing the corresponding buttons. The recording collects all the events and adding
some metadata (e.g. associated resource, title, keywords, etc.). After stopping the
recording, the user can rearrange the recorded sessions, structuring the sequence of
actions by introducing new processes and tasks or by grouping actions into existing
ones.
Task Wizard – The manipulation of daily activities, in form of templates or
instances, requires a considerable effort. However, it is likely to be easier to
perform this task using a wizard, especially for complex or infrequently performed
tasks where the user is unfamiliar with the steps involved. Then, the Task Wizard
guides knowledge workers through a set of steps with a particular purpose: select a
template or instance based on some keywords, tag cloud, or according a recom-
mendation; create a template or instance from a previous one or from scratch, where
the wizard guide knowledge workers with suggestions on task balance or related
tasks; finally, share templates and instances into local and community repositories
(e.g. storing templates in a Semantic MediaWiki).
Task Service – It is the core of the framework for knowledge processes.
It defines the knowledge process model and the corresponding service interface to
manage processes, tasks, and resources, besides other operations (e.g. tagging).
Furthermore it tracks the invocation of tasks defined by an actor, the time, and the
state the process is in. Such invocation information is exploited by the bottom-up
approach (mining actions and events to come up with process prediction models,
see Chap. 7).
Fig. 6.3 Task recording
6 Managing, Sharing and Optimising Informal Knowledge Processes 115
Semantic MediaWiki – This tool is used as collaboration and sharing platform
for templates. Thus, while knowledge processes are created on the client side,
domain expert can create templates (manually from scratch or from knowledge
processes, or using mining techniques) to be stored inside a Semantic MediaWiki to
be visualized, discussed and exchanged with other knowledge workers. A
screenshot of the Semantic MediaWiki as corporate knowledge process template
repository can be seen in Fig. 6.4.
Context Visualizer – In addition to these tools, a key factor for increasing the
productivity of knowledge workers is to help knowledge workers manage and
understand their daily collaborative processes. These are lightweight and highly
flexible processes carried out by knowledge workers and highly depending on the
context which defines the boundaries and environment of processes, and how
decisions are taken. Therefore, bearing in mind that each complex collaborative
process occurs in some context, the articulation of such a context would help
understand the underlying relationships within collaborative processes. However,
as any other complex system, contexts can be large in size and complexity, difficult
to understand and control. Thus, and although visualization of complex collabora-
tive processes can be addressed from different aspects, there is a need for new ways
of visualizing complex contexts which can assist knowledge workers in a better
understanding of their dynamics.
The Context Visualizer (Fig. 6.5) help users to understand complex collaborative
processes through the visualization of their corresponding contexts including
elements related to a working context (knowledge processes, people, and
resources), contextual information about context and elements, direct relationships
Fig. 6.4 Semantic MediaWiki as template repository for knowledge processes
116 J.M. Gomez-Perez et al.
between related elements in the context (in red), different icon size showing its
relevance in context, and filtering options (e.g. based on resource type).
6.3.2 Collaborative Process Development
As stated above knowledge workers should be able to define their own processes
and to share them as templates among each other. The components of the frame-
work for knowledge process management support the sharing of process templates
created by individual knowledge workers. The aggregate knowledge of a large
group is superior to the knowledge of one or a few experts. Thus, it is equally
important to provide knowledge workers with means to develop processes
collaboratively.
Knowledge workers have different experience with modelling processes. Thus,
there are usually knowledge workers involved, who are novice in process
modelling. Recker et al. (2010) have investigated how novice model business
processes in the absence of tool support. Their findings are that design representa-
tion forms chosen to conceptualize business processes range from predominantly
textual, to hybrid, to predominantly graphical types. They have also discovered that
the combined graphical and textual types achieve higher quality scores. Another
survey, analyzing the used modelling constructs of BPMN, shows that in most
BPMN diagrams, less than 20% of the BPMN vocabulary are regularly used and the
most occurring subset is the combination of tasks and sequence flows (zur Muehlen
Fig. 6.5 The Context Visualizer
6 Managing, Sharing and Optimising Informal Knowledge Processes 117
and Recker 2008). Based on these findings, requirements for collaborative process
development can be derived:
• Manual modelling support for novice users. Knowledge workers, who are
novice in process modelling, need manual modelling support, so that they can
create, extend and follow the processes without the assistance of an expert. The
tool requires a rich user interface providing the user with means for interacting
with processes in a highly intuitive manner. As a result this leads to trade-off
between expressivity offered to develop the formal process model and the
usability of the tool.
• Collaboration support. Knowledge worker must be able to discuss process
models asynchronously. Changes of the process model have to be tracked and
knowledge workers should be enabled to access the version history and to revert
to previous versions. In addition design rationales should be documented.
• Structured process documentation support. The process models must be stored
in a machine-processable documentation format, including semantic
representations. Knowledge workers must be able to interlink between process
descriptions and external resources
As a solution to support such a collaborative, distributed, and iterative process
development, we combined Semantic MediaWiki (SMW) (Kr€otzsch et al. 2007)
with the graphical open-source process editor Oryx (Decker et al. 2008), allowing
the use of formal semantics in combination with natural language to describe
processes. SMW was extended to be compatible with the Oryx to act as a process
knowledge repository. In addition, the graphical editor was extended to display and
edit wiki pages at the bottom of the screen as shown in Fig. 6.6; as a consequence
user can directly access the corresponding wiki page within the process editor.
Fig. 6.6 Wiki-based process editor
118 J.M. Gomez-Perez et al.
We support the Basic Control-Flow Patterns introduced in Russell et al. (2006).Every single process step (task) is represented as a wiki page belonging to category
Process Element and linked via the properties has Type to the corresponding type
(Task) and Belongs to Process to the corresponding process, represented as a wiki
page (process summary page) in SMW. An activity is the basic element of our
process. Depending on the granularity level of the process this can vary from atomic
activity, such as open a web page, to activities describing a whole subprocess.
To express the control flow of the process, we use edges in the diagram and special
predefined process elements (gateways). If an element has a successor we draw an
edge from the activity to the successor activity in the diagram and store this with the
additional property has Successor on the corresponding wiki page in SMW. For
more successors executed in parallel (parallel-split pattern), a Parallel Gateway isused in between the activities. An activity can have several successors, but only one
has to be selected and executed (multi-choice pattern). Therefore we use the Data-based Exclusive Gateway without conditions. The Data-based Exclusive Gatewaywith conditions is used to split based on a condition (exclusive-choice pattern).
A condition is stored as an n-ary property. The distinction between the synchroni-
zation pattern and the simple-merge pattern is realized by using the ParallelGateway and the Data-based Exclusive Gateway the other way round to merge
different branches of a process.
The advantages of such an approach are:
• The combination of natural language with formal semantics allows collaborative
modelling for both novice and experts. Textual and graphical elements can be
used interchangeably and complementarily. If the user does not know the
graphical representation of a process element, natural language can be used to
describe it.
• This approach uses an extendible underlying schema. Users can introduce their
own properties in the wiki by using the SMW property syntax on the process
element wiki page. Thus, processes can be linked to existing knowledge
structures (e.g. what input documents are used, can be made available and
processable for computers).
• Standard wiki features can be used for process modelling, like versioning, watch
lists, reverting, etc.
• SMW acts as a process repository where processes and their process semantics
are stored. Process knowledge can be linked, queried and displayed on process
pages and on other wiki pages.
6.3.3 Refactoring and Optimization
The variety and ways of carrying out knowledge processes and the underlying
complexity leads to the need of tools for refactoring and optimizing them in order to
help knowledge workers either restructure existing knowledge processes or select a
follow-up action, and eventually, to increase knowledge processes performance,
dynamicity and flexibility.
6 Managing, Sharing and Optimising Informal Knowledge Processes 119
For this reason, knowledge processes need to be quantified to make them
comparable in terms of the different factors which might influence such processes.
On the whole, three factors influence knowledge processes: Firstly, business pro-
cesses which might trigger knowledge processes; secondly, knowledge processes
which might trigger another knowledge processes of another person because their
social interaction (e.g. Bob has to prepare a presentation and could ask Alice to
provide some input to him because he knows she prepared some other related
presentation); thirdly, existing knowledge from user experience or even by using
document repositories or Wikis within the organization.
Based on this challenging motivation to make different knowledge processes
comparable in order to refactor and optimize them as part of the ACTIVE project,
we have developed a three-stage framework:
1. Metrics and measures to quantify knowledge processes
In order to make knowledge processes comparable, a set of measures and metrics
are defined to quantify knowledge process instances (also called knowledge process
traces). Those metrics are important to support semi-automatic refactoring of
knowledge processes and are based on the aforementioned factors. In particular,
metrics of business processes and workflows about complexity and costs for knowl-
edge processes were adapted. In addition to this, the metrics do also cover some
knowledge worker specific values based on decisions about follow up actions within
a knowledge process using his skills and context. This is an important distinction
between business processes and knowledge processes so that we could add addi-
tional metrics which reflects skills and roles of knowledge workers directly.
We categorized the metrics into: measurable metrics (like size, performance or
external costs), user-dependable metrics (like skill or feasibility), and qualifiable
measures (like quality of the result or satisfaction).
Furthermore, as aggregation of these measures, an indicator called knowledge
process trace indicator provides an overall metric to compare knowledge processes.
Such aggregation is performed by a logic score of preferences (LSP)-based approach
(Dujmovic 2005) which enables to express the knowledge process trace indicator as
a function of metrics of all these categories. This provides a highly flexible method-
ology to aggregate different metrics. Basically, the usage of weights for each of the
values makes this approach adaptable and tuneable for different perspectives.
2. Foundation for knowledge process optimisation
From the formal definition in the previous stage, the basic libraries for quantifi-
cation and calculation of metrics and measures were implemented in order to be
reused in visualisation, refactoring, and optimization.
3. Tools for refactoring and optimization of knowledge processes
As final step, we provide a tool to facilitate refactoring for knowledge processes.
A refactoring tool typically lets knowledge workers improve knowledge processes
by applying a series of small behaviour-preserving transformations (e. g. join two
tasks) even promoting reuse (e.g. copy a task from one knowledge process to
120 J.M. Gomez-Perez et al.
another) while the cumulative effect of each of these transformations is quite
significant. By doing such modifications in small steps you reduce the risk of
introducing errors while improving design, efficiency, and flexibility.
Figure 6.7 shows a screenshot of the tool divided into three main sections: on the
left, the knowledge process instance selected by the user to be compared and
refactored; on the centre, a set of related knowledge process instances. These know-
ledge process instances can come from two different sources: on the one hand,
because they may have been created from the same template; on the other hand, they
may have been detected by the mining algorithm; on the right, the knowledge
process template related to the knowledge process instance selected by the user.
The combination of these tools might be exploited in several types of scenarios.
For example, when a user starts such a knowledge process, typically only a small
amount of information about the process and the context is available. Thus, the log
data is insufficient for process mining. In this case, user interaction is exploited and
the process refactoring becomes semi-automatic which helps to optimize the process
more quickly and reliably. In addition to this we even might offer refactoring
recommendations highlighting main difference between knowledge processes.
6.3.4 Security and Privacy Issues
The scenario in this knowledge economy is more and more collaborative and
heterogeneous, where knowledge is defined, used and shared across different
groups and domains. Sharing this knowledge allows users to create social
relationships, forming working and knowledge-based groups beyond organizations,
Fig. 6.7 The refactoring tool
6 Managing, Sharing and Optimising Informal Knowledge Processes 121
but it also poses new challenges and risks in terms of the security and understanding
that new shared information.
This scenario raises the following two needs: (1) Knowledge workers need to be
provided with tools to create and manage virtual boundaries where knowledge is
shared with other workers. This means having flexible and powerful security
mechanisms and policies which can be defined dynamically and offer inference
mechanisms; (2) Knowledge workers need to be provided with visualization tools
to understand the complex relationships within knowledge processes.
These statements hold for any company where the typical scenario is collabora-
tive, decentralized and heterogeneous, and knowledge processes are shared across
the same and different domains and organizations. In this context, these virtual
boundaries, which connect different users and groups through the sharing of
knowledge processes, and which are tied by different security mechanisms, are
here called Knowledge Spheres.
In addition to the term, we propose a Knowledge Sphere Framework, a flexible
ontology-based framework to handle security and privacy and automate sharing
knowledge processes by knowledge workers across organizations. In a nutshell,
Knowledge Sphere Framework offers the underlying infrastructure to
manage Knowledge Spheres and their security policies and a graphical tool to
represent them facilitating an understanding of their relations and related contents.
In more detail, the main components are:
Knowledge Sphere Ontology – This ontology brings concepts and properties todefine how knowledge-based communities are created and formed, and how secu-
rity and privacy policies are defined and satisfied. The Knowledge Sphere Ontology
combines and extends two other ontologies: SIOC (Semantically-Interlinked
Online Communities)4 for describing information from online communities, and
used in the Knowledge Sphere Ontology to create Knowledge Spheres; and, the
KAoS ontologies5 for describing independent-platform policy originally oriented to
dynamic and complex software agents applications, and later adapted to grid
computing and Web Services, and used in the Knowledge Sphere Ontology to
define the security policies to be applied to Knowledge Spheres. While SIOC is
very limited in terms of security or access control mechanisms, KAoS does not
cover the actions associated to security policies required to disclose knowledge
processes (e.g. allow access and give modifications rights). An example of how the
Knowledge Sphere Ontology looks like is shown in Fig. 6.8 for the case of defining
a Knowledge Sphere – individualKS1 – with a positive authorization policy –
policyKS1_A – with two access actions: a read access action – actioniSO-COLabRead – for people from iSOCOLabGroupMadrid and a write access action
– actioniSOCOLabWrite – for people from iSOCOLabGroupMadrid having admin-
istration role.
4 http://sioc-project.org/ontology.5 http://ontology.ihmc.us/.
122 J.M. Gomez-Perez et al.
Knowledge Sphere Service – It manages all the related actions for managing
Knowledge Spheres, including a security module which infers if users have
permissions to carry out the action in the given Knowledge Sphere. This service
runs ontology inference on the Knowledge Sphere Ontology, transform security
policies into semantic rules, and eventually resolve the permission request.
An example of the Knowledge Sphere Service is as follows: A given the user –
John Sowa – wants to carry out an action – Delete a knowledge process – on the
Knowledge Sphere – Proposal making. The Knowledge Sphere – Proposal Making– has several policies assigned. For John Sowa being able to delete a knowledge
process there must exist a policy that grants him permissions. In this example, there
exists a security policy describing that members of the group – ProposalAdministrators – can execute the Action – Delete Knowledge Process – related to
the Knowledge Sphere. Since John Sowa is not member of this group he has no
rights to perform the action and delete the knowledge process.
Knowledge Sphere Visualization Tool (Fig. 6.9) – This is a graphical tool to
represent and visualize Knowledge Spheres, facilitating the understanding of their
relations and related contents. It comprises three main parts: the upper menu bar,
the right hand side Diagram Pane that shows the graphical representation of the
Knowledge Spheres, the left hand side Information Pane that shows information
about the elements explored in the Diagram Pane.
6.4 Conclusions
As a consequence of the increasing exchange of information and collaborative
decision among knowledge workers, the interest on knowledge processes
has risen. These knowledge processes are different to business processes with an
own nature: they are user-driven, informal, dynamic and adaptive, based on
Fig. 6.8 Knowledge sphere ontology example
6 Managing, Sharing and Optimising Informal Knowledge Processes 123
communication, not-well structured, dependent on skills, knowledge and experi-
ence of users. Moreover, the performance on knowledge processes marks the
productivity of knowledge workers and critically determines the eventual success
or failure of a corporation in the knowledge society
In this chapter, we provide an insight into knowledge processes as a key
challenge for modern organizations, the main differences with business processes,
and approaches to address their management. The success of an effective exploita-
tion and support of knowledge processes are based on providing useful tools and
applications for knowledge workers to deal with knowledge processes facilitating
their comprehension and according a concrete contextual factors which are relevant
to the process.
With this purpose, we present several building combined in a holistic view for
knowledge process management to enable the knowledge worker to improve his
daily work in terms of formal and informal processes (Fig. 6.10). Basically, the
steps can be combined as part of knowledge process lifecycle as follows: Define:
the knowledge worker defines a sequence of tasks and actions as a knowledge
process. In addition, a manager or a domain expert can define some process
Fig. 6.9 Knowledge sphere visualization tool
124 J.M. Gomez-Perez et al.
templates to be reused; Share: using the Semantic MediaWiki for sharing the
process templates with others and enable it for collaboration and improvements.
A similar statement can be applied for knowledge processes; Search: there are
various ways to search for processes, such the knowledge sphere visualisation;
Compare, the refactoring tool supports the comparison of process templates with
instances. This provides feedback about real executions compared to the envisioned
templates; Redefine: the features supported by the Task Pane and the knowledge
from the comparison can be used to redefine and improve knowledge processes and
templates.
References
Davenport TH (2005) Thinking for a living: how to get better performances and results from
knowledge workers. Harvard Business School Press, Cambridge
Decker G, Overdick H, Weske M (2008) Oryx – an open modeling platform for the BPM
community. In: Dumas M, Reichert M, Shan MC (eds) Business process management, vol
5240, Lecture notes in computer science. Springer, Heidelberg, pp 382–385. doi:10.1007/978-
3-540-85758-7_29
Dujmovic J (2005) Continuous preference logic for system evaluation. Department of Computer
Science, San Francisco State University, San Francisco. doi:10.1109/TFUZZ.2007.902041
Hill C, Yates R, Jones C, Kogan SL (2006) Beyond predictable workflows: enhancing productivity
in artful business processes. http://www.research.ibm.com/journal/sj/454/hillref.html. Accessed
24 Nov 2010
Fig. 6.10 Knowledge process lifecycle covered by the ACTIVE tools
6 Managing, Sharing and Optimising Informal Knowledge Processes 125
Kotlarsky J, Oshri I (2005) Social ties, knowledge sharing and successful collaboration in globally
distributed system development projects. Eur J Inform Syst 14:37–48. doi:10.1057/palgrave.
ejis.3000520
Kr€otzsch M, Vrandecic D, V€olkel M, Haller H, Studer R (2007) Semantic wikipedia. J Web
Semantics 5:251–261. doi:10.1145/1135777.1135863
Recker J, Safrudin N, Rosemann M (2010) How novices model business processes. In: Hull R,
Mendling J, Tai S (eds) Business process management, vol 6336, Lecture notes in computer
science. Springer, Hoboken, pp 29–44. doi:10.1007/978-3-642-15618-2_5
Russell N, Ter Hofstede AHM, van der Aalst WMP, Mulyar N (2006) Workflow control-flow
patterns: a revised view. Technical report BPM center report BPM-06-22, BPMcenter.org,
doi: 10.1.1.93.6974
Still K (2007) Exploring knowledge processes in user-centred design. Electron J Knowl Manag
5:105–114. doi:10.1.1.93.1675
Strohmaier M (2003) A business process oriented approach for the identification and support of
organizational knowledge processes. In: 4. Oldenburger Fachtagung Wissensmanagement,
Potenziale – Konzepte – Werkzeuge
Weske M (2007) Business process management: concepts, languages, architectures. Springer-
Verlag, Berlin, Heidelberg. doi:10.1007/978-3-540-73522-9
Zack MH (1999) Managing codified knowledge. Sloan Manage Rev 40(4):45–58
zur Muehlen M, Recker J (2008) How much language is enough? Theoretical and practical use of
the business process modeling notation. In: Bellahsne Z, Lonard M (eds) Advanced informa-
tion systems engineering, vol 5074. Springer, Montpellier, pp 465–479. doi:10.1007/978-3-
540-69534-9
126 J.M. Gomez-Perez et al.
7
Machine Learning Techniques forUnderstanding Context and Process
Marko Grobelnik, Dunja Mladenic, Gregor Leban, and Tadej Stajner
7.1 Introduction
Machine Learning techniques have been developing since the middle of the twenti-
eth century. Their original focus was on game-playing programs and then moved on
to finding regularities in database records (of patients, supermarket transactions,
credit card activity, stock exchange activity, etc.), modeling scientific measure-
ments (equation discovery, ecological modelling, etc.), speech recognition, user
modeling and spam filtering. Current applications include real-time data modeling
(real-time sensing behavior, tracking moving objects), semantic data annotation,
analysis of semantic sensor networks and monitoring real-time cyber data to track
interest and opinions. Capabilities of the available technology offer practical
benefits but also raise privacy issues (Mitchell 2009). In this chapter we focus on
the machine learning techniques themselves and do not deal with privacy or other
related issues.
7.2 Machine Learning
In general, we can say that Machine Learning seeks to answer the question “How
can we build computer systems that automatically improve with experience, and
what are the fundamental laws that govern all learning processes?” (Mitchell 1997).
More formally, a machine learns with respect to a particular task T, performancemetric P, and type of experience E, if the system reliably improves its performanceP at task T, following experience E. Depending on how we specify T, P, and E, the
M. Grobelnik (*) • D. Mladenic • G. Leban • T. Stajner
Artificial Intelligence Laboratory, Jozef Stefan Institute, Jamova 39, Ljubljana SI-1000, Slovenia
e-mail: [email protected]; [email protected]; [email protected];
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_7, # Springer-Verlag Berlin Heidelberg 2011
127
learning task might also be called by names such as data mining, autonomous
discovery, database updating, programming by example, etc. (Mitchell 2006).
For instance, we can have a task T: Categorize email messages as spam orlegitimate, a performance metric P: Percentage of email messages correctly classi-fied and experience E: Database of emails, some with human-given labels.Supervised machine learning techniques can be applied on the database of
e-mails to build a model for classifying e-mails as spam or legitimate. Usually
these would require representation of the data as feature-vectors, so we need to
define a set of features that can be used for describing e-mails and whose values we
can obtain from the database of e-mails (e.g., date of sending the e-mail, words in
the e-mail subject, words in the e-mail body) and sometimes including additional
knowledge sources (e.g., professional skills of a person receiving e-mail).
7.2.1 Data Representation and Modeling
Data points that are used in Machine Learning are usually represented as feature-
vectors in an n-dimensional space capturing important characteristics of the data.
Depending on the data that is represented, features can have numerical values
(e.g., size of an e-mail) or categorical values (position of an e-mail receiver in a
company organizational structure). When representing text, it is common to assign
a feature to each word in a document collection and represent each document as a
vector reflecting word frequencies (Mladenic 2007). When representing images,
features can correspond to pixels on an image.
The number of feature also varies depending on the data and task, in some cases
the original representation of the data is in a very high dimensional space and some
dimensionality reduction needs to be applied in the pre-processing steps of the data
analysis (Mladenic 2006).
In addition data points can have labels (e.g., e-mail can be categorized as
legitimate or spam) or not. Addressing tasks that needs labels means applying
supervised machine learning techniques (e.g., classification) for modeling the
data, while the tasks that do not care about labels suggests that unsupervised
machine learning techniques (e.g., clustering) should be applied for modeling the
data.
Following the previous example on e-mail filtering, we can have a collection of
e-mails that we would like to organize into folders assuming no folders exist.
Representation of e-mails using feature vectors can be similar to that for the
classification task, just that the output is a model grouping similar e-mails together.
These can be further used for classifying new e-mails as belonging to one of the
identified groups.
The latter is similar to the scenario that we have followed in modeling context.
First unsupervised machine learning techniques are used for context discovery,
based on the history of users’ activity. Then the current user activity is classified
128 M. Grobelnik et al.
into contexts and related resources are recommended for probable use in the near
future.
The rest of this section describes machine learning techniques that are directly
relevant for understanding context and process.
7.2.2 Multi-relational Clustering
The problem of identifying parts of data collections relevant for each context can be
approached as a multi-relation clustering problem. The motivation behind using
relational clustering for context modeling is the fact that a domain data model of
knowledge work can be modeled as a graph with multiple types of nodes: the
knowledge workers themselves are one type, the information resources they are
using are another, and the events that represent the uses of the resources are another.
Since the different nodes are also of distinct types, we cannot make the assumption
of identical and independent distribution of features, which is implicitly assumed
when using classic clustering techniques which are agnostic of node type (Banerjee
et al. 2007). For instance, assume that we have three tables. (1) A table of people
working on the contexts, giving some basic information on each person. (2) A table
of documents written in the contexts, having text of each document and a list of its
authors. (3) A table of events recording that a person has accessed a document at
some time (related to a table of people and a table of documents). In general, one
can assume that knowledge workers while working on one project, switch between
different tasks related to the project and that each task consists of a set of actions
performed in more or less fixed order. By multi-relational clustering we would like
to cluster people, documents and events so that events, related for the same project
are clustered together and events, related to the same action are clustered together.
Besides choosing an appropriate algorithm to fit the data model, another impor-
tant aspect of context discovery is selecting appropriate features (Stajner et al.
2010). This varies from domain to domain, but unless we have supervised examples
to do features selection from, we can resort to the following guidelines: at the data
level, we can distinguish between different contexts by different content keywords,
resources and people, involved in the events. This means that context discovery
uses literal names and affiliations of people as features, as well as the literal
contents of the document, since these are the features we have determined to be
important for describing a context.
7.2.3 Semi-supervised Clustering
While clustering has proven to be effective in exploratory and large-scale
context discovery, context modeling for the enterprise environment in some cases
imposes stricter standards on accuracy on context quality than is available from
7 Machine Learning Techniques for Understanding Context and Process 129
unsupervised methods. From the usability point of view, the semi-supervised aspect
also lets the knowledge worker feel more in control over the whole context
definition process, delivering overall better quality and more meaningful clusters.
In recent years, there has been a lot of interest in the development of semi-
supervised methods for clustering data which accommodate supervision from the
user. Most often, such supervision is in the form of constraints (Basu et al. 2008),
for instance using must-link and cannot-link constraints over pairs of data points
(Wagstaff and Cardie 2000) or cluster-level balancing and size control (Davidson
and Ravi 2005). However, instance-level models for supervision require that the
human supervisor understands and visualizes the whole collections of data points in
its entirety, which is rarely practical. On the other hand, cluster-level models may
be too coarse a tool for guiding the clustering in a desired direction.
As a complementary supervision technique, we can also apply feedback from the
user as conditional constraints (Dubey et al. 2010). For instance, instead of asserting
the exact pair-wise linkage constraints between data points, the user can give
assignment feedback, where he reassigns a data point to a more relevant cluster,
or alternatively, cluster description feedback, where the user modifies a cluster’s
feature vector to a more meaningful description.
These constraints are commonly implemented as iterative feedback loops
gathering supervision data in iterative steps from the user. They can also be inte-
grated into the k-means framework as an additional criterion for minimizing con-
straint violation while still maintaining low error. Experiments in (Dubey et al. 2010)
conclude that cluster-level assignment and description feedback enables faster
convergence and better results than pair-wise cannot-link and must-link constraints.
The application of semi-supervised clustering for context mining can be applied
both as assignment and cluster description feedback. When the user explicitly
assigns an information resource to a context, we can interpret that as assignment
feedback. On the other hand, cluster description feedback has also been imple-
mented in a different fashion: instead of letting the user change the context descrip-
tion, we let the user decide with a yes-no question whether the context is relevant.
7.2.4 Sequence Mining
Various flavors of Markov models, such as probabilistic deterministic finite
automata (Jacquemont et al. 2009) have been used for modeling processes, assum-
ing that a process can be seen as a collection of sequences. Modeling process then
can be addressed as identifying sequences in the data and connecting them to a
meaningful process model. The general idea behind this is that by learning a
conditional probability model for the sequences, we are able to implement predic-
tive functionality into the knowledge worker’s workspace. In practice, there are
several applications where predictive functionality based on sequence mining can
be used (Dong and Pei 2007).The first example is suggesting the most likely
information resource for a given moment. For this application, we look at the
130 M. Grobelnik et al.
history of the knowledge worker’s resource usage – internet browsing history,
opening various documents, communication via e-mail. From that, we represent
these individual events as actions, giving them a level of abstraction. For instance,
some deliverable document edits can be seen as edit-project-document actions. Theprocess model then captures the sequential dependencies between these events and
gives us a conditional probability of any resource being required at the current time,
given that we are aware of the immediate history of information resource usage. For
instance, imagine that we are editing a project-related document (such as a deliver-
able), The model can then suggest that of all possible different future actions, we
will likely send a message to a project partner soon or open a similar project-related
document. The advantage that this prediction provides is that we can maintain a list
of the top most likely resource in a given time, shortening the overhead of having to
look for them.
Figure 7.1 shows an example of such a sequence-based resource suggestion,
showing two distinct sets of recommendations. Furthermore, it demonstrates inte-
gration of both context and sequence mining – besides sequence information, we
also favor resources which are relevant for the current context.
Fig. 7.1 Different sets of resource predictions, the first predicting most likely resources after
opening a “validation and testing” document, and the second after opening a document with “sdk”
technical documentation
7 Machine Learning Techniques for Understanding Context and Process 131
When designing a sequence mining application, there are several different
aspects of the problem that require consideration in choosing a good model for
implementing them.
The first question that needs to be answered is the relationship between events
and actions: should we define them atomically as one-to-one, or can a single event
represent multiple actions? For instance, an ordinary email message may in some
domains only be an e-mail-to-project-partner, but in other domains, it may at the
same time be the action of e-mail-proposal-to-key-client. This requires us to model
a single event with multiple features, which is a different class of sequence models.
A second consideration is on the order of the model – a first-order model only
captures direct sequential dependencies, while a higher-order model can predict
more interesting and longer sequences, similar to what we can produce with n-gram
language modeling.
A third consideration is on the definition of “sequence” itself. Do we consider
the sequence to be strict, as in A-strictly-follows-B, or do we impose a less strict
A-before-B.All of these variables affect the complexity of the model and impose constraints
on the scalability of the approach while giving us richer models.
7.3 Context and Process
Context as addressed in this chapter is used as a term for grouping/packaging
information for a particular need. A criterion for selecting or prioritizing informa-
tion from a broader pool of information can be called a contextual model. We can
say that a good contextual model is the one which selects the right information fora particular need in some cases that may be related to personalized information
delivery.
Process as addressed in this chapter is seen as a sequence of tasks or activities
performed in order to achieve some goal. More about knowledge processes is
described in Chap. 6. Here we focus on using machine learning techniques for
automatically grouping the actions of the user into tasks and detecting common
sequences of tasks that can occur independently of the context.
In the research literature (Guo and Sun 2003; ECAI 2006; VUT 2007) we
can find different views on context depending on the aspect that is observed
and the purpose of using context. For instance, in Wikipedia context analysis
(http://en.wikipedia.org/wiki/Context_analysis) is described as a method to analyze
the environment in which a business operates. Depending on the aims of observing
context, context can be defined from a formal point of view, in terms of its
dynamics, scenarios of its usage, type of data it is modeled from, its usage/user
base, importance of efficiency when talking about context, acquisition of context or,
evaluation of context. Examples of context characteristics along several dimensions
are the following:
132 M. Grobelnik et al.
• Formalism: logic based versus probabilistic descriptions
• Dynamics: static versus temporal/dynamic contexts
• Scenarios: global model versus multiple local models
• Cross modal: modeling contexts across different data modalities (text,
multilinguality, social networks, audio, images, video, sensors)
• User aspects: context for single user versus user communities versus contexts
for machine processing
• Efficiency: expressivity versus scalability
• Acquisition: manual versus semi-automatic versus automatic approaches
• Best practices: economy of different application scenarios
• Evaluation: information compression (not really addressed yet)
7.3.1 Representation
Context representation as proposed in this chapter is built on the top of the Time-
Network-Text (TNT) store (Grobelnik et al. 2009) that enables capturing and
storing the data for further processing using context mining and task mining
approaches. The context mining capabilities of the TNT store aim at identifying
the main context of the knowledge worker from a streamed log of events. Contexts
are represented as abstractions of the TNT events, where each event has three
components: Text, Network, and Time as defined by TNT model. Notice that
several TNT events can overlap in the content of their components. For instance,
a person sending five e-mail messages appears in five TNT events as a sender and
that can be helpful in identifying context and tasks.
The representation of TNT events captures information about data objects
involved in the TNT event and it is used as a basis for context representation.
Basing context representation on data objects, we examine the characteristics of the
data objects that are available within an enterprise environment. In the proposed
context definition, each data object (e.g. documents, email, etc.) is represented as a
vector of a potentially large number of features. These features encode several
properties of the data; the following are the most important:
• Meta data about the application taking part in the event, email metadata,
document metadata
• Social graph of people, organizations, roles, social relations, entities in the
documents
• Terms from documents such as bag of words, named entities, concepts,
annotations
• Temporal information such as absolute and relative time, day of the week, part
of the day, seasons
• Relational information from ontologies or extracted relationships
The underlying data structure for the input data is defined in a general way to
capture the diversity of input data-sources, to capture temporal dynamics of data, to
7 Machine Learning Techniques for Understanding Context and Process 133
enable automatic building of models, to be guided by users’ suggestions if available
and, the last but not the least, to be scalable. The data structure we use to support the
above objectives is a dynamic network with extra information on nodes and links
as detailed in the description of the TNT store (Grobelnik et al. 2009).
7.3.2 Modelling
When considering methods for mitigating information overload of knowledge
workers, a common approach that is considered is partitioning the space of infor-
mation objects into several contexts. For our purposes, we define each context as a
grouping of information or data objects for a particular need. We refer to this
grouping as a context model. A knowledge worker can resort to manual context
model definition or modeling methods, based on knowledge management tools.
In an actual enterprise setting, the effort required for manual context model defini-
tion may often outweigh the benefits provided by partitioning information resources
in contexts. To lower the barrier to entry to exploiting the benefits of contextual
assistance applications, we can also consider using machine learning methods.
More formally, we define a context model as a function being in the form for
classification Contexti ¼ ContextModel(DataObjectj), or in a form for membership
queries Belonging ¼ ContextModel(Contexti, DataObjectj), or in a form of a scor-
ing function BelongingScore ¼ ContextModel(Contexti, DataObjectj), where
DataObjectj is in our case data on TNT event. As already described, we assume
that ContextModel can be
• Trained automatically from pre-labeled data (supervised learning, e.g. classifi-
cation algorithms like SVM)
• Trained semi-automatically from partially labeled data in user interactive
processes (semi-supervised learning, e.g. active learning algorithms like uncer-
tainty sampling)
• Discovered from unlabeled data (unsupervised learning, e.g. clustering
algorithms like K-means)
• Defined manually, for instance as a set of rules
Different approaches can be used that explore the trade-offs on varying amounts
of control that we allow the user to have over the context modeling process versus
the amount of modeling effort required, as well as describing possible design
decisions for developing solutions based on context models.
7.3.3 Deployment
Once we obtain an appropriate context model, we can deploy it for supporting the
user in her activity. When the context model is available, machine learning
techniques can be used to detect the user’s current contexts. Once the knowledge
134 M. Grobelnik et al.
worker starts doing something unrelated to his current context, for instance opening
a document related to another project, a shift is detected and the system may suggest
a switch to a different context – one which better reflects the user’s recent activity.
If the user then accepts this switch, her workspace is then put in another context.
There are several key points that should be addressed in the whole process of
context discovery and detection to make this functionality more usable.
First, experience suggests that users expected tightly integrated top-down and
bottom-up context definition, meaning that all information that they supplied to the
system should be also integrated in the discovery process. Therefore, newly discov-
ered contexts take into account all previous definitions and associations of contexts,
to avoid suggestion of a new context very similar to some existing one. This aspect
led us to improve on scalable online semi-supervised clustering algorithms which
we applied to context mining. Second, we found that correctness of context detection
was critical to knowledge workers, as false positive suggestions disturb their
workflow – an issue we wanted to avoid in the first place. As it turns out, when a
user associates (or does not associate) a resource with a context, it means muchmore
than a user accessing the information resource when in that particular context.
Therefore, fully supervised models are now used for context switch detection. In
terms of user involvement, this means that upon a successful context discovery, the
user is presented with a list of possible resources, out of which he selects the ones
which are relevant for learning a model for that context.
To summarize – for context discovery, which has a more exploratory nature,
semi-supervised methods are very practical in terms of suggesting interesting
relevant contexts with very little manual work. However, when actually supporting
knowledge workers in real-time by suggesting context switches (context detection),
highly accurate methods with stricter supervision are crucial.
7.4 Example Application: Context Mining on Emails
As a demonstration of context mining let us consider the domain of emails. In the
current technological era, emails definitely play a big part in our everyday life.
A May 2009 report by a research company The Radicati Group (Radicati 2009)
estimates that there were 1.4 billion email users in 2009 and the number is expected
to rise to 1.9 billion by 2013. The same source also suggests that there are 247
billion emails sent each day (notice that the majority of these emails are spam which
is automatically blocked by spam blocker) and predicts that the traffic will double to
507 billion emails per day by 2013. These predictions are definitely concerning,
since people are already overloaded with emails that they have to process, organize
and make sense of.
Despite email’s popularity, email clients unfortunately have not changed signif-
icantly since the beginnings of email in 1970s and still provide only the basic
functionality. The user interface of a state-of-the-art email client, such as Microsoft
Outlook 2010, displays a list of email folders on the user’s account, a list of emails
7 Machine Learning Techniques for Understanding Context and Process 135
and a reading pane. Email clients typically allow the user to sort the list of emails by
different criteria, such as the date or sender of the email, but there are no ways to
display emails related to a specific context. If the user would like, for example, to
see a list of emails that he recently exchanged with other partners on a certain
project then he basically has two options. He can either search the whole list of
emails and filter out irrelevant emails or he can create and maintain a special email
folder where he puts all emails related to that group of people. Both options are very
inefficient since they require the user to do all the work manually.
To help users be more efficient when handling emails we developed an
add-in for Microsoft Outlook called Contextify (Leban and Grobelnik 2010). It is
intended for heavy email users who receive tens of emails per day and possibly
work on several projects simultaneously. The main features that are currently
supported and will be described are: display of information related to the currently
selected email, social network visualization, visualization of discussion threads,
contact identity management, recipient suggestion, Facebook and LinkedIn inte-
gration, people and email tagging, and automatic folder suggestion. Contextify
supports Microsoft Outlook 2007 and 2010 and is freely available at http://
contextify.net.
7.4.1 Displaying Contextual Information in the ContextifySidebar
The Contextify add-in displays information in different windows. One of the
windows is the Contextify sidebar which is placed at the right side of Microsoft
Outlook, next to the email list and the reading pane. Examples of the sidebar are
displayed in Fig. 7.2. Information in the sidebar is updated each time the user
selects a new email and displays content related to the sender of the currently
selected email. The top of the sidebar shows sender’s personal information (Marko
Grobelnik in Fig. 7.2) such as phone numbers, job position, address, photo, etc. The
main part of the sidebar contains a tab control that displays recent emails sent to or
from this person and information extracted from these emails.
The first tab in the tab control contains the list of recent emails (see Fig. 7.2a).
For each email we display the sender’s name, the email’s subject and a short snippet
of the email body. For emails that are a part of a thread we also display a number
indicating the number of emails in the thread. If the user would like to read the
whole thread, he can select “Show Thread”option from the popup menu and
the content of the whole thread will slide into view. Below the list of emails is
also a tag cloud displaying the keywords extracted from the content of the displayed
emails. These keywords can help us to quickly recognize what are the topics
discussed in the emails.
The other three tabs in the tab control display information extracted from the
emails displayed in the first tab. The second tab in the control (see Fig. 7.2b)
136 M. Grobelnik et al.
(a) related e-mails (b) attachments
(c) social networks (d) web links (e) …when searching
Fig. 7.2 Examples of the Contextify sidebar showing the list of related emails (a), attachments
(b), social network (c) and web links (d). Contextify helping the user when performing a query
searching on the fly while the user is typing (e)
7 Machine Learning Techniques for Understanding Context and Process 137
displays the attachments exchanged in the emails. When different people collabo-
rate on a document they often exchange it several times while working on it. To be
able to easily see the whole history of a document, Contextify groups files with the
same filename and displays them all in the same sub-tree. Each attachment can be
opened or saved by clicking it. The third tab (see Fig. 7.2c) displays people who are
participating in the displayed emails. People are grouped based on their company
which is deduced from, e.g., e-mail address or LinkedIn account (see Sect. 7.4.4).
Clicking a person updates the sidebar with contextual information regarding this
person. The last tab in the control displays (see Fig. 7.2d) the web links that were
exchanged in the emails.
The Contextify sidebar also provides search capabilities as shown in Fig. 7.2e.
When the user enters the query in the sidebar’s search box, emails that match the
query are found and information extracted from these emails is displayed in the tab
control. Contextify currently supports three types of queries. The first type is the
person query where we search for information related to a particular contact (this
type of query is also performed each time we select a new email in Outlook). The
second type is the keyword query where we display information about emails that
contain all keywords that the user specified. The last type of query is a tag query.
The user is able to define custom tags and assign them to specific contacts. An
example where this can be especially useful is when the user would like to be able
to see emails from all people who work on a particular project. In such cases a tag
with the project name could be created and assigned to the members of the project.
When the user then performs a tag search, emails from all the members would be
displayed in the sidebar.
In the examples described we determined the to-be-displayed contextual infor-
mation based on the contact names, tags or search keywords. Alternatively, we
could also use similarity measures or machine learning techniques to find related
emails. Imagine, for example, that we are a person providing customer support.
Each day we receive tens of questions regarding the use of different features of our
product and we have to provide answers to these questions. Since it is likely that
different people have similar questions we can use answers to old questions at least
as a template for answering new questions. In such a scenario we could use a
similarity measure (such as the cosine similarity) and compare a new question with
the previous ones. The computed similarity could be used to rank the questions and
the most similar questions could be listed in the sidebar together with the provided
answer. Such contextualization would be extremely useful for the user as it would
save him countless hours of manually searching through the emails.
7.4.2 Advanced Searching and Filtering of Conversations
More advanced search and visualization functionality is provided in the Contextify
Dialog. The user can use the search box to specify a complex query consisting of a
combination of person names, keywords and tag names. Emails that match the
138 M. Grobelnik et al.
query are then displayed and grouped into threads based on the email’s subject.
Along with the individual emails in the thread we also extract and display keywords
for each thread which makes it easy to identify the content of the thread. An
example of the Contextify Dialog is displayed in Fig. 7.3.
The bottom left part of the dialog displays email activity (for the given query)
over time. Each bar represents the number of emails received in a particular time
span. The visualization is interactive – the user can select a subset of bars to display
only those emails that were sent in the selected time period.
The right side of the dialog contains a tab control. The first tab displays the social
network of people who participate in the displayed emails. Nodes of the graph
represent people while the edges indicate that there were emails exchanged between
the two connected persons. Font size and color is also used to indicate the intensity
of communication – people who sent more emails have nodes displayed using a
larger font and the darkness of the edge between two persons corresponds to the
number of exchanged emails between them. This helps to quickly identify who are
the main participants in the discussions and who they frequently communicate with.
The social network is also interactive. By selecting a contact all emails from that
contact are highlighted in the list of emails. By double-clicking a contact you can
even hide all email threads where the contact is not participating.
Fig. 7.3 Contextify Dialog where a search was performed using a contact’s name. Since Marko is
selected in the social graph all emails from him are highlighted in the list of emails
7 Machine Learning Techniques for Understanding Context and Process 139
7.4.3 Visualization of Discussion Threads
When we collaborate with a group of people it is common to have long email
discussions on a particular subject, often consisting of tens of emails. Such
discussions are typically very difficult to follow since it is hard to see who replied
to which email. What is even worse is that such discussions often have several
backchannels in which different subgroups of people are participating.
To help users more easily understand and follow the discussions we provide a
thread visualization feature. It is displayed in the Thread View tab in the Contextify
dialog. As mentioned, Contextify treats emails that have the same subject (after
removing prefixes such as “Re:”, “Fwd:”) as belonging to the same discussion
thread. When a message thread is selected, emails in the thread are displayed as
boxes in a flow diagram. The diagram starts at the top left corner with the first email
in the thread. The reply to an email is typically drawn under the previous email.
Exceptions to this rule are emails that start a new backchannel – either by adding
new participants or by removing some existing ones. In such cases, a new vertical
branch is created and all following emails with this specific group of participants
are then displayed in this branch. Boxes for emails contain the sender’s name,
attachments if they exist and a snippet of the email’s content. For emails that started
a new backchannel we also list the contacts that were added (in green color with a
“+” sign), contacts that were removed (in red color with a “�” sign) and the existing
contacts that are kept.
In Fig. 7.4a we see an example of a visualization of a relatively simple discussion
with 13 emails. We can see that the discussion was started by Tadej Stajner who sent
an email to Gregor Leban. Gregor replied and on the next day also sent an email to
Marko Grobelnik in which he removed Tadej from the discussion (hence the new
branch). All emails between Gregor and Marko are then displayed in the second
branch. For threads that are very complex with many backchannels, such as the one in
Fig. 7.4b, you can also use the zooming feature to get an overview of the discussion.
7.4.4 Identity Management
Most people own several email accounts. When searching for emails from a specific
person it is therefore important that we are able to show emails from all accounts. In
order to be able to do that, Contextify provides a Contact Management Dialog
where different email accounts that belong to the same person can be merged
together. An example of the dialog is displayed in Fig. 7.5.
Merging of contacts can be done manually or automatically. For automatic
merging Contextify uses the contact’s email address and name as provided in the
emails. This information is first preprocessed where we remove all non-letter
characters and try to obtain words that represent the contact’s name and surname.
A matching algorithm is then used that identifies contacts that are similar enough
140 M. Grobelnik et al.
Thread visualization with 13 e-mails in the thread
Thread visualization with 31 e-mails in the thread
a
b
Fig. 7.4 Two examples of the thread visualization; (a) a simple thread with 13 emails and (b) alonger thread with 31 emails and many backchannels. The visualization in part (b) is zoomed-out
in order to provide a general overview of the thread
7 Machine Learning Techniques for Understanding Context and Process 141
and should be merged. A more advanced approach for merging contacts would be to
also apply the analysis of similarities of social networks. If two email addresses
belong to the same person then it is likely that both email addresses will be used to
communicate with a specific group of people.
To obtain additional information about the contacts, Contextify can also connect
to user’s Facebook and LinkedIn accounts. This information is then displayed in the
top of the sidebar. Along with contact information, Contextify also manages
information about the companies associated with the contacts. When several
contacts specify on LinkedIn an association with the same company, Contextify
tries to induce which domain is associated with the company by inspecting the
email accounts of these contacts. If, for example, five of my contacts specify on
LinkedIn an association with Jozef Stefan Institute, I can discover from their email
accounts that the associated domain is most likely ijs.si. Using this information we
can now associate with Jozef Stefan Institute (JSI) also other contacts with email
accounts from this domain, even if they don’t manually specify this association.
Having contacts associated with companies is useful because company names
represent special tags that can be used when performing a search. Searching for
the JSI tag in the Contextify dialog, for example, displays message threads that
contain people who work at Jozef Stefan Institute (JSI).
Fig. 7.5 The Contact Management Dialog that can be used to manage information about contacts
and companies
142 M. Grobelnik et al.
The user can use the Contact Management Dialog also to create custom tags and
associate them with specific contacts. These tags can be then used when performing
a search in the sidebar or the Contextify Dialog.
7.4.5 Recipient and Folder Suggestion
Another nice example of contextualization provided in Contextify is the recipient
suggestion feature. When composing a new email, Contextify displays an addi-
tional sidebar in the email composition window. After the user adds a recipient we
list in the sidebar other contacts that have in past emails frequently appeared
together with the added recipient. By clicking on any contact in the sidebar, the
contact is added as an additional recipient. The list of suggested recipients is then
updated and shows also contacts who appeared in emails with the newly added
recipient. The list is updated each time a new recipient is added and is sorted so that
the contacts that will be most likely added next are placed at the top of the list. An
example of the sidebar is shown in Fig. 7.6.
Fig. 7.6 A demonstration of the contact suggestion feature. After we added Marko as the
recipient, Contextify automatically lists other contacts that we are also likely to add as recipients
7 Machine Learning Techniques for Understanding Context and Process 143
A new feature that we are currently implementing and which relies heavily on
machine learning methods is folder suggestion. People often organize emails into
folders. Moving each email individually into the corresponding folder can be very
time consuming and we would like to automate this process. In order to be able to
predict to which folder an email should be moved we need to build a classification
model. There are many possible learning features for building the model. People
often place emails based on the participants in the email; therefore one of the most
relevant features will definitely be the sender and all the recipients of the email.
A crucial feature will also be the email’s subject – not only because of the content of
the subject but also because emails that belong to the same thread should most
likely be in the same folder. Folder prediction could also be based on the content of
the email. In this case we would use the current folder structure and treat the content
of all emails in one folder as one large document. We could then apply the cosine
similarity measure to find to which document the current email is most similar and
move the email to the corresponding folder.
After representing e-mails with features we could use any of the classification
methods to build a classification model. The choice of the model depends on our
goals. If we are primarily interested in the accuracy of the predictions, we would
most likely achieve best results by applying the SVM algorithm. If we also need
to provide the user with the explanation for our prediction, then methods such as
if-then rules or decision trees should be preferred.
References
Banerjee A, Basu S, Merugu S (2007) Multi-way clustering on relation graphs. SDM SIAM,
Minneapolis, MN April 26–28, pp 145–156
Basu S, Davidson I, Wagstaff KL (2008) Constrained clustering: advances in algorithms, theory,
and applications. Data mining and knowledge discovery series. Chapman and Hall/CRC, CRC
Press, Florida, New York
Davidson I, Ravi S (2005) Clustering with constraints: feasibility issues and the k-means algo-
rithm. In: Proceedings of SDM, Newport Beach, CA
Dong G, Pei J (2007) Sequence data mining. Springer-Verlag New York Inc., New York
Dubey A, Bhattacharya I, Godbole S (2010) A cluster-level semi-supervision model for interactive
clustering. In: Proceedings of ECML PKDD 2010, Lecture Notes in AI, Springer-Verleg,
Berlin/Heidelberg
ECAI (2006) Workshop on context representation and reasoning. http://sra.itc.it/events/crr06,
Accessed date 5 Aug 2011
Grobelnik M, Mladenic D, Ferlez J (2009) Probabilistic temporal process model for knowledge
processes: handling a stream of linked text. In: Proceedings of the 12th international confer-
ence information society – IS 2009, vol A. Institut Jozef Stefan, Ljubljana, pp 222–227
Guo J, Sun C (2003) Context representation, transformation and comparison for ad hoc product
data exchange. ACM DocEng 2003: proceedings of the 2003 ACM symposium on document
engineering, ACM, Grenoble, France, New York, U.S., pp 121–130
Jacquemont S, Jacquenet F, Sebban M (2009) Mining probabilistic automata: a statistical view of
sequential pattern mining. Machine Learning 75(1):91–127
144 M. Grobelnik et al.
Leban G, Grobelnik M (2010) Displaying email-related contextual information using Contextify.
In: Proceeding of ISWC 2010, LNCS, Springer, Heidelberg, pp 181–184
Mitchell TM (1997) Machine learning. The McGraw-Hill Companies, Inc., New York
Mitchell TM (2006) The discipline of machine learning. CMU-ML-06-108 July 2006, School of
Computer Science, Carnegie Mellon University, Pittsburgh
Mitchell TM (2009) Mining our reality. Science 326(5960):1644–1645, http://www.sciencemag.
org/content/326/5960/1644.short
Mladenic D (2006) Feature selection for dimensionality reduction. In: Subspace, latent structure
and feature selection: statistical and optimization perspectives workshop, vol 3940, Lecture
notes in computer science. Springer, Berlin, Heidelberg, Hershey, USA, pp 84–102
Mladenic D (2007) Text mining: machine learning on document. In: Encyclopedia of data
warehousing and mining. Hershey [etc.], Idea Group Reference, cop., pp 1109–1112
Stajner T, Mladenic D, Grobelnik M (2010) Exploring contexts and actions in knowledge
processes. Workshop on context, information and ontologies, Lisbon
The Radicati Group Releases, Email statistics report, 2009–2013. http://www.radicati.com/
VUT – Vienna University of Technology, Austria, 2007, D2.2 design and proof-of-concept
implementation of the inContext context model version 1
Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of ICML
2000, Morgan Kaufmann, Massachusetts
7 Machine Learning Techniques for Understanding Context and Process 145
Part III
Applying and Validating the ACTIVETechnologies
8
Increasing Productivity in the Customer-FacingEnvironment
Ian Thurlow, John Davies, Jia-Yan Gu, Tom B€osser, Elke-Maria Melchior,and Paul Warren
8.1 Introduction
The objective of this case study was to evaluate novel information technology from
the ACTIVE project (Simperl et al. 2010) with people working in a demanding
customer-facing environment. The case study took place in BT, a major global
provider of telecommunications services. Specifically, our trialists were from the
sales community (salespeople and associated technical and product specialists) of a
division focussed on providing ICT solutions to the UK enterprise market. Trialists
were distributed across a wide geographical area. Indeed, one of their problems was
that they were not often able to meet face to face to share knowledge.
8.2 Users and User Requirements
We started by talking to the senior managers responsible for our trialists. In general,
as managers, they wanted better knowledge worker productivity. Specifically, they
wanted to shorten the time to create sales proposals and to improve their quality.
I. Thurlow (*) • J. Davies • J.-Y. Gu
British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE, Adastral Park, United Kingdom
e-mail: [email protected]; [email protected]; [email protected]
T. B€osser • E.-M. Melchior
kea-pro, Tal, Spiringen CH-6464, Switzerland
e-mail: [email protected]; [email protected]
P. Warren
Eurescom GmbH, Wieblinger Weg 19/4, D-69123 Heidelberg, Germany
e-mail: [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_8, # Springer-Verlag Berlin Heidelberg 2011
149
They saw information reuse through knowledge sharing as an important way to
achieve this.
Talking to the senior managers also helped us to further develop our initial
intuition of what information-related problems our trialists were confronted with.
Our task then was to confirm and refine that intuition with the trialists themselves.
We were working with two groups; a large group interacting directly with
customers and a small team responsible for coordinating creation of the more
complex customer proposals. Although their requirements were related, we needed
to deal with each group separately.
Both groups did share one characteristic, along with all BT people. They both
used the standard Microsoft applications, in particular Outlook, Word, Excel and
PowerPoint. It was essential therefore that, if ACTIVE was to be used, it worked
with and enhanced these applications. This characteristic was shared with the
Accenture case study described in Chap. 9 and was a major reason for the inclusion
of these applications into the ACTIVE Knowledge Workspace (AKWS) as
described in Chap. 5.
8.2.1 Working with Customers
The larger group consisted of sales consultants and their managers, and technical
specialists. The sales consultants were responsible for groups of customers. Some
had responsibility for hundreds of customers. These were generally desk-based. The
majority were responsible for tens of customers. Members of this group often spent
as much as 3 days a week out of their offices with customers. The technical
specialists had expertise in particular product ranges, and were used to provide
the technical input to business proposals. Our hypothesis was that, whilst working
at their desks, these people would all frequently switch their focus from one task to
another, as they responded to different customer requirements.
Two techniques were used initially to test this hypothesis and to generally gain
an understanding of how these people interacted with information, the problems
they faced in using information systems, and what they needed from such systems.
Firstly, a number of semi-structured interviews were held; some one to one, some in
small groups; some face to face; some by audio conference. Secondly, we observed
a few of our potential trialists over a period of around two hours.We did this remotely.
Specifically, we observed our subjects’ screens and how they interacted with
information via a Microsoft LiveMeeting session. We used Microsoft Communicator
to provide a separate audio channel for listening to our subjects’ telephone
conversations. We also used this audio channel for occasionally asking questions to
seek clarification, e.g. as to why certain on-screen actions were being taken.
Initially this procedure was adopted to save time, bearing in mind the wide
geographical spread of our trialists. However we believed that the technique had
the further advantage of being less intrusive than had we been physically present.
The subject knew that he or she was being listened to and observed, of course, but
150 I. Thurlow et al.
was potentially not so constantly conscious of this as would have been the case if
there had been an observer physically present. We were not able to test this
hypothesis, and this was not our purpose, but in any case we believed we had a
very effective mode of observing behavior.
These techniques gave rise to a number of conclusions. The subjects’ work was
often driven by email. That is to say, there was heavy use of email; and incoming
emails, along with telephone calls, caused people to switch tasks. In effect, people
were continuously switching their task context.
Despite this, and despite a significant use of the telephone, some of our subjects
reported problems arising from being isolated geographically from their colleagues.
As already noted, opportunities to meet and share knowledge were very rare.
A significant amount of time was spent using BT systems, e.g. CRM systems
to record interaction with customers. In addition, there was a great deal of use
of spreadsheets. The effect of multiple systems was to necessitate user
processes involving cutting and pasting from one system or application to another.
Effectively, users developed their own processes to accommodate this way of
working.
8.2.2 Building Customer Proposals
Our understanding of the proposal creation process and the requirements of the bid
unit who coordinate this process was built up through a number of face to face
meetings.
Whilst straightforward proposals are put together by individual sales
consultants, working with their personally-developed network of colleagues,
the production of more complex proposals is coordinated by a bid unit of five
people. Typically the bid unit receives an invitation to tender which contains
often many tens of questions. The majority of these questions can be answered
quickly by adapting text from previous proposals. There then remains a small
number of questions which are much harder to answer. The bid team then seeks
assistance from relevant technical specialists, who provide text for inclusion in
the proposal. This approach gives rise in particular to two problems. Firstly, in
the information-gathering phase, members of the bid unit need to be sure they
are accessing up-to-date information. Secondly, the fact that text has been written
by a variety of different authors creates inconsistencies in the proposals which
significantly affect their quality. Such inconsistency can be at the level both of
style, and also of fact. Even where a factual inconsistency does not materially affect
the solution being offered to the customer, it undermines that customer’s confi-
dence. An example might be a reference to how many similar solutions have been
offered in the past. A difference of one or two in the number quoted in different
parts of the proposal might not be material, but serves to reduce the quality of the
document.
8 Increasing Productivity in the Customer-Facing Environment 151
8.3 Gaining Feedback
8.3.1 Feedback from the “Mock-Ups”
The techniques described in Sect. 8.2.1 were used during the first year of the project
to give us an understanding of our trialists’ requirements. At the same time,
working with our partners in the ACTIVE project, we were able to gain a good
understanding of what functionality could be made available in our case study,
based on the technology being developed in ACTIVE. At this stage most of this
functionality was not yet available, but we wanted to get feedback from the trialists.
In particular, we wanted to be sure that the key requirements of the user community
were being met and to refine these requirements in the light of feedback. To do this,
we constructed a number of PowerPoint “mock-ups” to illustrate the basic func-
tionality. When presenting these to our trialists we stressed that we were not
concerned with particular questions of style, e.g. the use of icons, menus, fonts,
etc. At this stage we were interested in their feedback to the proposed functionality.
The mock-ups chiefly reflected the more general requirements of our larger
group of trialists, as described in Sect. 8.2.1. However, we did also show them to
the bid unit for their feedback. Apart from their own, very specific, requirements,
the bid unit shared with the other trialists the generic requirement to mitigate the
effects of information overload.
These mock-ups featured the following areas of functionality:
• Tagging, and the ability to search against a combination of tags and contexts, i.e.
for information objects tagged in a certain way and associated with a certain
context.
• Contextualised information delivery, along with the ability to create contexts,
select a particular context, and associate information objects with particular
contexts, as described in Chap 5. We also described the “bottom-up” features,
i.e. the ability to discover new contexts, to automatically detect a context switch,
and to automatically associate information objects with a context.
• Process recording, as described in Chap 6.
• A context-sensitive factbook. This was to be an application which could store
information, organised by particular contexts. It was motivated by the observa-
tion that triallists were frequently copying information from one application to
another. Sometimes this was done by cutting and pasting. At other times, the
information was manually re-keyed. Some people reported that they would make
a written note of information to be re-keyed, keeping a piece of paper or notepad
on their desk for just that purpose. The factbook was intended to be an electronic
version of these written notes.
We obtained very positive feedback to the first two of these. Moreover, there was
also a positive response to the use of context for knowledge-sharing. Specifically, if
colleagues share a context, e.g. relating to a particular customer, then information
placed on a shared server by one, and associated with that context, would be
152 I. Thurlow et al.
available for them all to use. Another advantage of our implementation of context
became apparent during the project. A number of triallists commented positively on
the ability to associate an information object with more than one context. They
contrasted this favourably with the single folder limitation of their current computer
filing system.
The response to our ideas on process recording were, in general, less enthusias-
tic. This may have been a problem in presentation. One individual commented that
the system “feels like I need to change the way I work to fit the system”. This, of
course, was not the objective; the system is designed to record how people work and
help them work as they want to work. We still felt that the technology was useful,
and it was developed as described in Chap. 6. However, as explained later, it was
not evaluated with our trialist group because of lack of time.
The response to the factbook was not at all favourable. This seems chiefly to be
because it was seen as another application, and individuals were already working
with a large number of applications; they felt they didn’t need another. As a result, it
was decided not to proceed with the factbook.
We also discussed and obtained feedback about a number of general issues. One
question was whether ACTIVE functionality should be on the desktop or integrated
into applications. There were mixed opinions on this. One comment was that
“integration into working tools will help uptake”. In fact, both approaches have
been adopted; a taskbar is present on the desktop and a toolbar integrated into the
common Microsoft applications.
The issue of trust arose with regard to the sharing of information. One person
asked “how do I know the information others have shared is correct?” Our tools
make clear the provenance of information; over time people build up confidence in
information from particular colleagues.
The view was also expressed that accurate search to find timely and relevant
information is a key determinant of whether a knowledge management system is
used or not. Clearly our system needed to provide high quality search.
8.3.2 Offline Working
A final requirement emerged at a later stage of the project. As we have already
noted, many of our triallists spent a great deal of time, often 3 days out of 5, out of
the office visiting customers. The original concept of the AKWS was that it would
employ a client–server architecture and, at least as originally developed, the client
functionality would depend on connection to a corporate server running, e.g., the
machine learning applications. It became apparent that, if many of these highly
mobile people were to use the ACTIVE functionality, they needed certain basic
aspects of that functionality to be available at all times. As an example, one person
wanted to use the tagging feature instead of the normal hierarchical file structure.
It was essential, therefore, that this feature was still available when disconnected
from the corporate network, as was often the case when our triallists were on
customer premises or travelling to customers.
8 Increasing Productivity in the Customer-Facing Environment 153
As a result, our colleagues in the project developed an offline version of the
client–server software. When the client was connected to the server, the user had all
the AKWS functionality available to him. From the end user’s perspective the
Workspace operations in offline mode are performed in the same way as in online
mode. The ACTIVE Taskbar and all ACTIVated Office applications can be used,
but the user is notified of working in the offline mode andWorkspace operations are
performed only on the resources from the Local Workspace. Resource replication
from the Enterprise Workspace to the Local Workspace works as follows: when-
ever a Workspace resource is created, read or updated, it is automatically replicated
to both the Enterprise and the Local Workspace. This means that when a user goes
offline he will have all Workspace resources he has been using so far in his Local
Workspace. In addition, the user can specify all Workspace resources he would like
to replicate explicitly and perform the replication to the Local Workspace before he
is going offline. While working offline, only the Local Workspace will be updated
with the changes the user is making and when returning online, a synchronization of
all new and modified resources he made locally will be made to the Enterprise
Workspace automatically. This offline version was able to be deployed with those
people who were frequently out of the office.
8.4 Deploying the AKWS
Within the ACTIVE project there was inevitably only limited time for deployment
and evaluation of the project functionality. We decided, therefore, to concentrate on
those aspects of the AKWS which previous feedback told us were most relevant to
our triallists. In particularly, we did not want to overwhelm our users with a large
range of new features. Our aim was to introduce to the users those features we
believed would be valuable to them, and to evaluate those features through user
feedback.
Specifically, for our evaluation activity, described in Sect. 8.6.1, we included:
• Information filtering by context, in particular in Microsoft Outlook,Word, Excel, PowerPoint, Internet Explorer and File Explorer. Combined
with this was the ability for users to create contexts, set their current contexts
and associate information objects with a particular context, as described in
Chap. 5. With this we included the ability, supported by machine-learning, to
automatically discover contexts, detect potential context switches, and
associate information objects with contexts. We also provided users with the
ContextVisualizer, again as described in Chap. 5, because we received very
positive feedback when we initially demonstrated this tool.
• Tagging, and the ability to search on a combination of tags and context.In particular, the system is able to recommend tags to the user, as illustrated in
Fig. 8.1. These recommendations are made partly on the basis of the contents of
a file and partly on the basis of what tags have been used by the same or other
154 I. Thurlow et al.
users to tag similar files. Here, similarity is based on the information retrieval
concept of cosine similarity (Hiemstra 2009). It was believed that automatic tag
recommendations bring two advantages. Firstly, this feature may overcome
some people’s reluctance to tag; caused in part by simply not being able to
think of a tag to use. Secondly, tag recommendation may help the user, or more
particularly groups of users, to converge on a relatively small set of tags, rather
than creating several tags for the same concept. Of course, a counter-argument is
that this feature may limit the user’s creativity in devising new descriptive tags.
• Access to the AKWS management portal. This provides the facility for users,
or an administrator on their behalf, to set their profiles. In particular, settings
exist to determine whether a user should approve the discovery of new contexts,
or whether new contexts can be created automatically; similarly whether users
need to approve a context switch, or whether it should just go ahead automati-
cally; and whether the tag suggestion facility should be initiated every time a file
is closed. The profile options are shown in Fig. 8.2.
8.5 Creating Better Business Proposals
Sect. 8.2.2 described the work of the bid unit. Their requirement was to create
customer proposals more quickly and with greater quality. The bid unit do not use a
content management system; rather the library of proposals are on a shared server
Fig. 8.1 Suggested tags
8 Increasing Productivity in the Customer-Facing Environment 155
and are searched and browsed with File Explorer. An Access database is used to
store key information about each bid. Moreover, once a team was brought together
to create a proposal, sharing of information was by email and exchange of Word
files; no specifically collaborative tools were used.
The Semantic MediaWiki was seen as an excellent solution. The semantic
features were seen as important to enable the bid team to locate the best previous
proposals. The basic wiki features were seen as valuable to encourage collaboration
between members of the bid unit and the technical specialists.
We also wanted to enhance the way in which the bid team obtained solutions to
the questions they were unable to answer themselves. To do this we used a
MediaWiki extension which enables the export of particular wiki fields to RSS.
Technical specialists throughout BT were encouraged to subscribe to this RSS feed,
which was used to disseminate these questions.
8.5.1 Storing and Accessing Information on the SMW
The SMW was not used to create whole proposals. Its editing features were not
regarded as adequate for this task. Instead, it was used to store key information
about proposals, customers and products; and also to capture the main requirements
set by customers and create the response solutions to these in the executive
summary which is a decisive part of the proposal. Figure 8.3 shows an example
of a bid proposal page in the SMW.
The semantic links between pages describing proposals, customers, products,
and also the solutions to questions posed to technical specialists creates a knowl-
edge base which can be searched semantically. Inline queries allow users to ask
questions such as which proposals use a particular product set; or which customer
requirements have been evaluated before in a particular industry sector. Figure 8.4
shows proposals that offer mobile products, displaying some details that can be
used to sort the items and Fig. 8.5 shows questions posed to technical specialists
that relate to a particular industry sector.
Fig. 8.2 ACTIVE profile settings
156 I. Thurlow et al.
When the knowledgebase is augmented with information about product
hierarchies and relationships, then this becomes more powerful. For example, if
a particular product name in the query is not known in the knowledgebase, then a
Fig. 8.3 The SMW, showing the key details of a bid proposal
Fig. 8.4 The SMW, showing proposals offering mobile products
Fig. 8.5 The SMW, showing questions related to the education sector
8 Increasing Productivity in the Customer-Facing Environment 157
semantically-related name might be known, e.g., a generalisation of the product or a
related product.
To support our users, we made use of a number of extensions to the MediaWiki
and SMW. For example, one extension enabled us to import comma separated
variable information into the SMW and associate its content with semantic
properties across proposal, product and people pages. We used this to import bid
proposal data from an access database that is constantly updated by the bid unit
team during their normal day-to-day work functions; so collaboration in the wiki is
based around this key information. For example, questions created in the wiki to
include in RSS feeds seen by technical specialists are associated with the relevant
proposals and in turn, with its semantic properties. These semantic property values
across the wiki are leveraged by using another extension to provide an improved
facility for searching, including facetted search. Where a user was also using the
AKWS, connection between the SMW and AKWS meant that context could be
used as one of the facets of the search. Figure 8.6 shows the facetted search
interface that allows users to find proposals based on keywords, bid type, bid status,
product area, contract value, free form tags created by users in the wiki, and AKWS
context. Where a user makes a facet selection, the facetted search options automat-
ically present the relevant subset of search criteria powered by semantic properties
in the wiki to align with the current selection and allow the user to further refine
their search.
Figure 8.7 shows the extension used for finding customers that are similar based
on their industry sector and size, both in terms of its workforce and its number of
sites where the user can then make a customer selection and view the bid proposals
associated with that customer. Another extension visualises bid proposals and
associated community questions in timeline formats so that users can view the
preparation time for bid proposals and the bid proposals that are cancelled or still
require submission. This offers another way for users to find proposals based on a
time dimension where the user can make a selection and view the proposal page and
also gauge the overall volume and completion rate of bid proposals. Figure 8.8
shows an example of this usage. Calendar extensions were used in the wiki for
quick access to bid proposals and questions month by month.
Other extensions that were used in the wiki include implementing header tabs on
pages that contained large amounts of information, whose content could also be
clearly categorised e.g. in Fig. 8.3 bid proposal pages have separated key details
from post-bid review data but the tabs still allow fast access and browsing between
this information compared to scrolling on a single page. Another extension allowed
users to log in to the wiki using their BT identification details that are used for other
systems and tools so reducing any barriers to using the wiki and maintaining
consistency with other enterprise systems that users may be familiar with.
Another extension offered enhanced capability for the wiki to evaluate
conditions and display information dependent on the result. This was used through-
out the wiki for various purposes that range from testing if an editable field
contained information and prompting the user to add information if tested false,
to testing whether a person belonged to a team and showing the relevant team page
158 I. Thurlow et al.
if tested true. This capability ensured that correct page links were displayed and a
user was able to view a link to further related information if this exists, maximising
the value of information in the wiki.
Fig. 8.6 The SMW, showing the facetted search for bid proposals
Fig. 8.7 The SMW, showing the facetted search for customers
8 Increasing Productivity in the Customer-Facing Environment 159
The Semantic Forms extension was used on every page in the wiki to present a
consistent way of editing information on a wiki page, which is often compiled from
multiple sources such as transcluded information from other pages using templates
and also displays values that change dependent on the evaluation of a condition in
the page source. It was important, therefore, to use forms to lower the barriers for
users to add information to a page, protect the source and enable new data to be
associated with semantic properties that can be used throughout the wiki.
8.5.2 Communicating with the Technical Specialists
In fact, we used a number of RSS feeds to pose the more difficult questions to the
technical specialists. The specialists were invited to subscribe to particular feeds
that contain questions associated with the latest bid proposals that were being
prepared by the bid unit team with varying levels of urgency. Each question
contains a link to the relevant wiki page where a specialist can contribute their
responses through a form to minimise the number of steps that a user needs to
participate and make a valuable response. This method of interaction with the wiki
used the RSS reader facility in MS Outlook that is regularly used by the sales team
and reduces the barrier of uptake by integrating the RSS feed with software that is
familiar to the users. Each RSS item can also be treated as an email item as well as
the entire feed being shared via email through Outlook. Figure 8.9 shows a users
inbox and the links that are available on each RSS feed item to make a response and
view further detail such as related uploaded documents.
An important issue here is that of motivation. Specifically, how do we encourage
the specialists to subscribe to the RSS feeds in the first place, and then how do we
Fig. 8.8 The SMW, showing timeline visualisation of bid proposals
160 I. Thurlow et al.
encourage them, when appropriate, to follow the link to the wiki and input their
knowledge. In principle there is even the danger that the technology will encourage
inaccurate responses from those who do not have the appropriate expertise; whereas
in the previous approach bid unit members targeted those they believed to have the
necessary knowledge.
8.6 User Tests and Field Trials
The development of AKWS and the Bid Unit Wiki was based on extensive
consultation with users and testing of partial applications and prototypes. The
results of these tests were numerous modifications of the application design and
bug fixes which assured that the finally completed prototypes were accepted in the
formal acceptance procedure of the BT bid unit.
During the last months of the ACTIVE project we undertook an extensive
evaluation exercise on all aspects of the case study. We wanted to understand
how the users reacted to the technology; to what extent it suits their needs; how
easy it is to use; and where there is scope for improvement. We also wanted to know
whether the technology was helping to achieve the organisational goals, e.g.,
whether it was making its users more productive; and whether it was improving
the quality of customer proposals, and the speed with which they could be written.
At the more abstract, scientific level, we wanted to help answer some of the
questions posed by the project. Specifically:
• Did the use of applications which integrate lightweight ontologies and tagging
offer measurable benefits to the management of corporate knowledge without
offering significant user barriers?
Fig. 8.9 RSS feed of bid proposal questions to technical specialists
8 Increasing Productivity in the Customer-Facing Environment 161
• Did the use of context help users cope with information overload, and make it
easier to switch frequently between separate tasks, which many of them do
frequently? Also, did the use of machine intelligence further support the users
by discovering contexts, detecting context switching, and creating associations
between information objects and contexts?
We organised two separate field trials; one for the AKWS where we were
working with the customer-facing people and their managers; and one for the
SMW where we were working chiefly with the members of the bid unit, but also
talking to those technical specialists who were receiving the RSS feeds.
Our focus here was on three aspects of the ACTIVE technology:
• The explicit (“top-down”) use of context by users. We wanted to know how
useful they found it; how easily they appreciated the concept; and how they
reacted to the user interface.
• The use of machine intelligence to discover contexts, detect context switches
and associate information objects with contexts, i.e. the “bottom-up” approach to
context. Here the principal question was the effectiveness of the algorithms, i.e.
how well did the results compare to the users’ own natural understanding of
context. We were also interested, of course, in the user interface, e.g., how
discovered contexts were presented to the user.
• The use of tagging to describe information objects. Here again, we wanted to
know how valuable users found the overall concept; how effective was the
algorithm which suggested tags; and how people reacted to the user interface.
8.6.1 Developing a Mature Prototype of the AKWS
The first concern was to assure that the functionality of the AKWS is well adapted
to the needs of the professional users for which it is intended. Although the needs
and requirements were analyzed carefully, these constitute necessary, but not
sufficient conditions for assuring the acceptance of the application by the prospec-
tive users. A series of iterations of design and testing was carried out, where
increasingly complete and refined prototypes were developed and tested with
appropriate user representatives and users. The results were used for identifying
shortcomings of the current prototype specification and bugs, for refining the
requirements for the application and for implementing the subsequent prototype
version.
Tests were firstly conducted with senior technical user representatives, then with
experienced users, with representative groups of users, and finally in field tests to
which the entire prospective user group (all members of the BT bid unit) and a
number of further professionals at BT were invited. In this manner it was assured
that the final application prototype corresponded firstly to the organizational
requirements, and both to the functional requirements and the needs of the users.
162 I. Thurlow et al.
Major results of the tests of advanced prototypes with users and user
representatives were significant modifications of the technical components to
improve performance (response time), significant improvements of the way in
which tags were recommended, and context detection and discovery was presented.
Additional requirements were formulated based on early user experience, such as a
re-design of the Task Pane and Task Wizard, and the ability to use the AKWS in an
offline mode.
The application passed the acceptance test by senior representatives of the BT
bid unit, which was a defined condition for proceeding with the field test. As a
summary, the application was regarded as sufficiently mature to ask the profes-
sional users to the BT bid unit to use the applications in their daily work, which
requires a significant amount of engagement from each individual user.
8.6.1.1 User Effort Imposed by New Functionality
One concern of user representatives, and an important barrier for professional users
when adopting new software applications, is the learning effort and the added
continuous effort imposed on users. The functionality of the AKWS demands a
certain amount of user intervention and response. Users have to tag, accept
suggested context switches (aka context detection), accept discovered contexts,
and associate documents with the new discovered contexts. Because it was
recognized that for users this is a non-trivial added burden, an independent analysis
was carried out.
Under controlled conditions an instance of AKWS was installed and systemati-
cally populated with a set of data (documents and URLs). A number of test persons
carried out defined tasks and the execution time was observed. The results indicate
the time required to execute the various functions of the AKWS (Table 8.1).
A function such as “Recommended tagging” requires a certain amount of time
(significantly longer than the time to execute the operation for entering data), due to
the need to read, make decisions, and select from the list of recommended tags. In
contrast, context tagging (context association) is carried out much faster, compared
to recommended tagging, provided that the user is in the context with which a
document is associated. Otherwise a context switch is needed before context
Table 8.1 Observed execution times for user tasks carried out with AKWS functionality
Task type Mean time to perform task (s)
Recommended tagging 00:55+
Context association 00:04
Context switch and context association 00:41
Context creation 00:34
Context switching 00:34
Search 00:36+
Discovered context 00:57+
Detected context 00:27
8 Increasing Productivity in the Customer-Facing Environment 163
association with the desired context can be performed and in this case the total time
to perform context association is much higher. It is notable that using discovered
contexts and responding appropriately is a tedious task and a lengthy procedure.
In addition, analytic models of the user procedures were constructed (GOMS
models (Card et al. 1983)). The results show in detail that AKWS functions add
cognitive complexity to the tasks of users. In combination with the measures of
execution times and the subjective assessments of the users this shows that the use
of AKWS does not provide additional information to the user for free, but that a
certain amount of investment is required by each user to learn to use new tools and
functionality.
8.6.1.2 Field Tests of the AKWS and the Bid Unit Wiki
The objective of field tests is to provide conclusive evidence which shows that the
new applications developed in ACTIVE can be used as intended, are accepted by
the target users, and provide value for the user organization. The specific context of
work of the target users of AKWS provides challenging conditions for conducting
valid field tests: the bid unit members work at variable locations, mostly out of the
office. Professional users do not follow prescribed work procedures, but adapt their
work to the current tasks, context of work, but also their personal capabilities and
preferences. This requires that flexible means to collect valid data remotely must be
constructed.
Thirty test persons were prepared to install and use AKWS. Of these 18 users
supplied data which were sufficiently complete to be included in the analysis of
data. The experimental design is a repeated measures design, where data were
collected at different points in time from each user.
Measures collected were
– Subjective user quality of the application
– Objective performance measures and performance self-assessment by users
– Added value assessment of the specific functionality included in the AKWS
– Monitoring of the usage of the functionality, indicating acceptance
The test procedure started by giving demonstrations of the AKWS via Microsoft
Live Meeting, and by inviting all members of the trialist community. The AKWS
Client application was installed on users’ workstations. Each user received
instructions for about one hour and was asked to familiarize with AKWS for about
3–5 days. During the test period of 3–5 weeks the users carried out their daily tasks
using AKWS together with their standard tools, mostly MS-Office applications.
User data were collected by means of a number of online questionnaires and user
interviews after the users had experienced the technology for a number of weeks.
We used proven and standardised instruments such as Software Usability Measure-
ment Inventory (SUMI) (Kirakowski 1996, 1998) and focussed questionnaires
and rating scales to assess the added value of the functionality in the AKWS
and Bid Unit Wiki. In the AKWS we also used a user monitor to allow us to obtain
164 I. Thurlow et al.
additional information about the usage of the functionality of the AKWS by users.
The events registered allow us to infer the acceptance and usage of the AKWS
functionality in the user population. In addition, user data determined which further
information would be collected from users. The objective was to assure that users
were only requested to provide information about functions which they had actually
used. As an example, if the user were presented with tag suggestions, the user
monitor might be invoked to obtain the user’s feedback on the usefulness of the
suggestions. The user monitor was designed so that the user was not queried too
frequently, and only about those functions which he had used.
8.6.1.3 Results of AKWS Field Tests
The ratings of user tasks characteristics indicate that context switching, search, and
information sharing is done frequently per day. Neither tagging nor editing and (re-)
using macros seem to be frequent activities of these users.
Quality of use assessment with SUMI. It is essential that the quality of use of
AKWS, even though it is not a mature product in the same sense as the other tools
used by the test users, is satisfactory. SUMI provides a standardized profile of the
quality of use of AKWS. The results show that AKWS fulfils the quality of use
requirements for the intended users, no single quality dimension stands out as
negative.
Ranking of features of the AKWS according to their usefulness. The test
persons were asked to rank eight major new features of the AKWS according to
their usefulness. The procedure forced the subjects to rank all features (no ties
permitted), as shown in Table 8.2.
The Kendall W statistic (W ¼ 0.22) indicates an acceptable degree of consis-
tency between subjects. A test of significance of the rankings using the Friedman
statistic yields an error probability of p < 0.01, i.e. the ranking sequence differs
significantly from chance.
The positive assessment of Manual Context Association, Context Filtering and
Manual Context Switching are reflected in additional remarks recorded by users
during the test, and by the frequency of manual context association and manual
context switching event logs. Our hypothesis is that manual context association
Table 8.2 Ranking of AKWS features according to usefulness for the users (18 subjects)
Rank AKWS feature Mean rank
1 Context association 2.9
2 Context filtering 3.1
3 Manual context switching 3.6
4 Automatic context detection 4.6
5 Workspace search 4.8
6 Recommended tags 5.5
7 Context visualizer 5.6
8 Automatic context discovery 5.8
8 Increasing Productivity in the Customer-Facing Environment 165
(aka context tagging) can be carried out efficiently by users, and is therefore
accepted readily, e.g. compared with recommended tagging.
Adoption rate and frequency of using the AKWS functions. Monitoring of
event counters gave the test administrators the opportunity to observe how users
were using the AKWS. The events monitored were
– Recommended tagging requests
– Context association
– Context switching
– Workspace search
– Acceptance and denial of Context detection and discovery
Over the usage period we have observed which functions are accepted by users
(Fig. 8.10). Context association and context switching events increase progres-
sively while recommended tagging request events, search events and context
discovery events are accepted much more slowly.
8.6.1.4 Discussion of AKWS Field Tests
In summary, the AKWS corresponds to user requirements and can be used to
perform the intended tasks of users. Some functions, especially context services,
were immediately accepted by users.
Users noted the slow response of the application, a technical implementation
issue. It seems that the users did not recognize immediately the benefit which
tagging in general and especially recommended tagging generate. Before the test
Fig. 8.10 Cumulated frequency of events over time. Context switching and context association
event counts increase throughout the test period. TGRQ, tag recommendation request counter;
CXAS, context association counter; SEAR, search counter (this is the workspace search web serviceinvocation counter); CXSW, context switch counter (this is the explicit context switch counter);
CDYN, denied context discovery counter
166 I. Thurlow et al.
users did not carry out tagging activities frequently. A solution for this problem
must be found on the organizational and management level, where incentives for
tagging and knowledge sharing must be provided. On the other hand, context
tagging (context association), which is a pre-requisite for context switching, is
accepted and used with an increasing frequency by users.
8.6.1.5 User Effort Versus User Benefits for AKWS Functions
Using the AKWS creates benefits for users, but also adds to the cost for users in
terms of time, cognitive complexity and workload, and learning cost. The results of
the lab tests carried out with the AKWS allow us to estimate the specific user costs
for using the functionality of the AKWS. Both the execution times under idealized
conditions, and the analytic results of the user procedures show that the user cost are
substantial, and that the user procedures create additional workload for the users.
Using the AKWS functions requires user procedures of different complexity. In
the case of “suggested context switch” the user just chooses between two responses,
“passive ignore” or “accept”, while “context discovery” must be answered by a
procedure comprising several actions.
It is more likely that users will accept to carry out tasks which have low cost in
terms of time and cognitive workload. Higher cost must be justified by greater
benefits. This relationship varies for different parts of the functionality of the
AKWS.We list the functions adding to user cost next to the user benefits in Table 8.3.
The benefit is convenience and improved personal productivity (such as in the case
of context switching), or information quality (such as in the case of search).
Table 8.3 Costs and benefits of using AKWS functionality
AKWS functions adding
to user cost User cost
Benefits for users (personal
productivity)
Context association
(with “manual” tagging)
+ create context
Low Ability to work in context:
context switching,
filtering, and visualization
Suggested context switch
+ accept or passive ignore
Low As above
Tag document by type-to-tag High Search (with added cost)
Tag by recommendation
+ select tag
High As above
Context discovery
+ accept context
+ name context
+ associate documents
+ set permission rights
High Ability to work in context:
context switching,
filtering, and visualization
Search
+ memorize metadata
+ select by visual search in tag cloud
8 Increasing Productivity in the Customer-Facing Environment 167
The results suggest that users accept context related functionality immediately
due to low user cost, and immediate benefit in terms of personal productivity
(context switching and context search). Some of the benefit created by tagging
(enabling efficient search) is also obtained from context tagging, which reduces the
incentives for users to use document tagging functions.
8.6.2 Field Test of the Bid Unit Wiki/SMW
The Bid Unit Wiki (BUW) was introduced to the Bid Unit Team (5 core users and a
community of 150 technical specialists) with a focus on the following three aspects
of the functionality:
• The use of the SMW to store and retrieve information about the customer
proposals. How effective are the semantic features in assisting with searching
and browsing the SMW?
• The use of the SMW generally as a collaborative tool. How effective is the BUW
in encouraging collaboration between the bid unit team and the technical
specialists.
• The use of the RSS feeds to pose questions to the technical specialists.
The initial uptake and usage of the functionality was observed, and
questionnaires and interviews were used to collect responses from users.
The uptake of the Bid Unit Wiki progressed steadily after introduction. The first
question to the community of technical specialists was posed immediately after the
release and was answered within 3 days. Users initially explored the Bid Unit Wiki
to test how it corresponds to their requirements, and then proceeded to use it as part
of their working environment. The responses of users give a differentiated view of
the acceptance and benefit perceived by the users.
The team recognizes the value of the BUW primarily in the function to install a
fast means of communication between the bid unit team and the large community of
technical experts. The corresponding functionality is rated as valuable and benefi-
cial for personal productivity. Spontaneous statements made by the users highlight
the added value of the Bid Unit Wiki in comparison with the existing enterprise
tools. The most attractive features were judged by users of the Bid Unit Wiki:
• “. . . utilising expertise from the wider BT community in assisting us with
responses . . . saving us time in searching which can be very time consuming.”
• “. . . to get responses to awkward questions promotes greater team working”• “Asking bespoke questions.”• “. . . ability to locate previous information”• “Speed and ease of use to locate information and gather new information.”• “. . . gives quick and easy access to a wide range of specialists and knowledge
that we would otherwise have to mail/phone people individually.”
• “Reaching out to specialist not directly involved in bids.”
• “. . . enabling us to obtain responses from the wider BT community, which is
better for us and our customers of course.”
168 I. Thurlow et al.
Based on the very positive reception by users, the decision was made to install
the BUW as a tool for collaboration at short notice.
The full power of the BUW for knowledge exchange in a large community of
technical experts unfolds itself when the community as whole adopts the BUW as a
tool for sustained use, which in the short time available was only just starting.
8.7 Conclusions
In our case study we set out to evaluate, both at the user and the organisational level,
a number of the technologies developed in the ACTIVE project. Our choice of
technologies was determined in part by the requirements of our users, but also by
the limitations of time, especially the limited ability of our professional users to
divert part of their time during their regular work to the adoption of new technol-
ogy. We would have liked, for example, to have been able to evaluate the process
tools described in Chap. 6.
We consider two of the project’s scientific hypotheses to be confirmed. It was
clear that the use of applications incorporating lightweight ontologies, as
implemented in the SMW and tagging, as implemented in the AKWS, can make
a material difference to productivity in our two environments. Moreover, context-
based information delivery, supported by our machine learning algorithms,
provides functionality which is valuable in managing information overload and
can reduce the disruptive effects of switching task focus.
The results of the detailed study of the user responses to the use of the AKWS
and the SMW in their daily work have shown that the context services are accepted
by users readily as a means to improve personal productivity. The use of AKWS
varies strongly with individual user habits. For example, users who have not been
tagging a lot before they used AKWS are likely to limit themselves to context
accepting, and might adopt tagging with recommended tags in the course of
intensified usage of AKWS.
A subject of further research, and an important question to answer for future
application, will be the cost/benefit ratio for users. It is clear that in order to enable
context detection or facetted search the user has to provide valid and reliable input to
the system beforehand. This has shown to involve user cost which may be weighted
disproportionally high because it demands additional work from the user in a work
situation where he or she is under high pressure imposed by other tasks.
The user monitor in combination with online questionnaires has been an effec-
tive and reliable instrument to collect empirical data in a distributed work
environment.
Acknowledgement ACTIVE project partner ComTrade implemented the user monitor described
in Sect. 8.6.1.
The research leading to these results has received funding from the European Union’s Seventh
Framework Programme (FP7/2007-2013) under grant agreement IST-2007-215040.
8 Increasing Productivity in the Customer-Facing Environment 169
References
Card S, Moran T, Newell A (1983) The psychology of human–computer interaction. Lawrence
Erlbaum Associates, Hillsdale, NJ
Hiemstra D (2009) Information retrieval models. In: G€oker A, Davies J (eds) Information retrieval:
searching in the 21st century. Wiley, Chichester, UK
Kirakowski J (1996) The software usability measurement inventory: background and usage. In:
Jordan PW et al (eds) Usability evaluation in industry. Taylor & Francis, Brighton, pp 169–177
Kirakowski J (1998) SUMI User handbook. Human Factors Research Group, University College
Cork, Ireland
Simperl E, Thurlow I, Warren P, Dengler F, Davies J, Grobelnik M, Mladenic D, Gomez-Perez J,
Ruiz C (2010) Overcoming information overload in the enterprise: the active approach. IEEE
Internet Comput 14(6):39–46
170 I. Thurlow et al.
9
Machine Learning and Lightweight Semanticsto Improve Enterprise Search and KnowledgeManagement
Rayid Ghani, Divna Djordjevic, and Chad Cumby
9.1 Introduction
Enterprise search and knowledge management tools are amongst the most
commonly used tools within enterprises today. This is mostly due to the fact that
most organisations today have enormous quantities of information in their
enterprises data warehouse and enterprise document repositories have been getting
larger. The challenge facing knowledge workers in these companies is how to
harness these enterprise-wide resources and how to use them when and where
they are needed most. The majority of the tools deployed in enterprises for search
and knowledge management are fairly generic and provide the same content to
every knowledge worker regardless of their context and task. A typical enterprise
search tool today doesn’t look much different than a typical web search engine,
even though the goals and functions of the users of these two types of systems and
the content they are operating on are vastly different. There has been a lot of
research focused on web search in the past decade (Wang and Zhai 2007; Jansen
and Spink 2006; Joachims et al. 2007) while enterprise search hasn’t received much
attention from the research community (Mukherjee and Mao 2004). The motivation
for the work described in this chapter came from several discussions and interviews
we conducted in Accenture to understand the needs of enterprise knowledge
workers. There were four key areas we identified that could be improved:
• Context and task-sensitive access to information
• Access to fine-grained reusable modular chunks of information
• Intelligent workflow/process support
• Document analysis and checking support for collaborative document
development
R. Ghani • D. Djordjevic • C. Cumby (*)
Accenture Technology Labs, Rue des Cretes, Sophia Antipolis, France
e-mail: [email protected]; [email protected];
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_9, # Springer-Verlag Berlin Heidelberg 2011
171
We describe these areas in more detail later while the rest of the chapter
describes our approaches to solve these challenges in the context of two critical
enterprise knowledge management tasks: enterprise search and collaborative docu-
ment development.
We describe our work on augmenting an enterprise search tool with context
mining, process mining, and visualization technologies that use a combination of
bag of words and lightweight semantic representations, making users more produc-
tive and efficient. We also describe two support tools that were developed to help
improve the quality of documents as well as improve the efficiency of the collabo-
rative document development process. All three of these were tested extensively at
Accenture with enterprise users. As a result of those tests and user feedback,
the enterprise search engine (SABLE) as well as the document development
toolkit were deployed and made available to over 150,000 Accenture employees.
We present evaluation results using usage logs as well as questionnaires that show
that these prototypes are effective at making consultants at Accenture more efficient
as well as helping them find better information while at the same time being easier
to use than existing enterprise tools for these tasks.
9.2 Improving Enterprise Search with Machine Learningand Lightweight Semantics
As mentioned earlier, even though enterprise search tools get used by a large
number of users for a broad set of tasks, the majority of these tools are still fairly
generic and provide the same content and results to every knowledge worker
regardless of their context and task. We identified a need for providing context-
sensitive and process-aware access to information that is automatically tailored to
the particular needs of the task each user is engaged in. Current enterprise search
tools also typically return a list of documents where each document can be hundreds
of pages long. This is common among enterprise document repositories where
Microsoft Word or PowerPoint files can have hundreds of pages or slides making
it difficult for users to find the relevant content within them. Often, enterprise users
are trying to find a specific object that they can reuse for their current need. This
is common when users are looking for certain types of document sections or
PowerPoint slides that can be quickly customized and re-used. Based on our
discussions with users, an important need that recurred was the need to retrieve
reusable “chunks” of content from enterprise search engines, instead of large
monolithic documents. These “chunks” include objects such as sections of
documents or graphics (such as an architecture diagram) that can be reused in
future documents.
In order to tackle these shortcomings of existing enterprise search tools, we
developed SABLE, a research prototype for enterprise search with the following
capabilities (illustrated in Fig. 9.1):
172 R. Ghani et al.
1. Document search: Allows users to search for documents
2. Expert search: Allows users to search for experts within the enterprise. More
details about this are in Cumby et al. (2009)
3. Graphics search: Allows users to search for graphics of specific type
4. Fast previews: Fast, visual previews of documents allows users to quickly
preview documents before downloading them
5. Personalized contexts: Allows personalized re-ranking of search results based
on automatically inferred contexts (illustrated in Fig. 9.2)
6. Visual “soft” filters: Allows users re-rank documents based on visual soft filters
(illustrated in Fig. 9.3)
7. Search collaboration: Allows users to collaborate with and contact other
Accenture users who are looking for similar information
This chapter will focus on capabilities 3, 5, 6, and 7.
9.2.1 Graphics Search
Corporate digital libraries consist of business documents comprising not only of
text data but also of graphics that get reused by knowledge workers working on new
documents. These graphics include process flow diagrams, organizational charts,
architecture diagrams, logos, and graphs which are embedded in documents but not
Fig. 9.1 SABLE search engine
9 Machine Learning and Lightweight Semantics to Improve Enterprise Search 173
Fig. 9.2 Based on the search behavior of similar users, SABLE automatically comes up with
personalized clusters for every user. Clicking on a cluster or moving the “Red Ball” to a cluster (or
in between clusters) re-ranks search results biasing themmore towards the clusters near the red ball
Fig. 9.3 “List-based” search filters only allow users to select a single value. SABLE contains
Visual Filters to let users select multiple filter values by placing the red ball in between several
values
174 R. Ghani et al.
individually available for users to search for and retrieve. Even if a user knows that
he’s looking for a specific type of graphic, say an architecture diagram for the
Microsoft Enterprise search solution, they first have to search for documents,
manually browse hundreds of pages of content and then hope to find the relevant
architecture diagram. Getting access to specific, reusable, pieces of graphics was
a key requirement that came out of our discussions with users of enterprise search
systems at numerous large companies.
We developed a machine learning approach for graphics classification that
automatically extracts graphics from corporate documents and classifies them
into enterprise graphics taxonomy and enables graphics search functionality to
augment traditional enterprise search. We augmented our classification algorithms
with active learning (Settles 2009) to make the system adaptive. As a result, we
developed (1) a method to automate the creation (extraction and consolidation) of
reusable graphics from legacy MS Office documents, (2) analysis of feature extrac-
tion techniques that combine text, OCR, visual features and structural features and
experimental evaluation of the contributions of these different feature sets for
graphics classification, and (3) empirical evaluation of existing classification
approaches and different sample selection strategies for active learning for graphics
classification. More details of the algorithms and experimental results concerning
the graphics classification are described in Djordjevic and Ghani (2010).
We apply this graphics classification capability to the corporate repository at
Accenture resulting in the extraction and classification of over one million graphics.
These graphics are then indexed by a search engine and integrated into SABLE
where a user can type in a text query and select a graphic type (org chart, process
flow, architecture diagram, etc.) and retrieve relevant graphics. We developed this
graphics search engine by indexing the graphics using the text that appears near (or
within the image), and the categories that were assigned to it by our graphics
classification system. We use different term weighting techniques such as weighing
the titles of slides containing graphics more than the body of the slide. The text from
the shape objects is incorporated in the indexing and in case the slide has no text, the
rest of the document (global information) is used for capturing textual similarity.
The classification scores are further used to re-rank the retrieved list of results.
9.2.2 Personalized Contexts
Typical enterprise search systems today return the same search results regardless of
the user, context, or task. We developed the notion of personalized contexts that
integrate context mining and process mining to deliver personalized results that
are relevant to the user and his current context and task. The personalized contexts
feature was enabled by the following technologies:
1. Context Modeling
2. Context Mining
9 Machine Learning and Lightweight Semantics to Improve Enterprise Search 175
3. Process Mining
4. Context Detection
5. Context Visualization and Switching
6. Resource Re-Ranking
The rest of this subsection describes these components in further detail.
9.2.2.1 Context Modeling
For context modeling, we represent an event as User U accessing a resource
(typically a document) R at time T. U is represented as a set of lightweight semantic
features of users at Accenture such as level, office location, and a set of skills. We
experimented with two kinds of representations for R: a model using lightweight
semantics in the form of metadata that KM curators had tagged documents with
(document type, relevant product offering, relevant industry, etc.) and a bag of
words representation. The experiments conducted to test the effectiveness of
context mining showed that the lightweight semantics representation performs as
well (and sometimes better) as the bag of words representation while being more
efficient in terms of storage and processing time. Our deployed implementation
therefore uses the lightweight semantic representation for R.The data used for context modeling and mining consists of three kinds of
information:
1. Usage logs: Large database of search and access logs with timestamps from our
corporate enterprise search engine spanning 147,000 users, 134,000 documents
and 7.2 million actions over 2 years
2. Document data: Information about the document repository including light-
weight semantics (in the form of metadata about the content, filled in when the
content was uploaded based on a predefined company-wide taxonomy)
3. People data: Structured database of skills information with organizational infor-
mation (company groups, office location, and career level) for each employee
and a list of self-selected skills, along with proficiency rating and the number of
years of experience etc. This database contains the above-mentioned details
for over 170,000 employees
9.2.2.2 Context Mining
After we represent an event E as <U,R,T> for each user U accessing resource R at
time T, we then perform context mining using hierarchical clustering algorithms
(Manning et al. 2008) on all events present in our data set. The details of some of the
context mining algorithms are provided in Chap. 7. As a result of context mining,
we obtain a list of approximately 1,000 generic contexts for the entire enterprise.
These contexts are internally stored as a set of two similarity matrices – one that
contains a membership score for each < person, context > pair and the other
176 R. Ghani et al.
contains a membership score for each < document, context > pair. These matrices
are indexed and stored in memory to speed up contextual information delivery.
The context mining component runs periodically and updates the list of generic
contexts across all users and resources.
9.2.2.3 Process Mining
The process mining component also runs periodically to update the list of informal
sequences of actions across all users and resources. We use the same data as in
context mining. The Process Mining component is based on data mining techniques
to obtain a probabilistic process model. The component constructs probabilistic
temporal models that detect patterns of sequential user actions. For action
modeling, a multi-relational clustering approach is used, where events are logs of
users accessing documents described with a lightweight taxonomy and unstructured
text. The Process Mining component uses Markov Models for discovering frequent
sequences of actions. The processes are internally stored as probabilistic sequences
of meta-data fields. For example:
itemtype ¼ proposal material -> document_type ¼ powerpointis a sequence that might be discovered as a result of PowerPoint documents
being accessed frequently after documents tagged as proposal material. The goal
is to integrate this functionality into SABLE to help infer resources that will be
accessed next and make the search process faster.
9.2.2.4 Context Detection
When a user logs on to the SABLE system and does a search query, we use the
person-to-context similarity of the logged-in user and the document-to-context
similarity of the top 100 documents returned by the user’s search query to generate
a ranked list of global contexts that are most applicable for the current user and
his current needs. We then take the top n contexts as relevant contexts where n is
currently set to 8 for SABLE.
9.2.2.5 Context Visualization/Switching
The context visualization component is given a ranked list of contexts that have
been identified as most relevant to the user and his current information need. These
contexts are displayed in a 2D representation, allowing the user to manually select
nearby contexts and get new search results. They are automatically labeled with the
top scoring metadata values of the context centroid. The user has the ability to look
at the top contexts and move the focus to signal his current context. If the focus
is in-between several contexts, the documents retrieved are re-ranked based on
a weighted sum of the nearby contexts (as shown in Fig. 9.2).
9 Machine Learning and Lightweight Semantics to Improve Enterprise Search 177
Overall, this component displays the contexts that are inferred to be closest to
the user’s current context and allows the user to manually select nearby contexts
which results in the re-ranking of search results.
9.2.2.6 Resource Reranking
When a user uses the context visualization/switching component, the search results
get reranked using the context and process inferred by the respective components.
For each context, the context mining component has a document-context member-
ship score. Based on the previous k documents viewed, the process mining compo-
nent also returns a probability distribution over the metadata values for the next
most likely document of interest. We combine both of these components and
compute a ranked list of documents relevant to the current context (for the current
user) as a function of previous documents viewed. This re-ranked list is then shown
in SABLE.
9.2.3 Visual “Soft” Filters
The context visualization component is extended to display multiple search filters
in a 2D representation labeled with the most frequent metadata values and allows
the user to manually select co-occurring meta-data values and filter search results.
9.2.4 Search Collaboration
LiveNetLife provides semantically enabled social browsing. We use LiveNetLife to
create awareness among knowledge workers who are working in similar contexts or
on similar tasks and using context information from the context identification
component.
9.3 Improving Collaborative Document Developmentwith Machine Learning and Lightweight Semantics
The ability to collaboratively create documents is a common activity for most
product and service organizations. Most large organizations have a distributed
team of experts working together to create documents for a variety of purposes.
These documents can be project proposals written in response to Request for
Proposals (RFPs), training materials, marketing and sales materials or product
178 R. Ghani et al.
manuals. Typically, the development of these kinds of documents requires the use
of a combination of collaborative formal and informal processes. Depending on the
requirements of each document type, skilled teams must be identified and assem-
bled by the project manager tasked with developing the final document. Where
expertise cannot be identified in time, materials are simply drawn from the central
document repository and adapted as necessary. We describe two support tools
developed to help improve the quality of collaborative documents as well as
improve the efficiency of the document development process. The first prototype,
Document Development Toolkit, is designed to help knowledge workers find
relevant information (and expertise) for the task at hand. The second prototype,
Document Development Workspace, is designed to help large project teams work
collaboratively, supporting the development of documents.
9.3.1 Document Development Toolkit
We developed the Document development toolkit as add-ins to commonly used
document creation tools, specifically Microsoft Word and PowerPoint. The main
motivation for this decision was to support the users in the context of the document
they’re developing and to embed the support tools in the document creation process
as opposed to in a separate application that the users would have to use. The
document development toolkit consists of the following features:
1. Search and visual previews from within Word and PowerPoint (Document,
Graphics, and Expert Search)
2. Search for experts, communicate with them using MS Office Communicator
and adding them to the project team
3. Auto-Suggestion of search terms: Based on the local position of the mouse
cursor in the Word/PowerPoint document a set of keywords is extracted from
the current working position and search-terms are auto-suggested
4. Search refining based on personalized clusters: Same backend functionality as
the context mining and detection in SABLE except the front-end shows the
current context and most similar contexts in a simple word cloud. We had to
modify the front end of context visualization and switching into something more
compact due to screen space limitations in MS Office add-ins
5. Personalized clusters (based on the same context mining described earlier)
6. Interactive document Scrubbing/Redaction functionality to remove client confi-
dential information before the document can be shared
7. Team building, section assignment, and document development workspace
creation
The Document Development Toolkit was designed to help knowledge workers
find the right information (and expertise) for the task at hand. The toolkit is imple-
mented as an add-in for MS Office applications (MS Word and PowerPoint
specifically) and is illustrated in Figs. 9.4–9.6.
9 Machine Learning and Lightweight Semantics to Improve Enterprise Search 179
The Interactive Redaction Tool shown in Fig. 9.7 addresses the problem of
sharing documents with client sensitive information and encouraging knowledge
workers to contribute content to the enterprise repository at a much larger scale.
An algorithm for document redaction described in Cumby and Ghani (2011)
enables document anonymization in a bottom-up fashion. The approach attempts
Fig. 9.4 MS Word and MS PowerPoint with document search (top) and context visualisation in
a word cloud with eight closest neighboring contexts displayed (bottom)
Fig. 9.5 Document development toolkit tabs. From left to right: document search, expert search,
graphics search
180 R. Ghani et al.
to optimally perturb a document to maximize the classification error for the
sensitive client identity within the set of potential clients for a document (the
confusion set) which is known in advance. When a document is loaded, the add-
in builds a word frequency vector of the existing text and returns the most likely
Fig. 9.7 Screenshot of
the interactive redaction
component integrated into
the document development
toolkit. The user analyzed
a document about Ford
and several sensitive terms
have been highlighted for
inspection
Fig. 9.6 Word cloud based on context detection implemented in the document development
toolkit
9 Machine Learning and Lightweight Semantics to Improve Enterprise Search 181
client classification based on a Naive Bayes model, along with the words suggested
to redact in order for the document to resemble membership to other client classes.
The user can adjust a slider to change the highlighted list of terms to redact. It also
includes simple named-entity and template-based recognizers for social security
numbers and other personal identifiers which are often necessary to remove when
scrubbing a document for submission.
9.3.2 Collaborative Document Development Workspace
The Collaborative Document Development Workspace is designed to help large
project teams work collaboratively during the rapid development of documents. It is
built on the Semantic Media Wiki (SMW) and allows each team member to
continuously contribute their expertise to the document development process. The
prototype automatically analyses a document to identify a possible response docu-
ment outline and automatically populates it with relevant content retrieved from
the enterprise-wide knowledge base. Using the SMW collaborative environment,
the project manager can quickly organize tasks and meetings with the appropriate
experts, and keep track of deadlines. We have developed the following capabilities
in the Document Development Workspace:
• Dynamically import data that is relevant to the current document (information
about people, credentials, products, similar documents) in to the new workspace
• Allow task allocation, tracking, and management for the project team
• Enable a project manager to set-up a process or import a process that team
members need to follow using a visual process editor
• Enable connection between the knowledge stored in the workspace and the tools
consultants use to develop the document (typically MS Word and PowerPoint)
allowing them to easily access the knowledge and facts stored in the workspace
• Automatic configuration of a new project development workspace, templates for
adding project specific information from enterprise data sources
• Customized for document development with templates and forms enabling
functions such as adding project workspace, adding sections, adding and orga-
nizing meetings, adding and organizing tasks, calendar and timeline view, task
list, etc.
• Dynamic data population from live enterprise sources. These data and properties
are queried via an inline query language provided by SMW and the results
rendered into the different visualizations (e.g. searching for team members
with certain skill proficiency for including them in the team)
• Import capabilities from SMW to MS Word for facts, sections, offerings, etc.
through Office Smart Tag Wiki extension
• A visual process visualization and editing tool for the SMW allowing project
managers to represent and visualize formal processes their teammembers should
follow
182 R. Ghani et al.
9.4 Evaluation
We conducted a series of experiments and user tests for the prototypes described in
this chapter. In this section, we focus on describing the results of our evaluation of
the enterprise search tool SABLE. The final release version was deployed in
Accenture and made available to all of the enterprise users. We had over 2,000
users in the first 6 weeks of this release. The results we describe below come from
(1) analyzing usage logs of the SABLE application, (2) a survey that was answered
by 104 users, and (3) a task-based validation session where a group of 20 con-
sultants were asked to perform a series of tasks and respond to questions about
SABLE.
9.4.1 Web-Based SABLE Survey
As the first step of the validation activity we conducted a web-based survey that was
taken by 104 users.
A summary of the results is given below and shown in Fig. 9.8. See Sect. 9.4.2
for an explanation of the questions:
• 89% of respondents agree that they would like to use SABLE frequently and
82% agree that it is easy to use
• Compared to standard enterprise search tools, 86% find SABLE easier to use,
67% believe it gives better results and 71% believe it gives them relevant search
results faster
• Regarding the Graphic Search Feature, 84% of the users believe that they find
the feature useful for finding relevant information, 80% believe it enables them
to get relevant information faster and 76% agree it gives them better results
Fig. 9.8 SABLE survey taken by 104 users. See Sect. 9.4.2 for an explanation of the questions
9 Machine Learning and Lightweight Semantics to Improve Enterprise Search 183
• Regarding the personalized clusters, 62% of the users believe the feature is
useful for finding relevant information 54% reported they believe the feature
enables faster (more efficient) search and 55% believed that the feature give
them better search results
• Regarding the Visual Filters feature, 66% of the users believe the feature is
useful for finding relevant information, 58% that it enables faster (more effi-
cient) search and 57% say that it enables better search results
• Regarding LiveNetLife as a Collaboration feature, 71% of the users believe it’s
useful for connecting them with other users looking for similar information,
while 55% agree that the feature might help to get relevant information through
other users
9.4.2 Task-Based Validation for SABLE
In addition to the large-scale survey, we also conducted a more focused task-based
test with 15 “power” users. Based on the requirements gathering process and
interviews with potential users, we designed five information needs that typical
consultants in Accenture would have. We asked 15 users to attempt to fulfill these
information needs using SABLE. The survey conducted for each task asked the
users about the utility of SABLE overall for this task, compared to the standard
enterprise search tools.
The tasks we asked the users to perform were:
Task 1: Find an architecture diagram or solution overview for a content manage-
ment system
Task 2: Find a credential for Biometrics that shows innovative ideas
Task 3: Find training material on the use of social media for internal employees
Task 4: Find vendors or alliance partners that work on BI
For each task, we asked the users to tell us if SABLE helps them find information
faster (Speed) and if it helps them find better information (Quality). After finishing
the tasks, we asked them 17 questions. The first two (Q1 and Q2) asked them if they
would like to use the SABLE frequently and if they found it easy to use. The next
three (Q3, 4, 5) asked them to compare SABLE to the standard enterprise search
engine in Accenture in terms of ease of use, information quality, and speed of
getting information. The next 12 questions focus on specific new features of
SABLE (Graphics Search, Personalized Clusters, Search Collaboration, and Visual
Filters) and asked users if they found the features useful for finding relevant
information, if it gets them relevant information faster, and if it enables better
search.
Some results are listed here and shown in Fig. 9.9:
Q4:100% of users agree that SABLE gives them better search results than the
standard Accenture enterprise search engine
Q5: 83% agree that it makes the search process faster and saves them valuable time
184 R. Ghani et al.
Graphic search (Q6, Q7, Q8): 100% believe it enables them to get relevant
information faster, 82% agree that it gives them relevant information faster,
and 72% agree it gives them better search results
Personalized clusters (Q9, Q10, Q11): 67% believe the feature is useful for
finding relevant information, enables faster (more efficient) search, and enables
better search results
LiveNetLife as a tool for search collaboration (Q12, Q13, Q14): 67% believe it’s
useful for finding relevant information
Visual filters (Q15, Q16, Q17): 80% users believe the feature is useful for finding
relevant information, 80% that it enables faster (more efficient) search
9.4.3 Usage Log Analysis for SABLE
We collected extensive usage data on SABLE. We describe our analysis of that
usage data in this section. The data we report on was collected between January 11,
2011 (the full release date) and February 23, 2011. In that period, 2,064 unique
users used SABLE. Some high level statistics are shown in Figs. 9.10–9.12.
We use the usage logs to measure the impact of using context mining and
detection to re-rank search results. We track each use of the “personalized context”
feature and calculate the change in rank of every document that was viewed
(clicked) by the user. For example, if a user moves the “red ball” to a different
context and then views the document at rank 4 that was previously at rank 14 (with
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Task 1 SpeedTask 1 QualityTask 2 SpeedTask 2 QualityTask 3 SpeedTask 3 QualityTask 4 SpeedTask 4 QualityTask 5 SpeedTask 5 Quality
Q1Q2Q3Q4Q5Q6Q7Q8Q9
Q10Q11Q12Q13Q14Q15Q16Q17
Strongly Disagree
Disagree
About the same
Agree
Strongly Agree
Fig. 9.9 Results of task-based evaluation of ACTIVated SABLE
9 Machine Learning and Lightweight Semantics to Improve Enterprise Search 185
standard search), the change in rank is 10. We notice that about 66% of the
document views did not result in a change in rank. Figure 9.13 shows the “Change
in Rank” distribution for the non-zero values. The average change in rank (exclud-
ing those that did not result in any change) is 14.5 (out of 100 top results shown in
the search) which shows that when results are re-ranked using context mining and
detection, the documents viewed are ranked on average 14.5 places (14.5%) higher
than they were in the initial search results. This clearly shows that the re-ranking
approach improves search experience and results in giving users not only a faster
way to get relevant results but also reduces the likelihood of missing relevant
documents if they are lower down in the search results.
Fig. 9.10 Visit distribution for users
Fig. 9.11 Query distribution for users
186 R. Ghani et al.
9.5 Summary
We have described several prototypes that were developed to tackle key challenges
in enterprise search and knowledge management. These prototypes used a combi-
nation of machine learning and lightweight semantics to make generic knowledge
management tools context and task sensitive in order to increase the productivity of
Fig. 9.13 Distribution of change in rank after using personalized context feature for the
documents viewed by users. This only includes the distribution for cases where the change in
rank was non-zero. About 66% of documents viewed did not have a change in rank
1
10
100
1000
10000
0 10 20 30 40 50 60 70
Nu
mb
er o
f U
Ser
s
Average Number of Documents Previewed
Fig. 9.12 Distribution of number of documents previewed per session
9 Machine Learning and Lightweight Semantics to Improve Enterprise Search 187
knowledge workers. We focused on two enterprise problems: (1) Enterprise search
and (2) Collaborative Document Development. We describe an enterprise search
tool employing context mining, process mining, and visualization technologies that
use a combination of bag of words and lightweight semantic representations,
making users more productive and efficient when performing enterprise search.
We also describe two support tools to help improve the effectiveness and efficiency
of knowledge workers collaboratively creating documents. All three of these were
tested extensively at Accenture and two have been deployed and made available to
over 150,000 Accenture employees. Our evaluation results show that these
prototypes are effective at making consultants at Accenture more efficient as well
as helping them find better information while being easier to use than existing
knowledge management tools.
Acknowledgements The research leading to these results has received funding from the European
Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement IST-2007-
215040.
References
Cumby C, Ghani R (2011) A machine learning based system for semi-automatically redacting
documents. Proceedings of the 23rd innovative applications of artificial intelligence confer-
ence, IAAI 2011, San Francisco, CA
Cumby C, Probst K, Ghani R (2009) Retrieval and ranking of entities for enterprise knowledge
management tasks. Semantic search workshop at WWW2009, Madrid, Spain
Djordjevic D, Ghani R (2010) Graphics classification for enterprise knowledge management.
ICDM workshops 2010, Sydney, Australia, pp 562–569
Jansen BJ, Spink A (2006) How are we searching the world wide web? A comparison of nine
search engine transaction logs. Inf Process Manage, 42(1). Formal methods for information
retrieval, Jan 2006, pp 248–263
Joachims T, Granka L, Pan B, Hembrooke H, Radlinski F, Gay G (2007) Evaluating the accuracy
of implicit feedback from clicks and query reformulations in web search. ACM Trans Inf Syst
25(2), Article 7 (Apr 2007)
Manning CD, Raghavan P, Sch€utze H (2008) Introduction to information retrieval. Cambridge
University Press
Mukherjee R, Mao J (2004) Enterprise search: tough stuff. Queue2(2) (Apr 2004)
Settles B (2009) Active learning literature survey. Computer sciences technical report 1648,
University of Wisconsin-Madison
Wang X, Zhai CX (2007) Learn from web search logs to organize search results. Proceedings of
the 30th annual international ACM SIGIR conference on research and development in infor-
mation retrieval, Amsterdam, Netherlands
188 R. Ghani et al.
10
Increasing Predictability and SharingTacit Knowledge in Electronic Design
Vadim Ermolayev, Frank Dengler, Carolina Fortuna, Tadej Stajner,Tom B€osser, and Elke-Maria Melchior
10.1 Introduction
In knowledge-intensive sectors of industry knowledge workers (Drucker 1969) are
central to an organisation’s success – yet the tools they must use often stand in the
way of optimising their productivity. A remedy to the defects of current knowledge
worker tools has recently become substantially in demand across industries. For
example in the ACTIVE project, the three case studies in consulting, telecommu-
nications and engineering design have been driven by this requirement. Knowledge
workers acting alone but more importantly in teams that can be distributed geo-
graphically and organizationally are of a particular concern and focus in our
research themes. One of the themes is the support for informal process knowledge
acquisition, articulation and sharing.
To this theme the notion of an informal (or knowledge) process is central. The
definition by Warren et al. (2009) lays out the ground for the specific features:
“Informal processes are carried out by knowledge workers with their skills, experi-
ence and knowledge, often to perform difficult tasks which require complex,
informal decisions among multiple possible strategies to fulfil specific goals.
V. Ermolayev (*)
Zaporozhye National University, 66 Zhukovskogo st., 69600 Zaporozhye, Ukraine
e-mail: [email protected]
F. Dengler
Karlsruhe Institute of Technology, B. 11.40 KIT-Campus S€ud, D-76128 Karlsruhe, Germany
e-mail: [email protected]
C. Fortuna • T. Stajner
Jozef Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia
e-mail: [email protected]; [email protected]
T. B€osser • E.-M. Melchior
kea-pro, Tal, Spiringen CH-6464 Switzerland
e-mail: [email protected]; [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_10, # Springer-Verlag Berlin Heidelberg 2011
189
In contrast to business processes which are formal, standardized, and repeatable,
knowledge processes are often not even written down, let alone defined formally,
vary from person to person to achieve the same objective, and are often not
repeatable. Knowledge workers create informal processes on the fly in many
situations of their daily work”.
ACTIVE has adopted a service-oriented and component-based approach to its
architecture. Services and components are defined at a number of levels (Warren
et al. 2009). At the bottom level are infrastructure services. At the level above this,
machine intelligence technology is used. For example the process mining service
learns repeated sequences of action executions which constitute running processes
and populates the knowledgebase with these. Finally at the top level are the
applications. One of the case study applications is the management of design
project (DP) knowledge in microelectronic and integrated circuit (MIC) engineer-
ing design. This case study is lead by Cadence Design Systems GmbH (www.
cadence-europe.com), an engineering design services provider in this domain.
It goes beyond the existing performance management solutions by providing the
functionalities of the following two kinds: (1) at the back-end, the learning of
design process execution knowledge from distributed datasets of acquired knowl-
edge; and (2) at the front-end, design project knowledge articulation and sharing –
by providing a lightweight collaboration platform.
The remainder of the chapter is structured as follows. Section 10.2 surveys
the related work in informal process representation, mining and extraction, articu-
lation and sharing. It also outlines the unsolved problems that are further addressed
in our work. Section 10.3 presents briefly the ACTIVE approach to informal
process acquisition, articulation and sharing that helps the knowledge workers
in engineering design navigate their projects. Section 10.4 elaborates on the
architecture and implementation of our fully functional software prototype. Section
10.5 presents the plan for and the results of the validation of the implemented
software in an industrial setting. Section 10.6 discusses the results and draws some
conclusions.
10.2 Related Work
The research and development work presented in this chapter provided
contributions in several interrelated aspects relevant to managing and productively
using informal process knowledge. Our contributions implemented and integrated
in the software prototype (Sect. 10.4) comprise: the representation of informal
process knowledge in the form of ontologies; the methods for informal process
mining and extraction from process logs; the methods for informal process knowl-
edge articulation and sharing using a visualization and superimposition approach.
This section analyses how our results are positioned relative to other work in these
directions.
190 V. Ermolayev et al.
10.2.1 Process Knowledge Representation
The mainstream in process modeling is represented by enterprise and business
process representations – in the form of ontologies or languages. Among the
ontologies the following results have to be mentioned: the Enterprise Ontology
(Uschold et al. 1998), Toronto Virtual Enterprise Ontology (TOVE) (Gr€uningeret al. 2000), and more recently the theoretical work by Dietz (2006) and the
reference ontology for business models developed in Interop, an EU Sixth Frame-
work Programme (FP6) Network of Excellence (Andersson et al. 2006), see also
www.interop-vlab.eu. The business process modeling community has developed a
variety of languages with the major objective of representing business processes as
the executable orchestrations of activities. The most prominent examples of such
languages are: PSL (Bock and Gr€uninger 2005), BPEL and more recently
WS-BPEL (docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.pdf), BPML and
more recently BPDM (www.omg.org/spec/BPDM/1.0/). A more comprehensive
approach to semantic business process modeling and management has been devel-
oped in the FP6 SUPER project (Hepp and Roman 2007). A major shortcoming of
the listed results is that they are not supposed to provide a means to model informal
processes as denoted in the definition by Warren et al. (2009).
One of the relevant approaches to modelling and representing informal processes
has been developed in the FP6 Nepomuk project (Grebner et al. 2006). A short-
coming of the process representation in Nepomuk ontologies is the limitation of the
scope only to the tasks performed on the computer desktop.
Our approach to informal process representation builds on the work in dynamic
engineering design process modeling of the PSI and PRODUKTIV þ projects
(Ermolayev et al. 2008). Our contribution in ACTIVE lies in the development of
the lightweight knowledge process representation for engineering design that is
essentially a micro-ontology providing a simplified yet sufficiently expressive view
of a design process to be visualized for articulation and sharing. This micro-
ontology is aligned with the ACTIVE Knowledge Process Model (Tilly 2010)
through the PSI Upper-Level Ontology where the latter is used as a semantic bridge
(Ermolayev et al. 2008).
10.2.2 Informal Process Knowledge Mining and Extraction
Process mining is a set of data mining techniques, focused on constructing process
models out of a large number of events. The purpose of these techniques is to
discover process, control, data, organizational and social structures from event logs.
Practical usefulness of process mining in our setting is twofold. Firstly, it allows
inferring a process model when such a model did not exist in an explicit form.
Secondly, it allows devising alternative models to the primary one to enable compar-
ison of different possible interpretations with regard to complexity and observe the
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 191
extent to which the primary model is being followed. Knowing the differences
between the actual process model and the mined process model is crucial when
optimizing the process. In the case of microelectronic design the most interesting
part of the structure is the process itself. Although process mining in general also
considers more organizational and social structures (van der Aalst and Song 2004),
the design process for a particular design artifact is focused mainly around a single
knowledge worker and his design decisions. The knowledge process which we are
trying to uncover is a product of a designer’s experience and intuition and is rarely
explicitly documented. This makes it valuable to ensure productivity and at the
same time difficult to capture manually.
There exist several different approaches to process mining each producing
models of varying expressivity. The selection of an appropriate model is influenced
by the properties of the event log and the expressivity requirement of the process
model. However all process models are based on the notion of states – at any point
in execution the process resides in some state. Multiple-actor models permit several
simultaneous states although in the electronics design domain a single process
instance is usually executed by a single designer. Another differentiating point
between various classes of process models is the semantics of the transition
between states which affects the expressivity of the model.
In informal knowledge processes, the states are often not well-defined which
requires solving this issue before tackling the process mining problem. One
approach that we incorporated in knowledge process mining software from other
application domains was to perform clustering on event logs and use the clusters as
proxies for states (Stajner et al. 2010) controlling the complexity of the process
model via the desired number of clusters.
In terms of transition modeling the most straightforward approaches consider
the Markovian assumption: each transition to a new state is dependent only
on the previous state. Usually the transition probabilities are statistical estimates
of the conditional probability of one state directly following another (Hingston
2002). We have explored this approach in related knowledge worker scenarios and
discovered that simple Markovian models work well for very fine-grained low-level
events (Stajner and Mladenic 2010). A side effect of using such models is that they
tend to have many states and transitions which make them difficult to interpret.
Because of this we often resort to de-noising the model by pruning the transitions
which we consider to have little information. For this purpose Probabilistic Deter-
ministic Finite Automata are often used, for which statistically well-founded
techniques for determining significant transitions are available (Jacquemont et al.
2009).
In environments where minute variations in activity order are not critical these
can be further relaxed to the conditional probability of one state following another
within a time window. Such a relaxation results in a slight decrease in expressive-
ness since a transition only means that a particular event has occurred within a
given time window before some other event. However this payoff avoids too much
sparsity especially when we are constrained by having many distinct activities in a
relatively short event log.
192 V. Ermolayev et al.
When higher expressiveness in control structures is required we can consider
using the family of process mining techniques based on Petri nets implemented in
the ProM framework (van Dongen et al. 2005). This approach provides for
modeling patterns beyond the Markovian assumption, allowing logical structures
such as conjunctions, disjunctions, splits and joins. Although the compromise is
that they do not operate probabilistically an important benefit of Petri net-based
approaches is that the models can also be transformed into extended Event Process
Chain (eEPC) diagrams which are more familiar to process analysts and more
amenable to comparison with formal process models.
All of the aforementioned models can be expressed with particular subsets of PSI
ontology terms. In that sense the PSI Suite of Ontologies provides the common
knowledge representation formalism for knowledge integration, fusion and
visualization.
10.2.3 Informal Process Knowledge Articulation and Sharing
The spiral of knowledge (Nonaka and Takeuchi 1995) introduces different knowl-
edge conversions which is a fundamental part of sharing knowledge. People can
share tacit knowledge with each other (socialization), but this is a rather limited
form of sharing knowledge. Knowledge articulation within companies is the pro-
cess of making tacit knowledge explicit (externalization). This explicit knowledge
can be combined with other explicit knowledge (combination) and shared through-
out an organization. Other employees extend and reframe their tacit knowledge
with explicit knowledge by internalizing it (internalization). There are different
ways to articulate and share informal process knowledge, but in all cases the
informal process knowledge has to be made explicit. For instance process knowl-
edge can be visualized manually or with tool support. In contrast to the visualiza-
tion approach, the process knowledge can also be stored and shared within the
system by using it directly for recommendations (Dorn et al. 2010).
10.2.4 Articulation and Sharing using Visualization Approach
It is natural for a human to use visualized representations of artifacts in general and
of processes in particular. Research in psychology, human memory models, image
recognition and perception reveals that graphical representations are comprehended
much easier and with lower effort than equivalent textual ones (Crapo et al. 2000).
Therefore process visualization is one of the mature instruments to articulate
processes thus enabling users to easily understand the logic of a process.
Most process visualization techniques are included in process modeling
activities, which can be centralized or decentralized. An abundance of modeling
methods and tools like ARIS (Scheer and Jost 2002) and IDEF3 (Mayer et al. 1995)
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 193
have been developed to ease the standardization, storage, and sharing of process
visualization. Unfortunately these tools are not sufficient for modeling collabora-
tive, decentralized processes. Therefore other approaches like CPM (Ryu and
Y€ucesan 2007) have been introduced.
In the area of knowledge processes additional methods and tools like KMDL
(Gronau et al. 2005), PROMOTE (Woitsch andKaragiannis 2005) andCommonKADS
(Schreiber et al. 1999) have been developed extending themethods and tools mentioned
above. In addition, semantic wikis combine the collaborative aspects of wikis (Leuf
and Cunningham 2001) with Semantic Web technology to enable large-scale and inter-
departmental collaboration on knowledge structures. Such features of semantic wikis
have been extended to support process development (Dengler et al. 2009), enterprise
modelling (Ghidini et al. 2009) and workflows (Dello et al. 2008).
Our contribution in process visualization is the enhancement of the existing
Semantic MediaWiki (SMW) process development approach (Dengler et al. 2009)
to visualize and discuss informal processes.
Our project navigation approach is based on offering a collaboration platform to
knowledge workers that facilitates socializing, externalizing and internalizing
design project knowledge using visualization. Visualizaton of project knowledge
helps to combine and internalize explicit project knowledge in a way that suggests
productive continuations of project execution.
10.3 The Approach to Project Knowledge Navigation
The goal of the presented case study is providing a software tool for design project
managers in MIC that will articulate and facilitate sharing knowledge about good
development practices in this domain.
An objective of a project manager as a knowledge worker is finding a reasonablebalance between the available and the achievable in order to meet the requirementsof a customer and accomplish development in his project with the highest possibleproductivity. The complexity of this task in modern design environments is beyond
the analytical capabilities of even an experienced individual. A manager has to find
an optimum in a solution space that has many facets: product structure comprising
possibilities for block reuse; the compositions of the development team involving
required roles and capabilities of the available individuals; the choices of the tools
for performing design and corresponding design methodologies; the resources
available for the project; project constraints and business policies, etc. One more
complication may appear in the course of the execution of the project – the
circumstances may change because of external events. Hence a previously good
plan may turn out to be not acceptable for the follow-up. Re-planning may therefore
be required at any moment.
Project managers use their working experience and intuition for taking planning
decisions under these complex conditions. In fact they rely on following good
practices and using the suggested development methodologies that they used in
194 V. Ermolayev et al.
the past and which constitute their tacitworking knowledge of project management.
Our working hypothesis in this research was that offering a software tool for:
• Eliciting good development practices as stable working patterns from the design
project logs
• Visualizing those practices of past projects, the plan and the state of the execu-
tion of a current project
• Facilitating moderated discussions among the members of the development team
on different aspects of a project
will decrease the complexity of making decisions for a project manager and
increase the robustness of his knowledge work. Such a tool would essentially
make the tacit knowledge of project managers within a company explicit – i.e.
articulate and facilitate sharing good project management and engineering design
practices.
For checking this hypothesis the software tool prototype of a Design Project
Visualizer has been developed in the case study. The tool implements a project
navigation metaphor – helping a knowledge worker find a productive execution
path through the state space of an engineering design project.
It is known for informal processes in general and the processes of engineering
design in particular that the paths to a desired outcome cannot be specified in
advance – before the process starts. Instead, a knowledge worker has to make his
decision about a follow-up action by choosing among the possible continuation
alternatives in an arbitrary process state – very similarly to the decisions made by a
driver on the road. Drivers use navigation systems that suggest the ways to go for
bypassing traffic jams, choosing a faster or a cheaper way. A similar approach is
employed in our work for helping a project manager to make decisions about the
continuations of his design project.
The Design Project Visualizer, like a car navigation system, provides the
visualized views of the basic “terrain” map. These views are product structures,
methodology flows that are either generic or bound to a particular product structure,
Work Breakdown Structures (WBS). These representations are essentially provided
by a project manager in a top-down fashion when he plans and kicks-off the project.
The Design Project Visualizer also assists in finding out where the project is on
the “terrain” at a specific point in time. The knowledge about the execution of the
project is mined from the available project log datasets, transformed to the terms of
the used ontology, stored to the knowledgebase, and superimposed onto the project
execution plans.
Unlike a car navigation system the Design Project Visualizer is a tool for team
work. It provides the infrastructure and the functionality for moderated discussions
attached to a visualized representation of any kind of a project constituent. By that it
facilitates making more informed decisions that are also more transparent to the
team members and are elaborated and approved with their active participation.
For constructing the necessary building blocks of the project maps and execution
tracks we first looked at the tasks of the project managers in their everyday work
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 195
and extracted the typical tasks of the project planning and execution management
that may be effectively facilitated. Those typical user tasks (Fig. 10.1) are
• Analyze the requirements and develop the structure of the product
• Choose development methodology and compose the team
• Develop the work breakdown structure
• Monitor the execution of the project
After extracting the typical tasks we decided about the requirements for the
functionality of the software tool. The requirements were elaborated by looking at
the working practices of a project manager in MIC design and extracting the use
case scenarios.
10.4 Prototype Architecture and Implementation
The architecture of the fully functional prototype of the ACTIVE Design Project
Visualizer is pictured in Fig. 10.1. It comprises both the back-end and the front-end
components and involves several ACTIVE technologies. Design process knowl-
edge acquisition is done mainly at the back-end while the functions of knowledge
articulation and sharing are offered by the front-end. As shown in Fig. 10.1, the
prototype helps users to perform their typical tasks of design project management.
Therefore it could be classified as a project management tool. The tool monitors the
environments of the managed engineering design project, that are design systems,
DP Execution Instances
Design Process
Temporal Probabilistic
Process Model
Cadence Knowledgebase PSI
Process Mining and Extraction Component
AKWS Server
DP Execution Logs
ACTIVEFront-end
Design Environment
Design Process
Design Environment
DP Monitor and Data Collector
Design Projects
Collaborative Platform (SMW)
DP Visualizer
Project MapMO
MO
Typical Tasks
Develop Product Structure
Select Methodology
Compose Development Team
Generate WBS
ACTIVE Back-end
CadenceFront-end
CadenceBack-end
WBS
MO
Track DP Execution
DP Monitor and Data Collector
…
SMW Connector
static top-down knowledge DP representation instances(top-down and bottom-up)
dynamic bottom-up knowledge
Cadence Project Management Tools
(CFI Framework, ProjectNavigator)
Legend: – ACTIVE component; – ACTIVated component; – Cadence component
– back-end component; – front-end component
Project Execution
Trace
Fig. 10.1 The configuration of ACTIVE and Cadence technology components for design project
knowledge acquisition, articulation and sharing
196 V. Ermolayev et al.
and allows for run-time extraction of the process-related knowledge in a bottom-up
fashion. The normative, methodological and static parts of project knowledge are
provided via the external tools in a top-down manner. The external tools are the
ProjectNavigator and the Cadence Flow Infrastructure (CFI) Framework – c.f.
Fig. 10.1. Hence, the prototype exploits the superimposition of the top-down and
bottom-up project knowledge for making its articulation and sharing more efficient
and effective.
Acquisition is done by incremental collection of the new knowledge about the
executions of design processes through monitoring design processes in their
environments and mining the dataset containing design process execution logs –
using the ACTIVE Process Mining component based on the probabilistic temporal
process model (TNT) (Grobelnik et al. 2009). The approach to process mining is
based on the generation of the Hidden Markov Models (Rabiner 1990). As outlined
in Fig. 10.1 the ACTIVated Design System Framework tools monitor the design
environments and the design processes and collect the data about the low level
events in the respective datasets. The datasets are further fed to the Process Mining
Service of the ACTIVE Knowledge WorkSpace (AKWS) Server that produces the
instances of the segments of the executed design processes in terms of the PSI Suite
of Ontologies. These instances are further stored in the Cadence Knowledgebase.
Articulation and sharing are done by visualizing different facets of DP knowl-
edge in the collaborative front-end using the SMW (Kr€otzsch et al. 2007) as a
platform – the ACTIVE DP Visualizer. Visualization functionality is structured
around the typical tasks that DP managers perform in their everyday business
(upper part of Fig. 10.1). The kinds of visualization pages are those for: product
structures; generic methodologies; product-bound methodologies1; tools; and actor
roles. These primary functionalities are supported by decision making instruments
for conducting moderated discussions – the discussion component as an extension
of SMW and LiveNetLife (www.livenetlife.com), an application for contextualized
interactive real-time communication between the users of a web site.
10.4.1 Knowledge Representation
A challenge in the development of knowledge representation for the case study and
the software prototype of the Design Project Navigator was finding a proper balance
between:
1A product bound methodology is a superimposition of the segments of the generic methodologies
appropriate for the particular types of functional blocks in the structure of the product to be
designed.
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 197
• The background result, the PSI Suite of Ontologies for MIC Engineering Design
domain used at Cadence, and the model of a knowledge process (KPM)
developed in ACTIVE (Tilly 2010)
• The expressive power of the knowledge representation of the Cadence knowl-
edge base (PSI Ontologies) and the lightweight character of the enterprise
knowledge structures developed in ACTIVE caused by the lightweight character
of the SMW used for the prototype development
The first aspect was essentially a harmonization problem. For harmonizing the
KPM with the PSI Suite of Ontologies the PSI Upper Level ontology (PSI-ULO)
has been used as a semantic bridge (Ermolayev et al. 2010b). Please refer to isrg.kit.
znu.edu.ua/ontodocwiki/ for the online documentation of this suite of ontologies.
The harmonization process was bidirectional. On one hand the Suite of PSI
Ontologies has been refined by cleaning the representations of process patterns
and processes. This work led to fixing the v.2.3 release of the PSI Suite. On the
other hand the KPM has been revised by aligning it to the PSI-ULO. This work led
to the final release of the KPM (Tilly 2010).
The second problem was the selection of the minimal required part of the PSI
Core Ontologies v.2.3 as the lightweight background knowledge representation for
the Design Project Visualizer. For that the requirements based on the analysis of the
typical user tasks and use cases have been applied and resulted in the development
of the micro ontology for the case study.
10.4.2 The Back-End: Process Mining and Extraction
The actual implemented workflow is as follows: first, the designer’s workstation is
instrumented with logging tools that capture his activities, their outcome and
measure the time that was required to complete those activities. Once the data is
exported the process mining software loads the sequence logs and constructs a
process model based on probabilistic deterministic finite automata.
The fitness for inclusion of a transition between two generic tasks in the process
is evaluated using the following procedure: given an error rate and sample size,
we use a statistical sequence mining technique to determine constraints for inclu-
sion of individual transitions in the process model as presented in Jacquemont et al.
(2009). We apply a criterion called proportion constraint. Given a desired risk
factor and an actual process execution log, we can compute the empirical probabil-
ity of every possible transition from one state to another. This can then be used as a
basis for determining whether every transition in the process is statistically signifi-
cant given the observed grounding in the process execution log. A benefit of using a
statistical approach is that the only parameter that the process analyst needs to
specify is the risk factor which corresponds to the expected false positive rate. This
parameter is easier to understand and specify than some arbitrary probability
threshold. We have found that pruning the model following this approach does
198 V. Ermolayev et al.
not affect the predictive power too adversely, while significantly reduces complexity.
Following that, the software outputs the two sets of results:
• The process pattern model expressing process state transition patterns as Generic
Tasks and Generic Activities – action patterns. The model also specifies statistical
dependencies between individual action patterns as possible output and input
configurations.
• The actual design process instances in terms of Tasks, States and Activities to
which concrete Actor and Design Artifact representation instances are related.
An instance of the whole design process is considered a top-level Task managed
by the Actor. The top-level Task comprises the lower-level Tasks for each pass
of the design process. The passes are further decomposed into atomic steps that
are represented as leaf-level Tasks and Activities. After the execution of each of
these steps the resulting Design Artifact Representation becomes different
reflecting the fact that each step of the process brings the process closer to the
target final Representation – the tape-out of the Chip.
This output is the native input for the visualization and sharing infrastructure,
provided by the back-end functionality (SMW Connector, Fig. 10.1). At the same
time, this provides a common interface for the consumption of process models and
instances from other PSI-based design project and process management tools. From
the perspective of process instances the solution resembles to some extent the ProM
framework which defines the MXML format for expressing event logs. However
ProM does not prescribe any terms for expressing process models. We observe that
the use of the ontology, PSI Suite in our case, for expressing both process models
and process execution logs is preferable in terms of integration for the purpose of
informal process knowledge management.
Furthermore, the Miner tool allows more complex queries of the mined design
process steps, among which also queries related to times spent on particular steps,
iterations, etc. For instance, it can be seen which designers executed a high number
of activities (dcart in the example in Fig. 10.2) and wich artifact required these
activities (AAMP in Fig. 10.2). The tabular interface allows browsing for more
details in the results of a query – please refer to Fig. 10.2. Other information such as
CPU time, duration of time spent by the designers per artifact or per step is also
possible to query and visualize using the Miner interface. By checking the numbers
for errors logged in the design process we observe that the median value was zero
and the average value was around 2,600. However the maximum number of logged
errors was over 1.5 million per activity while the sum was almost 13 million errors.
The tool allows tracking down the project that generated these errors and the
designer that was executing it. At a closer inspection, it can be seen that these
errors occured during one iteration of the design of NV_RAIL. Details about the
most frequent activities as well as their sequence executed within that design
process can be visualized. These visualizations may further be used for taking
remedial actions in a design process.
In the example dataset the most frequent activity turns out to be Extraction_PRHO.
This activity has normal accomplishments (there was no forced manual or automatic
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 199
abortion), on average it takes about 30min to complete (1,700,869ms), lowCPU time
and low number of errors/warnings (Fig. 10.3). However, the second most frequent
activity – SI_Analysis_Signoff – generally does not have a normal termination, on
average demonstrates more errors and warnings, and on average takes over 3 h to be
completed (Fig. 10.4). Due to the abnormal exists the duration of this activity is often
not known (i.e. end time is not logged). The abnormal exits are correlated with
unknown durations of activities.
Fig. 10.2 Designers, design artifacts and development activities mined from the U.S. dataset
Fig. 10.3 Details for the Extraction_PRHO activity
200 V. Ermolayev et al.
10.4.3 The Front-End: Process Visualization and Discussion
The front-end – ACTIVE DP Visualizer – for process visualisation and discussion
of design projects is based on SMW and has been implemented by extending the
existing process visualization approach (Dengler et al. 2009) and by developing
additional result printers for SMW to visualize and export the WBS – namely the
Gantt chart and the XML export result printers that are the part of the Mediawiki
Semantic Project Management extension: (www.mediawiki.org/wiki/Extension:
Semantic_Project_Management). The screenshots of the characteristic features of
the DP Visualizer are shown in Fig. 10.5.
This extension builds on the capability to query for semantic properties which is
provided by SMW and displays query results as process graphs, Gantt charts or
XML-files containing the WBS in XML schema to be further imported into MS-
Project. For the back-end a software connector has been developed that imports the
knowledge stored in the Cadence Knowledgebase into the SMW pages (Fig. 10.1).
Each element of the micro-ontology is represented as a wiki page containing
annotated links to other wiki pages (properties). These properties are queried via an
inline query language provided by SMW and the result is rendered into the
destination format required by the different visualization libraries.
For supporting collaboration the functionality for working with talk pages has
been developed as another SMW extension (Fig. 10.6). Talk pages corresponding to
visualized project elements can be created to discuss pros and cons of product
structures, methodologies, WBS and project execution progress. This collaborative
discussion functionality has been enhanced with semantics to add metadata to each
comment and allow querying. Therefore special wiki templates (www.mediawiki.
org/wiki/Help:Templates) and semantic forms (www.mediawiki.org/wiki/Exten-
sion:Semantic_Forms) have been developed. A summary icon is used to display
corresponding discussion activities including the sum of pros and cons (Fig. 10.5)
within the product structure, methodology, and Gantt chart visualizations.
Fig. 10.4 Details for the SI_Analysis_Signoff activity
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 201
Product Structure visualization
Generic Development Methodology visualization
Work Breakdown Structure with superimposed execution status
a
b
c
Fig. 10.5 Characteristic features of the Design Project Visualizer (a) product structure
vizualization; (b) generic development methodology vizualization; (c) work break down structurewith superimposed execution status
202 V. Ermolayev et al.
10.5 Validation Setup and Results
The development of the fully functional prototype was conducted in an iterative
design process focussed strongly on user needs and on the organizational
requirements for the application. The chip design process is very demanding in
terms of adherence to detailed technical requirements and standards, consequently
we have to focus strongly on testing detailed user and design processes. Different
types of tests were carried out throughout the development process. Each software
version was tested repeatedly in a sequence of tests, where as a general rule the
subsequent level of test was carried out after the lower level test returned a
satisfactory result, usually after a number of iterations:
• T1. Dry Runs and Technical Appropriateness: Experts and user representatives
who are familiar with the application context assess if the software is bug-free
and consistent with the requirements and suggest design improvements.
Fig. 10.6 Collaborative discussion functionality for managing invitations and working with
arguments
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 203
• T2. Usability in representative user tasks: User representatives and experts assess
the software in representative tasks (Sect. 10.3) according to quality-of-use
criteria and conformance to requirements.
• T3a. Information quality for users: Experienced users assess the quality of
information generated by the application.
• T3b. User satisfaction, acceptance: Representative samples of users use the
application in a realistic working context. User satisfaction and indicators for
the likely acceptance of the application are measured.
The validation results described here are part of T2 of the final application
prototype, with a main objective to assure that the software is satisfactory and
acceptable as a tool for the typical tasks of the prospective user population. The
likely effect on productivity will be considered, but this is the main question to be
answered in the subsequent T3 test.
10.5.1 Validation Plan
The validation trials were planned in two phases and are defined by combinations
of: (1) a software component; (2) a representative user task used as a frame for the
validation; (3) an external tool used as a benchmark for the assessment; (4) a source
dataset. The summary of the validation plan is given in Table 10.1.
Generic validation workflows have been developed for all the kinds of validation
trials (example in Fig. 10.7). These workflows, though identical in their nature and
goals, differ in the use of validation metrics, instruments (kinds of the
questionnaires to be filled in) and collaboration patterns – some are for individual
execution while the other are for a group of collaborating trialists.
Within the validation phases the set of validation tasks have been specified based
on the typical user tasks. The generic workflows have been instantiated for each
validation task. For example the task of validating usability (T2, Fig. 10.7) has been
decomposed into four lower level tasks as pictured in Fig. 10.8. In turn each of these
lower-level tasks have been performed using different instantiations of the generic
validation workflow for T2 developed according to the validation scenarios of the
particular lower level tasks. One example is given in Fig. 10.9.
The summary of the front-end component validation plan at phase 1 is given in
Table 10.2.
The expert test persons either decided if specific requirements are met or not met
(yes, undecided, no, unable to answer question), or made quality assessment on a
7-valued scale. Three separate on-line questionnaires with 75 questions in total
were answered by each test person.
10.5.2 Validation Procedure
The objectives of the validation was to verify that the Design Project Visualizer
corresponds to the specified technical and functional requirements of users, and to
204 V. Ermolayev et al.
Test Componentin a Typical Task
Assess Usabilityin Typical Tasks
Fill-in the on-lineQuestionnaire
On-line UsabilityQuestionnaire
Trialist (Moderator)
Several Trialists(Participants)
Fig. 10.7 The collaborative generic workflow for usability (T2) validation trial
Table 10.1 Validation phases and types for different validated components
Validated
components
Dry
runs
(T1)
Usability
for
typical
tasks
(T2)
Information
quality for
users (T3a)
User satisfaction,
acceptance of solutions
(T3b)
Comment
Completeness Correctness
Phase 1: Validation based on the simulated data. Verification tool – ProjectNavigator
Back end A trialist assesses the
components by
performing a
typical task on a
simulated project.
DPE � � � � �Front end
SS þ þ * * �PV þ * * * �DC þ þ * * �LNL þ þ þ þ �Phase 2: Validation based on the U.S. dataset. Verification tool – CFI framework
Back end A trialist assesses the
components by
performing a
typical task on a
real project that
has been
accomplished and
logged in the past.
DPE þ � þ þ þFront end
SS � þ þ þ þPV � þ þ þ þDC � � þ þ þLNL � � þ þ þDPE design process miner and instance extractor, SS semantic search component, PV design
project visualizer, DC discussion component, LNL LiveNetLife component, “–” not validated, “*”
partially validated because the data is simplified and artificial (simulated project), “þ” validated
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 205
validate the quality of use of the prototype with focus on the appropriateness of the
functionality for the tasks of professional users.
The test was performed by six professional experts with different roles in
Cadence: Engineering director with profound expertise in design, verification and
implementation; project manager; design project manager; knowledge engineer.
The evaluation was based on a task scenario composed of four typical tasks with
several sub-tasks each (described above). The test users performed the tasks on their
own; the discussion task was carried out in cooperation with the other test users.
After the completion of each task an on-line checklist for testing conformance
Validation Phase 1 (Simulated Data)
(T2.1)Usability for
“Develop Product Structure”
(T2.2)Usability for
“Choose Development Methodology and form
Development Team”
(T2.3)Usability for
“Generate Work Breakdown Structure”
(T2.4)Usability for
“Monitor Project Execution”
(T1) Dry Run
(T2) Usability
for Typical Tasks
(T3a) Info Quality for Users
(T3b) User Satisfaction and
Acceptance
(T4) Field Test
Fig. 10.8 The hierarchy of validation tasks for usability validation (T2) at Phase 1. The lowest
level corresponds to the typical user task based validation
Test Project Visualizer
Assess Usability
Fill-in the on-line Questionnaire
On-line Usability Questionnaire
Trialist (Moderator)
Several Trialists (Participants)
Test Discussion Component
Test Semantic Search
Test LiveNetLife
Examine Product Structure in CDNS ProjectNavigator
Fig. 10.9 The instantiation of the generic validation workflow (T2) for the validation task T2.1
highlighted in Fig. 10.8
Table 10.2 Validated front-end components per a typical user task
Typical user task Semantic
search (SS)
Project
visualizer
(PV)
Discussion
component
(DC)
LiveNetLife
(LNL)
Front-end
Develop product structure * * þ þChoose development methodology
and form development team
* * þ þ
Develop work breakdown structure
for the project
� * þ �
Monitor the execution of the project � * þ �
206 V. Ermolayev et al.
related to the task was completed. After executing the entire task scenario each
expert completed an on-line questionnaire with questions about the quality of use of
the prototype and the usefulness of the functionality for the task scenario. The three
questionnaires were composed of standard and proven scales, and questions related
to the specific functionality and context of the chip design process.
The critical question to be answered is whether the prototype is sufficiently
mature for further, full scale tests. The experts testing the prototype are representa-
tive of the decision makers who will decide the acceptance, and thus their decision
determines the organizational acceptance of the solution.
10.5.3 Validation Results
10.5.3.1 Conformance with Technical Requirements
In total, 6 senior experts participated in the test. The resulting sample size was
therefore too small for a statistical analysis. Hence, we discussed the results on a
case-by-case basis putting focus on the justifications and explanations given by
these experts in their assessments of the prototype. Overall, five of six professional
experts have accepted and approved the Quality of the Design Project Visualizer,
and four experts approved the Completeness of the Design Project Visualizer. The
following reservations were formulated:
One expert did not approve the Quality of the prototype because the software
was not able “to fully visualize the complex structures, interfaces and correlations
of Design Projects.” Two experts did not approve the Completeness of the proto-
type because of a “lack of flexibility, agility and completeness” of design project
visualizations.
Several experts were undecided about the quality and correctness of the Design
Project Visualizer because some specific elements (for example interfaces between
the functional blocks within a design artifact) while considered essential were not
included in the simulated input data. Therefore quality and correctness could not be
proven in this respect. For instance four experts were undecided whether the Gantt
chart representation of Work Breakdown Structures indicates the progress of the
project appropriately.
10.5.3.2 Conformance with Functional Requirements for the DiscussionComponent
The majority of experts (4 out of 5) agree that the functionality of the Discussion
Component meets their requirements. One expert was unable to subscribe to
discussions and thus did not exercise all of the functionality. Some experts are
undecided about the functionality for summary boxes because the summary boxes
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 207
contained summaries for simulated design project data only. This may be an issue
for further investigation, or at least a further test with real data.
10.5.3.3 Quality of Use of the Design Project Visualizer
The opinions about the visual presentations of product, generic methodology and
Work Breakdown Structure descriptions are divided, ranging from somewhat
positive to somewhat negative. All experts but one disagree or are undecided that
the visual presentations are appropriate for performing the typical tasks of the task
scenario.
The information quality of the visualizations was doubted: “. . . a visual repre-
sentation can only represent a part or a high level view of the overall process or
working patterns”. The visualization of working patterns, dependencies, roles,
tools, etc. are “too fragmented and difficult to connect for a non-savvy project
manager”. “. . . the WBS or Gantt representation cannot capture the full content and
properties which are needed to perform a design activity”.
To conclude, the experts have raised doubts about the quality of use of the
Design Project Visualizer.
10.5.3.4 Quality of Use of the LiveNetLife Component
The quality of use of the LiveNetLife component is judged positively. However,
LiveNetLife is competing with tools which are currently in use at Cadence.
LiveNetLife would have to demonstrate a differentiating benefit.
It was observed that LiveNetLife does not compute the similarity with other
users very reliably, a feature which might be an added value over competing tools.
10.5.3.5 Conclusions
The results of the validation of the Design Project Visualizer indicate that the
solution can provide expert assistance to design project managers performing
the typical tasks of project planning and execution management. According to the
professional experts the conformance with requirements tested on the basis of
simulated data in the Design Project Visualizer meet the technical and functional
requirements. The critical statements of experts relate to added value. This must be
proven by the information quality, which is the innovative feature enabled by
semantic backend processes.
The reservations may also be due to inconsistencies in the Knowledge Base. The
next test will be conducted with a real data set. Apart from checking the information
quality the objective of this test will be evaluating if the prototype can handle large
and complex projects (scalability) – another important issue which must be
investigated further using real data.
208 V. Ermolayev et al.
Overall, after repeated iterations the components of the prototype have achieved
a mature level. The functionality for discussions meets the requirements and is
judged to be satisfactory for users.
The quality of use of the prototype overall fulfills minimum usability requirements,
“though some features could have been implemented in a more functional and user
friendly way. The reason is the lightweight nature of the basic platform (SMW)”.
There is potential to improve semantic search, although users can cope with
the shortcomings of semantic search because the navigation and browsing in the
SMW works well.
The added value of an innovative application is important for its acceptance and
uptake. Diverse expert opinions about the added value of the Design Project
Visualizer can be explained by the different roles of the experts. Engineers prefer
to keep administrative work at a minimum level and therefore do not recognize the
added value of the tool directly. Project managers currently collect administrative
information manually (for example in project meetings). Automated data collection
and representation in Gantt charts would be an added value for this group of users.
The experts see the basic user functionality as acceptable but remain to be per-
suaded of the benefit of the new technology. Some experts asked how the Design
Project Visualizer will improve the productivity.
The prototype was compared with competing tools used at Cadence (e.g. the
visualization of Functional Blocks, re-used IPs, IPs libraries and interfaces are
already captured in existing Cadence tools). The experts fear that the integration
of a Design Project Visualizer into their work processes may cause additional
overhead (e.g. by having to ensure the consistency of several databases) instead
of increasing productivity. High upfront cost for individual users incurs a substantial
lag before benefits and added value are visible. Therefore users have to be con-
vinced that improved information quality will offset the upfront cost. The added
semantic backend-functionality should add significant value for users by providing
them with information with higher (pragmatic) information quality. This should
improve the cost/benefit ratio sufficiently to assure acceptance by the organization
and individual users at the workplace. Further tests will be conducted to collect data
on this issue.
10.6 Summary
The chapter presented the results of the case study of the ACTIVE integrated
project on the use of knowledge process learning, articulation and sharing
technologies for increasing the performance and decreasing the ramp-up efforts
of knowledge workers managing designs of Microelectronics and Integrated
Circuits. One of the most important characteristics of a design project in general
and in this domain in particular is a very low proportion of the use of predefined
workflows. Due to that the processes in engineering design are to a substantial
extent informal. Instead of following rigid working patterns, the knowledge
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 209
workers exploit their tacit knowledge and experience for finding the most produc-
tive way through the “terrain” of possible process continuations. Design product
structure and methodology knowledge is collected from the project manager and
the members of the development team in a top-down manner. Design process
execution knowledge is mined from process log datasets in a bottom-up fashion,
fused, superimposed on the top-down knowledge, and further used for visualizing
the design project plan and execution information in a way that suggests optimized
performance, points to the bottlenecks in executions, and fosters collaboration in
development teams. A project navigation paradigm has been developed that helps
knowledge workers more easily find their way to a reliable outcome. This approach
has been implemented in a software prototype – the Design Project Visualizer. The
first results of the validation of the software prototype indicate that the solution is
helpful in providing expert assistance to design project managers performing their
typical tasks of project planning and execution management. The total cost/benefit
improvement remains to be vindicated taking into account both organizational
objectives and the fact that for some users, notably design engineers, additional
overhead may be created by the tools.
Acknowledgement The research leading to these results has been funded in part by the European
Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement IST-2007-
215040.
References
Andersson B, Bergholtz M, Edirisuriya A, Ilayperuma T, Johannesson P, Gordijn J, Gregoire B,
Schmitt M, Dubois E, Abels S, Hahn A, Wangler B, Weigand H (2006) Towards a reference
ontology for business models. In: ER’06: Proceedings of the 25th international conference on
conceptual modeling, Springer-Verlag, Berlin, Heidelberg
Bock C, Gr€uninger M (2005) PSL: a semantic domain for flow models. Softw Syst Model 4
(2):209–231
Crapo A W, Waisel L B, Wallace W A, Willemain T R (2000) Visualization and the process of
modeling: a cognitive-theoretic view. In: KDD’00: Proc. 6th ACM SIGKDD international
conference on knowledge discovery and data mining, ACM, New York
Dello K, Nixon L, Tolksdorf R (2008) Extending the Makna Semantic Wiki to support workflows.
In: Lange C, Schaffert S, Skaf-Molli H, V€olkel M (eds) Proc. 3rd Semantic Wiki Workshop,
CEUR-WS.org/Vol-360, ISSN 1613–0073, pp. 119–123, online
Dengler F, Lamparter S, Hefke M, Abecker A (2009) Collaborative process development using
Semantic MediaWiki. In: Proceedings of the 5th conference of professional knowledge
management, Solothurn, Switzerland
Dietz JLG (2006) Enterprise ontology: Theory and methodology. Springer Verlag, Berlin
Dorn C, Burkhart T, Werth D, Dustdar S (2010) Self-adjusting recommendations for people-
driven ad-hoc processes. In: Hull R, Mendling J, Tai S (eds) Business process management,
Lecture Notes in Computer Science. Springer-Verlag, Berlin, Heidelberg
Drucker PF (1969) The age of discontinuity: guidelines to our changing society. Heinemann,
London
210 V. Ermolayev et al.
Ermolayev V, Keberle N, Matzke W-E (2008) An upper level ontological model for engineering
design performance domain. In: ER’08: Proceedings of the 27th international conference on
conceptual modeling, Springer-Verlag, Berlin, Heidelberg
Ermolayev V, Ruiz C, Tilly M, Jentzsch E, Gomez-Perez JM (2010b) A context model for
knowledge workers. In: Proceedings of the 2nd international workshop on context, information
and ontologies (CIAO 2010), CEUR-WS.org/Vol-626, ISSN 1613-0073
Ghidini C, Kump B, Lindstaedt S, Mahbub N, Pammer V, Rospocher M, Serafini L (2009) Moki:
the enterprise modelling wiki. In: The semantic web: research and applications, Lecture Notes
in Computer Science. Springer, Berlin, Heidelberg
Grebner O, Ong E, Riss U, Brunzel M, Bernardi A, Roth-Berghofer T (2006) Task management
model. NEPOMUK project, deliverable D3.1, http://nepomuk.semanticdesktop.org/xwiki/bin/
view/Main1/D3-1, Accessed date 5 Aug 2011
Grobelnik M, Mladenic D, Ferlez J (2009) Probabilistic temporal process model for knowledge
processes: handling a stream of linked text. Conference on data mining and data warehouses
(SiKDD 2009), Ljubljana, Slovenia
Gronau N, M€uller C, Korf R (2005) KMDL – capturing, analysing and improving knowledge-
intensive business processes. J Univ Comput Sci 11(1):452–472. doi:10.3217/jucs-011-04-
0452
Gr€uninger M, Atefy K, Fox MS (2000) Ontologies to support process integration in enterprise
engineering. Comp Math Org Theor 6(4):381–394. doi:10.1023/A:1009610430261
Hepp M, Roman D (2007) An ontology framework for semantic business process management. In:
Oberweis A, Weinhardt C, Gimpel H, Koschmider A, Pankratius V, Schmizler B (eds):
eOrganisation: Service-, Prozess, Market-Engineering, 1, Universitaetsverlag Karlsruhe
Hingston P (2002) Using finite state automata for sequence mining. Aust Comput Sci Commun 24
(1):105–110. doi:10.1145/563857.563814
Jacquemont S, Jacquenet F, Sebban M (2009) Mining probabilistic automata: a statistical view of
sequential pattern mining. Mach Learn 75(1):91–127. doi:10.1007/s10994-008-5098-y
Kr€otzsch M, Vrandecic D, V€olkel M, Haller H, Studer R (2007) Semantic wikipedia. J Web
Semantics 5(4):251–261. doi:10.1016/j.websem.2007.09.001
Leuf B, Cunningham W (2001) The wiki way: collaboration and sharing on the internet. Addison-
Wesley, Upper Saddle River, NJ
Mayer R, Menzel C, Painter M, Witte PD, Blinn T, Perakath B (1995) Information integration for
concurrent engineering (IICE) IDEF3 process description capture method report, Knowledge
Based Systems Inc., College Station, Texas
Nonaka I, Takeuchi H (1995) The knowledge-creating company. Oxford University Press,
New York
Rabiner LR (1990) A tutorial on hidden Markov models and selected applications in speech
recognition. In: Readings in speech recognition. Morgan Kaufmann Publishers Inc., San
Francisco
Ryu K, Y€ucesan E (2007) CPM: a collaborative process modeling for cooperative manufacturers.
Adv Eng Inf 21(2):231–239. doi:10.1016/j.aei.2006.05.003
Scheer A-W, Jost W (2002) ARIS in der Praxis. Springer Verlag, Berlin
Schreiber G, Akkermans H, Anjewierden A, de Hoog R, Shadbolt N, van de Velde W, Wielinga B
(1999) Knowledge engineering and management: the CommonKADS methodology. MIT
Press, Cambridge, MA
Stajner T, Mladenic D (2010) Modeling knowledge worker activity. In: Proceedings of the
workshop on applications of pattern analysis, Cumberland Lodge
Stajner T, Mladenic D, Grobelnik M (2010) Exploring contexts and actions in knowledge
processes. In: Proceedings of the 2nd international workshop on context, information and
ontologies (CIAO 2010), CEUR-WS.org/Vol-626, ISSN 1613-0073
Tilly M (2010) Dynamic models for knowledge processes. Final models. ACTIVE project
deliverable D1.2.2, http://www.active-project.eu/publications/deliverables.html, Accessed
date 5 Aug 2011
10 Increasing Predictability and Sharing Tacit Knowledge in Electronic Design 211
Uschold M, King M, Moralee S, Zorgios Y (1998) The enterprise ontology. Knowl Eng Rev 13
(1):31–89. doi:10.1017/S0269888998001088
van der Aalst WMP, Song M (2004) Mining social networks: uncovering interaction patterns in
business processes. In: Proceedings of the international conference on business process man-
agement (BPM 2004), Lecture notes in computer science. Springer-Verlag, Berlin
van Dongen BF, de Medeiros AKA, Verbeek HMW, Weijters A, van der Aalst WMP (2005) The
ProM framework: a new era in process mining tool support. In: Application and theory of Petri
nets. Springer, Berlin, Heidelberg
Warren P, Kings N, Thurlow I, Davies J, B€urger T, Simperl E, Ruiz Moreno C, Gomez-Perez JM,
Ermolayev V, Ghani R, Tilly M, B€osser T, Imtiaz A (2009) Improving knowledge worker
productivity – the ACTIVE integrated approach. BT Technol J 26(2):165–176
Woitsch R, Karagiannis D (2005) Process oriented knowledge management: a service based
approach. J Univ Comput Sci 11(4):565–588. doi:10.3217/jucs-011-04-0565
212 V. Ermolayev et al.
Part IV
Complementary Activities
11
Some Market Trends for KnowledgeManagement Solutions
Jesus Contreras
11.1 Introduction
The networked economy is based on the ability of companies to transform know-
ledge into value and take profit from it. Most of the companies are focused on
creating added value products and services and optimizing all business processes to
be able to compete in a globalized market. There are three critical factors
companies are interested in:
• The ability create value and differentiate from competitors
• The ability to improve business processes
• The ability to speed up the time-to-market
The knowledge as a collection of experiences, strategies and practices within an
organization may positively influence all of these factors. The more valuable and
available is the knowledge needed to perform, improve or even create business
processes, the more value is created for the organization. Traditionally Knowledge
Management is the discipline in charge of ensuring the availability and the quality
of internal knowledge in organization. Today, with the recent penetration of web
and web 2.0 technology and paradigms into organizations, traditional knowledge
management is re-focused as “Collaboration and Communication” or sometimes
“Enterprise 2.0”. These labels are not equivalent, but many times they may refer
very similar solutions and often they compete for the same budget.
Modern knowledge management, collaboration and communication or Enter-
prise 2.0 often includes some of the following competences:
J. Contreras (*)
iSOCO, Intelligent Software Components, S.A., Avenida Del Partenon, 16-18, Madrid, 1� 7a
28042, Spain
e-mail: [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_11, # Springer-Verlag Berlin Heidelberg 2011
215
– Group and Community Management
– Discussion and Blogs
– Resource Sharing (Documents, Experiences, News, Ideas, etc.)
– Search and Information Retrieval
– Collaborative Editing
– Profile and Role Management for Employees
– Incentives and Rewarding by Contribution
– Collaborative Project Management
– Business Process Enhancement with Knowledge and Collaboration Concepts
From the technology point of view these competences include the following
functionalities to work well:
• Search: allowing users to search for other users or content
• Links: grouping similar users or content together
• Authoring: including blogs and wikis
• Tags: allowing users to tag content
• Extensions: recommendations of users; or content based on profile
• Signals: allowing people to subscribe to users or content with RSS feeds
11.2 Economic, Marketing and Research Challenges
In this section we introduce the economic, marketing and scientific challenges that
need to be overcome to permit further evolution of knowledge management solu-
tion into the mainstream market.
11.2.1 Economic Challenges
Economic conditions affect availability of capital, cost and demand. Economic and
market globalization drives the need for new, more dynamic modes of production in
the face of strong international competition. These factors are likely to be favorable
for social and collaboration solutions adoption as knowledge management in many
industries becomes more diverse and geographically distributed and competitive-
ness is determined by productivity. This encourages the adoption of the more open
and interoperable IT systems that run global business. Competitiveness based on
innovation and marketing/branding also provides opportunity for using collaborative
solutions, particularly for intelligent information management. The recession of
2008–2009 and the unprecedented speed of the collapse in the financial markets –
and then the organisations that relied upon financial institutions’ credit – have
created a challenging moment for realising the potential of knowledge manage-
ment. This, however, will provide vendors with an even larger opportunity and a big
216 J. Contreras
challenge for demonstrating a clear ROI and business case for deploying this new
generation of technologies.
• Cost Reduction: Cost reduction is a universal requirement in current market
conditions, including cost optimisation: not only lowering expenditures but also
providing the quality in services and products expected by end customers or
consumers while managing the financial and business risk. Cost reduction is
driven by the need:
∘ To achieve efficient performance in specific operational business processes,
∘ To achieve or improve a competitive advantage – strategic, i.e. cost
leadership,
∘ To manage financial distress and corporate recovery,
• Sustainable Innovation: Sustainable innovation for new market creation, mar-
ket share improvement and faster time-to-market are the result of a continuous,
more internetworked economy which dictates the need for interlinked and co-
evolving products and services in the market. This demands a constant flow of
innovation in a sustainable commercialisation process. Enterprises are under
pressure to boost innovation by adapting and responding to market conditions
and customer needs, which also are evolving at an ever increasing pace. This
drives the enterprise’s need to have access to the right information at the right
time so as to constantly fine-tune the product/service offering.
11.2.2 Marketing Challenges
According to some experts there are several challenges that need to be addressed by
any knowledge management solutions according to the phase of the project:
• Decision phase: When a knowledge management solution is presented to the
organization’s decision-makers, there are two challenges:
∘ Cultural change: The resistance to change is one of the strongest barriers in
the decision process. Knowledge management may change the usual way of
working and may require an additional effort from the employees to maintain
the system with no immediate return.
∘ Unclear ROI calculation: Since there is no clear return on investment method
(similar to knowledge management market or any other intangible value) it is
difficult to argue with financial benefit figures when facing potential buyer. At
least the total cost of ownerships needs to be calculated in order to ensure full
support from the managerial structure. Some recent works1 on process
1 Patrick Lambe: “How to Use Key Performance Indicators in KM Initiatives” 2007 (www.
stratsknowledge.com)
11 Some Market Trends for Knowledge Management Solutions 217
management have identified key performance indicators for KM allowing for
clear measurements and trend detections. Introducing measurement method-
ology possibly linked with business indicator (e.g., balanced scorecard) may
help overcoming the ROI calculation weakness.
• Usage and maintenance phase: When a knowledge management solution is
already in place, there are a number of remaining challenges:
∘ Critical mass achievement: knowledge management may fail before achiev-
ing critical mass of users. Despite good starting interest, knowledge manage-
ment may never jump from single department or small initial group to whole
companies. Consultancy for monitoring and corrective actions may help
avoiding the premature death of the solution.
∘ Focus on tools rather than on the problem: Knowledge management
tools are introduced with no clear causal relation to any concrete problem
or business process. The purpose is vague and employees have disparate
expectations.
∘ Lack of support from managers (lack of resources): Since knowledge
management adoption needs time and dedication from the employees, it is
crucial to have direction and managerial support. Activities such as training,
maintenance, moderation or incentives are critical during the whole life cycle
of the enterprise 2.0 solution. It is also necessary to include different
departments into the whole decision and adoption process: IT, HR, Marketing
and Finance.
∘ Infrastructure change: IT integration used to be an important inhibitor
due to the existence of legacy systems.
11.2.3 Research Challenges
There are several technology and research challenges, the solution to which may
accelerate the adoption of knowledge management solutions into corporations.
Traditional technology bottlenecks are usually related to the knowledge acquisition
phase, where traditional solutions fail due to low quality of information gathered
and limited user contribution.
The challenge may arise where traditional technology is not able to automate
the knowledge acquisition as could be the case for multilingual and multimedia
sources:
• Multilingualism is an upcoming challenge for many suppliers. Typically cur-
rent solutions are either applied to English or only a limited number of home
market languages. Solutions for multi-lingual technologies are expected from
language-specific modular add-ons to existing products, i.e. parsers and seman-
tic analysis for specific languages.
218 J. Contreras
• Spoken-language understanding (or generation) as well as machine translation
are not in the focus of current or near-term planning.
• Video: visual support has become key to knowledge management application.
Techniques of story-telling, lessons-learnt or best practices are more often stored
in audio-visual support improving the end-user consumption experience. Videos
need to be analyzed in an automatic way and included in the overall knowledge
life cycle.
In addition to the traditional knowledge management bottleneck, knowledge
representation, storage, manipulation and delivery could also represent a technol-
ogy challenge, especially for large industrial adoption of these kinds of solutions.
The definition of a common and standard data interchange language for knowledge
representation across industries is still not satisfactory to enable interoperation
across a broad range of sources. A considerable effort has been made in the last
5 years to achieve syntactical standards for the representation of semantics; i.e. for
the definition of languages useful for the description of data content, contextual
information, etc. But there are still missing standards for expressing business
vocabulary for sectors (e.g.: how to express market intelligent data, personal skills
or product description in a specific sector).
With semantic technology knowledge is represented using formal representation
on a high conceptual level and the acquisition is done on a seamless way with
minimal user involvement. For instance: a semantically enabled idea management
portal can automatically identify similar ideas (even when they do not share com-
mon words or expressions) and suggest authors to collaborate and elaborate a joint
initiative. In this case semantic technology allows for workload reduction providing
fewer ideas to check, with higher degree of collaboration and peer-review.
Knowledge management efforts may seem useless if the there is no efficient
consumption or exploitation phase in the knowledge life cycle. The possible return
on the investment is highly dependent on the way the knowledge is delivered,
to whom and under what circumstances. That is why the notion of context gains
importance when talking about knowledge delivery. For instance, the employee
context: where is he/she, the type of device, type of ongoing task or process, type of
knowledge, social context, time dependency, etc. are variables that need to be taken
into account for precise delivery.
11.3 Potential market
Nowadays productivity and efficiency needs are driving potential technology
evolution within companies. Among others concepts, collaboration, innovation or
communication are key functionalities demanded by potential customers.
The traditional knowledge management market has been extended into a new
market for communication, collaboration and social networking solutions for
the enterprise. This new market definition for knowledge management is still
11 Some Market Trends for Knowledge Management Solutions 219
a heterogeneous concept, and includes solutions for productivity and collaboration
support, sometimes called Enterprise 2.0 or Social Software in the Workplace
(Drakos et al. 2009). According to Forrester Research forecasts (Young 2008)
done before the last financial downturn, the potential market may achieve US$4.6
billion by the 2013, a similar size to the current Content Management Systems
(CMS) market. In the last 2 years, and despite the downturn, the worldwide market
for social software was estimated to grow about 15% from 2009 to 2010, and
another 16% to 2011 up to $769 million in 2011 worldwide (Gartner 2010).
Knowledge management solutions as discussed in this book provide technology
for better knowledge sharing and personnel performance by enhancing traditional
applications to become more easy to use, predictive and intelligent. Despite
this possible technology differentiation these solutions will be competing for
similar budget as any traditional knowledge management or enterprise 2.0 provider,
since they aim to solve similar problems. Looking at detailed tools that usually
form knowledge management or collaboration and communication solutions, and
according to Forrester Research report, there is a clear growing potential for
corporate social networking as seen in Fig. 11.1.
In addition to strategic forecast in the same report, some potential customers
and users have expressed their buying intentions. When asked about the buying
decision, more than 50% of companies over 1,000 employees answered they are
currently buying or considering enterprise 2.0 solutions (see Fig. 11.2.)
11.3.1 Customer Needs and Perceptions
Potential customers understand that the usage of collaboration tools will help the
corporations with knowledge sharing and cost reduction (effort and time) for some
critical processes. Figure 11.3 shows the result of a survey on buying decisions.
Even if team-work and knowledge-sharing are the most popular drivers for
collaboration solution adoption, the overall lack of understanding is one of the most
mentioned inhibitors, as shown in Fig. 11.4. The value perception (understanding,
Fig. 11.1 Sales forecast, as before the worldwide crisis, of social software features, data taken
from Forrester report (Young 2008)
220 J. Contreras
Fig. 11.2 Buying decision according to company size (Source Young 2008)
Fig. 11.3 Drivers for collaboration tools (Source [Miles 2009])
Fig. 11.4 Inhibitors for collaboration tool (Source Miles 2009)
11 Some Market Trends for Knowledge Management Solutions 221
priority, cost and return of investment) together with corporate cultural barriers
seem to be the main inhibitors for the adoption of knowledge management solutions.
Very similar to traditional knowledge management solutions, enterprise 2.0
encounters difficulties presenting compelling selling arguments. All functionalities
are ‘nice to have’ instead of ‘must to have’ and there is no clear way to perceive
value. On the one hand drivers that would reinforce a possible decision to buy are
related with the potential efficiency improvement on knowledge intensive tasks.
On the other hand there have also arisen new paradigms of corporate value which
include knowledge management and collaboration as part of business indicators.
Despite the fact that these features are very difficult to measure, some indicators are
visible in companies and sectors where knowledge intensive tasks are part of the
critical mission and where the knowledge retention is crucial to the cost control.
The current opportunity relies in the fact, apparently contradictory, that on the
one hand companies, due to global competence and outgoing crisis, urgently need
to boost productivity of knowledge workers and the knowledge management intui-
tively may help, and on the other hand any buying decision will need a clear
business case with explicit ROI calculation, very difficult to perform for such
intangible assets as the corporate knowledge. Tools and services able to link
collaboration, knowledge management and innovation with financial and market
indicators will find fewer inhibitors during the buying process.
11.3.2 Supply Side
Gartner’s overview of the supply side (well known magic quadrant) shows the clear
leadership of three companies: Microsoft, Jive and IBM that are offering complete
solutions for internal social and collaboration management. After some years the
market is getting more mature: fewer players in niche players and challengers
sections, as well as stable position in the leaders’ area.
In the ACTIVE R&D project researcher have performed a closer analysis on 40
of the providers that operate on the knowledge management, collaboration and
enterprise 2.0 market. (The complete list is shown in Table 11.1.)
Table 11.1 List of providers analyzed
Alcatel-Lucent Ektron Leverage Software Realcom
Atlassian Emantix Liferay Saba
Blogtronix EMC Liquid Planner Siteforum
blueKiwi EpiServer Microsoft SmartPoint
BlueNog eTouchSystems MindTouch SocialText
Box FatWire Mzinga Telligent
CentralDesktop Google Neighborhood America ThoughtFarmer
ConnectBean Huddle Novell Tomoye
CubeTree IBM Open Text Traction Software
Customer Vision iGLOO Oracle Twiki
222 J. Contreras
The purpose of this overview was to identify some possible ongoing strategies
that these companies use to reach the market. A quick look at the names shows that
there are a set of companies offering complete solutions for knowledge management
and collaboration covering most of the functionalities announced in the intro-
duction of this chapter (IBM, Microsoft, Oracle, Fatwire, etc.) and another set of
companies that offers a single piece or specialized functionality (CubeTree, Liferay,
Atlassian, etc.) covering just one or a few functionalities.
• Complete solutions versus niche player:
∘ Big players turn their traditional products into social software for enterprises.
That is the case for IBM, Novell, Oracle, Microsoft and EMC.
∘ Small players are mainly offering SaaS solutions with typical functionalities:
blogs, wikis, file sharing, RSS, communities and groups management, instant
messaging. These have become commodity and the competition is on price
(from US$5 per user/month).
∘ Some companies focus on vertical, niche markets, such as health, software
development and project management or e-government.
∘ Integration is key feature in small players.
∘ Some companies offer complete solutions on their own or in cooperation
(e.g.: Lucent in partnership with BlueKiwi).
From the feature point of view the findings are the following:
• Features:
∘ Only a few solutions can interoperate with business processes.
∘ No solutions mention ‘informal knowledge processes’ or ‘predictive
behavior’.
∘ Only a few solutions (Huddle, Smartpoint) work within the desktop. A
standard is to offer a platform or web solution.
∘ Mobile access is becoming a commodity.
∘ Open source platforms (Liferay, Drupal) are offering some basic functiona-
lities for free.
11.3.3 Marketing and Pitch
Like any technology based solution, knowledge based solutions need to elaborate
the message to different decision takers:
• Final user: People who will use solutions based on knowledge management and
related technologies. At this current stage of internal social and collaboration
management software, final users are usually considered a positive driver for
adopting these kinds of solutions (see Fig. 11.5). Productivity, less effort and
easy-to-use argument will help in any commercial approach.
11 Some Market Trends for Knowledge Management Solutions 223
• IT managers: These are responsible for integration and maintenance of the
solution and need to ensure a smooth integration into their existing architecture
with controlled change management and maintenance. IT departments usually
are ahead in new technology adoption for their internal purposes (see Fig. 11.6)
and that may serve as a beachhead for further extension within the company.
• Business or financial buyer Such managers, e.g., CEO, CFO, Head of HR, etc.,
are responsible for the strategic aspect of their business evaluating the cultural
and cost/benefit impacts of the solution. Arguments about productivity measures
and growth, ROI, innovation culture and employees’ satisfaction are a good
option for a commercial approach.
11.3.4 Unique Selling Proposition
The differentiation of collaboration based and semantically and context-aware
enhanced solutions from traditional knowledge management and collaboration
tools relies on the usage of emerging technologies that are able to handle behavior
in a more intelligent and user-friendly way.
• Informal processes: The integration of knowledge management tools into
corporate business processes has become a strong requirement to ensure the
success of any initiative. The pre-existence of well defined and identified
business processes is critical to fulfill this requirement. Only a few organizations
are mature enough, from the management point of view, to provide flexible and
useful process descriptions to permit the integration of knowledge management
tools and/or measures. Semantic and context technology allows both the auto-
matic identification and discovery of personalized processes and the provision of
added value on top of them.
0% 5% 10% 15% 20% 25% 30% 35%
Users
IT Managers
CIO/CTO
Senior/Executive Business Managers
Mid-Level Business Managers
CEO
Fig. 11.5 Drivers by user group (Data taken from Miles 2009)
224 J. Contreras
• Structured and unstructured data integration: A lot of knowledge is stored in
structured and non-structured sources within the organization information
systems. The usage of semantic technology allows for conceptual discovery
and integration of the knowledge across the whole organization fostering colla-
boration and value creation.
• Privacy: The typical KM scenario is collaborative, decentralized and heteroge-
neous, where knowledge is defined, used and shared across different groups and
domains. Articulating enterprise knowledge in the form of knowledge processes
allows users to create social relationships and form working and interest-based
groups beyond their own domain and organization (Ruiz-Moreno et al. 2009).
• Incentives: Cooperative behavior is generally not easily rewarded with material
rewards. An obstacle is that the definition of fair and effective incentives
requires the availability of transparent measures of performance. The organiza-
tional context (enterprise strategy, organizational structure, and processes)
determine many aspects of human resource management principles (and thus
available incentives), cooperation, and knowledge management principles
(which may be based on codified or personalized knowledge). In order to define
effective incentive systems, these organizational issues must be analyzed and
taken as constraints. A number of factors, such as career situation, experience,
personality, and others are likely to affect individual cooperative behavior,
and may not be easily affected by incentives (B€osser 2009).• Cost-benefit information: This can influence decisions in many stages of the
life cycle of physical or digital products. Besides that, large efforts are spent in
collaborative knowledge engineering projects and there is no objective judgment
possible, whether these processes are cost-effective or not and if certain actions
are beneficial to the whole process or beyond.
0% 20% 40% 60% 80%
IT
Marketing
Other Operations
Training
Customer Support
R&D
Sales
Human Resources
Admin
Finance
Legal
All Departments
No Department
Fig. 11.6 Departments demanding potential KM and collaboration solution (Data taken from
Miles 2009)
11 Some Market Trends for Knowledge Management Solutions 225
11.3.5 Pricing
Since the value of knowledge management, social or collaboration tools is not
always fully perceived, but the topic is still on the radar of potential customers,
there are some niche or small providers that are achieving good market share figures
adopting a low-cost strategy. There are some (mainly small and medium companies)
that offer their solutions from about US$5 per user – month with special plans for
large companies.
The use of SaaS (Software as a Service) has permitted pay-per-use business
models with almost no risk for the buyer. The basic functionalities of enterprise-2.0
like wikis and blogs are adopted by corporations, with no actual interaction with
corporate information systems, as commodities with no perceived differentiation of
value. This is not a mainstream trend but could be an excellent opportunity for niche
players that are not willing or able to integrate their solution with big players.
11.4 Conclusions
The potential market for knowledge management and collaboration solutions is
growing, especially thanks to the expansion of the Web 2.0 culture. Collaboration
and social software is penetrating large corporations even if there are still
difficulties at the corporate level in perceiving the added value of knowledge
management. In a short time horizon there is a need to relate knowledge manage-
ment, collaboration and social solutions with business indicators such as productiv-
ity, innovation or customer satisfaction in order to counteract the effects of the
current recession on the IT market. There is also need for compelling success stories
to convince decision takers.
A few big players are leading the offerings since they include knowledge
management features over well established platforms with an extensive customer
base.
There are several strategies to reach the mainstream market:
• Build on top of existing platforms offering value added features.
• Become a niche player with specific applications or in a specific sector or area.
• Replace existing platforms with alternatives (other desktop platform, web
applications, Saas model, mobile devices, etc.)
To become a successful player there are some technological or methodological
challenges any provider will need to address:
• Privacy is becoming critical issue in any social or collaboration application.
Underestimation of this topic may lead to the complete failure of any initiative.
• Multimedia and multilingualism is already a basic requirement for potential
customers.
226 J. Contreras
• Methodology and professional service around deployment, business process
management, incentive policies and measurements will allow for transforma-
tional project in large organizations.
In the mid or long-term timeframe, the forthcoming change of organizations and
companies to more open, geographically distributed, collaborative and productivity
based structures is a great opportunity for IT suppliers to position themselves
around this change and help companies to adapt their workplace and culture
to the new reality.
References
B€osser T (2009) ACTIVE D4.3.1 “Analysis of incentives in knowledge management and Web2.0
applications” http://www.active-project.eu. Accessed 16 Jan 2011
Drakos N, Rozwell C, Bradley A, Mann J (2009) Magic quadrant for social software in the
workplace, 22 Oct 2009, Gartner RAS Core Research Note G00171792
Gartner (2010) Gartner says worldwide enterprise social software revenue to surpass $769 Million
in 2011, Gartner press release. http://www.gartner.com/it/page.jsp?id¼1497215. Accessed
16 Jan 2011
Miles D (2009) Collaboration and Enterprise 2009, AIIM Industry Watch. http://www.emc.com/
collateral/analyst-reports/aiim-emc-collaboration. Accessed 26 June 2009
Ruiz-Moreno C, Gomez-Perez J, Contreras J (2009) ACTIVE deliverable 3.4.1 security in
knowledge processes: http://www.active-project.eu. Accessed 16 Jan 2011
Young G (2008) Global enterprise web 2.0 market forecast: 2007 to 2013 forrester research
11 Some Market Trends for Knowledge Management Solutions 227
12
Applications of Semantic Wikis
12.1 Business Applications with SMWþ, a SemanticEnterprise Wiki
Michael Erdmann, Daniel Hansch
12.1.1 Introduction
MediaWiki (http://www.mediawiki.org/wiki/MediaWiki) is the technical basis of
many wikis, including the online encyclopedia “Wikipedia”. The free software is
a web-based content management system that supports easy linking of articles,
which every user can read and edit. Wikis, which are built upon the MediaWiki
software, establish an asynchronous web-based communication and collaboration
platform. Their users become communities, make information available quickly and
jointly develop and share information. Operators of such wikis benefit from easy
installation procedures, low cost of operation, the robustness of the system and from
low training requirements for its users. Wikis are thus a flexible tool for collabora-
tion on theWeb. However, wikis are not designed for entering, managing or making
structured data available. For example, compare the population of London specified
in the German Wikipedia article “List of largest cities in the EU” with the figure
given in the “List of megacities”. You will notice different values. It is also
impossible to get a list of all cities whose population ranges between two and
three million using standard MediaWiki. Such lists must be researched, created
and maintained in a manual, i.e. time-consuming way. Existing data values cannot
be explicitly marked-up as such and the unstructured textual information is not
machine-readable.
The above mentioned shortcomings are addressed by Semantic MediaWiki,1 thesemantic extension to MediaWiki described in Chap. 3. Users can assign categories
1 http://semantic-mediawiki.org, cf. (Kr€otzsch et al. 2007)
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_12, # Springer-Verlag Berlin Heidelberg 2011
229
as well as properties to wiki articles. To stay with the example above, “London” is
a “City” and has a specific population number. In addition, explicit relationships
between wiki articles can be created. For instance, “London” can be related to its
country “England” via a “located in” relationship. The now explicit data can be
accessed in all wiki pages, e.g. to generate dynamic lists. Semantic MediaWiki
provides a special language to formulate so-called inline queries, which can auto-
matically create lists, e.g. the “List of megacities”. Users can make use of these
semantic features by utilizing special wiki markup, which requires them to learn the
query- and annotation syntax. This makes the adoption of Semantic MediaWiki in
commercial settings, where usability is a success factor, hard to achieve.
12.1.1.1 SMWþ Addresses Enterprise-Level Requirements
SMWþ, a professionally developed software product from ontoprise2, includes
Semantic MediaWiki and addresses the requirements posed in corporate environ-
ments: usability, reliability in daily operation, scalability, expressivity, interopera-
bility and professional services. Usability enhancements (Pfisterer et al. 2008) to
Semantic MediaWiki, which were developed by ontoprise within project Halo3,
support casual users efficiently in applying the semantic features in their daily
work. Scenarios, which require access to external data sources or the evaluation
of business rules for query answering are supported by the ontoprise product
TripleStoreConnector (TSC) which is realized on top of OntoBroker, ontoprise’s
industry-grade reasoning engine. SMW+ distinguishes itself from other enterprise
wikis regarding the extent to which knowledge can be formalized and evaluated
in the wiki. SMW+ integrates well with the existing tool-landscape of an enterprise.
For instance, users can include rich text from Microsoft Office documents in the
wiki by copying it into the WYSIWYG editor. Connectors to Microsoft Word,
Microsoft Excel and Microsoft Project make annotated data in the wiki available in
these tools. Data from heterogeneous data silos in the enterprise is made available
to wiki users by the semantic data integration features. This data can also be used
to generate reports and visualize data. By entering commercial support contracts
with ontoprise, businesses receive immediate and professional support, for example
to adapt the wiki to their specific needs.
In this chapter we will report on the experiences we and our customers gained
by using SMWþ. We will first describe tasks, which can be supported by SMWþ
and then present actual use cases that demonstrate the versatility of SMWþ.
2 Product web site for SMW+ and TSC: http://wiki.ontoprise.com/3 Project Halo is funded by Vulcan Inc., http://www.projecthalo.com
230 M. Erdmann and D. Hansch
12.1.2 Achieving Knowledge Tasks in SMWþ
12.1.2.1 User Roles in Semantic Wikis
When applying SMWþ to a concrete use case we typically identify four primary
user roles: knowledge consumers, knowledge providers, knowledge architectsand collaboration managers. Other, secondary roles are knowledge engineeringexperts, wiki application experts and administrators.
Knowledge consumers and knowledge providers have complementary roles. The
group of consumers typically is the largest group and uses the wiki for reading or
exploring its contents. The knowledge providers contribute this content by creating
and writing articles or annotating data. Accordingly, consumers value efficient
methods to explore the wiki and to retrieve data and text. Providers, however,
are specifically interested in efficient authoring and annotation tools, which lead to
high-quality data and appealing wiki articles.
Users taking the role of a knowledge architect create the wiki ontology (i.e.
categories and properties) and templates and forms, or integrate web service and
inline queries into articles. If many users are working in the wiki within a particular
workflow (e.g. within the same project) then their interactions must be adjusted
and organized to ensure workflow compliance and data quality and consistency.
The collaboration manager takes this role and uses the framework given by the
knowledge architect to guide the community.
In contrast to the knowledge architects who know their domain well, highly
specialized knowledge engineering experts are skilled in creating advanced rules
and queries as well as in the handling of complex ontologies. Wiki application
experts can create advanced page layouts, templates and forms (so called wikitextprogramming). Finally, administrators take care of all technical aspects like
installing software upgrades or creating database back-ups.
In the following we explain which tools are provided by SMWþ to serve the
particular requirements of the above user roles.
12.1.2.2 Authoring Articles and Annotating Data
Contributors typically interact with the wiki by creating articles and entering rich
text into the WYSIWYG editor that supports formatted tables and page designs in
a Word-like manner. It also allows embedding media files such as documents,
images, audio and videos. Finally, users can include queries into the WYSIWYG
editor which are rendered as dynamic lists, tables or sophisticated data diagrams
(Fig. 12.1).
Besides the unstructured content that is provided by contributors using the WYSI-
WYG editor, they are given various tools to author data in articles (so called
annotations). The Semantic Toolbar is displayed alongside the WYSIWYG editor
and gives a detailed overview of all annotations which are present in the article.
It provides options to inspect, create and alter the annotations (Fig. 12.2).
12 Applications of Semantic Wikis 231
The Advanced Annotation Mode, available also via the WYSIWYG editor,
provides a method to annotate data in an article. The user marks a relevant text
area and assigns it to a property interactively. To support users in the annotation
process, the software can consider schema information about the type of properties,
e.g., providing auto-completion when choosing appropriate properties.
An alternative way to annotate data in articles is provided by semantic forms(Fig. 12.3). Articles, which are associated with a form, can be edited in a form mode
allowing users to fill in pre-defined form fields with data values. Each form field
comes with an auto-completion feature proposing suitable values.
Knowledge can not only be added directly to wiki articles but can also be
expressed implicitly via logical rules. The user formulates rules by means of rule
editors that support the authoring of calculation, definition and property chaining
rules. The TripleStoreConnector which can be installed along with SMWþ,executes these rules and generates automatically (i.e. infers) new data which
become accessible in the wiki.
12.1.2.3 Exploring the Wiki and Querying for Data
Information consumers require efficient tools to explore a wiki and to retrieve data
precisely. The full-text search engine in SMWþ returns relevant search hits coming
Fig. 12.1 The WYSIWYG editor comes with rich text editing capabilities
232 M. Erdmann and D. Hansch
from wiki articles or from uploaded Microsoft Office- or PDF documents. The
search interface supports full text search, Boolean operators, auto-completion and
spell checking (“did you mean?”). The search results are augmented with semantic
data to help assessing the relevance of search hits immediately.
The semantic tree view supports contributors and consumers in exploring the
contents of a wiki. Data (categories, articles or annotated data) are displayed
in the hierarchical (tree-) view which is always automatically updated whenever
the underlying data change.
On the semantic level, annotated data is accessible via inline queries that can be
used within articles for dynamic content generation. The graphical query interfaceis available for composing queries and for immediately previewing and for-
matting the results (Fig. 12.4). The query interface is directly accessible from the
WYSIWYG editor for embedding the queries into articles.
Fig. 12.2 Enter and modify annotations with the semantic toolbar
12 Applications of Semantic Wikis 233
Users can choose from a rich set of visualization methods for rendering query
results in an article. They range from simple lists or tables to calendars, timeline
charts, maps, bar or pie charts, or process diagrams (Fig. 12.5).
12.1.2.4 Organizing and Curating the Ontology
The knowledge architect is in charge of organizing and curating the ontology to
ensure that it addresses the requirements of the knowledge contributors and that
it is consistent with the data which is stored in the wiki articles.
The ontology browser provides a complete overview of the wiki’s ontology
to the knowledge architects. The ontology can be searched, filtered and changed in
this tool, e.g. by adding and renaming categories or deleting instances (Fig. 12.6).
The gardening bots of SMW+ are an extensible set of agents (bots) that detect
anomalies in the ontology such as undefined entities, a mismatch between property
type and property value, or empty categories. The knowledge architect regularly
starts bots to verify the consistency of data and ontology and to apply corrections
where necessary.
Fig. 12.3 Data editing in articles using semantic forms
234 M. Erdmann and D. Hansch
12.1.2.5 Reusing Knowledge
A basic wiki idea is to facilitate knowledge interchange and re-use between people,
which also includes supporting as many relevant resource formats and origins as
possible. This concerns the import of external data into the wiki on the one hand,
as well as the usage of wiki-data in external applications on the other hand.
Therefore, SMWþ allows calls to external SOAP or RESTful web services and
integration of their results into wiki articles. This is useful to extend articles with
external information, e.g. from a bug tracking system to compile dynamic project
state and progress reports. An additional term import feature enables importing
data and vocabularies into the wiki (existing terminologies, glossaries, emails,
CSV files etc.) by creating wiki pages and automatically populating them.
In the other direction, data in the wiki can be re-used in other applications via
corresponding connectors. They allow querying data from the wiki within a spread-
sheet in Microsoft Excel. It is also possible to exchange project and task informa-
tion, including attributes, hierarchies and interdependencies, with Microsoft Project
(Fig. 12.7). With the ontoprise product WikiTags, users of Microsoft Outlook,
Word, and Excel can access, add and edit information and dynamically query
results embedded in wiki articles without leaving their application.
Fig. 12.4 The query interface supports building queries
12 Applications of Semantic Wikis 235
The most powerful feature in the knowledge re-use context is semantic data
integration (Fig. 12.8). In conjunction with the TripleStoreConnector (containingOntoBroker), SMWþ is employed as the collaboration front-end in complex data-
base environments. Existing databases (using RDBMSs like Oracle, IBM,
Microsoft SQL Server or MySQL) are “lifted” to a semantic level, thus, made
available in SMWþ. OntoBroker’s semantic data integration features (Angele and
Gesmann 2007) provide a consolidated view on different data sources, which are
available in the ontology browser and the query interface.
12.1.2.6 Administrating an SMW+ Installation
Several tools facilitate the maintenance of an SMWþ installation, e.g., tools for
definition of the wiki’s security policy and for the installation and upgrade process.
A typical basic administration step is the definition of access restrictions to
different elements of a wiki for single users or user groups. Objects that can be
Fig. 12.5 Query results are visualized in articles
236 M. Erdmann and D. Hansch
protected in this manner are namespaces, categories, single wiki pages, but also
semantic content (e.g. properties) and even specific actions within the wiki (e.g.
editing or reading pages). The Access Control List extension implements these
security features, allowing users to define private user or department workspaces,
and to restrictively expose selected areas to partners and customers.
Installation and upgrade tasks are facilitated with the Deployment Frameworkthat is included in SMWþ. It supports the administrator in (de-)installing, updating
and upgrading the wiki software without altering configuration files or manually
applying patches. Extensions are downloaded from the central ontoprise repository
and added while the tool considers all possible dependencies.
12.1.3 Business Applications
SMWþ is a universal and domain independent tool which is applicable for a
multitude of scenarios. In contrast to common enterprise applications, which are
designed to address general business problems, SMWþ focuses on the specific and
evolving needs of a work group and, thus, forms a situational application. Thanksto the ontology modeling and querying features, users can adapt SMWþ to serve
their situational and unique requirements.
Fig. 12.6 Inspect and edit the ontology in the ontology browser
12 Applications of Semantic Wikis 237
Therefore, possible application areas range from enterprise content management
(ECM), to integrated collaborative environments (ICE) and business intelligence
(BI).
Fig. 12.7 A connector synchronizes data between SMWþ and Microsoft Project
Fig. 12.8 Technical
architecture of SMWþ,TripleStoreConnector and
OntoBroker
238 M. Erdmann and D. Hansch
ECM comprises technologies to manage an organization’s unstructured informa-
tion, e.g. via web content- and document management. SMWþ is a content manage-
ment system and it covers typical customer needs like collaborative and distributed
authoring, version control, markup language to create layout templates and manage-
ment of technical metadata and media files. The semantic features of SMWþ are used
to describe wiki articles by means of annotations that amongst other things allow the
generation of new content. The benefits of the semantically augmented content are
(1) increased efficiency of content production by aggregating content using queries
and embedding dynamic lists into articles and (2) improved user experience by
providing better search results based on data in articles.
An Integrated Collaboration Environment (ICE) is defined as an environment
in which virtual teams do their work. In the case of project management, typical
requirements address the need of team spaces to collect and collaboratively author
project documents, support project managers in monitoring and scheduling tasks
and work packages and keeping the team informed about performance, costs and
schedule. Customers of SMWþ are already using SMWþ as a light-weight, seman-
tic project management system. In contrast to traditional project management
systems, which require users to follow rigid workflows and metadata blueprints,
SMWþ offers more flexibility to project teams. They flexibly adopt and extend the
SMWþ project schema to better satisfy their situational needs.
Recently the term Business Intelligence 2.0 has emerged. It involves the idea of
having easy-to-use business intelligence systems that provide insights on enterprise
data and allow users to collaborate, share and create reports via web-interfaces. The
overall objective is to make critical BI information available and accessible not
only to special CIOs or BI experts, but for everyone who relies on company relevant
information. SMWþ is a BI 2.0 tool, since it integrates heterogeneous enterprise
data sources into a unified view. In contrast to other BI 2.0 tools, SMWþ addition-
ally gives users a flexible descriptive layer (i.e. arbitrary wiki pages) to organize
reports, queries and external data.
12.1.3.1 Content Management: UNESCO OceanTeacher
UNESCO/IOC (http://ioc-unesco.org) employs SMWþ as technical basis of the
“OceanTeacher Encyclopedia” (http://library.oceanteacher.org), a production
system and public library of knowledge related to oceanographic data management.
It includes text, images, objects (PDF and Word documents, software packages,
audio and video files) and links to other web sites. Currently, 10 editors and 25þauthors are collaboratively generating articles for 300þ content consumers.
UNESCO decided to re-launch OceanTeacher because the former content man-
agement system did not provide a sufficient quality of search hits. It was nearly
impossible for content consumers to find materials on a particular topic for a
particular qualification level. UNESCO has contracted ontoprise to define an
ontology that covers the various topics and that allows users to match articles in
these topics with their qualification level.
12 Applications of Semantic Wikis 239
The new OceanTeacher portal splits up into a public portal for retrieving,
exploring and reading articles and a production portal that is accessible for
UNESCO personnel only.
The production portal has been configured to orchestrate the production
workflow of the editors, contributors and consumers. Editors review and approve
content created by the members of the contributor-group. They distribute tasks to
these contributors, for example creating new articles or improving existing articles.
Contributors are responsible for creating content. The separation of the user groups
provides the opportunity to assign different user rights. Before publication in
OceanTeacher, created content has to be approved by an editor. Every contributor
has his own private section in the wiki, where not-yet approved articles are listed.
After successful approval, the article will be published in OceanTeacher and is
visible for other users.
The target audience of the new OceanTeacher can be separated into three levels
of expertise: beginners, intermediate and experts. While creating an article,
contributors assign the corresponding target group to the article. The user can
display the content, which relates to his personal state of knowledge and interests.
The selection of target group, category and further metadata is done very con-
veniently with forms or in the advanced annotation mode (Fig. 12.9).
Consumers are empowered to retrieve content precisely according to their needs.
This was realized with the SMWþ query interface which allows composing and
executing custom queries. These queries can be easily added with a fewmouse clicks.
After the re-launch of OceanTeacher consumers retrieve better search hits in less
time and they can explore the library of articles much more easily with the tree
view. The production workflow is better integrated and orchestrated. Finally,
editors generate more appealing articles thanks to the WYSIWYG editor.
Fig. 12.9 Editors use the advanced annotation mode of SMW+ to enrich wiki contents with
semantic annotations
240 M. Erdmann and D. Hansch
12.1.3.2 Project Management: Business Process Re-engineeringat an Italian Bank
Business Process Reengineering (BPR) is defined as the analysis and (re-)design of
workflows within an organization in order to improve their efficiency. The follow-
ing example deals with a BPR project that was conducted with SMWþ by an Italian
association of banks: the “Association of Cooperative Banks of Emilia Romagna”.
The project aimed at consolidating the various work processes of the 8000
members of the association. This required conducting a survey of the current
processes employed by each bank. The data was collected by employees of the
bank and had to be evaluated afterwards in a business impact analysis. By these
means an assessment was achieved of how tasks were described in theory, how
they were actually handled in practice and what the most crucial processes were.
Based on that, standard procedures were identified and implemented.
The implementation of the project was split up into several phases. After setting up
SMWþ, the employees started collecting facts and descriptive texts in SMWþ about
the tasks they execute in their daily work. It was crucial that they entered particular
data in forms to ensure that it is normalized (e.g. risk level, responsible persons,
duration of execution etc.) to allow the subsequent statistical analysis. The collected
data was imported into Microsoft Excel where the statistical clustering was done to
get the groups of tasks, which share similar characteristics (e.g. the risk of failure).
These groups allowed spotting the most critical operational processes that needed to
be re-engineered. In a final step, the selected processes were standardized and then
collaboratively consolidated by the bank employees (Fig. 12.10).
SMWþ provided the reliable and agile knowledge base for every single phase of
the depicted case. The project requires an easy to use and easy to procure tool
enabling employees to safely enter data and text about their tasks. Furthermore, the
dynamic generation of aggregation lists and overview tables of the collected data,
as well as the data’s provision for statistical processing, was stipulated. SMWþ
meets these requirements with some key features, e.g.
• semantic forms enable easy data entry,
• a flexible datamodel which can be created and extendedwith theOntologyBrowser
• fine-grained access control (especially crucial for the sensitive banking industries)
• advanced ad-hoc querying of the stored data
• SMWþ machine-interpretable data which can seamlessly be extracted and
processed with external tools (e.g. MS Excel or MS Access) for advanced
statistical analysis.
12.1.3.3 Semantic Data Integration
One of the main problems companies have to deal with regarding knowledge
management is the integration of heterogeneous data sources. Considerable efforts
are necessary to provide a unified view on distributed data. The following case
depicts how a Fortune 100 pharma-company employed SMWþ as a self-service
12 Applications of Semantic Wikis 241
portal for its R&D department, to integrate various data bases containing infor-
mation about production lots, ingredients of individual products, results from
tolerance tests, etc.
The R&D department of the company was faced with the task of tracking all
significant events and data that are relevant for Food and Drug Administration
(FDA) submission reports, which are required for approving novel drug products.
Since it was hard to assemble the scattered data from several sources, the compi-
lation of reports was error-prone and took an inordinate amount of time. Con-
sequently, the requirement was to achieve a consolidated view on the relevant data
stores by linking them to a common domain ontology. Easy access methods would
then allow the generation of suitable reports on the fly (Fig. 12.11).
The case was resolved with an internal self-service portal where the R&D staff
could not only generate reports via querying the integrated databases but also by
using rules for inferring knowledge. The original Oracle databases were connected
to an OntoBroker Server as integrating middleware. At the front end, researchers
interacted with SMWþ and OntoStudio. The latter served as a collaborative envi-
ronment for ontology engineering and curation while the former was a central
integration and presentation facility for all of the activities.
The application of SMWþ combined with the semantic middleware OntoBroker
offers an integrated, common view on separated data silos. OntologyBrowser and
semantic tree view facilitate browsing and navigating the product genealogy in the
wiki. The compilation of reports in this way is remarkably faster through federated
semantic queries. Moreover, researchers can collaborate on integrated data and
Fig. 12.10 Screenshot of a task which has been entered by an employee
242 M. Erdmann and D. Hansch
reports that were gathered in the wiki. Also, the quality and coverage of relevant
information improves since the tracking and validation of products involves their
entire lifecycle now and important events always automatically trigger alerts
(e.g. when new test results are available or the ingredients of drugs have changed).
All told, the solution reduces the turn-around time for FDA submissions by several
orders of magnitude, from something that took a few weeks, to just a few minutes.
12.1.4 Conclusion and Outlook
In Section 12.1 we have presented SMWþ, the semantic enterprise wiki of
ontoprise, and demonstrated how flexibly it addresses enterprise-level requirements
compared to traditional enterprise-level applications.
The versatility of SMWþ opens a wide area of situational applications whereSMWþ can be efficiently employed. In all cases the applications are built by the
wiki users for the wiki users. Starting from a standard SMWþ installation they
gradually extend and refine the wiki as they work with it. The applications are
highly collaborative (e.g. project management); they are driven by the combination
of text and data (e.g. content management) and they all include data modeling tasks.
Finally, SMWþ demonstrates its unique strength when combining socially curated
unstructured data (i.e. text) with semantically integrated data from external, legacy
data sources, like RDBMSs. SMWþ is a novel tool for knowledge workers that is
easily procured and deployed to “start small” and which over time serves more and
more users and suits more and more requirements.
Fig. 12.11 OntoStudio mapping view for mapping RDBMS schema onto the ontology
12 Applications of Semantic Wikis 243
12.2 Bringing Complementary Models and PeopleTogether: A Semantic Wiki for Enterprise Processand Application Modelling
Viktoria Pammer, Marco Rospocher, Chiara Ghidini,
Stefanie Lindstaedt, and Luciano Serafini
12.2.1 Introduction
Enterprise modelling is the process of formally describing aspects of an enterprise,
typically as process models, data models, resource models, competence profiles,
etc. The availability of such enterprise models, expressed in computer-interpretable
formal languages, is becoming an important factor for enterprises and organisations
to make better and more flexible usage of the organisation’s knowledge capital and
foster innovation. We describe here the design and applications of MoKi (the
Modeling wiKi, http://moki.fbk.eu), and illustrate that complementary models
can be created in the same modelling tool by people with complementary skills.
MoKi is a semantic wiki for modelling enterprise processes and application
domains, and intended to be used by knowledge engineers and domain experts,
and everyone with skills in between.
State of the art modelling tools provide good support towards the creation of
formal, computer interpretable models, but they suffer from two critical limitations:
Different tools and different people are needed to model various enterprise aspects,
and throughout all modelling stages.
• Different tools and modelling environments are provided for the specification of
the different aspects of an enterprise. Most notably, the tools that are used to
model business processes are usually separate and disconnected from the tools
used to model the enterprise application model, which is more and more often
encoded by means of ontology languages. This tool discontinuity produces two
types of problems: Firstly, the modelling team needs to learn and interact with
completely different tools. Especially in small environments like SMEs, where
there may be scarce resources to allocate to modelling activities, this double-
learning may lead to human resource problems. Secondly, the technical integra-
tion of the different parts (i.e., business processes and enterprise application
domain) is not directly supported by the different tools and can therefore become
unnecessarily complex and time-consuming.
• State of the art tools for ontology and process construction are often tailored to
knowledge engineers, that is, to people who know how to create formal models.
On the other hand, domain experts, that is, the people who know the domain to
be modelled, often find such knowledge engineering tools scarcely usable.
As a result the interaction between these different roles in the modelling team
is regulated by a fairly rigid iterative waterfall paradigm in which domain
experts produce or revise informal descriptions contained in textual documents
244 V. Pammer et al.
and these informal descriptions need to be manually interpreted and transposed
in a formal specification by a knowledge engineer with the obvious problems
of duplication of efforts, wrong or misleading interpretations, and so on.
To overcome the limitations illustrated above, we have experimented with
Semantic Web collaborative tools, and especially Semantic MediaWiki. The
current version of MoKi, an extension to Semantic MediaWiki, is the result of
these experiments. We have used MoKi in different development stages and
customisations, in various application cases (see Sect. 12.2.3). MoKi addresses
the above two limitations through (1) representing both tasks belonging to
a business process and topics belonging to an enterprise application domain in
a wiki based system, and (2) representing both formal and informal content, and
allowing different visual representations of the same content. A collage of views
(tree view of concepts, process editor, a single concept) on MoKi content is
depicted in Fig. 12.12.
12.2.2 A Conceptual System Description of MoKi
MoKi is an extension of Semantic MediaWiki, which itself is an extension of
MediaWiki. Semantic MediaWiki adds labelled links and RDF interpretation of
wiki content to MediaWiki, and MoKi adds OWL4 and BPMN5 interpretation
Fig. 12.12 Collage of views on MoKi content
4Web Ontology Language, http://www.w3.org/TR/owl2-overview5Business Process Modelling Notation, http://www.bpmn.org
12 Applications of Semantic Wikis 245
of wiki content, as well as some knowledge engineering support, to Semantic
MediaWiki.
The decision to implement MoKi on top of an existing wiki was taken for several
reasons. Firstly, a wiki environment was chosen since the wiki principles of (1)
giving access to content (both with read and write permissions) to all, and (2)
making modifications of content as easy as possible for everybody, are well-aligned
with the goal of MoKi to be a modelling environment not only for knowledge
engineers. Secondly, most wiki environments support versioning and standard
collaboration features like discussion threads or comments. Thirdly, most wikis
are web-based, which enables geographically distributed modelling, and allows
textual as well as multimedia content, i.e. everything which can be published on the
web can be published in a wiki. Finally, most potential users of a system such as
MoKi can be expected to know how a wiki looks and feels, and a large portion of
these people also know how to actively contribute to wiki content. This is partly due
to the large success of the online encyclopaedia Wikipedia, but as well to the arrival
of wiki (and other semantic web) technology in the corporate world, as observed
e.g. in Gissing and Tochtermann (2007) and Schachner and Tochtermann (2008).
Extending traditional wikis, semantic wikis (Schaffert et al. 2008) already
provide the basic infrastructure to deal with structured data in addition to traditional
human-readable content-types like text or multimedia. Thus, they are technically
well suited to accommodate the results from informal as well as formal modelling
activities. In choosing a particular semantic wiki upon which to build MoKi, we
decided for Semantic MediaWiki, because MediaWiki has a large community of
developers and users, it is easily extensible through plugins and because it is generic
insofar as it is not adapted to any specific application scenario.
12.2.2.1 The “One Page – One Element” Design Principle
The basic design principle of MoKi is that every model element corresponds to
a wiki page. For application domain modelling, the relevant model elements are
concepts, individuals and properties/relations, while for business processes the
relevant model elements are processes.6 Each model element is internally given
a formal meaning. For instance, a concept is given the meaning of a description
logic concept, i.e. it is a unary predicate and can be interpreted as a set of entities for
which the unary predicate holds. It is essential to be clear about this internal
interpretation of model elements, since this forms the basis of technically dealing
with imports from various knowledge representation formalisms and exports to
various knowledge representation formalisms. A MediaWiki category is used to
distinguish different kinds of model element. Table 12.1 shows a synopsis of the
6 For the sake of simplicity, in MoKi we decided to not explicitly distinguish between atomic and
not atomic processes (i.e. composed of two or more sub-processes).
246 V. Pammer et al.
model elements available in MoKi, their formal interpretation as well as in which
kind of model the corresponding model element is expected to be used.
For every kind of model element supported by MoKi, a specific template is
provided. Figure 12.13 shows an excerpt of an already filled-out concept descrip-
tion. The implementation of templates is based on the Semantic Forms extension to
Semantic MediaWiki that allows users to define forms for editing pages and
templates to store semantic data. From a user perspective, a template is displayed
as a list of fields which have to be filled out in order to describe the model element.
Fields can differ between model elements. Conceptually, a template asks the user
for information which is typically needed for a specific kind of model element.
Example. When describing a domain concept, it is typical to ask “What is a
superconcept?”, i.e. what are more general notions than the currently described
concept, in which categories does it fall?
Obviously, the users are not necessarily required to fill all the fields of a page
when describing a specific model element. Additionally, some of the templates
support the possibility for users to add custom defined fields.
Example. A user may want to describe the concept “Project”, and then express
that a project is typically managed by a person. In this case, the user can add a new
field to the concept “Project” which is called “managed-by” and fill it with the
concept “Person”.
The use of templates allows for an easy customization of MoKi to hold addi-
tional kinds of models other than domain ontologies and business processes.
Although such a customization can not yet be done solely at the user interface –
some programming in PHP is required in order to define the formal meaning of the
fields in a new template and to add import and export support for the new model
elements – it requires some minimal software development effort.
12.2.2.2 Functionalities
In addition to the features offered by MediaWiki and Semantic MediaWiki, MoKi
provides functionalities for importing/exporting models, navigating, editing, and
validating models. In this section we briefly illustrate these functionalities. A more
extensive description can be found at the MoKi web site and in Ghidini et al. (2009).
The first group of functionalities concern the import/export of models. The
application domain model can be exported into OWL 2, RDF/XML Rendering, and
Table 12.1 Category names in MoKi for designating different kinds of model elements in MoKi.
“Type of Model” refers to the type of model in which such a model element is expected to occur
Category Model element Interpretation Type of model
“Domain model” Concept DL concept Domain ontology
“MokiProperty” Property/Relation DL role Domain ontology
“Individuals” Individual DL nominal Domain ontology
“Process model” Process BPMN process Process/Task model
12 Applications of Semantic Wikis 247
Fig. 12.13 Excerpt of a filled-out concept template in MoKi, shown in the figure for a concept
called “Workshop”. The fields in the Annotation, Hierarchical structure and Notes boxes are
available for all domain concepts. The fields in the Properties box are added by the ontology
engineer specifically for each domain concept, as e.g. “hasParticipant” and “isOrganizedBy” for
the concept “Workshop”
248 V. Pammer et al.
the business process model can be exported into BPMN, according to the Oryx eRDF
(embedded RDF) serialization. Import of application domain models is possible from
OWL2files. In additionMoKi also supports importing knowledge from less structured
sources. Hierarchies of concepts can be imported by writing down the hierarchy as
a simple ASCII list of terms, where indentation indicates the hierarchy. Knowledge
can be imported from text documents by means of a term extraction functionality.
Extracted terms can be addedwith one click as (candidate) concepts intoMoKi domain
ontology. This functionality uses at the backend the KnowMiner framework, a Java-
based framework for knowledge discovery (Granitzer 2006; Klieber et al. 2009).
The second group of functionalities concern the navigation of models. In any
information or knowledge management system, navigation through content is vital
to its success: the best content is useless if it cannot be easily accessed. MoKi
content can be accessed through standard MediaWiki functionalities like search, or
through typing in the URL of a single page in the address bar of the browser. Apart
from this, there are the possibilities to get lists of model elements in a tabular style
(where each element is shown alongside some relevant information characterizing it),
and in graphical visualizations. For the ontology part, the graphical visualization
concerns a tree-view rendering of the specialisation (Is-A) and mereological (Part-
Of) hierarchies, and a tree-view rendering of the individuals concept membership
role. Hierarchies are also editable by drag-and-drop features. For the process part,
a graphical visualisation of the process workflow and of the different subprocesses
it comprises is available in the process element page.
The third group of functionalities concern the editing of models. Editing activities
relate to single model elements and concern the creation of model elements, their
repeated editing (among this renaming the model element, or even changing its type),
and the deletion of model elements. Depending on the type of element, the
corresponding template is loaded when a model element is created or edited.
A fourth important group of functionalities concern the evaluation of models
(described in detail by Pammer (2010), pp. 93–121). Whatever purpose models are
created for, they need to be evaluated in order to ensure that they will serve their
intended use. The current version of MoKi supports ontology evaluation through:
(1) a models checklist, (2) a quality indicator, (3) the ontology questionnaire
and (4) through displaying assertional effects. The models checklist is a list of
characteristics that typically point to oversights and modelling guidelines, and
automatically retrieves elements that fit the characteristics.
Example. One point on the checklist is “Orphaned concepts”, i.e. concepts that
have no super- or subconcepts, have no parts and are not part of anything. These are
often concepts left-over from brainstorming or another earlier modelling iteration.
The quality indicator is displayed on the page of all elements and visualises the
completeness and sharedness of the corresponding element as a bar that grows from
“short and red” to “long and green”. Both completeness and sharedness are heuris-
tic measures, where the first captures how much information (verbal, structural)
about the element is available while the second captures how many people have
contributed to the description of the element. The ontology questionnaire displays
12 Applications of Semantic Wikis 249
inferred knowledge, i.e. statements that can be derived from the models contained
within MoKi, and provides explanations for them, as well as the possibility to
remove them. In case explicitly made statements are deleted in order to remove an
undesired inference, the ontology questionnaire also displays side-effects, i.e. all
inferences that will be lost alongside. Assertional effects (Pammer et al. 2009) are
displayed on concept and property pages directly after an ontology edit that causes
one or more assertional effects. It is called questionnaire in order to point out
that domain experts should go through inferred statements like going through
a questionnaire and asking the question “Is this statement correct?”.
12.2.3 Application and Experiences with MoKi
MoKi has been applied in several scenarios, in varying stages of development
and sometimes with small customisations.
12.2.3.1 Model Tasks and Topics for Work-Integrated Learning
Two early versions of MoKi have been successfully applied (APOSDLE 2009, pp.
21–25, 32–47) to develop enterprise models in six different domains: Information
and Consulting on Industrial Property Rights (94 domain concepts and 2 domain
roles; 13 processes), Electromagnetism Simulation (115 domain concepts and 21
domain roles; 13 processes), Innovation and Knowledge Management (146 domain
concepts and 5 domain roles; 31 processes), Requirements Engineering (the RES-
CUE methodology) (78 domain concepts and 2 domain roles; 77 processes),
Statistical Data Analysis (69 domain concepts and 2 domain roles; 10 processes)
and Information Technology Infrastructure Library (100 domain concepts and
2 domain roles; no processes). The enterprise models were created for the purpose
of initialising and serving as the knowledge backend of a system for work-
integrated learning ((Lindstaedt et al. 2007), www.aposdle.org). The modelling
activities involved people with different modelling skills and levels of expertise
of the application domains, and located in different places all over Europe.
The experiences of using the first version of MoKi, essentially Semantic
MediaWiki without much additional convenience functionality for modelling, are
described in Christl et al. (2008). These experiences provided the motivation for
adding convenience, i.e. modelling support, functionality to Semantic MediaWiki
in order to support enterprise modelling for processes and application domains.
A qualitative evaluation on the entire modelling process including the usage of
early versions of MoKi applied in APOSDLE is documented in APOSDLE (2009).
The evaluation of the second version of MoKi, much the same as the current
open-source version in functionality if not in design and robustness, took the form
of structured interviews. The interviews were composed both of open and closed
questions. Interview partners were all involved domain experts, on-site knowledge
250 V. Pammer et al.
engineers working at the application partners’ and external knowledge engineers
providing additional modelling skills where necessary. Regarding MoKi, questions
about its usability, the support for collaboration, the usefulness of its functionalities,
the homogeneity of the modelling environment for modelling the different aspects
(domain model and processes), etc., were asked. Note that before using MoKi, the
users already tried modelling with Semantic MediaWiki for informal, integrated
models (domain and business process) and Protege and a YAWL editor for the
formal models.
According to the results of the questionnaire, the users highly appreciated the
form-based interface of MoKi, and the fact that they were able to participate in the
creation of the models without having to know any particular syntax or deep
knowledge engineering skills. Thus, MoKi was perceived as an adequate tool to
actively involve domain experts in the modelling process. From the questionnaire it
further emerged that also people with some knowledge engineering skills found
MoKi as comfortable to use as other state-of-the-art modelling tools. The answers
show that MoKi helped the users in structuring and formalizing their knowledge in
a simple, intuitive and efficient manner. Particularly appreciated have been the
functionalities, in particular the graphical ones, which allow to navigate through the
content defined in the models. Finally, the users have found MoKi, in its charac-
teristic as web-based modelling environment, quite useful to produce a single
model with a geographically distributed modelling team.
12.2.3.2 Collaboratively Build an Ontology for Annotating Content
The current version of MoKi has been used by a team of knowledge engineers and
domain experts to collaboratively build an Organic Agriculture and Agroecology
Ontology (61 domain concepts, 30 domain roles, and 222 individuals) within the
FP7 EU-project Organic.Edunet (http://www.organic-edunet.eu/). This experience
was perceived as positive enough to use MoKi as the central modelling tool in the
follow-up EU project, Organic.Lingua.
12.2.3.3 Maintain a Glossary in a Project
The current version of MoKi is being used within the FP7 MIRROR project
(www.mirror-project.eu/) to develop and maintain a common glossary.
12.2.3.4 Model Clinical Protocols in Asbru
Although MoKi as presented here is tailored to the development of ontologies and
business processes, the applicability of MoKi goes beyond typical enterprise
modelling. A preliminary and customized version of MoKi that supports the
modelling of clinical protocols in the ASBRU modelling language is described
12 Applications of Semantic Wikis 251
in Eccher et al. (2008). This version ofMoKi, calledCliP-MoKi, provides support for
modelling the key elements of an Asbru model (e.g. plans, parameters) as wiki pages,
and for exploring the models created according to the mechanisms for structuring
knowledge provided by the language (e.g. the plan/plan children decomposition).
12.2.4 Discussion
MoKi supports a variety of knowledge engineering activities, such as knowledge
acquisition, informal modelling, formal modelling, and evaluation of models at
various stages. Knowledge acquisition is supported through the term extraction
functionality. The term extraction functionality is state-of-the-art, which unfor-
tunately means that it supports only a limited number of languages (English and
German currently) and that the quality of results depends a lot on the corpus it is
given. Informal modelling is supported via the possibility to import simple
hierarchies, via the prominently placed possibility to verbally describe and docu-
ment (“Description”, “Synonym(s)”) all kinds of model elements, as well as via the
possibility to richly document (“Free notes”) all kinds of model elements in all
formats which can be held in a webpage. Evaluation of informal aspects of the
models contained in MoKi is supported, in that for instance elements with no verbal
description are explicitly pointed out to MoKi users (models checklist) and by
giving direct feedback on completeness and sharedness on element pages (quality
indicator). Formal modelling is supported from a user perspective by providing
form fields with auto-fill functionality to ontology engineers. Fields are given
a formal meaning, which is the basis of technically supporting formal modelling.
Evaluation of the formal models is supported through listing and explaining
inferences (ontology questionnaire) and through displaying the effects of editing
formal axioms on data (assertional effects). However, on the formal modelling side,
MoKi does not (yet) support the full expressivity of OWL 2. Most importantly,
it does not yet support complex concepts. Additionally, MoKi versions that support
both domain and process modelling, and a MoKi version that supports modelling
clinical protocols, exist. Thus, MoKi does improve on the first limitation of many
existing modelling tools and decreases the gap in tool support between tools for
different enterprise aspects, and for different knowledge engineering activities.
Most requirements for collaborative knowledge construction tools, discussed for
instance in Noy et al. (2008) and in Tudorache et al. (2008), are easily met byMoKi,
merely by its being implemented on top of MediaWiki. The requirements on
collaboration features satisfied by MoKi are distributed access to a shared ontology,
version control, user identity management and tracking the provenance of infor-
mation and discussion on model elements. Fine-grained access control is not
possible in MoKi, and collaborative protocols that involve rating or voting are
not supported in MoKi either. Coherence is ensured mostly through keeping the
informal and formal model element descriptions in one place, i.e. in one wiki page.
Inconsistencies between the natural language text or rich content on the one hand
252 V. Pammer et al.
and the formal descriptions on the other hand are not detected. Indeed, this would
exceed the state-of-the-art in natural language understanding, and even more so in
multimedia understanding. However, coherence is supported in another slightly
roundabout way, namely through the “watch” functionality of MediaWiki. Through
this functionality, users can be notified if changes occur at a wiki page, which in
MoKi means changes concerning a model element. Like this, both domain experts
and ontology engineers can easily detect changes to parts of the ontology in which
they hold an interest. Concerning the second limitation of most existing modelling
tools, namely that users with different knowledge engineering skills (knowledge
engineers, domain experts) are often not able to work within the same modelling
environment, MoKi already now is able to hold rich, informal content as any
MediaWiki can, as well as formal content that can be exported into formal domain
and business process modelling languages (OWL 2 and BPMN respectively).
References
Angele J, Gesmann M (2007) The information integrator: using semantic technology to provide
a single view to distributed data. In: Kemper A et al. (eds) Datenbanksysteme in business,
technologie und web (BTW). 12. Fachtagung des GI-Fachbereichs “Datenbanken und
Informationssysteme” (DBIS). GI-Edition-Lecture notes in informatics (LNI), p 103 http://
www.btw2007.de/paper/p486.pdf
Christl C, Ghidini C, Guss J, Pammer V, Rospocher M, Lindstaedt S, Scheir P, Serafini L (2008)
Deploying semantic web technologies for work integrated learning in industry. A comparison:
SME vs large sized company. In: Proceedings of the 7th international semantic web conference
(ISWC 2008), In use track, vol 5318, Springer, pp 709–722, 2008
Eccher C, Ferro A, Seyfang A, Rospocher M, Miksch S (2008) Modeling clinical protocols using
semantic MediaWiki: the case of the oncocure project. In: ECAI workshop on knowledge
management for healthcare processes (K4HelP), 2008
Ghidini C, Kump B, Lindstaedt S, Mabhub N, Pammer V, Rospocher M, Serafini L (2009) Moki:
the enterprise modelling wiki. In: The 6th annual european semantic web conference
(ESWC2009), Springer, pp 831–835, 2009, Demo
Gissing B, Tochtermann K (2007) Corporate Web 2.0 Bankd I: web 2.0 und unternehmen – wie
passt das zusammen? Shaker Verlag, Aachen, Germany
Granitzer M (2006) Konzeption und entwicklung eines generischen wissenserschliessungs-
frameworks. PhD thesis, Graz University of Technology, 2006
Klieber W, Sabol V, Muhr M, Kern R, Ottl G, Granitzer M (2009) Knolwedge discovery using
the knowminer framework. In: IADIS international conference information systems,
pp 307–314, 2009
Kr€otzsch M, Vrandecic D, V€olkel M, Haller H, Studer R (2007) Semantic wikipedia. In: WWW ’06:
proceedings of the 15th international conference on World Wide Web. ACM, New York (2006),
pp 585–594. http://korrekt.org/papers/KroetzschVrandecicVoelkelHaller_SemanticMediaWiki_
2007.pdf
Lindstaedt S, Ley T, Mayer H (2007) Aposdle – new ways to work, learn and collaborate.
In: Proccedings of the 4th conference on professional knowledge management WM2007,
ITO-Verlag, Berlin, Potsdam, Germany, pp 227–234, March 28–30 2007
APOSDLE Deliverable 1.6. Integrated modelling methodology version 2, 2009
Noy NF, Chugh Abhita, Harith Alani (2008) The CKC challenge: exploring tools for collaborative
knowledge construction. IEEE Intell Syst 230(1):64–68, January-February 2008
12 Applications of Semantic Wikis 253
Pammer V (2010) Automatic support for ontology evaluation – review of entailed statements
and assertional effects for OWL ontologies. PhD thesis, Graz University of Technology,
March 2010
Pammer V, Serafini L, Lindstaedt S (2009) Highlighting assertional effects of ontology editing
activities in OWL. In: d’Acquin M, Antoniou G (eds) Proceedings of the 3rd international
workshop on ontology dynamics, (IWOD 2009), collocated with the 8th international semantic
web conference (ISWC2009), CEURWorkshop Proceedings, vol 519, Washington D.C, USA,
October 26 2009
Pfisterer F, Jameson, A, Barbu, C (2008) User-centered design and evaluation of interface
enhancements to the semantic media wiki. In: The proceedings of the workshop on semantic
web user interaction at CHI 2008, Florence, Italy. http://swui.webscience.org/SWUI2008CHI/
Pfisterer.pdf
Schachner W, Tochtermann K (2008) Corporate web 2.0 band II: web 2.0 und unternehmen – das
passt zusammen! Shaker Verlag, Aachen, Germany
Schaffert S, Bry F, Baumeister J, Kiesel M (2008) Semantic wikis. IEEE Softw 250(4):8–11
Tudorache T, Noy N, Tu S, Musen MA (2008) Supporting collaborative ontology development in
Protege. In: 7th international semantic web conference, Springer, Karlsruhe, Germany, 2008
254 V. Pammer et al.
13
The NEPOMUK Semantic Desktop
Ansgar Bernardi, Gunnar Aastrand Grimnes, Tudor Groza,and Simon Scerri
13.1 A Tool for Personal Knowledge Work
The crucial role of knowledge work in modern societies is widely recognised.
Characterised by the collecting, structuring, and interconnecting of information
and by the articulation of new insights, ideas, and results, knowledge work is
understood as a comprehensive information processing activity which ultimately
leads to decision making according to the goals and processes of the particular work
context. Usually, such knowledge work ultimately boils down to a personal acti-
vity: the individual processes, navigates and enhances a rich information space,
communicates and shares with others, makes sense out of available sources, and
accounts for decisions taken and actions performed.
Supporting individual, personal knowledge work is thus a promising approach
for effective work support. Such support must take into account three core dimen-
sions of personal knowledge work:
• Information in the personal realm is typically available in various formats across
different applications on a personal computer – think e.g. about Web browsers,
databases, address managers, mailing tools, file systems, and documents of all
kinds. Retrieving information in this complex collection can be cumbersome.
A. Bernardi (*) • G.A. Grimnes
DFKI GmbH, Knowledge Management Department, Trippstadter Strasse 122, 67663
Kaiserslautern, Germany
e-mail: [email protected]; [email protected]
T. Groza
School of ITEE, The University of Queensland, St. Lucia 4072, QLD, Australia
e-mail: [email protected]
S. Scerri
DERI, National University of Ireland, Galway, IDA Business Park, Lower Dangan, Galway,
Ireland
e-mail: [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_13, # Springer-Verlag Berlin Heidelberg 2011
255
• Human insight and knowledge create relations and dependencies between infor-
mation items. Facilitating the documentation of such relations is a good way to
make the knowledge worker’s thoughts explicit.
• Typically, individual knowledge work is not performed in isolation. On the
contrary, creativity and decision-making very often arise out of a vivid exchange
with peers and contributors.
To cope with such challenges, the NEPOMUK project1 has developed and
implemented a vision of a new support tool for knowledge workers: the SocialSemantic Desktop. This work environment (the desktop) allows the representationof knowledge and relations in computer-processable ways by explicit semanticannotation and the sharing of such information with others in a kind of socialexchange. The central element of the Semantic Desktop is the generation
and maintenance of a Personal Information Model (PIMO). This PIMO makes
a knowledge worker’s concepts and relations between information items explicit
and allows for annotating arbitrary information items in a formal and consistent
way, thus representing the personal interpretations and assessments of the infor-
mation at hand. Such explicit and formal annotation is then the basis for automated
services to support the user.
For example, in Fig. 13.1, the knowledge worker – Claudia – is faced with
a multitude of information trapped within different data silos, such as, files, e-mails,
tools etc. PIMO leverages a semantic representation of this data and models
explicitly Claudia’s knowledge about the CID project, including relations to rele-
vant data, as well as relations among the different data silos.
The realization of the Social Semantic Desktop enables a better retrieval of
relevant information across different applications and information sources within
the personal (computer) desktop. It makes individual interpretation, perspectives,
and interconnections explicit and sharable, and allows for suitable automatic
activities, like classification of documents (via text/content analysis), grouping
and browsing support, or reminder services (for tasks, deadlines etc.).
To materialise the Social Semantic Desktop in a way which promises easy and
wide-spread uptake, NEPOMUK has identified core aspects which must rely on
established standards as far as possible:
• Explicit concepts and relations shall represent the user’s thoughts in the PIMO,
and available information is interpreted and structured accordingly. Thus the
semantics of the information is available in formal structures, and computer
services can rely on that.
• This should be realized based on standard data formats which are independent of
proprietary applications and implementation details. We find such data formats
in the domain of the Semantic Web
1 NEPOMUK was performed 2006–2008 by 16 partners with funding by the FP6 program of the
European Union. Further details and results are available at http://nepomuk.semanticdesktop.org
256 A. Bernardi et al.
• All models and annotations must rely on a widely-accepted conceptual basis in
order to make shared understanding and collaboration possible. To this end
NEPOMUK defines a number of ontologies to be used.
• The implementation realizes a Framework architecture with open standard
APIs, so developers can easily add their own additional services.
• The ultimate goal is to enable the maintenance and sharing of personal informa-
tion models – because knowledge workers are not alone!
13.1.1 An Example Scenario
We outline the usage of a NEPOMUK Semantic Desktop by following a typical
knowledge worker in some of her daily activities. Within the NEPOMUK project,
the description of personas and associated scenarios (Gudjonsdottir 2010) proved
very useful to clarify and illustrate the use of the tools and their benefits. The
personas and scenarios were developed after interviews with and observation of real
people from NEPOMUK use-case partners (see Sect. 13.6.2). There scenarios and
personas served to motivate the developers to think about end-users other than
the CIDproject
TopicProject Document
TaskManagement
CID slidesCID
Claudia
Who
Karlsruhe
Where
Dirk
files emails contacts
Documents
Papers
Projects
Inbox
Todo
SAP
Claudia Stern
Dirk Hagemann
Klaus Nord
CID
Research
CID proj
Karlsruhe
Fig. 13.1 A personal information model represents the user’s interpretation
13 The NEPOMUK Semantic Desktop 257
themselves, as well as giving a high-level, but concrete overview of the kind of
things the Semantic Desktop should be able to help the user with.
We present a short glimpse at the activities of one persona, Mrs. Claudia Stern,
and her Semantic Desktop:
Claudia Stern originates from the town M€unchen in Germany and lives now in
Karlsruhe. She has a diploma in information technology and is a project manager
at SAP Research. Currently, Claudia is working on a big EU project called CID.
She utilizes various tools to control and follow up on the activities in the project.
She books meetings with the project members through her calendar. In meetings,
she always uses her laptop to access relevant information and presentations and to
write minutes. She often needs to access her own and others’ calendar and the lists
of tasks for the project members. Through these tools she has a clear picture of what
needs to be done in the project.
On the 27th of February, 2008, Claudia is having a meeting with Klaus,
a colleague at SAP Research, and they are planning a meeting in the project called
CID they are going to attend in Belfast. The meeting is organised by Marco
Andriotti. There are many things that need to be done before they leave for Belfast
in a few weeks.
Claudia goes to her office and adds the meeting to her calendar; furthermore she
creates a task called “Belfast meeting” in her personal task manager. From the
work-trip template, a set of relevant sub-tasks are created, including e.g. requesttravel permission and book travel/accomodation. Besides the practical travel issuesthe task list also includes some items connected to the work that needs to be done in
the meeting and Claudia adds some specific tasks to the list, fleshing out the
automatically generated sub-tasks.
When Claudia has got permission to travel, she proceeds to book train, flight and
hotel, the system knows her preferences and only recommends fitting options. The
travel information and electronic documents that result from her successful booking
activities are added to her “Belfast meeting package” which includes everything
related to the meeting, like work and travel documentation. The package is also
accessible online. When the trip gets closer the system checks the weather so that
Claudia can pack the right things. The system adds recommendations of restaurants
and shops according to her profile.
A travel timeline is created automatically, where she sees all the relevant
appointments and dates she needs to observe: from leaving the office in order to
make it to the train in Karlsruhe, to the tram when she returns home from the trip.
Afterwards she gives her colleagues access to her timeline. As she is preparing for
the meeting, Claudia can look up the details of colleages she met last time she went
to a meeting in Belfast, including notes about their interests as well as about the
restaurants were they had dinner.
When Claudia comes back from her trip she needs to make a travel report in
order to claim her expenses. This is automatically created according to the travel
timeline she created before she left. Claudia sends this report, together with the
collected receipts (still on paper!), to SAP’s HR department and is promptly
refunded for her expenses.
258 A. Bernardi et al.
This scenario illustrates the support a NEPOMUK Semantic Desktop can offer
to the knowledge worker:
• Information is connected, combined, and accessed across different applications
and tools.
• A new event, insight, or activity results in descriptors or concepts created by the
user (“Belfast Meeting Package,” “Belfast Meeting”). However, as the Semantic
Desktop supports to relate such individual descriptions to pre-given, formal
concepts (“work-trip”), the system can offer various automatic support.
• Some relations and interconnections (e.g. time dependencies and timeline) are
observed automatically and can be used e.g. for report generating.
• Sharing of selected information items and structures support the communication
and exchange among collaborating colleagues.
In summary, Claudia benefits from an improved way to structure her personal
information space, to access information relevant to the current work context, to
express her own ideas and insights about key concepts and their relations, from the
exchange with her colleagues, and from various automatic helper services within
her personal computer.
13.2 Standards & NEPOMUK Technology
The Social Semantic Desktop (SSD) represents a platform that serves as a founda-
tion for building different social and semantic applications. Such applications share
a series of common functional aspects that the platform needs to capture and then
expose.
At the desktop level, applications need to be able to create and manage resources
and information about resources (in the form of annotations or relations). This
information needs to be stored and then efficiently retrieved when required. In order
to enable the transfer of information, the applications need to interact with the
Social Semantic Desktop. As a result, the platform exposes several services
representing access points to low-level functionalities. These functionalities range
from resource or notification management, to desktop sharing or off-line access.
However, three of them are essential in providing semantically-enhanced resources.
Firstly, the SSD supports different data analysis mechanisms (e.g., inference) to
enrich the semantics of low-level desktop information elements or resources. For
example, a keyword extraction mechanism enables automatic tagging of resources
or the summarisation of long textual information elements. Secondly, context is
crucial in delivering the right information for a given resource (Schwarz 2010).
Hence, the SSD supports not only user profiling, but also the detection of the current
context, in addition to ways of attaching such context information to resources.
Finally, making direct use of the results of the previous two, is searching. Thanks
to the rich semantic network of desktop resources, the user will profit from new
ways of accessing the available information space. Traditional keyword search and
13 The NEPOMUK Semantic Desktop 259
hierarchy browsing is complemented by associative browsing in the concept space,
grouping along relevant relations, and multiple conceptual views on search results.
Search tools will offer support based on user’s profile or context, and will act
proactively by detecting and predicting a particular user’s search patterns.
Going beyond the desktop level, the SSD provides different means of Social
Interaction, among which the most important one is resource sharing. This enables
innovative and more efficient collaboration means among users by providing
context and semantics-based sharing facilities, in addition to the creation of
ad-hoc shared information spaces centred around particular resources or users.
Such functionalities obviously rely on more low-level ones like access control
management or user group management.
In order to support the above-listed set of aspects, in the context of the
NEPOMUK project, the Social Semantic Desktop relies on two standards:
• A suite of ontologies, that provide a formal way of encoding the semantics of the
resources participating in or describing these aspects, and
• A standard architecture, that enables a standard specification of the underlying
components, independently of the implementation platform.
In the following sections, we describe both the ontologies and the architecture,
as well as detailing two reference implementations.
13.3 Ontologies
In this section we provide an overview of the ontologies engineered for the Social
Semantic Desktop, the most important ontologies are the NEPOMUK Represen-tational Language (NRL); the Information Element set of ontologies (NIE) – which
define common information elements and legacy data that is to be found on the
desktop in its various forms; and finally the Personal Information Model Ontology
(PIMO) – which combines the knowledge in the other ontologies to express the
individual’s entire unified personal information model.
In order to correctly manage the creation and integration of the numerous
required ontologies, we pursued a serial approach, designing first the higher (gen-
eral, abstract) layers of the ontology stack, and then the more detailed ontologies.
This layered approach is illustrated in the Semantic Desktop Ontologies Stack –
a top-down conceptual representation of the required ontologies and their inter-
dependencies, which also served as the road-map for their gradual design
(Fig. 13.2). In addition to the ontologies detailed below, the diagram also shows
the NEPOMUK Annotation Ontology (NAO) for representing simple tagging, and
the NEPOMUK Graph-Metadata Ontology for annotating named graphs. We
differentiated between the three ontology layers following the classifications used
in (van Heijst et al. 1995) and (Semy et al. 2004), in order of decreasing generality,
abstraction and stability: The Representational Level provides stable and generic
language elements, upper level ontologies contain domain-independent and widely
260 A. Bernardi et al.
agreed-upon concepts, whereas lower-level ontologies contain evolving personal
or group-level views.
13.3.1 Representational Level
The ontology on the representational level defines the concepts and relations
available for expressing the domain models building upon it. For example, RDFS
gives you the ability to express a subclass hierarchy of concepts, while OWL allows
richer class expressions using unions, intersections, or complements of exiting
concepts. OWL was deemed too complex for defining the ontologies needed on
the Semantic Desktop, especially as they had to be understandable for software
engineers with little Semantic Web background. However, plain RDF Schema lacks
some basic constructs like cardinality constraints. Another issue is that both RDFS
and OWL are intended for the open-world assumption of the Semantic Web,
whereas on the desktop making a closed-world assumption is more natural. We
therefore created a novel representational language – the NEPOMUK Represen-tational Language (NRL) (Sintek et al. 2007a), as an extension to the Resource
All Concepts a User of the Semantic Desktop deals with
All Concepts an Application Programmer of the Semantic Desktop deals with
RDF
RDFS
NRL
NAO NGM NIE
PIMO
Foundational
Mid-level
Personal-Level
Upper-Level Layer
Lower-Level Layer
Representational Layer
NEPOMUK Ontologies Pyramid
Fig. 13.2 Semantic desktop ontologies stack
13 The NEPOMUK Semantic Desktop 261
Description Framework (RDF) and the associated RDF Schema (RDFS), that
imposes no specific semantics on data, and supports Named Graphs (Carroll et al.2005). NRL addresses several limitations of current Semantic Web languages,
especially with respect to modularisation and customisation. Aside from fulfilling
the basic requirements for a representational language for the SSD, NRL is also of
relevance to the general Semantic Web, in particular because of its support for
named graphs, which although being a widely-popular notion, had not been
supported by any standard representational language so far.2
Named graphs allow us to handle identifiable, modularised sets of data. Through
this intermediate layer, handling and exchanging RDF data, as well as keeping track
of provenance information is much more manageable. All data handling on the
semantic desktop including storage, retrieval and exchange, is carried out through
the use of such named-graphs. Alongside provenance data, it is also possible to
attach other useful information to named graphs. In particular, for better data
management we felt the need for named graphs to be able to be distinguished by
their roles e. g. an ontology, instance-base, knowledge-base, etc.
Although the naming of the NRL formalism might suggest otherwise, it is
a completely domain-independent representational ontology (or language), and
can be applied to other platforms and scenarios. The interested reader might want
to refer to the complete NRL guide and specifications (Sintek et al. 2007b) to learn
more about NRL.
13.3.2 Upper-Level Ontologies
This layer includes high-level, domain-independent ontologies. They provide
a framework by which disparate systems may utilise a common knowledge base
and from which more domain-specific ontologies may be derived. They are charac-
terised by their representation of common sense concepts, i.e., those that are basic
for human understanding of the world. Concepts expressed in upper-level onto-
logies are intended to be basic and universal to ensure generality and expressivity
for a wide area of domains.
The NEPOMUK ontologies shall provide the user with the concepts which allow
to capture and represent the users’ mental models, their resources and their
activities in a set of well-organised knowledge models, as such formal models
are the prerequisite for enabling semantic technologies on the desktop. We set
out to develop a number of upper-level ontologies that fulfil this requirement. The
modularisation of these ontologies in itself was a challenge, keeping the layered
approach described earlier in mind. Even though they are all upper-level ontologies,
2 However, there are non-standard representational languages with support for named graphs,
such as Notation3 or TriG.
262 A. Bernardi et al.
dependency relationships exist between them. The design of the upper-level onto-
logies sought to:
• Represent common desktop entities (objects, people, etc.)
• Represent trivial relationships between these entities, as perceived by the desk-
top user
• Represent instances of a user’s mental model, consisting of entities and their
relationships as described above, on their desktop
Whereas the representation of high-level concepts like ‘user’, ‘contact’, ‘desk-
top’, ‘file’ is fairly straightforward, we also need to leverage existing information
sources in order to make them accessible to semantic applications on the SSD.
This information is contained within various structures maintained by the operating
system and a multitude of existing legacy applications. These structures include
specific kinds of entities like messages, documents, pictures, calendar entries and
contacts in address books. In (Sauermann et al. 2006) van Elst coins the term nativestructures to describe them, and native resources to describe the pieces of informa-
tion they contain. One of the core challenges was to integrate existing legacy
Desktop data into the Social Semantic Desktop, since in order to operate, it requires
meta-data represented in RDF/NRL.
13.3.2.1 NEPOMUK Information Element Ontologies (NIE)
A user’s desktop contains a multitude of applications serving numerous purposes, e.g.
word processing, calendar management, mail user agents, etc. They allow users to
create, store and process information in various ways in order to accomplish a wide
array of tasks. One of the major goals of the semantic desktop is to allow the user to
organise this information (the native resources) and enrich them with annotations,
connect related entities and map them to concepts in the PIMO. In order to implement
this kind of functionality native resources need to be expressed in a way that allows
for this kind of post-processing, i.e., as RDF graphs. This is the motivation for the
NIE set of ontologies, for which we now provide a brief overview.
Whereas the goal of PIMO, discussed below, is to be as close to the human
cognitive processes as possible, in order to represent objects as they are seen by the
user, NIE provides vocabulary intended to minimise information loss in the elici-
tation and representation of data from desktop data sources into RDF graphs.
However, this data lies outside the control of the semantic desktop system. It has
a dynamic nature, whereby new items appear, are modified and disappear in the
course of work. NIE is concerned with the synchronisation between the native data
sources and a data repository that doesn’t contain abstract concepts, and is designed
to describe concepts like “File”, “Email”, or “Contact”. PIMO can then be used to
add value to this data by expressing tags, people, places, projects and anything else
the user might be interested in.
NIE is composed of seven different, but unified, vocabularies. The NIE-core
vocabulary defines the most generic concepts, while six specialised ontologies
13 The NEPOMUK Semantic Desktop 263
extend NIE towards specific domains: NEPOMUK Contact Ontology (NCO) for
contact information, NEPOMUK Message Ontology (NMO) for messaging infor-
mation, Nepomuk File Ontology (NFO) for basic file meta-data and expressing file
system structures in RDF, NEXIF for image meta-data, NID3 for audio meta-data
and NCAL for calendaring information. NIE is a unified ontology framework
as there is no overlap between the vocabularies. Classes in all vocabularies are
organised in an explicit inheritance hierarchy when appropriate. The interested
reader is advised to consult the official NIE specifications (Mylka et al. 2007) for
more detailed descriptions about the NIE unified ontology framework.
13.3.2.2 Personal Information Model Ontology (PIMO)
Whereas the previous ontologies focused on specific aspects or domains of the SSD,
the PIMO provides representation for a unified information model which reflects
the users’ Personal Information Model as perceived by them. In other words, it is
a mental conceptualisation of how users organise the domain of their own work: the
projects they work on, the people they know, the products they use, the cities they
travel to, the organisations that employ them, etc. The goal is for the knowledge
structure to be independent from both the way the user accesses the data, as well as
the source, format, and author of the data. What this means is that the PIMO is
concerned with real-world things, rather than the technological details of how those
things are implemented in the software.
While each user has their own PIMO and can modify and personalise it accord-
ing to their own personal needs (i.e., add classes, properties and instances), the
standard PIMO (PIMO-Upper) already contains around 40 classes that represent
what we consider to be the lowest common denominator of typical personal
information models.3 PIMO defines a top-level class pimo:Thing, from which
most other classes in PIMO-Upper (as well as all user-defined classes) are derived,
with examples such as pimo:Person, pimo:City or pimo:PhysicalObject (some
classes in PIMO-Upper are not pimo:Things, such as pimo:Association or pimo:TimeConcept).
In order to prevent what is described as the cold start problem in knowledge-
based systems, we also assume that organisations and other employers will ideally
provide their employees with an extension of PIMO-Upper that reflects the work
context of that particular organisation. Within the ontologies stack (Fig. 13.2), such
extensions are called group-level PIMOs.
The reader can refer to the full specification (Sauermann et al. 2009) for further
information about PIMO.
3 It should be noted that PIMO does not try to model every possible world view and achieve
ontological perfection, but instead proposes a simple classification that we consider useful in the
context of knowledge work.
264 A. Bernardi et al.
13.3.3 Lower Level Ontologies
This consists of group and personal ontologies. Group-level ontologies (e.g. an
organisational ontology) are domain-specific and provide more concrete represen-
tations of abstract concepts found in the upper ontologies. They serve as a bridge
between abstract concepts defined in the upper level ontologies and concepts
specified in personal ontologies at the individual level. Personal ontologies arbi-
trarily extend (personalise) group-level ontologies to accommodate requirements
specific to an individual or a small group of individuals. Using common group-level
and upper ontologies is intended to ease the process of integrating or mapping
personal ontologies. Given the nature of the Semantic Desktop, a large number
of ontologies are either related to, or meant to be used for personal knowledge
management. A group of individuals or an individual user is free to create new
concepts or modify existing ones for their collective (shared) or individual personal
information models. This personal-user aspect of the ontologies is highlighted
accordingly in the stack, and conceptually it includes all concepts and relationships
that the end desktop user deals with, as opposed to all concepts and relations required
to model every aspect of the semantic desktop.
13.4 Architecture
As noted in the beginning of Sect. 13.2, the Social Semantic Desktop platform
exposes a series of low-level services required for building applications. This
structuring is similar to publishing services on the Web, where each service
exposes an interface used for communication and integration. Hence, our decision
to adopt the same principles and techniques for describing and deploying the
SSD architecture seems natural. The SSD is organised as a Service Oriented
Architecture (SOA), where:
• The SSD service interfaces are defined using the Web Service Description
Language (WSDL),
• XML Schema (XSD) is used for defining primitive types, and
• Simple Object Access Protocol (SOAP) is used for the actual inter-service
communication.
The vision of the NEPOMUK project is to provide a standard architecture
comprising a small set of services (represented by their interfaces), which enable
developers to adopt it and extend it. Ultimately, this will lead to an evolving
ecosystem. Figure 13.3 depicts the structure and the set of services defined by the
NEPOMUK architecture. It is straightforward to observe the two categories of
aspects that were targeted (emerged from the platform description): at the desktop
level, the semantics (via Text Analytics or Context Elicitation), and between
desktops, the social (enabled by the peer-to-peer (P2P) infrastructure). Both
categories are accessible via the Service Registry, that act as an access point to
13 The NEPOMUK Semantic Desktop 265
all the low-level functionalities provided by the SSD platform. The NEPOMUK
architecture is composed of two layers: the Middleware and the Application Layer.
The Middleware groups together the services exposed by the platform and to be
used by the applications present in the Application Layer. The Application Layer
consists of all applications that interact directly with services published by the
Middleware, and creates a bridge between the user and the low-level functionalities
provided by the Social Semantic Desktop. In the following we describe both layers,
starting with the NEPOMUK Middleware.
The Middleware is split into multiple categories of services, managed by the
Service Registry. This represents the access point to the low-level functionalities,
both for services using other services, as well as, for applications using the Middle-
ware services. Its duties relate strictly to registering and de-registering services,
in addition to providing service discovery facilities based on their interfaces.
Two sets of services, i.e., the Local Data Services and the P2P Infrastructure,
represent the foundational block of the SSD Middleware. In order to support
social aspects, and implicitly communication between several desktop, the SSD
Middleware comprises a P2P Infrastructure. The NEPOMUK P2P Infrastructure
is based on GridVine (Aberer et al. 2004), which in turn is built on top of P-Grid
(Aberer et al. 2003). This manages a distributed RDF store and provides distributed
search facilities, hence enabling higher-level functionalities such as meta-data or
resources sharing. Additionally, the infrastructure also provides an Event Manage-
ment service, which is responsible for the distribution of events between SSD peers,
P2P Infrastructure
Event Management
StorageSearch
Local Data Services
Mapping
StorageSearch
Helper Services
Alignment
Publish /Subscribe
Messaging PIMOService
Context Elicitation
Text AnalyticsData Wrapper
CORE
EXT.TaskManagement
CommunityManagement
...
Ser
vice
Reg
istr
y
TaskManagement
OfficeApplications
WebBrowser
FileBrowser
EmailClient
Wiki ......
Social SemanticDesktop
Middleware
ApplicationLayer
Fig. 13.3 The architecture of the social semantic desktop
266 A. Bernardi et al.
in addition to supporting the higher-level Publish/Subscribe service. The Event
Management manages subscriptions received from users (via some applications) or
from services, in form of RDF descriptions of the resources of interest, which are
stored in the underlying distributed store. Hence, when an event occurs, its RDF
payload is matched against all subscriptions and the subscribers are notified.
The Local Data Services have a similar role to the P2P Infrastructure, but on the
local side of things, i.e., on the desktop. The group consists of three foundational
services: the Local Storage (an RDF Repository), Search and Mapping. The Local
Storage controls the manipulation of the desktop semantic resources, including
their insertion, modification or deletion as RDF graphs. If a resource is shared with
other users in an information space, the meta-data is also uploaded to the distributed
index of the peer-to-peer infrastructure. Querying the Local Storage is done via
the Local Search service, which maintains also a context-based search history, as
well as user-profiled query templates that can be also shared as resources. Finally,
before new meta-data can be added to the repository, one needs to check whether
this meta-data describes resources that are already instantiated (i.e., an URI has
been assigned) in the RDF repository. In a positive case, instead of duplicating
resources, the already existing ones should be used, by re-using their URIs. The
Local Mapping Service handles this responsibility. The Core of the Middleware
contains a last category of Services that complement the functionalities provided by
the foundational ones, called Helper Services. Among these, the Data Wrapper or
the Text Analytics services extract information from desktop applications such as
email clients or calendars and store it as resources in the Local Storage. Generally,
the Data Wrapper handles structured data sources (e.g., email headers, calendar
entries, etc.), while the Text Analytics handles unstructured data sources, like email
bodies or generic document contents.
Other Helper Services include:
• The Alignment service that is used by the other Middleware services or
applications to transform RDF graphs from a source ontology to a target
ontology (a facility required due to the different partly overlapping ontologies
or vocabularies possibly being used on the desktop),
• The Context Elicitation service that acquires the current user context environment
by analysing the logs created by the Middleware and stored in the Local Storage,
• The Publish / Subscribe service that enables subscription-based events (for
services or users) on a local or distributed (via the P2P Event Management) basis,
• The Messaging Service providing synchronous and asynchronous communi-
cation mechanisms between SSD users, or
• The PIMO Service that represents an abstraction for an easier manipulation of
RDF graphs.
The second group of the Middleware services consists of Extensions. These
are services created by third-party developers that use the Core services to
provide certain functionalities for the Application layer. For example, a Task
Management service could use the Context Elicitation and Messaging services to
notify a group of users about an upcoming project deadline. While the actual
business logic would be encapsulated within this service, the interaction with the
13 The NEPOMUK Semantic Desktop 267
user could be realized either via existing applications such as an Email Client, or via
specific applications, like a Task Management application (as seen in the Applica-
tion layer and discussed below).
The top layer of the architecture is the Application layer. This layer includes the
programs and tools employed by the user. This ranges from typical knowledge
workbench applications of rather generic nature (e.g., File Browser, Web Browser,
and Office Applications) to specialized and domain-specific programs. Both legacy
tools and new developments need to be integrated with the Social Semantic
Desktop Middleware in order to profit from its functionalities. For available legacy
applications and third-party programs, this interfacing (or integration) is usually
done by developing plug-ins, complying to the data structures and user interfaces
used by the respective applications. As an example, in NEPOMUK, we developed
plug-ins for the Email Clients Thunderbird and Microsoft Outlook. These exten-
sions allow e.g. to automatically enrich e-mails (“Semantic Email”) with formal
descriptions of the concepts concerned (thus offering sender and recipient of such
e-mail an unambiguous reference to shared entities). Furthermore, the enhanced
clients provide simple workflow functionalities, like delegation and tracking
of tasks (Task Management) or automated calendar functionalities. Finally, Seman-
tic Search capabilities are a promising feature, as targeted e.g. in the KDE Dolphin
file browser.
13.5 Implementation
In NEPOMUK the above architecture was implemented in the Personal Semantic
Workbench (PSEW). PSEW is the central user-interface to the NEPOMUK Seman-
tic Desktop, it allows configuring data-sources, and connections to other programs
as well as editing and browsing the personal ontology. PSEW was implemented in
Java as an Eclipse-RCP application – an application framework built on top of the
Eclipse code-base, providing user-interface controls as well as a high-level user-
interface organisational concepts of perspectives, views and multi-document
editors. A screenshot showcasing some of the main views of PSEW is shown in
Fig. 13.4. Various versions of PSEW can be downloaded from http://dev.nepomuk.
semanticdesktop.org/download/.
13.5.1 KDE
In addition to the Java based research prototype PSEW, the NEPOMUK
technologies have also been embedded in the KDE Desktop Environment (http://
kde.org/), which provides fundamental desktop functionalities like task- and file-
management, and common core applications like web-browsers, internet chat, basic
document editing etc. Traditionally, KDE has been used with the Linux operating
system, but the newest version 4 is also available for Mac or Windows. In version 4
268 A. Bernardi et al.
of the KDE system, NEPOMUK also plays a central role (See http://nepomuk.kde.
org/), and it is now available to millions of KDE users world-wide, and this has as
such become the most enduring concrete outcome of the NEPOMUK project.
In KDE4 NEPOMUK an RDF store (based on Virtuoso, see http://virtuoso.
openlinksw.com/) is available to all applications, and tagging, rating and com-
menting on files is available throughout the operating system. For example, one can
replace the normal save/load file dialog with a Semantic view as shown in Fig. 13.5,
where rather than saving a file into a particular folder, a file can be saved with
particular tags, removing the burden of maintaining a strict hierarchical folder
structure from the user.
KDE also offers faceted browsing for desktop data, file indexing, full-text and
structure desktop search and many other features.
13.6 Experiences & Applications
The ultimate goal of a Semantic Desktop is to increase the efficiency of information
handling for its users, based on additional modelling efforts. Careful evaluation
must therefore demonstrate whether the perceived benefits ultimately outweigh the
perceived additional costs.
Fig. 13.4 The personal semantic workbench – showing the PIMO class hierarchy, the PIMO
editor editing the Claudia Stern concept, a timeline and a map-view
13 The NEPOMUK Semantic Desktop 269
From the scientific perspective, evaluations observe the user when they store,
retrieve, and process information. To gain a relevant insight on real-world
scenarios, it is beneficial to conduct evaluations with real users working withtheir real data. Measurement in such experiments is done mostly by observation
or interviewing users in long-term case studies. Short-term laboratory experimentscan also give insights on the immediate benefit of a certain feature, or to measure
usability indicators.
13.6.1 Evaluating the Core NEPOMUK Application
The core NEPOMUK Application, the Personal SEmantic Workbench (PSEW),
was also evaluated independently of any of the case-studies. Several long-term
studies with up to eight participants were carried out. Additionally, qualitative
interviews with 22 members about possible use cases of the system have been
carried out (Sauermann 2009). These studies come to the conclusion that relations
are the key feature to retrieve information. Long-term users evolve a combined
pattern of using text search, relations, and semantic wiki features in order to find
Fig. 13.5 Semantic Save/load dialog in NEPOMUK-KDE – instead of specifying a folder for
saving, the user only associates a number of tags with the document
270 A. Bernardi et al.
information. The system supports an intuitive behaviour of navigation by taking
small steps; a phenomenon previously observed by other researchers and described
as “orienteering” (Teevan et al. 2004).
In an earlier evaluation of NEPOMUK (Papailiou et al. 2008), problems were
found when the terms in the user interface are of too technological nature, while the
users prefer terms from their daily work processes. Also, a unification of various
interfaces was desired, which in turn overlaps with Quan’s findings in his thesis
Quan (2003).
Unfortunately, the NEPOMUK-KDE Semantic Desktop implementation with
the largest user-group has seen the least amount of formal evaluation so far.
Informal reactions collected from individual users range from praising the unified
tagging and the powerful search now available to criticising performance problems,
on which one focus of further community development has now been put.
13.6.2 NEPOMUK Case-Studies
In addition to the formal evaluation, NEPOMUK has proved it’s worth in practical
applications. The prototypes developed in the NEPOMUK project were customised
and deployed with four case-study partners:
• SAP Research, Germany, developed the Kasimir task management prototype
and additional productivity tools for Organisational Knowledge Management.
The Kasimir prototype was evaluated at the SAP Research lab in Karlsruhe in
a test phase of 4 weeks. The experience of the test users has been compiled in
a survey and analysed on the basis of an activity theoretic evaluation approach.
The results show that despite the fact that the prototype does not yet provide full
functionality it was nevertheless perceived as useful and promising work tool.
• Cognium Systems, France, developed a prototype called BioNote, a software
system whose goal is to help biomedical researchers at Institute Pasteur to manage
the information they collect or use during their work. The implemented prototype
was assessed through iterative expert and user evaluations. The test user comments
were generally encouraging: people understood and approved the general concept
and added value of the prototype and they were able to easily carry out a simple
scenario using the available user interfaces. Commercial products resulting from
this work can be seen at: http://www.cogniumsystems.com/.
• EDGE-IT, France, developed a prototype help-desk application for the members
of the Mandriva Linux community. It features collaborative semantic annotation
at the desktop or at the Web levels using a dedicated vocabulary, and semantic
search across a set of peers. This allows interlinking a personal Semantic Desktop
with a public Semantic Web maintained by a large community, improving the
learning and problem-solving processes. The help-desk application was evalua-
ted by Mandriva Linux users through an online questionnaire after experimenting
with the system through evaluation scenarios. User answers to the questionnaire
13 The NEPOMUK Semantic Desktop 271
show a real interest in the approach: users consider that the help-desk improves the
efficiency of the community for collectively bringing answers to technical
questions.
• TMI, a Greek consulting company, developed a collection of light-weight add-ons
to the basic PSEW application, especially for providing social and semantic
functionalities customised for knowledge workers in the sector of Professional
Business Services. The tools are collectively known as Sponge, and take the formof widgets that can be placed on the desktop providing quick and easy access to
common tasks such as search and annotation. The prototypes were evaluated both
using the unobtrusive observation method using the “think aloud” protocol, as well
as a more traditional questionnaire. Eleven employees in realistic work conditions
at different offices too part in the evaluation, and the results confirm that the
prototype was simple, intuitive, easy to use and satisfies most of the expected
benefits of users.
13.7 Outlook
The NEPOMUK project ended in 2008, but the activities around the Semantic
Desktop continue. The KDE implementation is still being actively maintained and
enhanced, and the Open Semantic Collaboration Architecture Foundation (OSCAF)
(http://www.oscaf.org/) was created to maintain the ontologies and other standards.
Several of the components of the NEPOMUK implementation, such as the meta-data
extraction framework Aperture (http://aperture.sf.net) are being used by many people
and development continues. The research directions started in NEPOMUK go on in
other research projects, such as Perspecting (http://www.dfki.uni-kl.de/perspecting/).
In addition to the open-source and standardisation work, several spin-off com-
panies were created to commercialise semantic desktop technologies. For instance,
Gnowsis (http://gnowsis.com) recently launched their product Refinder (http://www.
getrefinder.com/), a web-based productivity tool centred around the PIMO ideas.
Acknowledgements This work was supported by the European Union IST fund (Grant FP6-
027705, Project NEPOMUK, http://nepomuk.semanticdesktop.org/) and the German BMBF in
Project Perspecting (Grant 01IW08002).
References
Aberer K, Cudre-Mauroux P, Datta A, Despotovic Z, Hauswirth M, Punceva M, Schmidt R (2003)
P-Grid: a self-organizing structured P2P system. SIGMOD Record 32(3):29–33
Aberer K, Cudre-Mauroux P, Hauswirth M, Pelt, TV (2004), Gridvine: building internet-scale
semantic overlay networks, 3th International Semantic Web Conference ISWC 2004, SpringerVerlag, pp 107–121. http://lsirpeople.epfl.ch/aberer/PAPERS/ISWC2004.pdf, Accessed on
9 Aug 2011
272 A. Bernardi et al.
Carroll, JJ, Bizer C, Hayes P, Stickler P (2005) Named graphs, provenance and trust, WWW ’05:
Proceedings of the 14th international conference onWorld Wide Web, ACM Press, New York,
NY, USA, pp 613–622
Gudjonsdottir R (2010) Personas and Scenarios in Use. Doctoral Dissertation, Department of
Human–Computer Interaction, Royal Institute of Technology, KTH, Stockholm, Sweden.
Mylka A, Sauermann L, Sintek M, van Elst L (2007) Nepomuk information element ontology,
http://www.semanticdesktop.org/ontologies/2007/01/19/nie/, Accessed on 9 Aug 2011
Papailiou N, Christidis C, Apostolou D, Mentzas G, Gudjonsdottir R (2008) Personal and group
knowledge management with the social semantic desktop. In Cunningham P, Cunnigham M
(eds) Collaboration and the knowledge economy: issues, applications and case studies,echallenges e-2008 conference, 22–24 October 2008. Stockholm, Sweden, pp 1475–1482
Quan D (2003)Designing end user information environments built on semistructured data models,PhD thesis, Massachusetts Institute of Technology, Department of Electrical Engineering
and Computer Science
Sauermann L (2009) The gnowsis semantic desktop approach to personal information manage-ment, PhD thesis, University of Kaiserslautern. http://www.dfki.uni-kl.de/~sauermann/papers/
Sauermann2009phd.pdf, Accessed on 9 Aug 2011
Sauermann L, Dengel A, van Elst L, Lauer A, Maus H, Schwarz S (2006) Personalization in the
EPOS project. Proceedings of the semantic web personalization workshop at the ESWC 2006conference, pp. 42–52. http://www.dfki.uni-kl.de/~sauermann/papers/Sauermann+2006a.pdf,
Accessed on 9 Aug 2011
Sauermann L, van Elst L, M€oller K (2009) Personal information model (PIMO), v1.1, Recom-mendation, NEPOMUK. http://www.semanticdesktop.org/ontologies/2007/11/01/pimo/v1.1/
pimo_v1.1.pdf, Accessed on 9 Aug 2011
Schwarz S (2010) Context-awareness and context-sensitive interfaces for knowledge work,Dissertation, Technische Universit€at Kaiserslautern, Fachbereich Informatik. http://www.dr.
hut-verlag.de/978-3-86853-388-0.html, Accessed on 9 Aug 2011
Semy SK, Pulvermacher MK, Obrst LJ (2004) Towards the use of an upper ontology for U.S.
government and military domains: An evaluation, Technical report, MITRE. http://www.
mitre.org/work/tech_papers/tech_papers_04/04_0603/index.html, Accessed on 9 Aug 2011
Sintek M, Elst L, Scerri S, Handschuh S (2007a) Distributed knowledge representation on the
social semantic desktop: Named graphs, views and roles in nrl, ESWC’07: Proceedings of
the 4th European conference on The Semantic Web, Springer-Verlag, Berlin, Heidelberg,
pp. 594–608
Sintek M, Elst LV, Scerri S, Handschuh S (2007b) Nepomuk representational language specifi-
cation. Nepomuk specification. http://www.semanticdesktop.org/ontologies/nrl/, Accessed on
9 Aug 2011
Teevan J, Alvarado C, Ackerman MS, Karger DR (2004) The perfect search engine is not enough:
a study of orienteering behavior in directed search. CHI ’04: Proceedings of the SIGCHIconference on Human factors in computing systems. ACM, New York, NY, USA, pp 415–422.
http://portal.acm.org/citation.cfm?id¼985745, Accessed on 9 Aug 2011
van Heijst G, Falasconi S, Abu-Hanna A, Schreiber G, Stefanelli M (1995) A case study in
ontology library construction. Artificial Intelligence in Medicine 7(3):227–255
13 The NEPOMUK Semantic Desktop 273
14
Context-Aware Recommendation for
Work-Integrated Learning
Stefanie N. Lindstaedt, Barbara Kump, and Andreas Rath
14.1 Introduction
In order to improve knowledge work productivity we need to understand the factors
influencing the use of knowledge within organizations. A host of management
science research highlights the importance of ‘soft factors’ which influence the
individual knowledge worker to apply her abilities, invest efforts, face frustrating
experiences and to perform for the sake of the organization. Wall et al. (1992)
extend the fundamental equation of organizational psychology to include the factor
opportunity: Performance ¼ Ability � Motivation � Opportunity.
That is, the performance of a knowledge worker is enhanced by organizational
practices that increase the individual’s knowledge (ability), the individual’s motiva-tion to use this knowledge, or the individual’s opportunity to do so in the workplace.This equation highlights the assumed non-compensatory relationship between the
factors. If an individual lacks for example motivation (motivation ¼ 0), then the
other factors could be as high as they may be, but the performance would still be zero.
Our work focuses specifically on increasing the factor ability (by also taking the
other two factors into account). That is, we strive to build computational support (so
S.N. Lindstaedt (*)
Know-Center, Graz University of Technology, Inffeldgasse 21a, Graz A-8010, Austria
Knowledge Management Institute, Graz University of Technology, Inffeldgasse 21a, Graz,
A-8010, Austria
e-mail: [email protected]
B. Kump
Knowledge Management Institute, Graz University of Technology, Inffeldgasse 21a, Graz,
A-8010, Austria
e-mail: [email protected]
A. Rath
Know-Center, Graz University of Technology, Inffeldgasse 21a, Graz A-8010, Austria
e-mail: [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_14, # Springer-Verlag Berlin Heidelberg 2011
275
called knowledge services) which enables users to continuously improve their
competencies during work. Our approach builds upon the notion of Work-
Integrated Learning (WIL, see Sect. 14.2) which has the potential to improve
knowledge work productivity by building awareness of relevant knowledge
resources, pointing out learning opportunities, and improving task completion
ability. We engaged in a 4 year quest for building a computational environment
which supports WIL. This quest was undertaken within the integrated EU-funded
project APOSDLE1. Applying a multitude of participatory design methods (shortly
sketched in Sect. 14.3) we identified three key challenges: learning within real work
situations, utilization of real work resources for learning, and learning support
within the user’s everyday computational work environment.
This chapter addresses mainly the first challenge (real time learning). Our
participatory design studies indicate that learners require varying degrees of
learning support in different ‘contexts’. Specifically, it is important to know and
understand what the user is working on or is trying to achieve. In our work we
distinguish between two types of context: On the one hand, we take the work situation
into account by identifying the concrete work task a user is currently executing (e.g.
preparing a project kick-off event), i.e. the work task specifies the short-term context
of the user; On the other hand, we strive to infer the competencies a user possesses
(e.g. experienced project manager); i.e. the accumulated experiences from work task
executions specify the long-term user context. Our approach to supporting WIL in
real time is to provide a variety of recommendation services which are adapted to the
short-term as well as long-term user context. Specifically, we not only recom-
mend documents but fine granular parts of documents (so-called snippets), people,
possible learning goals, application opportunities, relationships between tasks, etc.
In doing so, these recommendation services positively influence productivity.
Section 14.4 introduces three different learning situations in the form of three
WIL scenarios. These scenarios mainly differ in the available time which the user
can spend to engage in a learning situation. Different types of recommendations
(e.g. content versus learning opportunities) are helpful in these different situations
but all of them need to be adapted to the current work task of the user and her
competencies in order to improve productivity effectively. Our goal is the design of
an environment which can provide learning support of varying degrees of learning
guidance (see Sect. 14.3.3) independently from the application domain and which
utilizes knowledge resources from within an organization for learning – thus
keeping costs and efforts for learning material generation low.
Section 14.5 offers a conceptual view of the APOSDLE environment which
represents our ontology-based approach to designing domain-independent learning
support. Within this chapter we focus specifically on three knowledge services
which implement our approach to context-aware recommendation for improving
knowledge work productivity, namely task detection service (Sect. 14.5.2), user
1Advanced Process-Oriented Self-Directed Learning, www.aposdle.org.
276 S.N. Lindstaedt et al.
competence inference service (Sect. 14.5.3), and content recommendation service
(Sect. 14.5.4).
In other words, we present a design environment (Eisenberg and Fischer 1994)
which enables the creation of environments for WIL support specifically tailored to
the unique needs of a company and concrete learning domain. We see the
“APOSDLE solution” as consisting of (a) modelling the learning domain and the
work processes, (b) annotating documents and other sources of information avail-
able in the company repository, (c) training the prospective users of the APOSDLE
environment, and (d) using the APOSDLE environment with its varying learning
guidance functionalities at the workplace.
We have evaluated our approach to WIL by embedding the APOSDLE environ-
ment for 3 months into three application organizations – not only technically but
also socially by building the relevant processes around it. Section 14.6 shortly
presents the results of our summative workplace evaluation and discusses modeling
efforts needed during APOSDLE instantiation in a specific application domain.
14.2 Work-Integrated Learning
Building on theories of workplace learning (such as [Lave and Wenger 1991] and
[Eraut and Hirsh 2007]) we conceptualize learning as a dimension of knowledgework which varies in focus (from focus on work performance to focus on learning
performance), time available for learning, and the extent of learning guidance
required. This learning dimension of knowledge work describes a continuum of
learning practices which starts at one side with brief questions and task related
informal learning (work processes with learning as a by-product), and extends at the
other side to more formal learning processes (learning processes at or near the
workplace). This continuum emphasizes that support for learning must enable a
knowledge worker to seamlessly switch from one learning practice to another as
time and other context factors permit or demand.
Research on supporting workplace learning and lifelong learning so far has
focused predominantly on the formal side of this spectrum, specifically on course
design applicable for the workplace and combinations of face-to-face and online
learning elements (blended-learning). In contrast, the focus of our work is on the
informal side of the spectrum, specifically covering work processes with learning as
a by-product and learning activities located within work processes. We have coined
the term work-integrated learning (WIL) in order to refer to this type of informal
learning practices at the workplace that are truly integrated in current work pro-
cesses and activities. WIL is relatively brief and unstructured (in terms of learning
objectives, learning time, or learning support). The main aim of WIL activities is to
enhance task performance. From the learner’s perspective WIL can be intentional
or unintentional, and the learner can be aware of the learning experience or not
(Schugurensky 2010).
14 Context-Aware Recommendation for Work-Integrated Learning 277
WIL makes use of existing resources – knowledge artifacts (e.g., reports, project
results) as well as humans (e.g., peers, communities). Learning in this case is a by-
product of the time spent at the workplace. This conceptualization enables a shift
from the training perspective of the organization to the learning perspective of the
individual.
14.3 Distributed Participatory Design
This section gives an overview of the activities and their settings leading to the
development of the APOSDLE environment. The development process involved
three prototyping iterations lasting 1 year each and involving in-depth requirements
elicitation, conceptual design, implementation, and evaluation phases.
14.3.1 Application Domains and Ssettings
The APOSDLE environment was designed in close cooperation and involvement
of five different knowledge intensive application domains. Three application domains
were chosen in collaboration with different enterprises participating in the
project: simulation of effects of electromagnetism on aircraft (EADS – European
Aeronautic Defense and Space Company, Paris, France), innovation management
(ISN – Innovation Service Network, Graz, Austria), and intellectual property rights
consulting (CCI Darmstadt – Chamber of Commerce and Industry, Darmstadt,
Germany). Two additional domains refer to general methodologies that can be used
in different settings and disciplines: the RESCUE process for the thorough elicitation
and specification of consistent requirements for socio-technical systems, and the
domain of statistical data analysis (SDA).
14.3.2 Design and Development Process
The design process employed is an example of distributed participatory design,
which has been carried out in the context of real-world project constraints on time
and cost. The process consisted of instances of synchronous and asynchronous,
distributed and non-distributed design activities, and also integrated activities
designed to stimulate creative inputs to requirements. For example, the activities
included workplace learning studies (Kooken et al. 2007) and surveys (Kooken
et al. 2008), iterative use case writing and creativity workshops (Jones and
Lindstaedt 2008) resulting in 22 use cases and more than 1,000 requirements. In
order to improve the system throughout the development process, formative
evaluations of the second prototype (Lichtner et al. 2009) were carried out that
278 S.N. Lindstaedt et al.
triggered a re-design of APOSDLE using the personas approach (Dotan et al. 2009).
The third prototype was then exposed to extensive usability studies with students,
application partners in real world settings and usability labs, and a variety of
evaluations of individual components. A final summative evaluation spanning
3 months of real-world application concluded the process. The main findings of
this summative evaluation are reported in Sect. 14.6.
14.3.3 Challenges for Supporting WIL
With the described participatory design and development activities, three major
challenges for WIL support were identified (see also Sect. 14.1): real time learning,
i.e. learning within real work situations; utilization of real work resources for
learning; and learning within a user’s everyday (real) computational environment.
Real time learning:WIL support should make knowledge workers aware of and
support them throughout learning opportunities relevant to their current work task.
WIL support needs to be adapted to a user’s work context and her experiences, and
should be short, and easy to apply.
Real knowledge resources: WIL support should dynamically provide and make
users aware of available knowledge resources (both human as well as material)
within the organization. By providing ‘real’ resources the effort for learning
transfer is reduced and the likelihood for offering opportunities to learn on different
trajectories is increased.
Real computational environment: WIL support should be provided through a
variety of tools and services which are integrated seamlessly within the user’s
desktop and allow one-point access to relevant back-end organizational systems.
These tools and services need to be inconspicuous, tightly integrated, and easy to
use. They must support the knowledge worker in effortlessly switching between
varieties of learning practices.
Concerning real time learning, our participatory design activities and workplace
learning studies suggest that learners need different support and guidance within
different work situations. Specifically, it became apparent that learning guidance is
needed on varying levels ranging from descriptive to prescriptive support. While
prescriptive learning guidance has the objective of providing clear directions, or
rules of usage to be followed, descriptive learning guidance offers a map of the
learning topic and its neighboring topics, their relationships, and possible
interactions to be explored. Specifically, prescriptive learning support is based on
a clearly structured learning process which imposes an order on learning activities
while descriptive support does not do so.
The learning guidance discussed here is applicable to WIL situations and covers
the more informal side of the learning dimension of knowledge work. Clearly the
spectrum of learning guidance could be extended to much more formal learning
guidance approaches such as predefined learning modules and enforced learning
processes. However, this is beyond the scope of our work. The learning guidance
14 Context-Aware Recommendation for Work-Integrated Learning 279
we consider and explore ranges from building awareness of descriptive learning
support (exposing knowledge structures and contextualizing cooperation) to par-
tially prescriptive (triggering reflection and systematically developing
competencies at work). The following section illustrates how these different
degrees of learning guidance are realized within the APOSDLE environment.
14.4 Providing Recommendations with Varying Degrees
of Learning Guidance
Within this section we present varying types of context-aware recommendation
functionalities for WIL in the form of three scenarios based on the ISN application
case. These scenarios also provide an overview of the overall APOSDLE environ-
ment from the user’s point of view. Section 14.5 then examines the conceptual
architecture together with the knowledge services which implement these
recommendations.
The following scenario illustrates varying types of learning guidance that we
have developed. Our scenario is located in a company called ‘Innovation Service
Network’ (ISN). ISN is a network of small consultancy firms in the area of
innovation management. Consultants at ISN support customer companies in the
introduction of innovation management processes and the application of creativity
techniques. One of these consultants for innovation management is Eva. Eva has
been assigned to lead a new project with a client from the automotive industry. The
objective is to come up with creative solutions for fastening rear-view mirrors to the
windshield.
14.4.1 Building Awareness: Recommendation of Snippetsand People
Let us assume that our innovation management consultant from the scenario, Eva, is
in a hurry. She needs to plan the kick-off meeting for her new innovation project
within the next hour. As Eva begins to create the agenda using her favorite word
processor, APOSDLE automatically recognizes the topic of Eva’s work activity,
namely “moderation”. A small pop-up notification (see Fig. 14.1) unobtrusively
informs Eva that her work topic has been detected and that relevant information is
available.
Over the years the APOSDLE knowledge base has collected a large variety of
resources (e.g. project documents, checklists, videos, pictures) about innovation
management which Eva and her colleagues have produced or used. In the back-
ground, APOSDLE proactively searches the knowledge base utilizing the detected
work topic together with information from Eva’s User Profile (see User Profile and
280 S.N. Lindstaedt et al.
Experiences in Sect. 14.5.3) to form a personalized query. Eva is interested in
getting some help to speed up her work. Therefore, she accesses APOSDLE
Suggests by clicking on the notification or the APOSDLE tray icon. APOSDLE
Suggests (Fig. 14.2) displays a list of resources related to the topic “moderation”
(see Sect. 14.5.2). This list of resources is ranked based on her expertise in
moderation techniques.
Eva finds a checklist for moderating a meeting or a workshop which was put
together by a colleague in the past. She opens the checklist in the APOSDLE
Reader (Fig. 14.3) which jumps to and highlights the most relevant part (called
Fig. 14.1 APOSDLE automatically detects a user’s context and displays notifications. In this case
APOSDLE detected the Topic “Moderation” Eva is currently working on
Fig. 14.2 “APOSDLE Suggests” recommends Knowledge Resources for Eva’s current context
(Topic “Moderation”). The tree view of Topics in the Browse Tab is shown on the left
14 Context-Aware Recommendation for Work-Integrated Learning 281
‘Snippet’) for her. In addition, the APOSDLE Reader indicates other parts of the
checklist also relevant to Eva’s work task via a theme river (see yellow blocks
within center part of Fig. 14.3). Eva finds some ideas suitable to her current project
and integrates them into her own workshop preparation.
APOSDLE supports Eva in performing her work task without her even having to
type a query. Moreover, her own expertise level is taken into account when making
recommendations. This unobtrusive proactive information delivery raises aware-
ness of knowledge resources which Eva would not have searched for otherwise. It
provides learning guidance in that it highlights possible learning opportunities
within the current work task without imposing constraints on the learner.
14.4.2 Descriptive Learning Guidance: Recommendationof Learning Goals and Communication Channels
After the kick-off meeting Eva prepares for the next step in the innovation project,
namely a creativity workshop. Since she has never moderated such a workshop
before, she takes some more time to explore different possibilities and their
implications. Eva opens APOSDLE Suggests and starts searching for the keywords
“creativity techniques” and “creativity workshop”. Eva selects the task “applying
Fig. 14.3 APOSDLE Reader showing relevant information for Eva’s context. It highlights the
relevant part in a document (left), suggests other relevant parts (Snippets) throughout the document
as a ThemeRiver (center), and offers a more detailed list view of Snippets (right) which can be
sorted according to different criteria
282 S.N. Lindstaedt et al.
creativity techniques in a workshop”. APOSDLE Suggests analyzes her User
Profile and recommends a number of possible Learning Goals she could pursue in
the context of the chosen work task. These Learning Goals are related to
competencies which she would need to execute the task properly and for which
Eva has not exhibited a sufficient level of expertise, i.e. for which she has a
competency gap. Eva selects one of the Learning Goals (e.g. basic knowledge
about creativity techniques in Fig. 14.4) and thus refines the snippet and people
recommendations.
Eva opens a video which contains an introduction about creativity techniques
and creativity. The APOSDLE Reader again highlights relevant parts of the video
and provides an overview of the video by displaying a theme river similar to the one
shown in Fig. 14.3. The video helps Eva to get a better understanding of basic
creativity theories and methods. But she still has some more concrete questions in
particular in the context of the snippet she has found.
By simply clicking on ‘contact snippet author’, Eva contacts Pierre (another
consultant in her company) to ask him about his experiences. APOSDLE supports
Eva in selecting a cooperation tool by knowing Pierre’s preferred means of cooper-
ation (e.g. asynchronous vs. synchronous, tools he uses, etc.). APOSDLE also
contextualizes the cooperation by providing Pierre with the parts of Eva’s work
context which are relevant to her question. That is, Pierre can review which
resources Eva has already accessed (assuming Eva’s privacy settings allow this).
Pierre accepts Eva’s request and Pierre and Eva communicate via Skype (Pierre’s
preferred means of communication). Eva can take notes during the cooperation, and
can reflect on the cooperation afterwards in a dedicated (Wiki) template. If Eva and
Fig. 14.4 Recommended resources can be refined according to a user’s Learning oals listed in the
drop down box. Learning Goals allow narrowing down large lists of resources to specific needs of
users
14 Context-Aware Recommendation for Work-Integrated Learning 283
Pierre decide to share this cooperation result with other APOSDLE users, the
request, resources, notes and reflections will be fed into the APOSDLE knowledge
base. After talking to Pierre, Eva continues with preparations for the upcoming
creativity workshop.
By exposing the relationships between topics and tasks of the application domain
the learner is enabled to informally explore the underlying formal knowledge
structures and to learn from them. Specifically, users can be made aware of topics
relevant to the current task. These might constitute relevant learning goals for the
future. In addition, APOSDLE supports communication between peers by helping to
identify the right person to contact, to select the preferred communication channel,
to contextualize the cooperation, and to document it if desired. It is up to the user if at
all, in which situations, and in which order to take advantage of this support.
14.4.3 Partially Prescriptive Learning Guidance:Recommendation of Reflection Situationsand Learning Paths
Eva has some additional time which she wants to spend on acquiring in-depth
knowledge about creativity techniques. She opens the APOSDLE Experiences Tab
(see Sect. 14.5.3 for more details on the User Profile) and reflects on her past
activities. This tab (Fig. 14.5) visualizes her own User Profile indicating which
topics she routinely works with (middle layer), which topics she needs to learn more
about (top layer), and in which topics she has expertise (bottom layer).
She realizes that she is a learner in many of the common creativity techniques
and therefore she decides to approach this topic systematically by creating a
personalized Learning Path. Eva opens the Learning Path Wizard (Fig. 14.6) and
browses through the task list. She selects the task “applying creativity techniques in
a workshop”. Based on this task, the Learning Path Wizard suggests a list of topics
to include in her Learning Path (see Sect. 14.5.1.1), and Eva adds some more
creativity techniques Pierre mentioned in their last conversation. Eva saves the
Learning Path and also makes it public so that other colleagues can benefit from it.
To execute the Learning Path Eva then activates it in APOSDLE Suggests. At
this time APOSDLE Suggests recommends relevant knowledge resources for the
topic she selected from her Learning Path. Eva now follows her Learning Path in
dedicated times during her working hours. Whenever new relevant resources are
added to the knowledge base Eva is made aware of them.
APOSDLE explicitly triggers the learner to reflect upon her (learner) activities
while the reflection process itself is not formally supported but left to the user’s
discretion. In addition, the creation of semi-formal Learning Paths for longer term
and more systematic competence development is supported and partially
automated. However, the time and method of Learning Path execution is not
predetermined and can be performed flexibly.
284 S.N. Lindstaedt et al.
Fig.14.5
TheExperiencesTab
provides
userswithan
overviewabouttheirexperienceswithtopicsin
thedomain.APOSDLEusesthreelevels(Learner
[top
layer],Worker
[middlelayer],Supporter
[bottomlayer])to
indicatedifferentlevelsofknowledge
14 Context-Aware Recommendation for Work-Integrated Learning 285
14.5 Conceptual View on the APOSDLE Environment
As was illustrated in the above scenarios the APOSDLE environment provides a
variety of recommendations to support WIL. Within this section we present our
approaches to three key challenges for context-aware recommendation: automatic
detection of a user’s work task, inference of a user’s competences based on her
interactions, and the computation of the recommendation itself.
All components have been designed and are implemented as domain-indepen-
dent knowledge services (Lindstaedt et al. 2008). That is, none of the components
embody application domain knowledge and thus constitute a generic design envi-
ronment for WIL environments. In order to create a domain-specific WIL environ-
ment for a specific company, all application-specific domain knowledge has to be
added to APOSDLE in the form of three ontologies (see Sect. 14.5.1) and the
different knowledge resources within the knowledge base. Our approach to ontol-
ogy engineering support using the ModellingWiki (MoKi) is presented in Chap. 12.
In the present chapter, we first shortly present the structure and knowledge
resources of the knowledge base which provide a conceptual basis for all other
components (Sect. 14.5.1). Then, we describe our approach to automatically
detecting a user’s work context (Sect. 14.5.2). We briefly introduce our method
of unobtrusively diagnosing a user’s knowledge and skills, and recommendation
services that make use of the information in the user profile (Sect. 14.5.3). Finally,
Fig. 14.6 Wizard for creating a Learning Path based on the topics available in the domain. The
left column contains topics which have been recommended to reach a specific Learning Goal.
Users can choose out of this list of Topics to assemble their individual Learning Path (rightcolumn)
286 S.N. Lindstaedt et al.
we describe, how an associative network is used to retrieve relevant content for a
task at hand (Sect. 14.5.4).
14.5.1 Knowledge Base
Different types of Knowledge Resources are presented to the user within
APOSDLE: Topics, Tasks, Learning Paths, Documents, Snippets, Cooperation
Transcripts, and Persons. All of these resources can be organized into Collections
which can be shared with others and thus may serve as Knowledge Resources
themselves.
14.5.1.1 Topics, Tasks, Learning Goals, and Learning Paths
Topics, Tasks and Learning Paths are structural elements which are presented to the
users and which can be used for accessing further Knowledge Resources. All of
them are encoded within an integrated OWL ontology within the Knowledge Base
and provide the basis for intelligent recommendation of resources and for
inferences on the user’s competencies (Lindstaedt et al. 2009).
Topics (domain model) are core concepts which knowledge workers in a com-
pany need to know about in order to do their jobs. For instance, Topics in the ISN
domain are “creativity technique” or “workshop”. Each Topic has a description. A
Topic can be added to a Collection, its relations with other Topics and with Tasks
can be browsed or it can trigger recommendations in APOSDLE Suggests.
Tasks (process model) are typical working tasks within a specific company.
Examples for ISN Tasks include “applying creativity techniques in a workshop” or
“identifying potential cooperation partners”. Each Task has a description. In addi-
tion, to each Task a set of Learning Goals is assigned which are required for
performing the Task successfully. For instance, for the ISN task “applying creativ-
ity techniques in a workshop” one required Learning Goal might be “basic knowl-
edge about creativity techniques”. Each of these Learning Goals is related to one
Topic. That way, Tasks and Topics are inherently linked. A Task in APOSDLE can
be added to a Collection, its relations with other Tasks and with Topics can be
browsed and it can trigger suggestions in APOSDLE Suggests.
In essence, a Learning Path is a sequence of Topics for which recommendations
can be obtained in APOSDLE Suggests. The sequence of Topics about which
knowledge should be acquired shall maximize learning transfer and follows a
prerequisite relation computed on the Learning Goal model based on compe-
tence-based knowledge space theory (Ley et al. 2008). Learning Paths are
generated by APOSDLE users themselves with the help of a Learning Path Wizard
(Sect. 14.4.3) starting from a Task, a Collection, or a Topic. Learning Paths can be
shared with others or added to a Collection.
14 Context-Aware Recommendation for Work-Integrated Learning 287
14.5.1.2 Documents, Snippets, and Cooperation Transcripts
Documents, Snippets and Cooperation Transcripts are the actual ‘learning content’
within APOSDLE. They constitute previous work results of knowledge workers in
the company which can be accessed. Such context-related knowledge artifacts
improve the likelihood of offering highly relevant information which can be directly
applied to the work situation with little or no learning transfer required. In addition,
they have the advantage that no additional learning content has to be created.
By Documents, we understand both textual and multimedia documents which
can be accessed in the APOSDLE Reader. Documents can be opened and saved,
shared with others, added to a Collection or rated.
Snippets are parts of (textual or multi-media) documents annotated with one
Topic which can be viewed in the APOSDLE Reader. Users can create Snippets by
highlighting a part of a Document and annotating it with one Topic from the domain
ontology, share Snippets with their colleagues, add them to a Collection, and rate
them. In addition, APOSDLE automatically generates Snippets fitting to the
domain ontology provided.
Cooperation Transcripts are textual documentation of information exchanged
during cooperations. Cooperation Transcripts can be fed back into APOSDLE and
made available in APOSDLE Search or Suggest.
14.5.1.3 Knowledgeable Persons
All APOSDLE users are potential sources of knowledge and hence constitute Knowl-
edge Resources. Knowledgeable Persons are identified topic-wise, i.e. there are no
‘overall Knowledgeable Persons’ but only Knowledgeable Persons for a Topic at
hand. For instance, a person can be knowledgeable with respect to the Topic “work-
shop” but might have little knowledge about the Topic “creativity technique”. The
information of who is a knowledgeable person at a certain point in time for a Topic at
hand is obtained from the APOSDLE User Profile (see Sect. 14.5.4). Persons can be
contacted directly or be added to Collections for future contact.
14.5.2 Automatically Determining the User’s Work Task(Short-Term Context)
Within this section we present our approach to automatically determining a user’s
current work task based on her interactions (e.g. keystrokes, application specific
events) with the computer. The unique characteristic of our approach is the use of a
generic ontology to represent the user interaction context which preserves and
establishes the relationships between knowledge resources. It is utilized to engineer
features which improve task detection precision also for knowledge intensive tasks.
288 S.N. Lindstaedt et al.
The identification of the current work task on the one hand triggers the context-
aware recommendation services (see Sect. 14.5.4) and on the other hand serves as
input for inferring a user’s competences (see Sect. 14.5.3).
14.5.2.1 User Interaction Context Ontology
We refine Dey’s definition of context (Dey et al. 2001) by focusing on the user
interaction context that we define as “all interactions of the user with resources,
applications and the operating system on the computer desktop” (Rath et al. 2009).
Various context model approaches have been proposed, such as key-value models,
markup scheme models, graphical models, object oriented models, logic-based
models, or ontology-based models (Strang and Linnhoff-Popien 2004). However,
the ontology-based approach has been advocated as being the most promising one
(Baldauf et al. 2007) mainly because of its dynamicity, expressiveness, and
extensibility.
We have defined a user interaction context ontology (UICO) (Rath et al. 2009)
which represents the user interaction context through 88 concepts, 215 datatype and 57
objecttype properties. It is modelled in the ontology web language (OWL), a W3C
standard for modelling ontologies widely accepted in the Semantic Web community.
The majority of concepts represents the types of user interactions and the types of
resources. The high number of datatype properties represents data andmetadata about
resources and application user interface elements the user interacted with. The
objecttype properties relate (1) the user interactions with resources, (2) resources
with other resources or parts of resources, and (3) user interactions with themselves
for modelling the aggregation of user interactions. Context observers, also referred to
as context sensors, are programs, macros and plug-ins that record the user’s
interactions on the computer desktop. It is important to note that the UICO is a generic
ontology for personal information management which covers most resource, interac-
tion types, and applications available to the user at a typical windows desktop
computer. It can easily be extended to include more specialized applications.
From a data perspective, the UICO is a much richer representation of the user’s
interaction context than is typically preserved in simple sensor streams. From a
semantic perspective, the UICO transcends top–down approaches to semantic
desktop ontology design in not only providing high level concepts but connecting
them to low level sensor data. We argue that by relating semantics with sensor data
the UICO lends itself naturally to: (1) a variety of context-aware applications, and
(2) “mining” activities for in-depth analyzes of user characteristics, actions,
preferences, interests, goals, etc.
14.5.2.2 Automatic Population of the UICO
We developed a broad range of context sensors for standard office applications and
the operating system Microsoft Windows (XP, Vista and 7). A complete list of
14 Context-Aware Recommendation for Work-Integrated Learning 289
sensors is given in (Rath et al. 2009). The sensed contextual data sent by the context
sensors is used as a basis for automatically populating the UICO. Automatic
population here means an autonomous instantiation of concepts and creation of
properties between concept instances of the UICO based on the observed and the
automatically inferred user interaction context. For example, if a user copies a piece
of text from an e-mail to a word document we will automatically create an instance
of the concept e-mail and the concept word document within the UICO and connect
them via a relationship link. The automatic population exploits the structure of user
interface elements of standard office applications and preserves data types and
relationships through a combination of rule-based, information extraction and
supervised learning techniques. We also use our knowledge discovery framework,
the KnowMiner (Klieber et al. 2009) to perform named entity recognition of
persons, locations and organizations as well as for extracting data and metadata
of various resource types. Hence, the UICO is a much richer representation of the
user interaction context than is typically stored in attention metadata sensor streams
(Wolpers et al. 2007) since it preserves relationships that otherwise are lost.
14.5.2.3 Task Detection as a Classification Problem
Performing task detection consists of training machine learning algorithms on
classes corresponding to task models. This means that each training instance
presented to the machine learning algorithms represents a task that has to be
‘labelled’. Thus, training instances have to be built from features and feature
combinations derived from the user context data on the task level. In our ontol-
ogy-based approach, this means deriving features from the data associated with a
Task concept. Based on our UICO, we have engineered 50 features for constructing
the training instances. They are grouped in six categories: (1) action, (2) applica-
tion, (3) content, (4) ontology structure, (5) resource, and (6) switching sequences.
We use the machine learning toolkit Weka (Witten and Frank 2005) for parts of the
feature engineering and classification processes. A number of standard pre-
processing steps are performed on the content of text-based features.
14.5.2.4 Evaluations
We have performed three laboratory experiments for evaluating the influence on
our ontology-based user task detection approach of the following factors: (1) the
classifier used, (2) the selected features, (3) the task type, and (4) the method chosen
for training the classifiers. We have gained several insights from our evaluation.
First, the J48 decision tree and Naıve Bayes classifiers provide better classifica-
tion accuracy than other classifiers, in our three experiments.
Second, we have isolated six features that present a good discriminative power
for classifying tasks, namely the accessibility object name feature, the window title
feature, the used resource metadata feature, the accessibility object value feature,
290 S.N. Lindstaedt et al.
the datatype properties feature and the accessibility object role feature. Because of
the low standard deviation values associated with them, the performance of these
six features also proves to be stable across datasets.
Third, even though it could seem easier to classify routine tasks, our experiments
show that knowledge intensive tasks can be classified as well as routine tasks. For
example, within the domain of computer science students the following tasks can be
considered routine tasks: registering for an exam, finding course dates, reserving a
book from the university library, etc. On the other hand knowledge intensive tasks
here include: programming an algorithm, preparing a scientific talk, planning a
study trip, etc. We attribute this result to the fine granularity of the usage data
captured within the UICO.
Fourth, we have shown that a classifier trained by a group of experts on
standardized tasks also performs well while classifying personal tasks performed
by users. This result suggests, that the classifier can be trained by a domain expert
and then utilized by other users – eliminating the time intensive training period for
individual users. For more details please refer to (Rath et al. 2010).
In future work we will investigate combining unsupervised learning mechanisms
for identifying boundaries in the user interaction context data, based on the six
discovered context features and applying the J48 decision tree and Naıve Bayes
learning algorithms for classifying these clusters to task classes.
14.5.3 Inferring the User’s Competencies (Long-Term Context)
Within this section we present our approach to inferring a user’s competencies based
onKnowledge Indicating Events (KIE, see below) and thus automaticallymaintaining
her user profile. Themajor KIE we have examined is the execution of work tasks. The
KIE approach distinguishes itself from other approaches in the user modeling field in
that it takes into account user events which occur naturally during everyday work as
opposed to being restricted to events within specialized eLearning systems.
The inferred user competencies are used to rank the learning goal and content
recommendations and form the basis for the recommendation of subjectmatter experts.
14.5.3.1 Knowledge Indicating Events
We suggest tackling the challenge of user profile maintenance by observing the
naturally occurring actions of the user (Jameson 2003) which we interpret as
knowledge indicating events (KIE). KIE denote user activities which indicate that
the user has knowledge about a certain topic (Ley et al. 2008).
Within our first two prototypes we have based the maintenance of the user profile
solely on information on past tasks performed (task-based knowledge assessment).
The algorithms for maintaining the user profile and the ranking of learning goals
were based on competence-based knowledge space theory (cbKST, Korossy 1997)
14 Context-Aware Recommendation for Work-Integrated Learning 291
which is based on Doignon and Falmagne’s knowledge space theory (Doignon and
Falmagne 1985). It is a framework that formalizes the relationship between overt
behavior (e.g. task performance) and latent variables (knowledge and competencies
needed for performance).
While there is some evidence that in fact most learning at the workplace is
connected to performing a task, and that task performance is a good indicator for
available knowledge in the workplace, this restriction to tasks performed certainly
limits the types and number of assessment situations that are taken into account. We
have therefore extended our approach and looked at a variety of additional potential
KIEs. Examples include communication with other users about a topic or the
creation of documents which deal with a topic. KIE thus are based on usage data.
Our approach goes into a similar direction as Wolpers et al. (2007), who suggested
using attention metadata for knowledge management and learning management
approaches. This idea is similar to the approach of evidence-bearing events (e.g.
Brusilovsky 2004). By now, the approaches of attention metadata and evidence-
bearing events have been discussed from a rather technical point of view, and how
the approaches, once implemented, could be used in different settings. With our
approach to KIE, we extend the technical perspective by taking into account the
entire process of implementing the KIE approach in a learning system, from
identifying potential KIE to their evaluation.
14.5.3.2 User Profile
We constitute the user profile as an overlay of the topics in the knowledge base
(Sect. 14.5.1; see also Lindstaedt et al. 2009). In APOSDLE’s second prototype, the
user profile was maintained using the following rule: whenever a user executes a
task (e.g. workshop preparation) within the environment the counter of that task
within her user profile is incremented. The user profile counts how often the user
has executed the task in question. It therefore constitutes a simple numeric model of
the tasks which are related to one or several topics in the domain. Based on the
learning goal model we can infer that the user has knowledge about all the topics
related to that task. By means of an inference service (see below), information is
propagated along the relationships defined by the learning goal model (see
Sect. 14.5.1; Lindstaedt et al. 2009), and the counters of all topics related to the
task are also incremented. Consequently, the user profile contains a value for each
user and each topic at any time during system usage.
In APOSDLE’s third prototype, the simple numeric user profile was changed
into a qualitative user profile that distinguishes three qualitatively different knowl-
edge levels (learner, worker, supporter). Therefore, we extended our set of KIE and
assigned to each knowledge level those events which indicated the respective knowl-
edge level. For example, ‘asking for a learning hint’ is an event that indicates ‘learner’
knowledge level, ‘performing a task’ would be ‘worker’ knowledge level and ‘being
contacted about a certain topic’ would be supporter knowledge level.We then tracked
all KIE that occurred for a user and developed an algorithm that allowed us to diagnose
292 S.N. Lindstaedt et al.
one of the three knowledge levels for each topic in the domain the user has usedwithin
APOSDLE (e.g. by opening a document, carrying out a task etc.).
14.5.3.3 Learning Goal Ranking
A user’s learning need is inferred in three steps (Lindstaedt et al. 2009). Starting
with the user’s current task, the learning goal model is queried to retrieve the
required learning goal vector r for this task (step 1). The vector r represents forall learning goals of the domain whether or not they are required to perform the
user’s current task. In step 2, the user profile is queried with the required learninggoal vector r as parameter to retrieve the current knowledge levels vector k for theuser. The vector k consists of one of three knowledge levels for all topics which arereferenced within vector r. The third step generates the learning need g by ranking
the knowledge levels of all required topics (vector k) from learner topics (top of the
list) to supporter topics (bottom of the list). The ‘lower’ the knowledge level of a
topic in the learning goal vector g, the higher the rank of the learning goal. The
‘most required’ learning goal is therefore listed on the top of the learning need. If
there are multiple topics with the same knowledge level, these topics which are
assigned to more tasks in the knowledge base are ranked higher. This simple rule
ensures that the prerequisite relation that is assumed between topics of the knowl-
edge base is taken into account (see Ley et al., 2009). The learning need is used by
APOSDLE in two ways. An application running in the working environment of the
user visualizes the result as a ranked list (see Fig. 14.3). The ‘top’ learning goal is
automatically pre-selected, which invokes the Associative Retrieval Service (see
Sect. 14.5.4) to find knowledge resources relevant for the ‘most pressing’ learning
need. The user can select other learning goals from the ranked list and thus filter the
offered knowledge resources accordingly.
14.5.3.4 Identifying Knowledgeable People
Within APOSDLE, a People Recommender Service aims at finding people within the
organization who have expertise related to the current learning goal of the user.
Obviously, the People Recommender Service is based on the information in the user
profile. Users specialising in certain topics are represented in the user profile with
high knowledge levels for these topics. Other users can now individually be provided
with colleagues having equal or higher experience. Compared to e.g. the MetaDoc
system (Boyle 1994) this service uses a more dynamic way of identifying experts.
Knowledgeable users are identified by comparing the current knowledge levels
vectors of all users with the knowledge level vector of the user who will receive the
recommendation. To infer knowledgeable users, the People Recommender Service
utilises the Learning Need Service (see Sect. 14.5.3.3) to retrieve current knowledge
levels vectors for all users. The next step removes all users with lower knowledge
levels compared to the user receiving the recommendations. The remaining users are
14 Context-Aware Recommendation for Work-Integrated Learning 293
then ranked according to their knowledge levels in the current knowledge level
vectors. The most knowledgeable user will be ranked highest. The service can be
configured to also use the availability status of users as a ranking criterion.
14.5.3.5 Evaluation
We conducted a variety of studies to evaluate the automatically maintained user
profile and its related services (Lindstaedt et al. 2009) of which we here only report
on one that addresses the accuracy and usefulness of the learning goal ranking.
In a lab study psychology students were observed and interviewed while they were
trying to learn with APOSDLE in the learning domain of statistical data analysis. Ouraim was to investigate the effects of the learning goal ranking on the actual perfor-
mance of users in realistic tasks which they were not able to solve before the study
(pre-test). In the pilot study, control groups were used to compare three different
versions of the ranking algorithm: (a) the ranking algorithm as it was designed for
APOSDLE (taking into account both the requirements of the task and the knowledge
of the user), (b) a shuffled list of learning goals required for the task at hand (taking into
account the requirements of the task but not the knowledge state of the user), and (c) a
set of learning goals randomly selected (neither taking into account requirements of
the task, nor the knowledge state of the user). Each of the participants had to solve
three different tasks, one for each version of the algorithm. With versions (a) and (b),
the previously unknown tasks could be solved by all participants, whereas the task
could be solved by none of themwhen algorithm (c)was applied.Additionally, a slight
difference in the users’ behavior was found between versions (a) and (b): In case of
version (a), users by tendency selected less learning goals and more frequently carried
out learning activities for learning goals on the top of the list in comparison with
version (b) of the ranking algorithm. This serves as an indicator that the ranking
algorithm is useful but clearly more studies are needed.
14.5.4 Recommending Content Relevant to the Work Task
We introduce a hybrid context-based search mechanism on the basis of an associa-
tive network. The network consists of a semantic layer which is built from the
domain ontology and a content-based layer which is built from the textual knowl-
edge artifacts of the organization. This hybrid approach combines semantic-simi-
larity with text-based similarity measures in order to improve retrieval
performance. The novelty of our approach lies in combining spreading activation
search in both layers using one and the same model.
Here spreading activation (Crestani 1997) is used for all three types of search:
search for documents based on a set of Topics, associative search for Topics, and
associative search for documents/snippets. This allows uniform search based on the
detected user work task.
294 S.N. Lindstaedt et al.
14.5.4.1 Associative Retrieval Service
The Associative Retrieval Service (Scheir et al. 2008) relies on knowledge
contained in the domain ontology and the statistical information in a collection of
documents. The service is queried with a set of Topics from the ontology and
returns a set of snippets. As discussed above, snippets in the environment are
annotated with ontological Topics. In APOSDLE, the annotation process is
performed manually for a training set which is then utilized to train a classifier to
automatically create and annotate snippets.
Topics from the ontology are used as metadata for snippets in the system. In
contrast to classical metadata, the ontology specifies relations between the Topics
(Fig. 14.7). For example, class-subclass relationships are defined and domain-
specific relations between Topics are modeled. The structure of the ontology is
used for calculating the similarity between two Topics in the ontology (see below).
This similarity is then used to expand a query with similar Topics before retrieving
documents dealing with this set of Topics. After thus retrieving documents based on
metadata, the result set is expanded by means of textual similarity. The implemen-
tation of the associative network allowed us to develop and test different
combinations of query and result expansions that are based on the spreading
activation algorithm.
The three concept similarity measures used are: a measure based on the shortest
path between two Topics in the same class hierarchy, the measure of Resnik (1999)
and a vector measure based on the properties which relate Topics in an ontology.
All three measures share the fact, that for calculating the semantic similarity
between two Topics these two Topics have to originate from the same ontology.
Concept to Concept layer(C2C)
Concept to Document layer(C2D)
Document to Document layer(D2D)
Fig. 14.7 Anatomy of the associative network
14 Context-Aware Recommendation for Work-Integrated Learning 295
14.5.4.2 Evaluation
In our evaluation we compare the search performance of this associative network
using a number of different similarity measures. All associative search approaches
employing semantic similarity, text-based similarity or both, increase retrieval
performance compared to the baseline. In addition, since there exist no standardized
test corpora for semantic retrieval we built such a test corpus based on an applica-
tion case within the APOSDLE project.
We tentatively conclude that text-based methods for associative retrieval result
in an increase in retrieval performance (Scheir et al. 2008); therefore we want to
explore the approach of attaching a set of terms to every Topic in our domain
ontology during modeling time to provide search results even for Topics that are not
used for annotation. In addition we want to extend our research towards the
application of different semantic similarity measures within our service (Stern
et al. 2010). A key research question for the future is the appropriate selection of
similarity measures for given ontologies. Therefore further experiments with dif-
ferent ontologies have to be conducted.
14.6 Summative Evaluation
As mentioned in the introduction, we see the “APOSDLE solution” as consisting of
(a) modelling the learning domain and the work processes, (b) annotating
documents and other sources of information available in the company repository,
(c) training the prospective users of the APOSDLE environment, and (d) using the
resulting domain-specific WIL environment with its varying learning guidance
functionalities at the workplace. This means that a comprehensive summative
evaluation of the APOSDLE solution requires a summative evaluation of each of
these aspects. This is even more mandatory as the aspects depend on each other. If
the domain modelling has not been done correctly, the annotation will fall short of
what is needed; if the annotation is done badly retrieval of relevant information will
be unsatisfactory; if the users are not well trained, their use of the APOSDLE
system will be sub-optimal.
Within this chapter we shortly report on results of the workplace evaluation (step
d) and briefly discuss the efforts required for model creation (step a). The modelling
of the learning domain and the work processes were conducted over a period of
2–3 months (see also Chap. 12).
The workplace evaluation assessed the third APOSDLE prototype in use at the
sites of three application organizations, EADS, ISN and CCI (for brief descriptions
see Sect. 14.3). It spanned a period of about 3 months and involved 19 persons.
EADS, CCI, and ISN reported modelling efforts between 59 and 304 person hours
for domain, task, and learning goal models ranging from 94 to 145 Topics, 13–100
Tasks, and 59–291 Learning Goals.
296 S.N. Lindstaedt et al.
For the evaluation a multi-method data collection approach was followed using a
questionnaire, interviews, log data, user diaries kept while working with APOSDLE,
and site visits. This allowed for triangulation of results. An overview of the findings is
given in the following. Please refer to (Dotan et al. 2010) for details.
14.6.1 Workplace Evaluation
At the beginning of the APOSDLE project all application partners were asked to
express which goals they would like to reach by employing computational support
for WIL. The following goals were expressed by all three application partner
organizations:
1. Learning material relevant to current task
2. Aware of learning material
3. Existing knowledge improved
4. High quality learning material provided
5. Learning helped task completion
6. Learning time planned and managed
7. Experts accurately sorted by relevance
Within the exit questionnaire the participants of the three companies that
participated in the usage evaluation were asked to which extent they thought that
these goals were reached by APOSDLE. The exit questionnaire was filled out by 19
participants. Figure 14.8 shows the mean and standard deviation on a 5-point Likert
scale for each question. It shows that all goals were reached to a large extent.
Two main findings were observed:
1. APOSDLE was used most frequently by trainees and new employees who were
expected to learn and were also given the time to do so. The highly-specialized
content recommended was appreciated and end users reported they could not
have accessed it that effectively by other means. These learners used all features
of the system in various combinations to access and manage their learning
Fig. 14.8 Exit questionnaire results
14 Context-Aware Recommendation for Work-Integrated Learning 297
process. In contrast, the work schedule of ‘regular’ and ‘expert’ knowledge
workers was to a large extent dictated by clients’ needs. This meant that work
activities were often unexpected, performed under time pressure and left little
room for learning activities. This led to sporadic and less-frequent interaction
with APOSDLE.
2. APOSDLE was especially effective within high specialized domains in which
much of the domain knowledge is documented. APOSDLE proved less effective
in broad customer-driven domains where knowledge was shared to a large extent in
person. One reason for this result probably was that in those domains documenta-
tion was not part of the established work practice and thus the system did not have
access to all knowledge. This was aggravated by the fact that users in these domains
worked in the same offices and extensively shared knowledge face-to-face; there-
fore collaboration support provided by APOSDLE was not utilized often.
During the evaluation period and across all three sites there was clear evidence
that using the APOSDLE environment improved people’s knowledge in various
ways. Assessing whether knowledge has been improved and to what extent has
been based on personal accounts (diary entries and interviews) describing enhanced
understanding of topics and tasks following different kinds of interactions with
APOSDLE. Overall APOSDLE supported the acquisition of new knowledge by the
users by making them aware of learning material, learning opportunities and by
providing different degrees of learning guidance. In EADS especially, it was
reported on numerous occasions in the user diary that explicit and implicit learning
material enabled knowledge workers to gain useful insight, improve their knowl-
edge, and complete a task they were working on.
In all application cases, users utilized the awareness building and descriptive
learning guidance (e.g. exposing knowledge structures) more often than the more
prescriptive learning guidance (e.g. triggering reflection, learning paths). Learners
extensively used the different functionalities to browse and search the knowledge
structures (descriptive learning guidance), followed the provided content
suggestions (awareness), and collected relevant learning content within
Collections. Learning Paths were used by trainees and new employees in order to
structure and plan their own learning process. Their potential as a teaching tool for
experts was not realized since experts only rarely used the system. The reflection
tool (MyExperiences, see Fig. 14.5) was employed by learners mainly to examine
the environment’s perception of their usage behaviour. To which extent this also
lead to reflection of their learning activities could not be identified.
14.7 Conclusion
One overall conclusion of the workplace evaluation is that the WIL approach has
proven effective for (a) end users in explicit learner roles (e.g. trainees) in (b)
highly-specialized domains (such as EADS’s Electromagnetism Simulation
298 S.N. Lindstaedt et al.
domain) in which much of the knowledge to be learned is documented within work
documents. In those circumstances, APOSDLE delivered an effective work-
integrated learning solution that enabled relatively inexperienced knowledge
workers to efficiently improve their knowledge by utilizing the whole spectrum
of learning guidance provided.
Our results concerning improving productivity of experienced knowledge
workers are inconclusive. Further evaluations have to be conducted to determine
the effect on topic experts over time.
Secondly, our findings show that awareness building and descriptive learning
guidance effectively support learning tightly intertwined with task executions.
These ‘soft’ learning guidance mechanisms were used frequently by users of all
levels of expertiese. Moreover, people did not consider them as additional steps or
efforts but rather as inherently part of task execution. On the other hand, our
partially prescriptive support mechanisms were used nearly exclusively by novices
and new employees. This suggests that supportive measures derived from instruc-
tional theories which are focusing on longer term learning processes require a
‘learner’s mindset’ and time dedicated to learning.
Finally, we can conclude that the domain-independent WIL design environment
approach was successful. Relying on existing material instead of tailor made
learning material provided to be effective and cost efficient. Crucial for this is
having good modelling tools, experienced modellers, and high quality annotations
of snippets (see Chap. 12). In a recent instantiation within a new organization and
new application domain we were able to further reduce instantiation efforts (com-
pare above for reported efforts of EADS, ISN, and CCI) to 120 person hours for 51
Topics, 41 Tasks, and 124 Learning Goals. We believe that these numbers are quite
competitive when comparing them to efforts needed to instantiate a traditional
Learning Management System at a site and to develop custom learning material.
Acknowledgements The Know-Center is funded within the Austrian COMET Program - Com-
petence Centers for Excellent Technologies - under the auspices of the Austrian Ministry of
Transport, Innovation and Technology, the Austrian Ministry of Economics and Labor and by
the State of Styria. COMET is managed by the Austrian Research Promotion Agency FFG.
APOSDLE (www.aposdle.org) has been partially funded under grant 027023 in the IST work
programme of the European Community.
References
Baldauf M, Dustdar S, Rosenberg F (2007) A survey on context-aware systems. Int J Ad Hoc
Ubiquitous Comput 2(4):263–277
Boyle C (1994) An adaptive hypertext reading system. User Model User Adapt Interact 4(1):1–19
Brusilovsky P (2004) KnowledgeTree: A Distributed Architecture for Adaptive E-Learning. In:
Proceedings of WWW 2004, May 17–22, 2004, New York, New York, USA, 104–113
Crestani F (1997) Application of spreading activation techniques in information retrieval. Artif
Intell Rev 11:453–482
14 Context-Aware Recommendation for Work-Integrated Learning 299
Dey AK, Abowd GD, Salber D (2001) A conceptual framework and a toolkit for supporting the
rapid prototyping of context-aware applications. Human Comput Interact 16(2):97–166
Doignon JP, Falmagne JC (1985) Spaces for the assessment of knowledge. International Journal of
Man-Machine Studies 23:175–196
Dotan A, Maiden NAM, Lichtner V, Germanovich L (2009) Designing with only four people in
mind? – A case study of using personas to redesign a work-integrated learning support system.
In: Gross T et al (eds) Proceedings of INTERACT 2009, Part II, Uppsala, Sweden, pp 497–509
Dotan A, Maiden N, Lockerbie J, de Hoog R, Leemkuil H, Ghidini C, Rospoche M, Lindstaedt SN,
Kump B, Pammer V, Faatz A, Zinnen A (2010) Summative evaluation report, deliverable
D6.12, EU project 027023 APOSDLE, City University, London, 2010
Eisenberg M, Fischer G (1994) Programmable design environments: integrating end-user pro-
gramming with domain-oriented assistance. In: Adelson B, Dumais S, Olson J (eds) CHI’94.
Conference proceedings, human factors in computing systems. ACM, New York, pp 431–437
Eraut M, Hirsh W (2007) The significance of workplace learning for individuals, groups and
organisations, SKOPE is based at Oxford and Cardiff universities
Jameson A (2003) Adaptive interfaces and agents. In: Jacko J, Sears A (eds), The human-computer
interaction handbook: Fundamentals, evolving technologies and emerging applications.
Mahwah, NJ: Erlbaum, pp 305–330
Jones S, Lindstaedt S (2008) A multi-activity distributed participatory design process for
stimulating creativity in the specification of requirements for a work-integrated learning
system. Workshop at CHI 2008, Inderscience, www.inderscience.com
Klieber W, Sabol V, Muhr M, Kern R, Ottl G, Granitzer M (2009) Knowledge discovery using the
KnowMiner framework. In: Proceedings of the IADIS’09, 2009, Inderscience, www.
inderscience.com
Kooken J, Ley T, de Hoog R (2007) How do people learn at the workplace. Investigating four
workplace learning assumptions. In: Proceedings of EC-TEL 2007, Crete, Greece, pp 158–171
Kooken J, de Hoog R, Ley T, Kump B, Lindstaedt SN (2008) Workplace learning study 2.
Deliverable D2.5, EU project 027023 APOSDLE, Know-Center, Graz, 2008
Korossy K (1997) Extending the theory of knowledge spaces: a competence-performance-
approach. Zeitschrift f€ur Psychologie 205:53–82Lave J, Wenger E (1991) Situated learning: legitimate peripheral participation. Cambridge
University Press, Cambridge
Ley T, Ulbrich A, Scheir P, Lindstaedt SN, Kump B, Albert D (2008) Modelling competencies for
supporting work-integrated learning in knowledge work. J Knowledge Manage 12(6):31–47
Ley T,KumpB,MaasA,MaidenN,Albert D (2009) Evaluating theAdaptation of a Learning System
before the Prototype Is Ready: A Paper-Based Lab Study. UMAP 2009: 331–336, Trento, Italy
Lichtner V, Kounkou A, Dotan A, Kooken J, Maiden N (2009) An online forum as a user diary for
remote workplace evaluation of a work-integrated learning system. Proceedings of CHI 2009,
2009, Boston, MA
Lindstaedt SN, Beham G, Kump B, Ley T (2009) Getting to know your user – Unobtrusive user
model maintenance within work-integrated learning environments. In: Proceedings of ECTEL
2009, Nice, France, pp 73–87
Lindstaedt SN, Ley T, Scheir P, Ulbrich A (2008) Applying scruffy methods to enable work-
integrated learning. Upgrade Eur J Inform Prof 9(3):44–50
Rath AS, Devaurs D, Lindstaedt SN (2009) UICO: an ontology-based user interaction context
model for automatic task detection on the computer desktop. In: Workshop on context,
information and ontologies, ESWC’09, 2009, Springer, Berlin and Heidelberg
Rath AS, Devaurs D, Lindstaedt SN (2010) Studying the factors influencing automatic user task
detection on the computer desktop. In: Sustaining TEL: from innovation to learning and
practice, Proceedings of EC-TEL 2010 (Lecture Notes in Computer Science), vol 6383.
Springer, Barcelona, Spain, pp 292–307
Resnik P (1999) Semantic Similarity in a Taxonomy: An Information-Based Measure and its
Application to Problems of Ambiguity in Natural Language, Volume 11, pages 95–130
300 S.N. Lindstaedt et al.
Scheir P, Lindstaedt SN, Ghidini C (2008) A network model approach to retrieval in the Semantic
Web. Int J Semantic Web Inform Syst 4(4):56–84
Schugurensky D (2010) The forms of informal learning: towards a conceptualization of the field.
http://hdl.handle.net/1807/2733. Retrieved 10 July 2010
Stern H, Kaiser R, Hofmair P, Kraker P, Lindstaedt SN, Scheir P (2010) Content recommendation
in APOSDLE using the associative network. J Univ Comput Sci 16(16):2214–2231
Strang T, Linnhoff-Popien C (2004) A context modeling survey. In: Workshop on advanced
context modelling, reasoning and management, UbiComp’04, Nottingham, 2004
Wall TD, Jackson PR, Davids K (1992) Operator work design and robotics system performance: a
serendipitous field study. J Appl Psychol 77:353–362
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn.
Morgan Kaufmann, San Francisco
Wolpers M, Najjar J, Verbert K, Duval E (2007) Actual usage: the attention metadata approach.
Educ Technol Soc 10(3):106–121
14 Context-Aware Recommendation for Work-Integrated Learning 301
15
Evolving Metaphors for Managingand Interacting with Digital Information
Natasa Milic-Frayling and Rachel Jones
15.1 Introduction
We are in the midst of the digital revolution which permeates all the areas of human
endeavour. From the information management point of view, digital content is the
most ephemeral of all the media that we have historically used to store and transfer
information. Digital documents can be replicated easily and shared at unprece-
dented speeds. They can be aggregated, augmented, and transformed to generate
new value. Yet, digital media exhibits vulnerabilities that require special attention
by the users and designers of digital systems. Indeed, digital media depends on
a dedicated infrastructure to sustain its existence, access, and consumption.
One dominant aspect is the need to store the digital content for reuse. Many usage
scenarios rely upon persisting digital contentwhile it is being authored orwhen viewed
and organized for subsequent use. Thus, it is not surprising that the management of
content storage has been an important part of the user’s interaction with the informa-
tion systems. It has distinctly shaped the design of applications and played a significant
role in defining the information management paradigms. As the users move from the
traditional Desktop environment to Internet services we see increased reliance upon
services to secure the persistent data storage, even in the case of the user’s personal
content, e.g., Flickr (www.flickr.com) stores and organization of photo albums.
Similarly, authoring and publishing of digital information have changed with the
wide adoption of the Web. From two distinct functions they morphed to a more
unified activity, with the focus on efficiency of on-line publishing and communi-
cation. The traditional metaphors of documents and database records have been
N. Milic-Frayling (*)
Microsoft Research Ltd, 7 J J Thomson Avenue, Cambridge, United Kingdom
e-mail: [email protected]
R. Jones
Instrata Ltd, 12 Warkworth Street, Cambridge, United Kingdom
e-mail: [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_15, # Springer-Verlag Berlin Heidelberg 2011
303
complemented by hypertext and content streams. Indeed, wikis, blogs, and micro-
blogging services have emerged as self-sufficient authoring and publishing environ-
ments that produce distinctly different information formats. The wikis expand by
growing the number of interconnected wiki pages and enforce access and editing
control to coordinate collective authoring. The blogs provide a simple linear
structure and support threaded conversations, similar to those in forums and online
discussions groups.
While new authoring environments come with less flexibility in organizing and
reusing information, they are optimized for social interaction, communication, and
broad exposure. The design of services and user experience are shaped by the
continuous influx of information through content streams, ranging from email
and instant message threads to RSS feeds and dynamic Web pages (Milic-Frayling
et al. 2002; Fetterly et al. 2003; Teevan et al. 2009). That is particularly apparent
with the micro-blogs like Twitter (www.twitter.com), designed for live broadcast of
information snippets that spread through pre-defined types of social interaction
such as retweets and the follower relationship among participants.
Finally, the means of accessing information have also been affected by the
paradigm shift. Substantial information influx through email and Web services
has impact on the user practices, putting a significant strain on the manual filing
of content. Indeed, the filing and classifying paradigm has caved under the volume
of information and gave way to search and, more recently, a light-weight tagging of
the user generated content with keywords that are then exposed through tag cloudsfor content browsing (Cutrell et al. 2006; Dumais et al. 2003; Jones et al. 2001).
As we continue to embark on another computing paradigm, the data storage and
computation in the cloud, the informationmanagementmodels and practices are being
redefined once again. The storage and computational infrastructure have become
a commodity that can be “hired” as required. The content is now resident in the data
centres that can optimize data processing and facilitate integrationwith related content
repositories and services. This will have implications for the ways enterprises manage
and leverage their information assets in the new environment. At themoment,much of
the content management in the cloud is based on the previous generation software
and new applications and practices are still to evolve. How will one reconcile the
existing data stores and user practices with the emerging digital environments that
are abundant with services, data repositories, and social engagement?What should be
the principles of managing and interacting with information and data?
We explore some of these issues by reflecting upon the user experience and
practices in the early days of Internet penetration and integration with the Desktop
environment. We anchor our discussion around the findings of a user observation
study that we conducted in 2004. The study highlights the changes in processes and
practices that the users developed to interact with digital content, from authoring,
storage, and viewing to organizing and sharing information. This led to the con-
sideration of user activities as a driver for information management (Bardram
et al. 2006). That approach is particularly amenable to connecting the Desktop
with the Web environment and services in the cloud. We discuss the metaphors
of collections and compositions that allow for gathering information across
304 N. Milic-Frayling and R. Jones
distributed information repositories and support both rich representations of context
and effective access to the collected resources (Oleksik et al. 2009; Kerne et al.
2008). In essence, we take the Web hypertext model to another level by supporting
integrated content and activity spaces.
In the following sections we discuss the paradigm shifts and evolving metaphors
for information management through examples of user practices observed in two
studies. We conclude by reflecting upon future directions.
15.2 Information Management Processes
Information management systems are designed to facilitate interaction with informa-
tion resources. Their characteristics impact the efficiency with which information
workers are able to accomplish their everyday tasks. At the same time, the digital
environments in which information is created and exchanged are evolving continu-
ously. The self-contained and well defined desktop paradigm has been challenged by
the new opportunities for acquiring information through the desktop connectivity to
the Internet. Similarly, new ways of communication have emerged, ranging from
synchronous messaging via Instant Messenger and voice over IP to discussions in
community forums, blogs, and most recently social media sites (LinkedIn, Facebook,
etc.). This increased exposure to information and services makes new demands on
the user’s engagement with technology in terms of the complexity and variety of tasks
that need to be accomplished. At the same time, the applications and tools that are
supposed to provide support for their work remain disconnected and inflexible.
We shall reflect upon the emergence of these issues by discussing an early study
of the Web enabled desktop environment. Back in 2004 we observed a dynamic,
fast-paced work place and had a chance to see how the change has impacted the
workers. The task overload and the increased volume of received and produced data
were apparent. That led to frequent breakdowns in the use of existing applications.
The users continued to push the boundaries of the available facilities in order
to handle fast changing and demanding situations. Yet, they were experiencing
difficulties. We analysed the sources of their inefficiencies and reported the detailed
findings in (Milic-Frayling et al. 2006). Here we reflect on selected aspects that
are relevant to our discussion of the changing information management practices.
15.2.1 In-Situ Observations of Information Workers’ Practices
In October 2004 we gained access to a workplace at a leading international public
relations firm and conducted in-situ observations of nine employees over a period of
8 days. In selecting the participants, we made a conscious decision to improve
on previous studies that looked at individual workers in isolation. We chose a group
of co-workers whose activities were rather intricately interwoven. They were
15 Evolving Metaphors for Managing and Interacting with Digital Information 305
collocated within an open plan, with their desks put together in an elongated area.
Sitting across each other, separated only by the personal computers, they could
easily collaborate on projects and assist each other with individual tasks. In their
work they used Microsoft desktop applications, including Instant Messenger,
MS Internet Explorer, MS Excel, MS Word, MS PowerPoint, and MS Outlook.
In order to get a broader view and investigate both offline and online information
and communication activities, we applied a hybrid method described by (Jones
et al. 2007). We combined complementary methods: repeated interviews with the
participants, on-site observations, and activity logging, and analyzed these three
sources of information in concert to gain deeper insights.
The researcher who conducted the observations spent 8 days on site. She was
provided with a desk in the proximity of the participants. That enabled her to observe
the activities continuously and approach an individual or a group when she judged
important for the study. She recorded the activities unobtrusively using a portable
video camera and made a conscious effort to minimize the impact on the participants.
By installing the logging software on the individual participant’s machine, we contin-
uously captured the application level events that resulted from the user’s interaction
with the Windows computing environment. The logging software also captured
screenshots of the desktop displays every 5 min. Synchronized with the rest of the
data, this visual aid proved invaluable for viewing and interpreting the log data.
Through integrated data analysis we arrived at rich profiles of participants’
practices (Milic-Frayling et al. 2006). Here we discuss in more details the emerging
workflows, multi-tasking, and interruptions that the participants dealt with and
the rise of communication media as an important factor in shaping information
management practices.
15.2.1.1 Workflow and Task Management
Most of the participants maintained paper or electronic to-do lists to keep track of
their work plans. However, our observations revealed that their typical workflows
were sensitive to various triggers and required flexibility and re-adjusting of plans
according to the emerging circumstances.
Indeed, the participants worked in teams, on tasks with multiple overlapping
activities. They were responsible for press releases, briefing clients before media
interviews, and preparing information packs for other agencies that were dealing
with their clients. Their business required them to be informed of any media that
mentioned or affected their clients and to respond to new information and new
developments as they occur. They employed RSS news services to help them stay
abreast of the happenings. The main such service, NewsNow, delivered news via
email. They also subscribed to services like Media Disk and Factiva to access
information about local agencies and monitored alternative news sites where news
sometimes breaks out first, such as the Recorder site. Essentially, they checked their
email continuously, either for news or for information requests from agencies,
media contacts, and clients.
306 N. Milic-Frayling and R. Jones
In Fig. 15.1 we depicted a typical taskworkflow.A task beginswith a trigger event,a request from a colleague or client, or a personal reminder that prompts about
a planned activity. The participants would first try to locate relevant information and
then continue with authoring or communication. During this process theymay consult
or collaborate with other colleagues by exchanging information electronically or face-
to-face, sitting next to a person or seeking advice by conversing across the desk.
The tasks normally required focus and concentration. Dealing with interruptions
was one of the apparent challenges and they used various methods to keep track of
multiple tasks and respond to changes. Furthermore, the tasks required multiple
desktop applications and the participants were constantly switching between them
in their work. While this was a persistent phenomenon it did exhibit highly variable
patterns, even for the same individual. This is apparent from the time log analysis in
Fig. 15.2, showing the patterns of switching between different application windows
on three different days for the same participant. We indicate explicitly the three
major categories of activities: authoring, searching and browsing, and communi-
cating via email. The dark lines and boxes in the timelines (Fig. 15.2) show the
periods that the user spent switching between application windows, without remain-
ing in any one for 5 s or more. Some of the observed patterns can be traced
to specific characteristics of the desktop user interface.
For example, the participants maintained high awareness of the applications
that were open and frequently switched between them. Still, they often confused the
documents with one another and would maximize the wrong window. The tabs on
the MS Windows taskbar were the primary means of accessing documents and
applications that were open. However, the tabs truncated the visible file information
and thus provided suboptimal support. We also observed that participants managed
email by opening each message in a separate window and deleting unimportant
ones within few seconds. Thus, it is not surprising that dark areas, corresponding to
many reviewed email messages, are adjacent to the longer intervals of communi-
cation that involved writing emails for sustained periods of time.
The analysis of user logs also showed switching between different applications
(Fig. 15.3). Tasks like preparation of press releases, interview briefings, and
Fig. 15.1 The workflow diagram shows the task trigger and the “task loop” with typical activities
that involve locating, communicating, authoring, and filing information
15 Evolving Metaphors for Managing and Interacting with Digital Information 307
17:42:45 15:00:16 15:22:24
14:07:4713:09:55
13:00:3812:47:08
11:32:0111:21:54
15:03:5512:21:24
11:44:1511:41:27
09:46:4309:36:28
08:06:04 08:14:53
26-Oct
CommunicationAuthoringBrowse/SearchOtherIdle
20-Oct15-Oct
13:53:4413:40:04
13:26:1412:49:11
12:32:3211:35:37
17:24:1517:05:52
16:57:1516:44:00
Fig. 15.2 Beth’s time log for three separate days. The logged activities are identified in the key.
The black boxes indicate periods in which all applications were viewed for less than 5 s. Published
with permission from (Milic-Frayling et al. 2006)
308 N. Milic-Frayling and R. Jones
information packs required the use ofMSWord,MSOutlook,MS Excel, IE, andMS
PowerPoint together. Since these applications are disconnected and optimized for
work on a specific type of documents, the participants experienced many problems
when trying to use them in concert and move content between them. Often they
would compose a document in MS Word and then export it to MS Outlook to find
that fonts have changed, bullet points had moved, and spacing was irregular.
Some of the participants’ practices were shaped by the customers’ demands.
For example, some customers avoided email attachments and expected information
to be included in the body of the email message. That meant either using email as
the main authoring environment or moving content from MS Word and MS Power
Point documents into the email message.
15.2.1.2 Impact of Communication on Content Storage and Organization
The predominant use of email as communication media had ripple effects not only on
authoring practices but equally so on the storage and organization of project and
customer related information (Fig. 15.3). Incoming emails were prioritized and often
time-stamped bymoving into theOutlook calendar.Once read and dealtwith, important
messages were filed and kept as accountable records, especially regarding communica-
tion with clients or journalists. The participants often filed emails into folders named
after clients. This expanded role of email resonated well with the notion of email as
a habitat by (Ducheneaut and Bellotti 2001) and observations by (Mackay 1988).
Fig. 15.3 Aggregate statistics across participants show the typical time breakdown across desktop
applications and implied task types on a typical work day
15 Evolving Metaphors for Managing and Interacting with Digital Information 309
A female participant took a lead in filing and organizing information for the group.
If email arrived from someone new, she would set up a new folder. Her colleagues
often asked her for the location of information they were looking for. The team
primarily used the folder system within MS Outlook but also filed emails on the
shared network \G: drive, especially those that were needed by other team members.
Participants frequently “lost” documents and emails, and spent time looking for
them across Outlook folders, personal network drives, and the shared drive. They
searched Outlook by a sender or by using the “advanced search” feature. This was
often a frustrating experience: “Outlook searches against names and sometimes
keywords, but never by dates. It would be good if search could break out the results
by months rather than just a long list.”
In many instances, the information was embedded in received and sent emails
that were carefully archived in network folders. However, the context of an attached
document or email was lost once removed from Outlook and placed on a local or
a shared drive. There it was not possible to search on all the email attributes: “It is not
as easy to search on the H (personal network) drive as it is in Outlook. Explorer does
not recognize names, just keywords in files. No option to search by asking who did
you get it from or who sent it to you.” Indeed, our study pre-dated the desktop search
releases byGoogle andMicrosoft. Google delivered this feature in the form of a plug-in
associatedwith theWebbrowser toolbar. Previous research did show the importance of
search as a unifying access medium across application data silos (Dumais et al. 2003).
15.2.2 Changing Information Management Metaphors
Dynamic, fast-paced environments, such as the one we observed, are becoming
more commonplace (Gonzalez and Mark 2004). The workers have to accommodate
emergent activities and work reactively when needed. They are required to accom-
plish multiple tasks in parallel, despite the constant interruptions. As the study
revealed, their experience is affected by several distinct issues:
• Increased rate of acquiring and handling information leads to higher demands for
multi-tasking and micro-switching between applications
• Information silos created by desktop applications hinder the user tasks that
involve multiple applications and services
• Communication media, like email, is emerging as a unifying function that
includes and contextualizes digital artifacts but lacks appropriate access and
archiving support.
Over the years, researchers have proposed alternatives to the traditional PC
desktop metaphor and designed tools to address similar information management
issues. (Kaptelinin 2003) provides a comprehensive review of proposed models and
tools. Among them are attempts to create dedicated project spaces, such as Rooms
(Henderson and Card 1986), Task Gallery (Robertson et al. 2000), Manufaktur,
Kumira, and similar. Well aligned with our observations are recommendations to
310 N. Milic-Frayling and R. Jones
create communication based work environments such as the Taskmaster and
ContactMap. These approaches either imposed a significant overhead onto the
workers’ time or excluded important work scenarios. Systems like Lifestreams,
Presto, MIT Semantic File System, and MIT Haystack attempted to remove the
overhead of maintaining the hierarchical organization of the file system but had
limited success. By relying on search and filtering, the users had to define formal
criteria to select relevant information and that has proven to be difficult in general.
In search of appropriate metaphors to support the user across the evolving
paradigms, from desktop and Internet to cloud and multi-device computing, we
consider two important aspects: the shift from storage management to content
organization and the benefit of activity focus management of information to capture
a rich usage context.
15.2.2.1 Content Storage and Organization
Considering the use of Web based authoring, e.g., through blogs, forums, online
wikis, and social sites like Facebook and Flickr, we observe a separation between
storing and organizing the content. The storage is tightly coupled with the Web appli-
cation and hidden from the author. Thus, the user is mainly focussed on organizing
information through the facilities provided in the user interface. Organizing content
across Web services is accomplished through the Internet browser bookmarks that
include URL references to the content. Instead of managing the digital artefacts
themselves, the users are organizing resource links that are resolved through the
Internet services in order to present the content upon the user request.
Within the desktop environment, the separation of storage and content organi-
zation can be observed in the design of applications such as MS OneNote that is
based on the notebook metaphor. The storage of individual OneNote pages and
files embedded in the pages is not transparent. Yet, the user can organize infor-
mation in a variety of ways. Storage and organization of the content are completely
enclosed within the single application interface.
We also note that user activities in forums, discussion groups, Facebook, and
Twitter tightly integrate publishing and authoring and emphasize instant exposure
of the content. This is facilitated by constraining the user interaction and content
format to a predefined organizational structure. By supplying templates that deter-
mine organization and interaction, these services redirect the focus from content
storage and organization to the information broadcast.
15.2.2.2 Application and Activity Management
User interaction with digital information comprises authoring, storing, viewing,
organizing, and sharing content through communication and publishing media.
Optimization of these functions within and across applications directly impacts
the user experience and the outcome of the user activities.
15 Evolving Metaphors for Managing and Interacting with Digital Information 311
Furthermore, from the study we observed that email, as a content rich communica-
tionmedium, captured resources related to the user tasks and provided valuable context
for embedded documents. Generalizing this principle, we anticipate that mechanisms
for capturing association of resources based on user activities would be beneficial.
Refocusing information organization towards activities rather than software
applications has been proposed back in 1983 by (Bannon et al. 1983). However, that
has to be approached carefully. Resorting to the workspace metaphor and clearly
delineating individual activities may cause problems with activity switching, similar
to those observed with application switching. Furthermore, user activities may be
transient, dynamic, or not yet completely formed (Bardram et al. 2006; Kaptelinin
2003).Thus, the user may have difficulty pre-defining their activities. The same
problem was observed in the organization of applications and documents into “work-
ing spheres” suggested by (Gonzalez and Mark 2004; Mark et al. 2005).
Persisting the activity state is likely to help with interruptions that disrupt the
planned workflow and often originate from communication. Czerwinski et al.
(2000) looked at the implications of IM for interruptions in task management.
In the diary study (Czerwinski et al. 2004) analyze the requirements for tools
to aid with the recovery from interruptions and developed the GroupBar tool
(Smith et al. 2003) that enables the user to organize application windows in sets
that can be easily folded or evoked to switch from one task to another.
15.2.2.3 Unified User Experience
Considering the interconnection of authoring, storing, viewing, organizing, and
sharing digital content, one is presented with a dilemma whether to
• Support all or most of these phases in each application
• Provide a meta-layer that enables interconnection of applications and leverage
complementary functions across them.
Reflecting upon the work that has been done so far, we see several trends.
(Bellotti et al. 2003) implemented the Taskmaster, a system designed for tight
integration of email, task, and project management. They introduced thrasks—threaded task-centered collections of items and aggregated rich metadata about the
task related content. (Boardman and Sasse 2004) suggested to look at a broader
spectrum of user activities to understand the benefits of alternative design strate-
gies. They contrast the Taskmaster approach with the approach taken in (Isaacs
et al. 2002) and (Kaptelinin 2003). There the design involves a consolidated
interface that unifies interaction across the tools. In the Stuff I’ve Seen system by
(Dumais et al. 2003) this is accomplished through the unified search interface.
In UMEA (User-Monitoring Environment for Activities) by (Kaptelinin 2003)
the system provides a project centered overview of the information space with
integrated user monitoring function and user data.
With these insights, we decide to explore a hybrid approach. We designed
a unifying tagging facility for gathering activity related resources across desktop
312 N. Milic-Frayling and R. Jones
applications and on-line services. At the same time,we extended each applicationwith
the activitymanagement features that enable task switching andmicro-switching from
the context of the current application. We essentially move towards the metaphor of
virtual collections where light-weight tagging is a mechanism for collecting relevant
content from remote or local repositories, in the context of the user work.
15.3 Supporting Activities in Dynamic Work Environments
The user’s perspective and objectives evolve over time and that is reflected in the
reuse of existing documents or creation of new ones. A collection of digital
resources used at a given time represents a stage in the user’s activity and offer
a valuable context. In order to capture these stages, we implemented a TAGtivity
prototype that enables light-weight tagging of resources relevant to the user work
including local and remote documents, email messages, Web pages and storage
locations such as folders, services, and databases (Oleksik et al. 2009). Labelling of
resources can be done at the desktop level and within individual applications to
enable tagging and micro-switching without leaving the current application context.
A similar approach has been taken by (Voida et al. 2008), with the Giornata
system, and promoted through the Placeless Documents project and the resulting
Presto system (Dourish et al. 1999). With Giornata, the user can operate in multiple
virtual desktops and separate resources associated with distinct activities. Any file
accessed within a specific desktop is automatically linked to the corresponding
desktop tag. One can also assign tags to individual files but that was designed
as a less accessible feature, through the file property settings. Presto, on the other
hand, allowed users to specify and apply attributes to individual documents and use
them to retrieve, index, and organize documents for specific tasks. While the users
could browse and further tag the resulting collections through a purposefully
designed Vista browser, the tagging facility was not closely integrated with the
desktop applications and the user’s workflow.
We have learnt from these systems and based the TAGtivity prototype on two
essential design principles:
1. A user activity is represented as a set of references and metadata rather than the
document files themselves, and
2. Document tagging is light-weight, based on the user generated labels. That
reduces the overhead of maintaining a strictly controlled vocabulary of tags
and maximizes the flexibility of use.
We deployed the TAGtivity in a natural user setting and observed the use of
tagging over time. We analyzed the emerging usage patterns and the principles of
organizing information that the users applied. The details of the study and the
results are reported in (Oleksik et al. 2009). Here we discuss the findings that are
pertinent to our reflections on the changing desktop metaphor. In particular, we
demonstrate the use of references to create logical organization of distributed
15 Evolving Metaphors for Managing and Interacting with Digital Information 313
and heterogeneous resources that complement the file system organization. We
also point to the affordances of tagging that help with the issues identified in
Sect. 15.2: the management of dynamic content streams and the increased demand
for multi-tasking and attentiveness to emerging tasks.
15.3.1 TAGtivity Features
The TAGtivity prototype consists of two UI components, the TAGtivity Manager
implemented as a deskbar and the TAGtivity Toolbar associated with individual
applications. They both enable the users to assign existing tags to resources or
create a new tag as needed. The prototype is compatible with Microsoft Windows 7,
Vista and XP operating systems, and the Microsoft Office 2007 suite. In addition to
the MS office documents, one can tag a broad range of document types by using
a drag-and-drop facility to associate them with the activity.
Figure 15.4 shows the TAGtivity deskbar that includes the TAGtivity Manager
(TM), a centralized place where users can manage tags and access their activities and
resources. The TM displays a list of the user’s tags in a selected order: alphabetically,
by recency of use, or by the number of associated resources. One can access a resource
by clicking on the title. The reference to the resource file is resolved via a database that
stores the association of tag labels and corresponding files or locations. The file or
location is then opened in the corresponding software application.
The text box at the top of the list allows the user to access a specific tag or to
create a new one. By typing text into the text box, the list of tags is filtered to show
only matching tags. If the keyword is not found in the list, the user can choose to
Fig. 15.4 TAGtivity Manager comprising the list of user tags, tagged resources, and thumbnail
previews with metadata about each item
314 N. Milic-Frayling and R. Jones
use it as a new tag. Once the tag is created or found, the user can drag and drop
a resource onto the tag to create the association.
We also designed and implemented a TAGtivity Toolbar as an extension of
the main MS Office 2007 applications: Word, Excel, PowerPoint and Outlook, and
Internet Explorer 7 (IE7) (Fig. 15.5). Within the IE7 browser, each browser tab
is handled independently. TAGtivity Toolbars are located at the bottom of each
application window.
Similarly to the TAGtivity Manager, the toolbar also provides a text box for the
user to type in keywords and find existing tags or create new ones. The user can
attach a tag to the current resource by selecting a tag from the list or by typing a new
one into the text box.
One important aspects of TAGtivity is the integration with the file system.
TAGtivity enables the user to associate files and folders with tags. The user can
simply drag and drop a folder onto the tag and confirm whether to associate
individual files with the tag or the folder as a whole. In the former case, all the
files from the folder are added to the activity list and can be accessed independently.
In the latter case, the folder location is added to the list and the user can access the
folder content through the file system hierarchy presented in theWindows Explorer.
Furthermore, the user can “export” a tag. The TAGtivity application would
create a tag folder within the File System containing the links to resources and
metadata about files associated with the tag. The metadata typically includes the
title, the author, the storage location, and a thumbnail image of the application
Fig. 15.5 TAGtivity Toolbar for Microsoft PowerPoint, showing the set of tags associated with
the document and the list of items associated with the specific tag “NodeXL Study – UMD.” The
thumbnail of a specific item is display on mouse hover over the title
15 Evolving Metaphors for Managing and Interacting with Digital Information 315
displaying the file or location. If desired, the export function can create a copy of
all the associated files, thus providing the archiving support for the past activities.
15.3.2 TAGtivity User Study
We observed the use of the TAGtivity prototype by 16 participants over the period
of 3 weeks (Oleksik et al. 2009). The participants included four employees of
a small software development company, seven research interns, three full-time
research scientists, one intern with a legal department, one independent market
researcher, and one small business owner. Participants were aged between 20 and
60, 14 male and 2 female. For their participation, they were compensated with
computer software or hardware accessories at the end of the study.
During the study we conducted four interviews with each participant, first to
capture information about the existing data management practices and then to learn
about their use of TAGtivity over time. We recorded and transcribed the interviews
and analysed TAGtivity logs to study the relationship between the tags and the
projects, tasks, and activities that the users conducted during the time of the study.
This revealed that:
• Tagging extends the file system function by providing additional views or
logical organization of the content in the file system organization.
• With tags, the users can capture ephemeral information that would otherwise
be unrecorded since the users would not create a folder to hold it.
• Tagging supports activity management. It helps with collecting resources related
to a task, enables flexible switching between tasks, and allows association of the
same resource across multiple tasks.
A detailed study report is provided in (Oleksik et al. 2009). Here we outline
several aspects that are of interest to our discussion of the changing paradigms and
metaphors.
The tags were used for the management of ephemeral information, as assistance
for time saving, planning and emerging activities, and for reorganizing and repur-
posing content across tasks through alterative logical grouping of the data.
These aspects are directly relevant to the heightened demand on the respon-
siveness and multi-tasking that the users are exposed to, as observed in Sect. 15.2.
In the extended PC environment, the relatively stable organization of files and
folder has been replaced by a new metaphor, the flow of information. Diversecontent streams have penetrated the PC environment through the Internet browser,
email, and RSS feeds and require support for capturing, organizing, and revisit-
ing relevant content. Similarly, increased communication triggers new tasks and
require support for planning of activities.
From the interviews we found, for example, that tags offered the means of low-
overhead planning and time saving:
316 N. Milic-Frayling and R. Jones
• Place holding. Eleven users created tags as place holders for future activities.
For example, one of the participants created a tag to gather interesting papers
and online links related to robotics. However, he did not add any items to the tag
until a week later. In this instance, the tag was created in view of the user’s
anticipation to find relevant resources at some future point.
• New project. Eleven users created a tag at the beginning of a new project. Unlike
place holding tags, these were created with the intention to label relevant
documents right away.
• Time saving. Ten users created a tag to mark resources that were difficult to find.
This was often the case with documents found through search and browsing.
By bookmarking the item using a tag, the users circumvented the need to engage
in the same process again.
Furthermore, the tags were used for information gathering across desktop and
online services, normally not well supported by the standard desktop facilities.
They enabled groupings of heterogeneous resources that could not have been put
together through a single application, i.e., the file browser or the Internet browser.
During the study 742 resources were tagged and 608 were accessed through the
TAGtivity user interfaces. They covered a broad range of tagged resources, includ-
ing 157 email messages, 98 Web pages, and 174 Word or PDF documents. This
confirmed the important property of the tags as a unifying mechanism for accessing
resources across data silos and capturing relevant information from email streams.
Finally, the tags provided a unique advantage over the existing, more static
and permanent filing metaphor. By convenient capture of references to relevant
content, they supported short term, transient, and emerging activities and enabled
meta-organization of resources needed in the user’s tasks.
Indeed, TAGtivity was found effective for managing short term tasks and early
stages of longer term activities. This was observed with 12 users. Tags enabled
them to collect and associate resources before a task was well formulated. The tag
names could be easily modified as the task progresses. For short term tasks,
TAGtivity helped to manage resources up to the task completion, at which point
the tag would be removed if the resources were not deemed relevant any more.
Generally, the tags were kept and left traces of transient activities that would
normally not warrant creation of a file or a bookmark folder.
TAGtivity was also used to create alternative views, i.e., logical organizations
of the content in the file system or other data stores. One of the participants, who
conducts market research for various customers, stated that the TAGtivity enabled
alternative ways to “organize my files without creating them [folders]; so it helpedme group them based on my processes and my needs.”
15.3.3 Discussion
The study of TAGtivity revealed that simple desktop tagging can provide significant
benefits. Participants often commented on the ease and low overhead of creating tags.
15 Evolving Metaphors for Managing and Interacting with Digital Information 317
This perception made tagging attractive even for the most transient activities.
It enabled the creation of collections that could be easily dispersed when not further
needed. Indeed, since tags only reference the content, deleting a tag is low risk, almost
non-consequential. This fine interplay between the persistent file storage and the tags
supports a rich set of new practices and enable transient user needs and early-stage
information management.
While TAGtivity is not an activity management application in the sense of
(Smith et al. 2003), (Boardman and Sasse 2004) and (Bardram et al. 2006), it has
proven to support users in performing their everyday tasks. By enabling tagging
from within applications, it supports multi-tasking. The users can tag the document
with multiple tags and easily access related resources without changing the current
context. This also helped with interruptions since the user can capture resources of
an emerging task without shifting the focus of their work. Finally, through tagging
the users brought together and made easily accessible the content that was other-
wise buried in the application specific data stores, like email exchange, or the file
system hierarchy. The visibility of tags and tagged resources had raised the aware-
ness and served as a reminder of activities that required attention.
Through the observed practices and feedback from the users, we gained valu-
able insights about the very essence of tagging, i.e., the use of references to create
associations among digital objects. The importance of the “broken” references and
the access control became apparent. If tagging were to be adopted broadly,
it would need to be supported by extensions of the file system with a notification
mechanism for tracking changes in the file status such as deletion or movement
to another storage location. With regards to the access control, the tagged items
currently inherit it from the file management system. In the future, this can be
reversed by enabling the users to specify access properties at the tag level and have
them propagated to the file and folders of the tagged resources. Finally, sharing of
tagged resource collections and reuse of tags by multiple individuals opens up
a host of additional design considerations. For example,(Mendes Rodrigues et al.
2008) outline practices that evolve around social of user-generated content in
online communities.
In summary, activity management as a principle of organizing information and
tagging as a supporting mechanism facilitate an important shift from content
to activity management. They form important building blocks for the emerging
computing paradigm, the computational cloud with distributed data and services
and multi-device personal computing. We reflect on these in the following section.
15.4 Glimpse into the Future
As our studies have shown, through the Web enabled desktop environment indivi-
duals are accessing information and media from distributed data repositories
and services. The use of the Web has caused a significant shift in their practices,
placing more emphasis on publishing and consuming the content. For that reason,
318 N. Milic-Frayling and R. Jones
the notions of the document location and access control have not been central to
the design of information management support on the Web. Indeed, the document
location is important only insofar as the document URL ensures a repeated access.
Thus, the use of bookmarks and search engines to revisit documents has been the
primary means of accessing and managing Web information by the user.
With regards to content authoring, Web blogs and discussion posts are facilitated
by a simplified authoring environment and a limited control over the content storage
and organization. On the other hand, a broad adoption of Web mail services shaped
the authoring experience through the features of email editors and introduced a new
storage paradigm for personal information. Indeed, services like gmail (www.gmail.
com) byGoogle (www.google.com), Yahoo!Mail byYahoo! (www.yahoo.com), and
Hotmail (www.hotmail.com) by Microsoft (www.microsoft.com) enabled a broad
user community to create substantial repositories of emails and associated content.
The observed transformation of email from the communication media into a context
rich data store (Sect. 15.2) has now reached a broad adoption, primarily shaped by the
archiving and content management capabilities of the Web mail services.
Considering the continued expansion of personal computing environments with
novel devices and new modes of accessing services, e.g., through mobile phone
applets, a unified resource management will become essential for the optimal
user experience. Providing an adequate user support will require new metaphors
to guide the design of the information management models and features.
15.4.1 Metaphor Transformations
In this section we reflect upon the new notions that are emerging and the role
they are expected to play in shaping future information systems and practices.
15.4.1.1 From Folders to Collections of Distributed Resources
Broad adoption of Web and mobile platforms have highlighted issues with the user
experience in the highly distributed, heterogeneous, and disconnected data stores.
The TAGtivity study showed the benefit of abstracting the notions of files, folders,
and bookmarks and introducing mechanisms for flexible content organizations
based on references to digital objects.
This points to collections of digital items as the fundamental element of the
data organization, represented by the metadata that refers to the data files and their
locations. Such collections arise in different contexts, assembled using various
mechanisms from hand gestures on the multi-touch devices to tagging as demon-
strated through TAGtivity.
15 Evolving Metaphors for Managing and Interacting with Digital Information 319
15.4.1.2 From Lists to Context Rich Views
In most content management systems, the basic organization structure of resources
is a list, optimized for easy access through sorting and browsing. However, the lists
provide limited support for capturing and describing relationship among individual
resources in the collection or adding information about the collection itself. At the same
time, it is often important to record the meaning and the purpose of the collected
resources.
Building on the TAGtivity model, we propose descriptive representations of the
collections called resource maps (Fig. 15.6). Resource maps can be created auto-
matically from the metadata captured during user interaction with the digital items.
By importing desired content descriptors into an authoring or interactive environ-
ment, one can generate maps through established templates or allow the users to
compose them manually. Once persisted, the maps provide a rich access mecha-
nism to the collection items and become an integral part of the collection.
The notions of the collections and the corresponding resource maps are impor-
tant vehicles for creating knowledge in different contexts, from gathering and
reflecting upon content during brain-storming to sharing knowledge through
discussions and publishing.
Fig. 15.6 Creating visual representations of a distributed and heterogeneous collection of
resources by tagging the resources, exporting the metadata into an authoring application and
saving the map in a sharable format with active links to the resource files, Web pages, or services
320 N. Milic-Frayling and R. Jones
15.4.1.3 From Documents to Composite Information Constructs
In order to support a broader range of information needs, it is often necessary to
relinquish the traditional boundaries of files and applications. The users organize
information into compositions of digital information created by combining parts
of documents rather than using collection of files. Indeed, knowledge is often
conveyed by bringing together text passages, images, and media content that,
in the new assembly, convey a meaning beyond individual parts.
In summary, we suggest that the traditional file organization, database schemas,
and hypertext forms can be further enhanced with the context-rich representations
of digital collections in terms of resource maps and content compositions. These
new representations capture the semantics and enable sharing and propagation of
knowledge. At the same time they are connected with the resources and facilitate
easy switch to detailed view of individual files. They become the essential means
for browsing and reusing information from distributed information environments
and services.
15.5 Concluding Remarks
In this chapter we reflected upon the evolution of information management
practices over the past decade and extrapolated into the next era of information
management in the computational cloud and across distributed data stores, services,
and devices. Direct observations of users revealed issues with the support for
content management, from authoring, storing, and viewing to publishing and
sharing. These have been amplified by the transformation of the PC desktop through
the connectivity to the Internet and the prolific use of Web services. Based on
the related research and our own user studies, we hypothesize about the metaphors
that are likely to shape content management in the cloud computing environment
and across multiple personal computing devices. We promote an activity based
approach in which the fundamental elements are collections of resources, including
content locations and applications relevant for the user task. We demonstrate how
such collections can be created through light-weight tagging. With low overhead
and flexibility in creating, naming, and dispersing collections, this approach
minimizes the disruption to the main user task. However, it is important to explore
further the implications of managing and organizing potentially a large number of
tagged collections and dealing with the complexities that arise from multiple
contexts in which digital objects may be used.
We here conclude with the mention of several key aspects that are likely to shape
the future of content management and require further considerations. First, as more
valuable information is stored in the digital form there is a need for its secured
persistence over long periods of time. At the same time, we are moving away
from static files towards “live” and dynamic documents. Second, a substantial
15 Evolving Metaphors for Managing and Interacting with Digital Information 321
amount of information is now generated through social media and various forms of
crowdsourcing efforts. This trend will continue to shape the future content services
and information management practices.
Acknowledgments Prototypes and research studies described in this Chapter are conducted by
the Integrated Systems Group at the Microsoft Research Lab in Cambridge, UK in collaboration
with Instrata Ltd., Cambridge, UK. Special acknowledgments go to Yvonne Sanderson and Gerard
Oleksik from Instrata Ltd. for their work on the user studies. Eduarda Mendes Rodrigues
conducted quantitative analyses and, together with Gabriella Kazai and Annika Hupfeld worked
on the design of the TAGtivity features. We particularly recognize the work by Gavin Smyth who
is solely responsible for the implementation and deployment of TAGtivity software. The prototype
is publicly released as the Microsoft Research Project Colletta http://research.microsoft.com/en-
us/um/cambridge/projects/ResearchDesktop/ProjectColletta.
References
Bannon L, Cypher A, Greenspan S, Monty ML (1983) Evaluation and analysis of users’ activity
organization. In: Proceedings of CHI 1983. ACM Press, New York, pp 54–57
Bardram JE, Bunde-Pedersen J, Soegaard M (2006) Support for activity-based computing in
a personal computing operating system. In: Proceedings of CHI 2006. ACM Press,
New York, pp 211–220
Bellotti V, Ducheneaut N, Howard M, Smith I (2003) Taking email to task: the design
and evaluation of a task management centered email tool. In: Proceedings of CHI 2003.
ACM Press, New York, pp 345–352
Boardman R, Sasse MA (2004) “Stuff goes in the Computer but it doesn’t come out”: a cross-tool
study of personal information management. In: Proceedings of CHI 2004. ACM Press,
New York, pp 583–590
Cutrell E, Robbins DC, Dumais S, Sarin R (2006) Fast, flexible filtering with phlat – personal
search and organization made easy. In: Proceedings of CHI 2006. ACM Press, New York,
pp 261–270
Czerwinski M, Cutrell E, Horvitz E (2000) Instant messaging and interruptions: influence of task
type on performance. In: Proceedings of OZCHI 2000. ACM Press, New York, pp 356–361
Czerwinski M, Horvitz E, Wilhite S (2004) A diary study of task switching and interruptions.
In: Proceedings of CHI 2004. ACM Press, New York, pp 175–182
Dourish P, Edwards K, LaMarca A, Salisbury M (1999) Presto: an experimental architecture for
fluid interactive document space. ACM Trans Comput-Hum Inter 6(2):133–161
Ducheneaut N, Bellotti V (2001) E-mail as habitat: an exploration of embedded personal infor-
mation management. Interactions 8(5):30–38
Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins D (2003) Stuff I’ve seen: a system
for personal information retrieval and re-use. In: Proceedings of SIGIR 2003. ACM Press,
New York, pp 72–79
Fetterly D, Manasse M, Najork M, Wiener J (2003) A large-scale study of the evolution of
web pages. WWW 2003, 669–678
Gonzalez VM, Mark G (2004) Constant, constant multi-tasking craziness: managing multiple
working spheres. In: Proceedings of CHI 2004. ACM Press, New York, pp 113–120
Henderson DA, Card S (1986) Rooms: The use of multiple virtual workspaces to reduce space
contention in a window-based graphical user interface. ACM Transactions on Graphics 5,3,
211–243
322 N. Milic-Frayling and R. Jones
Isaacs E, Walendowski A, Whittaker S, Schiano DJ, Kamm C (2002) The character, function,
and styles of instant messaging in the workplace. In: Proceedings of CSCW’02. ACM Press,
New York, pp 11–20
Jones WP, Bruce H, Dumais S (2001) Keeping found things found on the web. In: Proceedings
of CIKM 2001. ACM Press: New York, pp 119–126
Jones R, Milic-Frayling N, Rodden K, Blackwell A (2007) Contextual method for the redesign
of existing software products. Int J Hum Comp Interact 22(1–2)
Kaptelinin V (2003) UMEA: translating interaction histories into project contexts. In: Proceedings
of CHI 2003. ACM Press, New York, pp 353–360
Kerne A, Koh E, Smith SM, Webb A, Dworaczyk B (2008) CombinFormation: mixed-initiative
composition of image and text surrogates promotes information discovery. ACM Trans Inf
Syst 27(1):1–45, Article 5
Mackay WE (1988) More than Just a communication system: diversity in the use of electronic
mail. In: Proceedings of CSCW 1988. ACM Press, New York, pp 26–28
Mark G, Gonzalez VM, Harris J (2005) No task left behind? Examining the nature of fragmented
work. In: Proceedings of CHI 2005. ACM Press, New York, pp 321–330
Mendes Rodrigues E, Milic-Frayling N, Fortuna B (2008) Social tagging behaviour in community-
driven question answering. In: Proceedings of the 2008 IEEE/WIC/ACM International
Conference on Web Intelligence, WI’08, IEEE 2008, Sydney, Australia, pp 112–119
Milic-Frayling N, Sommerer R, Tucker R (2002) MS WebScout: web navigation aid and personal
web history explorer. Poster paper 170, WWW 2002
Milic-Frayling N, Jones R, Mendes Rodrigues E (2006) User study of interconnection
among communication, authoring, and information management processes. Microsoft research
technical report MSR-TR-2006-96
Oleksik G, Wilson ML, Tashman C, Mendes Rodrigues E, Kazai G, Smyth G, Milic-Frayling N,
Jones R (2009) Lightweight tagging expands information and activity management practices.
In: Proceedings of CHI 2009, ACM Press, New York, U.S., pp 279–288
Robertson G, van Dantzich M, Robbins D, Czerwinski M, Hinckley K, Risden K, Thiel D,
Gorokhovsky V (2000) The task gallery: a 3-D window manager. In: Proceedings of CHI
2000. ACM Press, New York, pp 494–501
Smith G, Baudisch P, Robertson GG, Czerwinski M, Meyers B, Robbins D, Andrews D (2003)
GroupBar: the taskBar evolved. In Proc. of the 2003 Australasian Computer-Human Confer-
ence, OzCHI 2003, S. Viller and P. Wyethh (eds), CHISIG 2003, Available on web at http://
www.ozchi.org/proceedings/2003/ozchi2003.pdf, Accessed date 15 Aug 2011, pp 34–43
Teevan J, Dumais ST, Liebling DJ, Hughes R (2009) Changing the way people view changes
on the Web. In: UIST 2009. ACM Press, New York, pp 237–246
Voida S, Mynatt ED, Edwards WK (2008) Re-framing the desktop interface around the activities
of knowledge work. In: Proceedings of UIST 2008. ACM Press, New York, pp 211–220
15 Evolving Metaphors for Managing and Interacting with Digital Information 323
Part V
Conclusions
16
Conclusions
Paul Warren, John Davies, and Elena Simperl
16.1 Introduction
We started this book by describing three challenges which we saw as important for
increasing the efficiency and effectiveness of knowledge work: the failure to fully
share knowledge in organisations, including knowledge about informal processes;
the problem of information overload; and the disruptive effect of continual changes
of task focus. These beliefs were confirmed by our work in the ACTIVE case
studies, described in Part III of our book. We also started our book by talking about
three technology areas which we thought were important in tackling these
problems: the synergy of the informal approach of Web2.0 and a more formal
approach to semantics based on the use of ontologies; the use of context to deliver
information related to the user’s current task; and tools to support the informal
processes which underlie how we undertake our daily work. ACTIVE’s work to
develop these technologies has been described in Part II; the work in the case
studies has also confirmed the importance of these technologies. Moreover, we
found that others shared the same intuition about the problems of knowledge work
and were investigating related technological solutions. Some of this work is
described in Part IV of our book. In this chapter we briefly review these three
technology areas, in each case looking at the likely trends for the future and
indicating some research challenges.
P. Warren (*)
Eurescom GmbH, Wieblinger Weg 19/4, D-69123 Heidelberg, Germany
e-mail: [email protected]
J. Davies
British Telecommunications plc., Orion G/11, Ipswich, IP5 3RE, Adastral Park, United Kingdom
e-mail: [email protected]
E. Simperl
Institute AIFB, Karlsruhe Institute of Technology (KIT), Berlin, Germany
e-mail: [email protected]
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5_16, # Springer-Verlag Berlin Heidelberg 2011
327
16.2 Web2.0 and Semantic Technologies
A prominent feature of Web 2.0 has been the popularity of wikis. Semantic wikis,
representing a synergy of Web2.0 and semantic technology, are now an important
topic. At recent conferences addressing the application of semantic technologies to
industry, the use of semantic wikis has been a major theme. This theme was taken
up in Chap. 3 which described work in ACTIVE to extend the use of the SMW.
Complementing this, Chap. 12 described both how the concept of the SMW is being
developed to make it more adapted for use in the enterprise and how the SMW is
being used for enterprise process and application modelling.
Another important aspect of Web2.0 is the use of tags to describe web-pages,
photos and all forms of information objects, thereby creating folksonomies.
Folksonomies can be contrasted with the ontologies of more formal semantics, or
the semi-formal taxonomies frequently used in content management systems.
Folksonomies are perceived as offering a lower barrier to use, in particular by
occasional users. Chapter 15 describes a tagging system and users’ reaction to it. In
ACTIVE we have supported the creation of folksonomies through the use of
machine intelligence to make tag suggestions. Another way in which Web2.0 and
more formal semantic technologies come together is through the use of algorithms
for learning taxonomic or ontological structures from folksonomies. There has been
significant work in this area. For example Heymann and Garcia-Molina (2006) have
developed an algorithm which converts a tag cloud into a hierarchical taxonomy.
The starting point is to create a tag vector for each tag, of dimensionality equal to
the number of objects, and such that the component in each dimension is the
number of times the tag has been applied to a particular object. From this, the
cosine similarity between tag vectors is used to calculate the similarity between
tags. These similarities are used by the algorithm to create a taxonomy. Hotho and
J€aschke (2010) provide an overview of the state of the art in ontology learning from
taxonomies, as of late 2010.
A related theme is that of the social semantic desktop, discussed in Chap. 13.
Here we saw how the semantic interpretation of available metadata could enrich the
user’s experience. This approach requires neither the creation of formal, ontological
metadata nor informal tags. Instead, it makes use of metadata available but nor-
mally not exploited.
In the coming years we are likely to see these synergistic approaches between
Web 2.0 and formal semantics being adopted more widely in the enterprise and also
in personal applications on the WWW. Amongst the research questions which
remain are:
• What kind of informal semantic structures, and modes of presentation, are most
natural to users? How does this vary between applications and types of users?
• How far can we improve machine learning algorithms to create ontologies from
user tagging? We have to bear in mind here that the quality of such an algorithm
is in part subjective; what matters is the user’s perception of the improved
experience made possible by the ontology
328 P. Warren et al.
• To what extent could the use of tags replace the use of conventional hierarchical
folders? Could this trend be encouraged by the automatic learning of ontological
structures from tag classes?
• To what extent does the notion of informal semantics, as implemented in the
SMW, scale to the WWW? Could web page editors be encouraged to associate
informal semantics with their hyperlinks, creating a kind of global SMW? Of
course, the rel attribute in RDFa permits the association of semantics with
hyperlinks, but this is in the world of formal semantics and requires a relatively
sophisticated user. Can we create user interfaces which encourage the creation
and use of vocabularies to describe web links and resources?
16.3 Context
We have already observed that context is a very broad concept, incorporating ideas
such as location as well as the task focus which was central in ACTIVE. During the
last decade of the previous century and the first decade of the current century there
were a series of conferences devoted to modelling and using context. The most
recent of these was held in 2007 (http://context-07.ruc.dk). A glance at the list of
papers reveals the range of topics covered. One topic included was the use of
ontologies. The joint themes of context and ontologies have been taken up in the
CIAO workshop series on Context, Information And Ontologies; see http://ftp.
informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-626/ for the proceedings
of the 2010 workshop. Outside of the research arena, the analysts Forrester
(http://www.forrester.com) have highlighted the importance of context in informa-
tion delivery. Rugullies et al. (2007) report on a survey which asked North
American IT and business professionals about the importance of context. When
asked “how important is it that content is delivered to users within the context of the
business process in which they are involved?” 60% responded that it was
‘extremely important’ and 38% thought it was ‘somewhat important’. There was
also considerable support for just-in-time eLearning. When the same people were
asked “how important is it that eLearning is available at the point in time when the
user needs it?” 32% thought it was ‘extremely important’ and 51% thought it
‘somewhat important’. This supports the importance of the work-integrated
learning approach described in Chap. 14. Rugullies et al. also commented on the
need to “provide context with a sufficient level of simplicity to make implementa-
tion practical while at the same time addressing the intent of the worker”.
In ACTIVE, work on context integrated the explicit creation and manipulation
of context by the user with the automatic creation and management of context using
machine learning techniques. Chapter 5 described the former approach, depending
on user intervention; whilst Chap. 7 described some of the algorithms used to
automatically exploit context.
In a world where people’s information needs change rapidly, as does their mode
of interacting with information, context-based information delivery will remain
16 Conclusions 329
high on the research agenda and will increasingly find its way into user applications.
In this book we have outlined two complementary approaches. In ACTIVE we have
combined a relatively simplistic approach to modelling context with the use of
sophisticated machine-learning techniques. The approach described in Chap. 14
makes use of more sophisticated ontological modeling, besides also employing
machine learning techniques. Each approachwill be appropriate for different domains.
However implemented, context is natural to the way people think and work and
we are likely to see greater use of context, both within applications, e.g., eLearning
as has been pioneered in APOSDLE; and for the general organization of informa-
tion, as has been pioneered by ACTIVE.
Two research themes stand out in their importance. On the one hand, algorithm
research needs to continue, to provide high quality results; acknowledging that this
quality is subjective and must be judged by its impact on the user experience. On
the other hand, the overall user experience needs to be improved to increase its
acceptability to the user. That means the user experience needs to be simple and
natural, and oriented towards achieving the user’s work goals. Moreover, where
machine learning is used to create recommendations, this must be done in a way
which does not intrude on the user. However good the algorithms, the
recommendations will not always be right, and false recommendations must be
easily ignored. In an extreme approach the user might not need to be aware of
context at all. Machine learning algorithms could make use of the concept
of context to make recommendations to users, without users having an awareness
of the context-related basis of those recommendations.
16.4 Informal Processes
The study of informal processes has come as much from the ethnographical
community as from the technological, e.g., (Hill et al. 2006) which was referenced
in Chap. 1 and also (Kogan and Muller 2006), both from the same issue of the IBM
Systems Journal. The latter make some observations about the kind of tools
required for what they call ‘personal work processes’. Chapter 15 also reported
on observations by Microsoft researchers of people’s working practices and the
workflows they used. Chapter 6 described the tools which ACTIVE has developed
to assist in these personal work processes.
Motivated by the role that email plays in most people’s working processes, there
has been research to extract actions from incoming emails, e.g., (Sow et al. 2006)
and (Tagg et al. 2009). Work in ACTIVE has taken a different but related direction.
The project has developed Contextify (http://babaji.ijs.si/contextify/), an Outlook
plug-in which provides contextual information about a selected email. For example,
a sidebar displays information about the sender; a graphical display allows threads
to be visualized; attachments in a particular thread can be easily identified; the
social network of those displayed in a thread can be displayed; and recipients of an
email can be suggested.
330 P. Warren et al.
Supported by references to a number of pieces of field work, Kogan and Muller
(2006) also contend that even where we expect people to be following a formal
approach, actual practice often requires them to deviate from the prescribed pro-
cess. The ProM framework (http://prom.win.tue.nl/tools/prom/) exists to enable an
understanding of how business processes are actually executed. Several hundred
plug-ins are available to allow learning and visualization of actual process flows.
The informal processes which we have been concerned with in ACTIVE are more
varied than business processes executing on a workflow engine. Chapter 6 has in
part described ACTIVE’s response to the challenge this poses.
Processes underlie all knowledge work, and understanding informal processes is
crucial to improving the productivity of knowledge work. Hence process learning
will remain an important topic. Algorithm development needs to continue; both
algorithms to detect processes, and algorithms to understand the user’s information
needs at each stage of a process. Here the user need not be seen in isolation; an open
research question is how much can be learned from users as a group, and how much
can be learned from one user and applied to another.
Hill et al. (2006) identified the user as an integrator, frequently cutting and pasting
between applications. Our current approach to knowledge work is application-centric;
our information systems are a set of badly-communicating applications. The need
for a different approach was recognized as long ago as the 1990, e.g., Kolojejchick
et al. (1997) describe an information-centric paradigm for the user interface. The
problem of implementation is not just technical, but has to do with the nature of a
market which quite naturally offers a set of independent, often isolated applications.
Given the importance of this problem to knowledge work, it is likely we will see
innovative solutions in the coming years.
The same authors also saw the need for applications to exchange semantic
information. This is a theme taken up by the work on the social semantic desktop
described in Chap. 13.
16.5 Final Words
The information landscape has changed in the years which have elapsed since
ACTIVE, and the other initiatives described in this book, were conceived. The
volume of information has increased, and this has only served to increase the
problems we identified at the beginning of our work. The arrival of Linked Open
Data, not just as a concept but as a reality, has expanded the potential offered by
information on the internet. To achieve that potential we need even more to
understand the user’s information needs in context; to use semantics, but in a way
which is accessible; and to understand and exploit the users’ processes.
So the challenges outlined in our book remain pertinent today. We believe the
solutions we have described are pertinent also, and that in the coming years these
solutions will be exploited and further developed to contribute significantly to
increasing the effectiveness and efficiency of knowledge work.
16 Conclusions 331
References
Heymann P, Garcia-Molina H (2006) Collaborative creation of communal hierarchical taxonomies
in social tagging systems. Technical report 2006-10, Stanford University http://ilpubs.
stanford.edu:8090/775/, Accessed on 9 August 2011
Hill C, Yates R, Jones C, Kogan S (2006) Beyond predictable workflows: enhancing productivity
in artful business processes. IBM Syst J 45(4):663–682
Hotho A, J€aschke R (2010) Ontology learning from Folksonomies. Tutorial at EKAW, Lisbon
Kogan S, Muller M (2006) Ethnographic study of collaborative knowledge work. IBM Syst J
45(4):759–772
Kolojejchick J, Roth S, Lucas P (1997) Information appliances and tools: simplicity and power
tradeoffs in the visage exploration environment. IEEE Comput Graph Appl 17(4):32–41
Rugullies E, Moore C, Markham R (2007) Context is king in the new world of work. Forrester
Research Inc, Cambridge, MA, USA
Sow D, David J, Ebling M, Misra A, Bergman L (2006) Uncovering the to-dos hidden in your
in-box. IBM Syst J 45(4):739–757
Tagg R, Gandhi P, Raaj S (2009) Recognizing work priorities and tasks in incoming messages
through personal ontologies supplemented by lexical clues. ECIS 2009 Proceedings, paper 163
332 P. Warren et al.
Index
A
Ability, 275
ACTIVE
initial challenges, 4
scientific hypotheses, 5
technologies, 4, 325
ACTIVE knowledge work space (AKWS),
6, 7, 94, 154–155, 158, 161–167, 197
enterprise workspace, 154
local workspace, 154
management portal, 155
Microsoft applications, 154
Amazon, 13, 18, 19
AOL, 21
Aperture metadata extraction
framework, 272
APOSDLE, 8, 250, 276–299, 328
Apple, Google, Microsoft, Nokia, Research
in Motion, 23
ASBRU, 251
Assertional effects, 250, 252
Associative browsing, 260
Associative network, 286, 294, 295
Attention, 22
Authoring, 310
B
Beacon, 21
Benefits, 62, 69–71, 73, 78, 84, 88
Bid proposals, 151, 155–161
Bid unit, 151, 155, 161, 162, 164–165,
168–169
Blippy.com, 21
BPMN, 245, 247, 249, 253
Business indicators, 222, 226
Business models, 13, 14, 16, 21
Business processes, 108
Business process re-engineering, 241
C
Cadence Flow Infrastructure (CFI), 197
Cadence ProjectNavigator, 197
Case study
Accenture, 7
BT, 7
Cadence Design Systems, 7
Challenges, domain of data management, 17
Civil society, 22
Classification of knowledge structures, 34
Cloud, 302
Collaboration, 215, 216, 219, 220, 222–226
Collaboration support, 118
Collaborative filtering, 18
Collaborative process development, 117–119
Collective action, 18
Collective intelligence, 17–20
Competencies, 276, 280, 287, 291–294
Computational cloud, 316
Context, 5, 6, 152, 259, 276–283, 286–294,
327–328
association, 163, 165, 168
detection, 5, 103, 154, 162, 163, 166, 167
discovery, 5, 154, 162–164, 166, 167
elicitation, 265, 267
for knowledge-sharing, 152
loss, 107
mining, 130, 133, 135–144
model, 288
ontology, 289
sensors, 289
visualizer, 116–117, 154
Context-aware applications, 24
P. Warren et al. (eds.), Context and Semantics for Knowledge Management,DOI 10.1007/978-3-642-19510-5, # Springer-Verlag Berlin Heidelberg 2011
333
Context-aware service delivery, 23
Contextify, 6, 136–138, 328
Context-rich representations, 319
Context-rich views, 317–318
Context-sensitive content (information), 17,
22–23
Context-specific mobile solutions, 23
Cooperation, 278, 280, 283, 284, 287–288
Co-ordination, civil society, 22
Costs, 62–79, 81–84
estimation, 79
model, 71, 73, 80, 81
Crowd-sourcing, 19
D
Data representation, 128–129, 133–134
Descriptive learning guidance, 279, 298
Design environment, 277, 286, 299
Design Project Visualizer, 195
architecture, 196
back-end, 198–201
front-end, 202
validation, 203–209
Digital content, 310
Discovered contexts, 102
Discussion threads, 140
Domain and scope, 36
Domain of data management, challenges, 17
Dynamic data management, 23–24
Dynamic use contexts, 16, 17
E
E-bay, 18
eLearning, 327, 328
Email, 135–144, 328
evolution of, 8
threads (see Discussion threads)
time spent on, 3
Emerging activities, 314
Enterprise 2.0, 215, 220
Enterprise intelligence, 14
Enterprise knowledge processes, 12–15
Enterprise knowledge structures, 29–58
Enterprise modelling, 244
Enterprise search, 171–178
Ephemeral information, 314
Ethics, 21
Exploiting social, relational or intellectual
capital, 21
Expressivity, 33
F
Facebook, 18, 20, 21, 25
Facebook and Beacon, 21
Factbook, 152, 153
Field trials, 161–169
Flikr, 19
FOLCOM, 62, 70, 73, 76, 83, 85, 88, 89
Folksonomy, 19, 45
Formality, 34
Formal models, 34
Freebase, 52–54
G
Gnowsis, 272
Google, 18, 21
Google Buzz, 22
Google, Facebook Amazon, 25
Granularity, 33
Graphics classification, 175
H
Hidden Markov model, 197
HP/Palm, 23
Huffington post, 21
I
IBM, 328
Identity management, 140–143
Incentives, 216, 225
Informal learning, 277
Informal models, 34
Informal processes, 5, 6, 224, 328, 329.
See also Knowledge processes
articulation and sharing, 193
knowledge representation, 191, 197
mining and extraction, 191–193, 198–201
visualization and discussion, 201–203
Information-centric paradigm, 329
Information economy, 12
Information integration, 40
Information management metaphors,
308–311
Information management models, 302
Information management paradigms, 301
Information overload, 107
Information quality, 40
Innovation, 216, 217, 219, 222, 224, 226
Intellectual property rights, 21
Interruptions, 304
334 Index
K
KDE Desktop Environment, 268–272
Kiva, 22
Knowledge
base, 280, 284, 286, 292, 293
differentiation from information, 9
intensive tasks, 288, 290
work(er), 9, 108
work productivity, 275, 276
Knowledge indicating events (KIE), 291
Knowledge management, 171
environments, 13
Knowledge processes, 107–125, 189
management, 113–117
model, 198
optimisation, 120
Knowledge spheres, 122
ontology, 122–123
KnowMiner, 249
L
Learning
content, 287, 298
goals, 283, 287, 296, 299
guidance, 276, 277, 279–286, 296,
298, 299
need, 292
Light-weight tagging, 311
LiveNetLife, 197
Lock in, 20
Long tail effect, 14
Long-term user context, 276
M
Machine learning, 127–144, 175, 187
supervised learning, 128
unsupervised learning, 128
Measures, 120
Metaphors, 302
Metrics, 120
Microsoft, 328
MIRROR, 251
Mobile telephones, 23
Mobilise, 22
Modeling paradigm, 33
Models of social enterprise, 22
MoKi, 8, 244–248, 250–252
Monetising collective intelligence, 21
Multi-device personal computing, 316
Multilingualism, 226
Multimedia, 226
Multi-relational clustering, 129
Multi-tasking, 304
N
Named graphs, 262
Napster, 13, 19
Native resources, 263
Native structures, 263
NEPOMUK, 8, 255–272
NEPOMUK Annotation Ontology
(NAO), 260
NEPOMUK architecture, 265–268
NEPOMUK Graph-Metadata
Ontology, 260
NEPOMUK Information Element
Ontologies (NIE), 260, 263–264
NEPOMUK Representational Language
(NRL), 260, 261, 263
Netflix, 18
Network economy, 12–15
Network effects, 13, 20
Network forms of organising, 11, 12
O
OceanTeacher Encyclopedia, 239
Offline working, 153–154
OntoBroker, 236
ONTOCOM, 62, 64, 70, 73–75, 77, 79–81,
83, 88, 89
Ontology, 61–89
questionnaire, 250, 252
OntoStudio, 242
Open innovation, 19
Open Semantic Collaboration Architecture
Foundation (OSCAF), 272
Open source, 19
Organizing, 310
Orphaned concepts, 249
OWL, 245, 249, 252, 253
ontology, 286
P
Participatory design, 276, 278–280
PC environment, 314
Peer-to-peer content sharing, 19
Peer-to-peer lending, 22
Personal information management, 289
Personal Information Model (PIMO),
256, 260, 263–265, 267
Personalized clusters, 179, 185
Index 335
Personalized query, 281
Personal SEmantic Workbench (PSEW),
268–270, 272
Personas approach, 279
Positive returns, 18
Potential market, 219–226
Prediction, 131
Prescriptive learning guidance, 279, 298
Primitive events, 93
Privacy, 121–123, 225, 226
Process mining, 130, 132–135,
176, 177
Process recording, 152, 153
Project knowledge navigation, 194–196
ProM framework, 329
PSI suite of ontologies, 198
PSI Upper-Level Ontology, 198
Pure information businesses, 16
Q
Quality issues, 54
R
RDF. See Resource DescriptionFramework (RDF)
RDFa, 327
Recommendations, 276, 280–287,
291, 293
services, 276, 286, 288
Redaction, 179, 180
Refactoring and optimization, 119–121
Refactoring tool, 121
References, 316
Reflect, 283, 284
Repairing knowledge structures, 54–57
Representation language, 36–37
Requisite corporate competences, 14
Resource Description Framework (RDF),
245, 249
Resource maps, 318
Richness and reach, 15
RSS, 156, 158, 160, 162
S
Search engine, 171–173, 175,
176, 184
Security, 121–123
policies, 122
Semantic data integration, 236, 241–243
Semantic desktop, 289
Semantic Email, 268
Semantic forms, 160, 232, 247
Semantic Media Wiki (SMW), 4, 6–8, 38,
116, 156, 158, 162, 168–169, 182,
197, 245–247, 250, 326
extensions, 158
Semantic-similarity, 294
Semantic toolbar, 231
Semi-supervised clustering,
129–130
Shareability, 35–36
Sharing, 310
Short-term context, 276
Size, 33
Smart phones, 23
SMW+, 8
deployment framework, 237
ontology browser, 234
query interface, 233
semantic tree view, 233
WikiTags, 235
WYSIWYG editor, 231, 232
SMW Ontology Editor, 41
Social and relational capital of networks,
20–22
Social intelligence, 13, 16
Social lending, 22
Socially intelligent targeting mechanism, 20
Social marketing, 20
Social media, 16, 18
Social networking sites, 20
Social networks, 12, 16, 20–22,
137, 139
Social selling, 21
Social Semantic Desktop (SSD), 8, 255–272,
326, 329
Social Semantic Desktop ontologies stack,
260, 261
Social web, 23
Socio-technical systems, 278
Software Usability Measurement Inventory
(SUMI), 164, 165
Spreading activation, 294, 295
Storing, 310
Street View, 22
Summative workplace evaluation, 277
Syndication, 24
T
Tags(tagging), 152, 154, 155, 162, 163, 165,
166, 168, 169, 326, 327
TAGtivity deskbar, 312
336 Index
TAGtivity manager (TM), 312
TAGtivity prototype, 311
TAGtivity toolbar, 313
Task, 314
completion ability, 276
detection, 276, 288, 290
pane, 114
recording, 115
service, 115
switching, 107
wizard, 115
Text-based similarity, 294, 295
Third sector, 22
TNT (text, network, time), 133
Trade-off, 33–38
Transient activities, 315
Triple store connector (TSC), 230
Trust, 21
Twitter, 19
U
Use-context information, 17
User experience, 310–311
User profile, 280, 283, 284, 286,
288, 291–293
User study, 314–315
User tests, 161–169
V
Validation, 183
Value propositions, 22
Viewing, 310
Viral marketing, 20
Virtual collections, 311
Visualization, 141, 176–178
W
Web 2.0, 4, 5, 8, 326
Web 2.0 capabilities, 15–17
Wikipedia, 18, 19
Work environments, 311–315
Workflows, 304
Work-Integrated Learning (WIL),
8, 275–299, 327
Workplace learning, 277–279
Z
Zopa, 22
Index 337