Upload
zagile
View
1.733
Download
1
Tags:
Embed Size (px)
Citation preview
copyright zAgile Inc., 2011 1
Enterprise Knowledge Management with
Wikidsmart®1 and Confluence®2
1 Wikidsmart is a trademark of zAgile Inc. 2 Confluence is a trademark of Atlassian Pvt. Ltd. All other names mentioned in this document are trademarks of their respective companies. © Copyright zAgile Inc. 2011. All information contained in this document is subject to change without notice. No parts of this document may be reproduced in any form without the prior written permission of zAgile Inc., 101 California Street, Suite 2450, San Francisco, CA 94111
copyright zAgile Inc., 2011 2
Summary .....................................................................................................................3
Introduction.................................................................................................................4
A Few Typical Problems with Wikis ..............................................................................4
The Promise of Semantic Wikis ....................................................................................6
zAgile’s Approach to Semantic Enablement of Wikis ....................................................7 Smart ‘Semantic’ Templates.................................................................................................................................. 7 Semantic Macros ........................................................................................................................................................ 9 Faceted search ......................................................................................................................................................... 10 SPARQL Query.......................................................................................................................................................... 12 Machine-‐based Annotation................................................................................................................................. 13 Machine-‐Readable Content ................................................................................................................................ 14 Wikidsmart Architecture ............................................................................................15 zAgile’s Semantic Repository............................................................................................................................. 15 zAgile’s Semantic Interface Layer (zSLayer) .............................................................................................. 15 zAgile’s RPC Server ................................................................................................................................................ 16 zAgile’s Semantic Plugin for Confluence (Wikidsmart) ......................................................................... 16 Conclusion .................................................................................................................17
copyright zAgile Inc., 2011 3
Summary Wikis have become entrenched in the enterprise as the most common ‘groupware’ application for collaboration amongst individuals and teams. They are easy to acquire, easy to setup, and easy to use for capturing ad hoc content that teams desire to share amongst themselves. It is not unusual to find dozens and sometimes even hundreds of wiki instances within enterprises. Whether it is Human Resources, Sales, Marketing, or IT, teams find ways of leveraging the convenience of wikis to publish and share information with each other. While wikis permeate the enterprise, the convenience of capturing information, the ad hoc nature of this information and its lack of structure also contribute to the general failure of the wiki as an effective information collaboration tool. As the published content grows into hundreds and thousands of pages, it becomes increasingly difficult to organize, maintain, access, and search. It quickly loses credibility and becomes stale. Since there is no consistency or discipline with which information must be published or organized, different groups within an organization may experience varying degrees of success with their wiki. Traditional wikis also present additional limitations that often constrain their usage as an enterprise groupware application. These constraints are not a critique of the inherent design of the wiki since it was intended as a simple tool for allowing people to capture information. We discuss them mostly in the context of the evolution of the wiki in the enterprise. For example, the page-‐paradigm for representing content or information does not support ‘formal type’ declaration of specific content in a page or capturing any classification or taxonomic relationship between pages of information. Rather, it restricts the organization to simple page-‐level hierarchies. In multi-‐dimensional taxonomies, it should even be possible for a page to be represented in more than one classification scheme. And what about inheritance where information in a page may draw from that already represented in the parent page? or somewhere else external to the wiki? Many commercial wikis support information integration from external sources through mashups. However, the context of such integration is implied based on its location rather than through any formal interpretable relationship with the page where it renders. There is no inherent capability of dynamically integrating content from external applications based upon page-‐level context. Wikis do not support any federation of content. When there are hundreds of wikis in an enterprise, there is no mechanism for sharing information across them. Wikis in such scenarios represent team-‐level and department-‐level information silos.
copyright zAgile Inc., 2011 4
In spite of these shortcomings, wikis have the potential to become an Information or Knowledge Portal within an enterprise and its users demand it. The emergence of semantic technologies, particularly as applied to wikis (aka Semantic Wikis) has gone a long way to address these gaps. Here, we describe the current limitations of wikis and how zAgile’s Wikidsmart semantic technology provides ways of overcoming them, specifically in the context of Atlassian’s Confluence enterprise wiki.
Introduction Wikis have become entrenched in the enterprise as the most common ‘groupware’ application for collaboration amongst individuals and teams. They are easy to acquire, easy to setup, and easy to use for capturing ad hoc content that teams desire to share amongst themselves. It is not unusual to find dozens and sometimes even hundreds of wiki instances within enterprises. Whether it is Human Resources, Sales, Marketing, or IT, teams across the organization find ways of leveraging the convenience of wikis to publish and share information with each other. While wikis permeate the enterprise, the convenience of capturing information, the ad hoc nature of this information and its lack of structure also contribute to the failure of the wiki as an effective information collaboration tool. As the published content grows into hundreds and thousands of pages, it becomes increasingly difficult to organize, maintain, access, and search. It quickly loses credibility and becomes stale. Since there is no consistency or discipline with which information must be published or organized, different groups within an organization may experience varying degrees of success with their wiki.
A Few Typical Problems with Wikis Some of the key problems that users face with enterprise wikis can be categorized into the following major areas:
1. Content Organization is limited to mostly page-‐level hierarchies, quite analogous to binders on a shelf. Designing and maintaining the page hierarchies, cross-‐referencing pages across major sections, tracking and updating them with the most current information and maintaining cross-‐reference links—all require significant manual processing. For example, in Confluence, the most typical method of content organization is through the use of Spaces, which may imply or mimic some topic, team or department. The organization of a space or its contents is arbitrary. Any type of organization requires significant up front investment, occasionally leveraging the consulting services of knowledge management experts. Conversely, if the organization of wiki content structure is ad hoc, then the success of its deployment
copyright zAgile Inc., 2011 5
declines as the usage and adoption grows. 2. Content Consistency or Integrity can only be achieved through diligent effort.
Constantly maintaining and updating the same information in multiple areas of the wiki is often unrealistic. As a result, the wiki content quickly becomes outdated and unreliable. There are some shortcuts that allow for the inclusion of a page in another -‐ but this forces the reusable section to be developed as a separate page. While content management is certainly the responsibility of each department or team, the need to maintain its integrity and freshness varies with the dedication and diligence afforded to any given team. Hence, the lack of consistency in the values of such content across the enterprise.
3. Content Categorization may be implied in the page hierarchy (or Confluence Spaces) but there is no inherent formal structure to support it. Does a page depict a requirement, a process, profile of a team member, an invoice, a customer account, information about business partners, or instructions for a holiday party?? There isn’t a way to structure this information consistently and more importantly—in a reusable and machine-‐readable format.
4. Content Attribution reflects some information or knowledge about a page, i.e. what is it describing? What are its relationships with other information concepts? You may be able to glean that off of the page title, its place in the page hierarchy, its labels or its Space container—but the page itself does not contain any attribute-‐level information. Wikis typically provide some page-‐level metadata, such as page author, creation and modification timestamp, and tags. However, they lack the ability to capture formal attributes associated with the page content. For example, if a page represents a customer account, then there is no easy to way to provide any attributes such as the relationship of the account with a contact, department, or industry domain.
5. Information Access may be accomplished via searching the content or page labels for specific matches, navigating through the space and page hierarchies or looking for a specific page title or a URL. However, wildcard matches to character strings in content are not often what enterprise users desire. They are looking for a specific topic, such as information related to a project, document, bug or account. Furthermore, since there is no support for formal attribution, there is no way to search the context of the page itself, for example, all pages relating to a customer account, invoice, project or release. Similarly, all pages reviewed by J. Smith before October 10, 2011 associated with Wikidsmart project. This limitation results in imprecise search and significant amount of effort on the part of the users to locate relevant information contained in wiki pages. Of course, there is no way to query for specific information on a page. For example, if a page contains a list of use cases pertaining to a feature or contacts for an account, it isn’t possible to access a specific use case or contact information.
6. Information Integration with other applications and vice versa is mostly limited to
copyright zAgile Inc., 2011 6
data sharing via RSS feeds and mashups, with neither offering any formal context or relationship to the page. If you are capturing information about product requirements in Confluence then you may also want to integrate them with corresponding test cases, tasks, and checkins. And you may want to do it both ways, i.e. integrate information from other tools into Confluence pages but also pull some information from Confluence into other tools. However, other than page, space and section-‐level URLs, there is no other mechanism for such integration because the wiki conventionally does not support machine-‐readable formats for the content that it captures.
7. Static vs Dynamic Content. The lack of integration with other applications, and the lack of attribute level support limits the wiki to a static information or knowledge repository. If information changes on one page that change isn’t automatically reflected everywhere else it may also appear unless the entire page is included in referring pages. It has to be manually updated. In the absence of such updates, the content quickly becomes inconsistent and unreliable.
8. Content Federation can address the problems many organizations face due to the presence of dozens and hundreds of departmental wikis. By participating in a federated architecture, content becomes easily shareable across wikis and becomes more meaningful across the enterprise. Teams often struggle with keeping track of what information to capture in which wiki and how to make it shareable. Of course, searching across wikis is not possible.
The Promise of Semantic Wikis Over the past decade, there have been a number of initiatives undertaken to tackle some combinations of the wiki limitations outlined above. The common focus is to retain the ease of use and collaborative nature of the wiki while allowing for the creation of ‘semi-‐structured’ content. Semi-‐structured here is mostly a reference to the ease of authorship of the content while supporting formal representation of its semantics, which may be its type or category, position in the hierarchy, inherited and direct attributes, and relationships to other objects. In these efforts, it is also assumed that the ‘semi-‐structured’ content would be machine-‐interpretable, hence easy to share and access across wiki pages and even across applications. The goal of the semantic wiki is to provide support for formal categorization and attribution of content. The technologies and features vary with implementations but these, along with the ability to publish content that is machine-‐readable, are clearly some of the obvious characteristics (based on the discussion above) that semantic technologies in wikis are expected to support. However semantic technologies provide for much more than these capabilities, depending upon the implementation. We will discuss them in the context of zAgile’s Wikidsmart semantic extension for Confluence later in this section.
copyright zAgile Inc., 2011 7
zAgile’s Approach to Semantic Enablement of Wikis zAgile’s technologies enable semantic capabilities in popular best-‐of-‐class wikis, such as Confluence, to address the natural limitations of wikis discussed earlier. It also brings more power to the wiki in its role as an enterprise collaboration application and knowledge portal. The semantic extension, available via Wikidsmart, turns Confluence into one of many sources of semantically structured, shareable content. This provides an effective mechanism for creating ‘knowledge’ out of the ‘relatively unstructured’ wiki pages, while also allowing integration of the wiki content with other applications. Furthermore, this integration occurs both ways, i.e. specific, typed, and semantically relevant wiki content is readily accessible to external applications, and Confluence is also able to contextually incorporate structured data from external applications into its pages -‐ to render itself as a knowledge portal. zAgile’s Wikidsmart allows users to leverage their existing Confluence-‐based content repositories. It allows users to easily capture semantic annotations of existing content as well as create new content using semantic templates and semantic forms. These extensions can also be developed for other commercial or open source wikis, provided that the wiki supports a mechanism for development of user extensions and form templates. zAgile’s approach also separates the semantic repository from the wiki content so that the wiki functions not as the central and sole repository for both unstructured content and semantic annotations but as one of many application that contributes semantically relevant data to a central repository accessible to all applications. This further allows users to have distributed and/or federated semantic databases. And finally, because the semantic repository is accessible to all applications including the wiki using the same interfaces, it facilitates two-‐way integration between wiki-‐based content and external applications that are also creators and consumers of related information. This level of integration not only allows the wiki to share common information with other application but also to extend it, thereby creating a richer knowledge repository. This level of semantic integration between the wiki and other applications becomes a viable framework for unifying an environment of disparate and heterogeneous applications, content and processes. zAgile’s solution for semantic enablement of wikis consists of the following high-‐level functional components:
Smart ‘Semantic’ Templates Through the use of customizable form-‐based page templates, Wikidsmart provides an easy way for users to create consistent and semi-‐structured content. The elements of any such form are mapped directly to and captured in the underlying
copyright zAgile Inc., 2011 8
metamodel. This allows for each element on the page to represent a formal semantic concept or attribute. For example, if the page represents a Requirements Document, then another element on the page may represent a related Requirement Item, related Project, the Author, Stakeholder, or Task assigned to an individual Requirement Item. While leveraging the ease of the wiki-‐based paradigm for user-‐based content creation, the templates provide powerful medium for the creation of semantically structured content. Since this content is captured in a centralized repository, it is easily accessible to other applications. These templates also contribute significantly to overall wiki adoption, since they minimize user-‐level page formatting (via wiki markup) and result in consistent page layouts. An example of such a form is illustrated below for creating a test case as part of a collection of a test suite. The test case has elements, which formally link it to requirements (Confluence), components (JIRA) and test execution tasks (JIRA). The list of Related Components, are derived from the JIRA project associated with the test suite (represented by the page). The Related Requirement list is obtained from the feature linked to the test suite.
Similarly, the following screenshot shows how a form-‐based template in Confluence is used to capture structured information related to a medical device.
copyright zAgile Inc., 2011 9
The value of the template-‐based forms is further enhanced through field-‐level validation that may be added using macro-‐level parameters in the templates. The validation may also be tied to workflow states and roles. Thus the forms can almost behave like enterprise applications, while still retaining the wiki page paradigm.
Semantic Macros zAgile’s Wikidsmart provides macros that may be used alongside wiki macros and markup to interact with the semantic database and retrieve contextual information inline within the page content. These macros provide a number of capabilities that collectively support the implementation of knowledge management solutions using wikis. The following are a few of the key capabilities provided via these macros, at the page or paragraph level: Categorization—to assign formal category to the content. Categorization allows you to identify a page or section to represent a category, such as Requirement, Hip Implant, Invoice, or IT service request. Categories may represent hierarchical structures and formal taxonomies. Therefore, it is possible to have a page (or section) that represents a structure such as the following: Medical Device->Implant->Hip Implant->Primary Hip Implant->Primary Hip Implant (cemented)
A page or section may also belong to multiple categories, as a result of inference, based on its properties.
Attribution—to associate the content with formal attributes that correspond to the definition of the Category in the metamodel or ontology. These attributes or
copyright zAgile Inc., 2011 10
properties may be created or associated (if they exist). For example, the priority associated with a requirement, the surface coating of a hip implant, or description of an invoice. Through categorization and attribution, you can create formal relationships between pages or sections in the wiki that represent complex logical hierarchies and part-‐whole relationships, without the constraint of the physical location of any of the pages. For example, a page may be a Sub Document to one or many Documents in other areas in the wiki. Such cross-‐referencing is automatic, dynamic and independent of the physical location of the content. It also does not impose any tedious maintenance overhead.
Reference—for the inclusion, by contextual reference, of information that may exist elsewhere in the wiki or in an application external to the wiki. For example, including description and status of a bug (logged in JIRA) on the corresponding requirements page in Confluence, article summaries associated with an implant offered by a journal, and vendor information related to an invoice (from an accounting application). This capability of being able to embed references in content allows for the reference to be contextual and dynamic, i.e. if the information changes at the source, then its references will be automatically refreshed. It eliminates the need for manually updating information in wiki pages that is derived from outside of those pages, either from other pages in the wiki or from external applications. It improves the integrity and reliability of the overall content.
External Integrations—for two-‐way direct and ‘contextual’ integration with external applications. For example, creating a JIRA approval task from within a requirement defined in a Confluence page. This level of integration automatically creates a two-‐way link between the task and the requirement for ongoing traceability. It provides the needed context, which is missing if task-‐related information were simply pasted into that page. Similarly, a template-‐based Confluence page may be created from within Salesforce to represent an account. This page can be used for collaborative inputs related to the account. Using some of the capabilities discussed above, this page can also automatically display cases reported for that account, internal resolution status of each case and product releases, which address specific issues raised on behalf of this account. The page becomes the focal point for collaborative activities related to the account.
Faceted search zAgile’s Smart Search allows the users to specify the category of information being searched. The search criterion is applied to a specific category rather than to all content in the wiki.
copyright zAgile Inc., 2011 11
Since the underlying metamodel supports inheritance, this category may represent any level in the taxonomy. For example, a Functional Requirement may also be searched using its parent category–Requirement or other super categories (Abstract Requirement, WorkProduct) in its hierarchy. The search returns a structured result set, comprising of objects matching the category specified in the search parameter. The result set also contains known attributes and relationships of the object—across applications. For
example, when searching for a requirement, the result set may contain its author, reviewer, related component and stakeholder, as shown in the example below.
copyright zAgile Inc., 2011 12
Since each category has an associated and formal set of attributes, it is also possible to filter the search results based on the values in these attributes.
The ‘Power Search’ mode presents the list of relevant attributes based upon the category being searched, as shown in the adjacent screenshot. Here, all the attributes for ‘Requirement’ category are retrieved from the metamodel definition. The combo boxes will contain possible objects to which the requirement may have a formal relationship, for example, an associated JIRA issue or the document in which it is described. And finally, search isn’t limited to just wiki content. Since the wiki is integrated into a common information platform, search has access to information from all sources. This is in contrast to the typical search in a wiki, which is limited to matching character strings against page content, tags or titles. Therefore, it is possible in Confluence to search for specific projects or tasks defined in JIRA, or accounts defined in Salesforce.
SPARQL Query SPARQL (an RDF query language), similar in syntax to SQL, support declarative queries against graph databases, such as those implemented in zAgile’s semantic repository. SPARQL support in Confluence via Wikidsmart opens up a number of possibilities
for querying data from any source, including externally published content, as long as it is published in RDF. This provides interesting opportunities for bringing information together in a page from a number of sources. The query result set may be combined with other formatting macros in Confluence, as shown below, to create dashboards that bring information together from a variety of sources.
copyright zAgile Inc., 2011 13
This illustration above is a section of a dynamically generated page in Confluence, comprising of a number of SPARQL queries and Confluence markup macros, and representing information related to a JIRA project release (or Version) and its related features, requirements and test cases defined in Confluence. In the above example, a single query returns Features being delivered in a Release, and associated Requirements, approval status and development tasks.
Machine-‐based Annotation Using Natural Language Processing techniques, machine-‐based annotation of unstructured wiki content automatically tags words or phrases in a page if they match any references in the semantic database. A mouse click on the tagged word will identify its category, source, and any other contextual information associated with that term. This eliminates the need to manually tag specific references in paragraphs and pages in the wiki. The annotation is dynamic and does not impact
copyright zAgile Inc., 2011 14
the original content. Examples of its implementation include tagging clinical terms in article abstracts in a Confluence page with the pop up displaying associated procedures, videos and conditions associated with those terms. In the context of software engineering lifecycle, meeting notes related to a project could be similarly annotated. The tags will automatically identify any references to the project, its related artifacts or tasks.
Machine-‐Readable Content Linked Open Data (LOD) is a set of guidelines for publishing machine-‐readable content that can be interpreted by external agents, including search engines. This can be extremely valuable especially when publishing product-‐related content to an external audience. A number of implementations in e-‐commerce (ex: Best Buy) have successfully leveraged this capability to improve search engine optimization (SEO). Wikidsmart supports Linked Open Data based content publishing. Within Confluence pages, in non-‐display mode, machine-‐readable content can be automatically generated for any semantic categories represented on the page. The example below shows a Confluence page source in LOD-‐based format, representing information related to a medical device (ACF Femoral Component) and the manufacturer of that device (Biomet Inc.). The representation includes a number of standard ontologies including GoodRelations, vCard and eClassOWL.
copyright zAgile Inc., 2011 15
Wikidsmart Architecture The following key components describe the architecture of Wikidsmart and further illustrate the ways in which it allows Confluence to become integrated with other enterprise applications.
zAgile’s Semantic Repository The semantic repository comprises of a set of formal ontologies and metamodels specific to a domain of interest. Specifically, the repository supports formal ontologies based on OWL (Web Ontology Language) language—a W3C specification. The ontologies provide a highly structured framework for the capture and annotation of semantically relevant data pushed from the participating applications. They also facilitate integration of this data across related metamodels, as well as allow for inferencing and reasoning for implied categorizations and inheritance of attributes.
zAgile’s Semantic Interface Layer (zSLayer) The semantic interface layer provides wikis and other applications with interface to this repository. It is a middleware that resides between the ontology repository and the client applications. It provides a consistent way to interact with the ontology through lightweight serialize-‐able objects that can be exchanged across the web. It also maintains a Lucene index that is intended to be a fast way to search and retrieve individual information but that is also transparent to the client applications.
copyright zAgile Inc., 2011 16
zSLayer also provides an option to query the ontologies it handles, allowing the client application to choose which ontology it wants to query or if it wants to use the main merged ontology that holds all. Via connectors, all applications and consumers of the repository can use a consistent API to access the semantic repository. This layer also supports standard query languages like SPARQL which can be embedded within wiki pages for querying and navigating the semantic graphs in the repository.
zAgile’s RPC Server This server provides a set of XML-‐RPC methods for accessing zSLayer, which applications (XML-‐ RPC clients) can use to query or save semantic data. This server organizes and prepares the semantic data to be readable by a client application written in any language. It also supports that access in JSON format by using URLs for querying the repository.
zAgile’s Semantic Plugin for Confluence (Wikidsmart) This plugin provides the interface between Confluence and the zAgile Semantic Server. It supports macros for creating templates and forms in Confluence for annotating wiki pages. Templates are based upon the ontologies and metamodels defined in the semantic repository. They allow annotation of existing content as well
copyright zAgile Inc., 2011 17
as creation of new forms and pages. This plugin also supports macros for embedding SPARQL queries within wiki pages, as discussed above.
Conclusion This paper highlights the power and capabilities of Wikidsmart as it extends Confluence for deeper integration within the enterprise as a collaboration application and an information portal. This level of integration results from the implementation of a metamodel-‐driven architecture and semantic capabilities built on an integration platform hosted by zAgile’s zCALM Server. Wikidsmart is a commercial open source product, available on SourceForge and from zAgile.
For more information on Wikidsmart, please contact info at zAgile.com.