Transcript
Page 1: Putting Controlled Vocabulary To Work I Davis 2008

© Copyright 2008 Dow Jones and Company, Inc.

Putting Structured Business Vocabularies to Work

November 4, 2008Data Management and Information Quality Conference

IRM UK

Ian DavisGlobal Project Manger, Dow Jones & Company

Page 2: Putting Controlled Vocabulary To Work I Davis 2008

2 © Copyright 2008 Dow Jones and Company, Inc.

What we’ll cover today:

Understanding the challenges of controlled versus uncontrolled vocabularies

Developing a strategy to create and maintain controlled vocabularies

Identifying how you want to integrate your controlled vocabularies into your systems

Understanding the requirements of integrating controlled vocabularies into multiple applications

Page 3: Putting Controlled Vocabulary To Work I Davis 2008

© Copyright 2008 Dow Jones and Company, Inc.

Setting the Context

Page 4: Putting Controlled Vocabulary To Work I Davis 2008

4 © Copyright 2008 Dow Jones and Company, Inc.

Once upon a time…

Most of the business was IT enabled. There was some degree of “sharing” of information

and content, there were even some large, well structured document repositories.

Yet, no one could find anything. Actually, they found things,

but not what they wanted when they wanted it and they were never sure they found the “best” or “saw

it all”.

Page 5: Putting Controlled Vocabulary To Work I Davis 2008

5 © Copyright 2008 Dow Jones and Company, Inc.

Once upon a time…

The C-level executives were a bit irritated. They’d spent lots on the technology and people really weren’t much more efficient, the pinch point in the workflow had simply

moved further downstream. So, what happened next?

Page 6: Putting Controlled Vocabulary To Work I Davis 2008

6 © Copyright 2008 Dow Jones and Company, Inc.

Once upon a time…

They SPENT <more> MONEY and bought the best in class search utilities.

Yet, no one could find anything. Actually, they found things,

but not what they wanted when they wanted it and they were never sure they found the “best”

or “saw it all”.

Page 7: Putting Controlled Vocabulary To Work I Davis 2008

7 © Copyright 2008 Dow Jones and Company, Inc.

Once upon a time…

The C-level executives became a bit more irritated.

Everyone was a bit frustrated. What was missing?

Page 8: Putting Controlled Vocabulary To Work I Davis 2008

8 © Copyright 2008 Dow Jones and Company, Inc.

Optimized?

Is the search utility optimized using all the bells and whistles it came with?

Relevancy rankings “Thesaurus” files (synonym lists) Multi-lingual capabilities Common searches saved and presented to

users Logs reviewed to understand user issues

Page 9: Putting Controlled Vocabulary To Work I Davis 2008

9 © Copyright 2008 Dow Jones and Company, Inc.

Usable?

Is the user interface considerate to users? Was it designed with YOUR users in mind

Designed for occasional users? Designed for power users?

Was it designed with YOUR business in mind Task-based views for context sensitive

searches Present results in a format readily used

within work flows

Page 10: Putting Controlled Vocabulary To Work I Davis 2008

10 © Copyright 2008 Dow Jones and Company, Inc.

Metadata?

Are there required metadata fields within the CMS? Author, Title, Language, Topic, Product/Service, etc

Are the entry values to those fields controlled? Lookups against authority files, taxonomies, thesauri

Does the search utility support fielded searches? Does the search utility weight terms within metadata

fields higher than free-text?

Page 11: Putting Controlled Vocabulary To Work I Davis 2008

11 © Copyright 2008 Dow Jones and Company, Inc.

Metadata?

For example: If a financial analyst enters the query term “stock”

within the company’s knowledge base, Will he get back results with the documents

specifically discussing “stock” as a financial instrument listed first?

Or will he have to look through 100’s of documents discussing what’s relevant to him as well as every document that references free-text in the body of the document about:

soup stock (food industry), cows (livestock industry),

or stock car racing (professional sports industry)?

Page 12: Putting Controlled Vocabulary To Work I Davis 2008

12 © Copyright 2008 Dow Jones and Company, Inc.

Metadata?

Precise and comprehensive searches Only if controlled vocabularies have been used to

populate metadata fieldsAND The search utility takes advantage of that by giving

priority to query term occurrence within controlled value metadata fields

OR Fielded searches are enabled

e.g. <Author = Smith> + <Service = Consulting> + <Industry = Automotive> + <Date = January 2006> + <Content Type = Proposal>

Page 13: Putting Controlled Vocabulary To Work I Davis 2008

© Copyright 2008 Dow Jones and Company, Inc.

Challenges: Controlled versus Uncontrolled

Page 14: Putting Controlled Vocabulary To Work I Davis 2008

14 © Copyright 2008 Dow Jones and Company, Inc.

Controlled Vocabularies Explained

Authority files e.g. Company’s active directory, ISO standard for Languages Typically a flat list of allowed values

Taxonomies e.g. Linnaean Classification (kingdom, phylum, class, order,

family, genus, and species ) Typically includes only hierarchical relationships between terms

Thesauri e.g. NASA Thesaurus (http://www.sti.nasa.gov/thesfrm1.htm) Includes full set of semantic relationships defined between terms

(hierarchical, associative, equivalence)

Page 15: Putting Controlled Vocabulary To Work I Davis 2008

15 © Copyright 2008 Dow Jones and Company, Inc.

NASA Thesaurus – Sample Entry

Page 16: Putting Controlled Vocabulary To Work I Davis 2008

16 © Copyright 2008 Dow Jones and Company, Inc.

Semantic Relationships

Hierarchical Superordination - representing a class or a whole, and

subordination - referring to members or parts e.g. mammals and vertebrates e.g. cherry pie and cherry pie slices

Equivalence One concept expressed by two or more terms

e.g. dogs and canines Associative

Terms that are conceptually linked, but not through hierarchy or equivalence e.g. accounting and accountant

Page 17: Putting Controlled Vocabulary To Work I Davis 2008

17 © Copyright 2008 Dow Jones and Company, Inc.

Challenges – Uncontrolled Vocabularies

Uncontrolled vocabularies are: Comprehensive but noisy

Only comprehensive if synonym lists are used

Limited in their precision and relevancy Time lost scanning through hundreds of

“miss” hits Reduced effectiveness of cross-repository

searches Limited ways to disambiguate ‘soup stock’

from ‘stock car’

Page 18: Putting Controlled Vocabulary To Work I Davis 2008

18 © Copyright 2008 Dow Jones and Company, Inc.

Challenges - Controlled Vocabularies

Controlled vocabularies can produce: Potentially significant overhead effort (manual

and technical) Organizational politics can add YEARS to

establishing an initial set of controlled vocabularies

A lack of basic understanding of what the controlled vocabularies are and how they work impedes effective development and utilization

Page 19: Putting Controlled Vocabulary To Work I Davis 2008

19 © Copyright 2008 Dow Jones and Company, Inc.

Challenges - Controlled Vocabularies

Controlled vocabularies: Richness and power comes from a full set of semantic

relationships, not just hierarchical ones Hierarchy supports the ability to narrow and broaden

search queries Association supports “did you mean” and “you might

also want to look at” Equivalence enables the use of familiar language to

retrieve content which is conceptually on target but never uses their term

e.g. user enters dog and search utility expands query to include “canine, k-9, puppy”

Page 20: Putting Controlled Vocabulary To Work I Davis 2008

20 © Copyright 2008 Dow Jones and Company, Inc.

Challenges - Controlled Vocabularies

Controlled vocabularies: Richness and power comes at the cost of

added complexity of development, implementation, integration and maintenance

Utilization of controlled vocabularies can produce performance issues During search index creation During query run time

Page 21: Putting Controlled Vocabulary To Work I Davis 2008

© Copyright 2008 Dow Jones and Company, Inc.

Tackling the Challenges

Page 22: Putting Controlled Vocabulary To Work I Davis 2008

22 © Copyright 2008 Dow Jones and Company, Inc.

Strategy – Creation and Maintenance

State the business case clearly Benefits

Reduced time for knowledge discovery Increased richness of knowledge discovery Decreased risk to firm of making business

decisions with partial information Scope

One business unit or enterprise-wide? Resource requirements

Skill sets (IS, IT, business knowledge) Time commitment

Page 23: Putting Controlled Vocabulary To Work I Davis 2008

23 © Copyright 2008 Dow Jones and Company, Inc.

Strategy – Creation and Maintenance

Tackle organizational politics head-on Gain credibility and ensure usability by establishing a

cross-functional working committee that will become the Review Committee

Include all major stakeholder groups and any interested parties (even the non-supporters)

Establish methods of broadly soliciting end-user input that will become a source of change requests during maintenance phases

Page 24: Putting Controlled Vocabulary To Work I Davis 2008

24 © Copyright 2008 Dow Jones and Company, Inc.

Strategy – Creation and Maintenance

Additional considerations before you start: How rigorous does it need to be?

What external standards should be adopted? ANSI/NISO Z39.19-2005 British Standard – BS 8723

What internal standards should be developed? Editorial Guidelines Usage Guidelines

How extensive will it be? Depth and breadth within and across facets

What about adaptability and flexibility Will there be a need for local extensions?

Page 25: Putting Controlled Vocabulary To Work I Davis 2008

25 © Copyright 2008 Dow Jones and Company, Inc.

Strategy – Creation and Maintenance

Additional considerations before you start: Projected frequency of revisions

How quickly does the content base change with respect to concepts; is there significant content drift?

How volatile is the language? Management consulting vs. accounting

Vocabulary Management Software DON’T spend money just to spend money However, you CAN’T manage controlled

vocabularies in a spreadsheet Buy the tool you need based on your documented

functional requirements

Page 26: Putting Controlled Vocabulary To Work I Davis 2008

26 © Copyright 2008 Dow Jones and Company, Inc.

Strategy – Integration Choices

Performance trade-offs Store UIDs within content, then use look-up table at

query run time Store full-text of a term, then touch all content when

taxonomy value changes (must re-assign new term value)

Version control Use static versions of controlled vocabularies within

CMS and search utilities, releasing new versions periodically

Use dynamic version of controlled vocabularies with continuous revisions occurring

Page 27: Putting Controlled Vocabulary To Work I Davis 2008

27 © Copyright 2008 Dow Jones and Company, Inc.

Strategy – Integration Choices

Utilizing semantic relationships Store full set (term values or UIDs) within

content record OR Store single UID and have search utility use

reference tables to determine related terms Display of semantic relationships

User interface considerations for effective presentation of non-hierarchically related terms

Page 28: Putting Controlled Vocabulary To Work I Davis 2008

28 © Copyright 2008 Dow Jones and Company, Inc.

Strategy – Integration Choices

Browse navigationoptions

Query entry (including ability to broaden or narrow current search results)

Query results listing

Related topics(defined through

Associative relationships)

Previous query statement user entered plus any auto-expansion done by engine

Page 29: Putting Controlled Vocabulary To Work I Davis 2008

29 © Copyright 2008 Dow Jones and Company, Inc.

Strategy – Multiple Applications

Expanding the adoption and use of controlled vocabularies Know the business objectives of the applications

In conjunction with the search utility, does the controlled vocabulary enable this objective?

Are there metadata fields available within current application for the controlled vocabulary?

Does the business have resources to assign the controlled vocabulary?

What format does the controlled vocabulary need to be in to be integrated with the application?

Page 30: Putting Controlled Vocabulary To Work I Davis 2008

30 © Copyright 2008 Dow Jones and Company, Inc.

Strategy – Multiple Applications

Additional considerations Will there be conflicting version management

needs? How does search currently index these

applications and will that change with the use of controlled vocabularies?

Page 31: Putting Controlled Vocabulary To Work I Davis 2008

31 © Copyright 2008 Dow Jones and Company, Inc.

Five Key Points

1. Controlled vocabularies are a lever to improve precision and comprehensiveness

2. Controlled vocabularies are never finished – they are always a work in process

3. Search utilities can only be tweaked so far4. Tapping into the richness of the semantic

relationships between terms can be extremely powerful

5. There are lots of options for implementing and integrating controlled vocabularies

Page 32: Putting Controlled Vocabulary To Work I Davis 2008

© Copyright 2008 Dow Jones and Company, Inc.

Thank you for your attention!

Ian [email protected]


Recommended