Putting Controlled Vocabulary To Work I Davis 2008

  • Published on
    12-May-2015

  • View
    591

  • Download
    1

Embed Size (px)

Transcript

<ul><li>1.Putting Structured Business Vocabularies to WorkNovember 4, 2008Data Management and Information Quality Conference IRM UKIan DavisGlobal Project Manger, Dow Jones &amp; Company Copyright 2008 Dow Jones and Company, Inc. </li></ul> <p>2. What well cover today: Understanding the challenges of controlled versusuncontrolled vocabularies Developing a strategy to create and maintaincontrolled vocabularies Identifying how you want to integrate your controlledvocabularies into your systems Understanding the requirements of integratingcontrolled vocabularies into multiple applications Copyright 2008 Dow Jones and Company, Inc. 2 3. Setting the Context Copyright 2008 Dow Jones and Company, Inc. 4. Once upon a time Most of the business was IT enabled. There was some degree of sharing of informationand content, there were even some large, wellstructured document repositories. Yet, no one could find anything. Actually, they found things, but not what they wanted when they wanted it and they were never sure they found the best or saw it all. Copyright 2008 Dow Jones and Company, Inc.4 5. Once upon a time The C-level executives were a bit irritated. Theyd spent lots on the technology and people really werent much more efficient, the pinch point in the workflow had simplymoved further downstream. So, what happened next? Copyright 2008 Dow Jones and Company, Inc.5 6. Once upon a time They SPENT MONEY and bought thebest in class search utilities. Yet, no one could find anything. Actually, they found things, but not what they wanted when they wanted it and they were never sure they found the bestor saw it all. Copyright 2008 Dow Jones and Company, Inc.6 7. Once upon a time The C-level executives became a bit moreirritated. Everyone was a bit frustrated. What was missing? Copyright 2008 Dow Jones and Company, Inc.7 8. Optimized? Is the search utility optimized using all the bells and whistles it came with? Relevancy rankings Thesaurus files (synonym lists) Multi-lingual capabilities Common searches saved and presented tousers Logs reviewed to understand user issues Copyright 2008 Dow Jones and Company, Inc.8 9. Usable?Is the user interface considerate to users? Was it designed with YOUR users in mind Designed for occasional users? Designed for power users? Was it designed with YOUR business in mind Task-based views for context sensitivesearches Present results in a format readily usedwithin work flows Copyright 2008 Dow Jones and Company, Inc. 9 10. Metadata? Are there required metadata fields within the CMS? Author, Title, Language, Topic, Product/Service, etcAre the entry values to those fields controlled? Lookups against authority files, taxonomies, thesauriDoes the search utility support fielded searches?Does the search utility weight terms within metadata fields higher than free-text? Copyright 2008 Dow Jones and Company, Inc. 10 11. Metadata?For example: If a financial analyst enters the query term stockwithin the companys knowledge base, Will he get back results with the documentsspecifically discussing stock as a financialinstrument listed first?Or will he have to look through 100s of documents discussing whats relevant to him as well as every document that references free-text in the body of the document about: soup stock (food industry), cows (livestock industry),or stock car racing (professional sports industry)? Copyright 2008 Dow Jones and Company, Inc.11 12. Metadata?Precise and comprehensive searches Only if controlled vocabularies have been used topopulate metadata fieldsAND The search utility takes advantage of that by givingpriority to query term occurrence within controlledvalue metadata fieldsOR Fielded searches are enabled e.g. + + + + Copyright 2008 Dow Jones and Company, Inc. 12 13. Challenges: Controlled versus Uncontrolled Copyright 2008 Dow Jones and Company, Inc. 14. Controlled Vocabularies Explained Authority files e.g. Companys active directory, ISO standard for Languages Typically a flat list of allowed values Taxonomies e.g. Linnaean Classification (kingdom, phylum, class, order, family, genus, and species ) Typically includes only hierarchical relationships between terms Thesauri e.g. NASA Thesaurus (http://www.sti.nasa.gov/thesfrm1.htm) Includes full set of semantic relationships defined between terms (hierarchical, associative, equivalence) Copyright 2008 Dow Jones and Company, Inc.14 15. NASA Thesaurus Sample Entry Copyright 2008 Dow Jones and Company, Inc.15 16. Semantic Relationships Hierarchical Superordination - representing a class or a whole, and subordination - referring to members or parts e.g. mammals and vertebrates e.g. cherry pie and cherry pie slices Equivalence One concept expressed by two or more terms e.g. dogs and canines Associative Terms that are conceptually linked, but not through hierarchy or equivalence e.g. accounting and accountant Copyright 2008 Dow Jones and Company, Inc. 16 17. Challenges Uncontrolled Vocabularies Uncontrolled vocabularies are: Comprehensive but noisy Only comprehensive if synonym lists are used Limited in their precision and relevancy Time lost scanning through hundreds of miss hits Reduced effectiveness of cross-repositorysearches Limited ways to disambiguate soup stock from stock car Copyright 2008 Dow Jones and Company, Inc.17 18. Challenges - Controlled Vocabularies Controlled vocabularies can produce: Potentially significant overhead effort (manualand technical) Organizational politics can add YEARS toestablishing an initial set of controlledvocabularies A lack of basic understanding of what thecontrolled vocabularies are and how they workimpedes effective development and utilization Copyright 2008 Dow Jones and Company, Inc. 18 19. Challenges - Controlled Vocabularies Controlled vocabularies: Richness and power comes from a full set of semanticrelationships, not just hierarchical ones Hierarchy supports the ability to narrow and broaden search queries Association supports did you mean and you might also want to look at Equivalence enables the use of familiar language to retrieve content which is conceptually on target but never uses their term e.g. user enters dog and search utility expandsquery to include canine, k-9, puppy Copyright 2008 Dow Jones and Company, Inc. 19 20. Challenges - Controlled Vocabularies Controlled vocabularies: Richness and power comes at the cost ofadded complexity of development,implementation, integration and maintenance Utilization of controlled vocabularies canproduce performance issues During search index creation During query run time Copyright 2008 Dow Jones and Company, Inc. 20 21. Tackling the Challenges Copyright 2008 Dow Jones and Company, Inc. 22. Strategy Creation and MaintenanceState the business case clearly Benefits Reduced time for knowledge discovery Increased richness of knowledge discovery Decreased risk to firm of making businessdecisions with partial information Scope One business unit or enterprise-wide? Resource requirements Skill sets (IS, IT, business knowledge) Time commitment Copyright 2008 Dow Jones and Company, Inc.22 23. Strategy Creation and MaintenanceTackle organizational politics head-on Gain credibility and ensure usability by establishing a cross-functional working committee that will become the Review Committee Include all major stakeholder groups and any interested parties (even the non-supporters) Establish methods of broadly soliciting end-user input that will become a source of change requests during maintenance phases Copyright 2008 Dow Jones and Company, Inc. 23 24. Strategy Creation and MaintenanceAdditional considerations before you start: How rigorous does it need to be? What external standards should be adopted? ANSI/NISO Z39.19-2005 British Standard BS 8723 What internal standards should be developed? Editorial Guidelines Usage Guidelines How extensive will it be? Depth and breadth within and across facets What about adaptability and flexibility Will there be a need for local extensions? Copyright 2008 Dow Jones and Company, Inc.24 25. Strategy Creation and MaintenanceAdditional considerations before you start: Projected frequency of revisions How quickly does the content base change withrespect to concepts; is there significant contentdrift? How volatile is the language? Management consulting vs. accounting Vocabulary Management Software DONT spend money just to spend money However, you CANT manage controlledvocabularies in a spreadsheet Buy the tool you need based on your documentedfunctional requirements Copyright 2008 Dow Jones and Company, Inc. 25 26. Strategy Integration Choices Performance trade-offs Store UIDs within content, then use look-up table at query run time Store full-text of a term, then touch all content when taxonomy value changes (must re-assign new term value) Version control Use static versions of controlled vocabularies within CMS and search utilities, releasing new versions periodically Use dynamic version of controlled vocabularies with continuous revisions occurring Copyright 2008 Dow Jones and Company, Inc. 26 27. Strategy Integration Choices Utilizing semantic relationships Store full set (term values or UIDs) withincontent recordOR Store single UID and have search utility usereference tables to determine related terms Display of semantic relationships User interface considerations for effectivepresentation of non-hierarchically related terms Copyright 2008 Dow Jones and Company, Inc.27 28. Strategy Integration ChoicesQuery entry(including ability to broaden ornarrow current search results)Previous query statement user entered Related topicsBrowse navigation plus any auto-expansion done by engine (defined throughoptions Associative relationships) Query results listing Copyright 2008 Dow Jones and Company, Inc. 28 29. Strategy Multiple Applications Expanding the adoption and use of controlledvocabularies Know the business objectives of the applications In conjunction with the search utility, does the controlled vocabulary enable this objective? Are there metadata fields available within current application for the controlled vocabulary? Does the business have resources to assign the controlled vocabulary? What format does the controlled vocabulary need to be in to be integrated with the application? Copyright 2008 Dow Jones and Company, Inc.29 30. Strategy Multiple Applications Additional considerations Will there be conflicting version managementneeds? How does search currently index theseapplications and will that change with the useof controlled vocabularies? Copyright 2008 Dow Jones and Company, Inc.30 31. Five Key Points 1. Controlled vocabularies are a lever to improve precision and comprehensiveness2. Controlled vocabularies are never finished they are always a work in process3. Search utilities can only be tweaked so far4. Tapping into the richness of the semantic relationships between terms can be extremely powerful5. There are lots of options for implementing and integrating controlled vocabularies Copyright 2008 Dow Jones and Company, Inc.31 32. Thank you for your attention! Ian Davisian.davis@dowjones.com Copyright 2008 Dow Jones and Company, Inc. </p>

Recommended

View more >