View
37
Download
0
Category
Tags:
Preview:
DESCRIPTION
Quality Taxonomies. Dr. Claude Vogel Founder & CTO KM World 2000. Ontology / Taxonomy. Static Discovery. Root Ontology. Taxonomy Generation. Dynamic Discovery. What is Quality ?. “Best value for the money” - PowerPoint PPT Presentation
Citation preview
Quality Taxonomies
Dr. Claude VogelFounder & CTO
KM World 2000
Ontology / Taxonomy
Root Ontology
Taxonomy Generation
Static Discovery
Dynamic Discovery
What is Quality ? “Best value for the money” According to this definition, you are entitled to
get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.
What is Quality ?
“Good Quality is Nominal Conformance” Taxonomy Quality is defined as Taxonomy
Conformance to: – Valid requirements;– Explicitly documented development standards; and, – Implicit characteristics that are expected of all
professionally developed taxonomies, such as the desire for good maintainability.
Standards ISO 2788-1986
– International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788-1986(E)). (Available in the U.S. from American National Standards Institute)
ISO 5964-1985 – International Organization for Standardization. Documentation—Guidelines for the Establishment
and Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute)
ANSI/NISO Z39.19-1993– National Information Standards Institute. Guidelines for the Construction, Format, and Management
of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993)
SEMIO Quality Plan v1 2000 ISO/IEC 13250 Topic Maps RDF
– Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML
Project Plan
1. Kick-off2. Requirements Review3. Lexicon Review4. Taxonomy Review5. Tags Review6. Final Review
1. Kick-off Objectives
– Purpose– Scope– Scale– Users– Conditions of receipt
Roles– Supplier– Customer
• Admin• KE• Experts• Users
Planning Training and Transfer
2. Requirements Review
Sources Lexicon Ontology Install
Sources
Dispersion (Multiplicity, Size, Homogeneity) Refresh Access
Features Internet, News, E-Mail
Reports, Patents
E-Trade, Logs
Informative content - + + Number of topics covered + + - Structured information - + + Size of records - + - Number of records + - +
Typical Patterns Disparity
Adjust sources Adjust crawl strategy Isolate communities / taxonomies
Lexicon
Vocabularies, etc. Substitutions: Acronyms, Synonyms, etc. Preferred Keywords: Brand Names, etc. Banned Keywords
Typical Patterns Lack of requirements
Use Librarian Resources
Ontology
Thesaurus ? Is the information domain analysis complete,
consistent, and accurate ? Is the partitioning of the problem complete ?
Typical Patterns Directory versus Taxonomy
Isolate “directory” branches Thesaurus versus Taxonomy
Put an ontology on top of thesaurus Check ASAP match of thesaurus generics with extracted
lexicon Very high level design for top categories
requirements Plan to work bottom-up
See also Taxonomy (functions, combinations, etc.)
Install
Implementation / Integration:– Are external and internal interfaces properly
defined? – Are all requirements traceable to the system level? – Has prototyping been conducted for the
user/customer? – Is performance achievable within the constraints
imposed by other system elements? – Are requirements consistent with schedule,
resources, and budget?
Typical Patterns Scale Security Missing Documents
3. Lexicon Review Coverage
– Extracted words / Words– (Extracted Index / Index)
Sources bench-marking– Coverage– Extraction quality– Topic distribution
Structure– Most Frequent Phrases– Most Productive Generics
Substitutions Exceptions
Typical Patterns Low level of frequency / quality for the
most meaningful content Increase size of value corpus Filter and re-import lexicon
4. Taxonomy Review Taxonomy Operation
– Correctness– Reliability– Usability– Integrity– Efficiency
Taxonomy Revision– Maintainability– Flexibility– Testability
Taxonomy Transition– Portability– Reusability– Interoperability
Tax
Liability
Loan
Term loan
Short-term loan
Unique Beginner
Life Form
Generic
Specific
Varietal
Folk Taxonomies Design
The Berlin and Kay model: Taxonomy = Nomenclature + Terminology
Correctness Accuracy Completeness Consistency
Accuracy
PrecisionRecall
Completeness
Taxonomy Maps Lexicon Collection
Concentration Works Against Quality
Lexicon
Document Collection
Maps
Taxonomy
Tagging
Tagging Coverage Ontology Coverage Hook Coverage Map Coverage Lexical Coverage Collection Coverage
Consistency:Typical Patterns
Objectivization Hyperonymy Speciation Necessity
Objectivization
EmploymentFiringHiring
Salaries
Avoid functional categories Don’t mix functions / objects Exhaust scripts Match idiomatic phrases
Genericity
PartsAir ConditioningBelts and HosesBodyBrake SystemChassisEngineExhaust SystemFuel SystemGlassIgnition
Avoid meronymy Don’t mix meronymy /
hyperonymy Exhaust prototypes
Speciation
Person Unwelcome personUnpleasant personSelfish personOpportunistBackscratcher
Avoid “strings” of categories Avoid (non-idioms) properties for categories
(WordNet)
Necessity
Tax
Individuals Corporations
Assets Liability Assets Liability
B C
D
E
FG
H
I
K
Tax
Individuals Corporations
Assets Liability
Individuals Corporations
Avoid non-productive categories
Avoid combinations of categories
Nomenclature (Design Structure) Quality Index
UB
i j
lf lflf1 2 g g gn 1 2 i
n3 4 mg g g g g g s s s s s s25 6 1 3 4
s s s s5 6 7 8
v v1 2
•Level 0
•Level 1
•Level 2
•Level 3
•Level 4
UB = unique beginner lf = life-form g = generic s = specific v = varietal
Width
Depth
Balance
Complexity Index Cyclometric complexity increases with number of
Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing.
Taxonomy Complexity Index combines:– autonomy– closure– similarity– typicality– commonality– redundancy– stability
Maturity index The IEEE standard 982.1-1988 suggests a taxonomy
maturity index to provide an indication of the stability of the taxonomy .
Maturity Index combines:– number of modules in current ontology / taxonomy.– number of modules in current ontology / taxonomy that have
been changed.– number of modules added to current ontology / taxonomy. – number of modules deleted from the previous version of the
ontology / taxonomy.
5. Tags Review Document coverage Concepts coverage
<tagset> <document> <docurl>http://www.TaxSource.com</docurl> <tag> <tagname>Liability</tagname> <weight>1.289</weight> </tag> <tag> <tagname>Federal Funds</tagname> <weight>0.746</weight> </tag> </document></tagset>
6. Final Review Receipt Maintenance
Quality Taxonomies
Claude Vogelcvogel@semio.com
KM World 2000
Recommended