100
1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee (BPC), CIO Council, and Enterprise Architecture Team, Office of Environmental Information U.S. Environmental Protection Agency April 5, 2005

1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

Embed Size (px)

Citation preview

Page 1: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

1

Building An Ontology of the NHIN: Status Report 3

Brand NiemannCo-Chair, Semantic Interoperability Community of Practice (SICoP)

Best Practices Committee (BPC), CIO Council, andEnterprise Architecture Team, Office of Environmental Information

U.S. Environmental Protection AgencyApril 5, 2005

Page 2: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

2

Overview

• 1. The National Health Information Network (NHIN) Request for Information (RFI):– 1.1 Scope & Quality– 1.2 Statistics– 1.3 Analysis & Reporting Strategy– 1.4 Business Cases– 1.5 Leadership Statements– 1.6 Related Activities– 1.7 Building Ontologies

• 2. Results and Next Steps• Appendices

Page 3: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

3

1. NHIN RFI:1.1 Scope & Quality

• The NHIN RFI stimulated substantial and unprecedented interest.– Cumulatively, the 512 responses yielded

nearly 5,000 pages of information.

• The National Coordinator established a federal government wide RFI review task force (RTF) to review, summarize and analyze the RFI responses.– The RTF consists of more than 120 Federal

officials from 17 agencies.

Page 4: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

4

1. NHIN RFI:1.1 Scope & Quality

• The responses to these initial questions yielded the richest and most descriptive collection of thoughts on interoperability and health information exchange that has likely ever been assembled in the United States.

• The responses to the general questions are a treasure trove of the best thinking on the topic.

Page 5: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

5

1. NHIN RFI:1.2 Statistics

Type of Respondent Count Percent

Individual Consumers 174 34%

Individual - Health Professionals 109 21%

Vendors - Software, hardware, system integrators 94 18%

Associations - Medical, Patient Interests, Vendors 54 11%

Multistakeholder Respondents 16 3%

Provider Organizations (Hospitals, clinics, labs, homecare, hospice, pharmaceutical firms, etc.) 16 3%

Research Org (think tanks, non-hospital Universities, etc.) 15 3%

RHIOs 10 2%

Payers (HMO, PPO) 9 2%

Standards Development Organizations 7 1%

Federal, State, Local Government agencies 4 1%

Foundations 4 1%

Total 512 100%

Page 6: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

6

1. NHIN RFI:1.3 Analysis & Reporting Strategy

• The NHIN RFI consisted of:– Twenty-four (24) questions, in– Six (6) basic groups

• The NHIN Team divided the RFI’s into two basic groups:– Individuals (283)– Organizations (229)

• The NHIN Team organized the Organization responses for review in:– Thirty (30) sets with 2-3 reviewers for each set– Templates (matrices) with 13 entities by about 4 categories of

the 24 questions mapped to each of the three Work Groups (see next slide).

• For example: WG1 – Standards (Questions 4b, 14-18), Technical Development/Architecture (Questions 2-4a, 23), Technical Services/Operations (Questions 9-11), and General Comments by Federal Government, Industry – Software/Hardware Vendors, etc.

Page 7: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

7

1. NHIN RFI:1.3 Analysis & Reporting Strategy

• NHIN Team divided the participants into three Work Groups:– Technical and Architecture– Organization and Business Framework– Finance, Privacy, Regulatory, and Legal

• Each Work Group created Major Themes:– WG1: 3, WG2: 2, and WG3: 3

• Each Work Group reported out on Sub-teams:– WG1: 5, WG2: 5, and WG3: 4

• NHIN Team mapped the Work Group results to new structures for two reports:– Report 1 - Sections: 7, Sub-sections: 17, and Sub-Sub-sections:

18– Report 2 - Sections: 4, Sub-sections: 16, and Sub-Sub-sections:

86

Page 8: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

8

1. NHIN RFI:1.3 Analysis & Reporting Strategy

• There is and will be criticism:– “It is important to note, in the front when talking about the

process, that approximately 270 RFIs were not reviewed by the interagency process. The process that ONCCHIT used to select and review these responses should be made clear.” (name withheld)

• There will be responses to criticism:– Statistical Summary Analysis of Responses from Individuals:

• 85% of the responses had strong concerns about the potential loss of privacy along with 53% of health officials who had the same concern.

• 17% of health officials shared their experiences with implementations of EHR systems.

• Only about 4% expressed enthusiasm for the creation of a system that would facilitate interoperability.

Page 9: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

9

1. NHIN RFI:1.4 Business Cases

• Veterans Can Personalize Medical Records on VA Web Site, GCN, November 9, 2004:– My HealtheVet (also copy parts of VistA)– Could allow the VA to share patient data with

other providers.– Patients can request changes to their medical

records and allow their loved ones or their physicians to access portions of their records.

• iHealthBeat, November 13, 2004.

Page 10: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

10

1. NHIN RFI:1.4 Business Cases

• Canadian Health Infoway*:– An EHR solution is a combination of people, organizational entities,

business processes, systems, technology and standards that interact and exchange clinical data. A network of interoperable EHR solutions—one that links clinics, hospitals, pharmacies, and other points of care—will help enhance quality of care and patient safety, improve Canadian's access to health services, and make the health care system more efficient.

– Interoperability for electronic health records is the capability of computer and software systems to seamlessly communicate with each other. It is central to Infoway's mission, making clinical data available across the continuum of care and across health delivery organizations and regions, promoting reusable and replicable solutions that can be aligned with jurisdictional priorities and deployed across the country more cost-efficiently. Without a common framework and sets of standards, EHR systems across Canada would be a patchwork of incompatible systems and technologies.

*Accelerating the development of Electronic Health Information Systems for Canadians http://www.infoway-inforoute.ca/ehr/index.php?lang=en

Page 11: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

11

1. NHIN RFI:1.4 Business Cases

Canadian Health Infoway Standards Collaborationhttp://www.infoway-inforoute.ca/ehr/standards_overview.php?lang=en#

Page 12: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

12

1. NHIN RFI:1.4 Business Cases

• One recent study estimated a net savings from national implementation of fully-standardized interoperability between providers and five other types of organizations could yield $77.8 billion annually, or approximately 5 percent of the projected $1.7 trillion spent on U.S. health care in 2003– Source: J. Walker et al., “The Value of Health Care

Information Exchange and Interoperability,” Health Affairs, January 19, 2005.

Page 13: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

13

1. NHIN RFI:1.5 Leadership Statements

• HHS Administrator Leavitt’s Keynote Address at AFCEA International’s Homeland Security Conference, February 22, 2005 (See http://www.fcw.com/article88110):– The next frontier of human productivity is the Interoperability Era.– Collaboration is the premium leadership skill that’s need in this

new era.– Interoperability begins by setting standards and should be

organically grown through the "messy, complex, difficult process called collaboration.”

– Several elements (8) will improve the chances for success (a “common pain”, a “convener of stature”, a committed leader, openness, transparency, and voluntary participation, a critical mass of stakeholders, representative of substance, a clearly defined purpose and goal, and a formally written and signed charter).

Page 14: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

14

1. NHIN RFI:1.5 Leadership Statements

• Dr. Brailer’s Keynote Address at HIMSS Conference, February 17, 2005:– Interoperability Themes from RFIs:

• Standards (WG1 & WG2)*• Governance (WG2)• Privacy (WG3)• Regionalization (Initially none, then WG2)• Financing (WG3)• Architecture (WG1)• Regulation (WG3)

*Mappings to WG’s added by author of this presentation.

Page 15: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

15

NHIN RFI:1.6 Related Activities

• Federal Health Architecture (FHA) Interoperability Work Group, March 17 and 24, 2005:– Goal: Technology Standards Harmonization

• Strive for consensus on some of the potential technical specifications (see next slide)

• Draft Health Information Interoperability Standards Profile• Present standards to OMB as Draft Standards for Trial Use

(DSTU)• Follow-up with more detailed guidance on implementation

– Concern: Narrow focus of Work Group is on the less crucial aspect of interoperability (technical standards)

Page 16: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

16

Approach for Technology Classification

TransportTransport

MessageMessage

DescriptionDescription

DiscoveryDiscovery

BusinessProcess

BusinessProcess

HTTPHTTP

SOAPSOAP

WSDLWSDL

UDDIUDDI

BPELBPEL

HTTPHTTP

SOAP w/ attach.,ebMS

SOAP w/ attach.,ebMS

CPP/ACPP/A

Registry(RIM)

Registry(RIM)

BPSSBPSS

HTTP, SMTP,FTP

HTTP, SMTP,FTP

SOAPSOAP

XML Digital Signature

XKMS

SAML

WS-Security

XACML

PKI

SSL

XML Digital Signature

XKMS

SAML

WS-Security

XACML

PKI

SSL

OtherOther ASCII, Binary (e.g., image)ASCII, Binary (e.g., image)

XMLXML XSLT, XSL, etc.XSLT, XSL, etc.

HL7HL7 V 3.0V 3.0

V 2.xV 2.x

Da

ta

ebXMLWeb Services Other Security

Me

ss

ag

e O

rie

nte

d

Inte

rch

an

ge

Source: FHA Health Interoperability Work Group, March 24, 2005.

Page 17: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

17

1. NHIN RFI:1.6 Related Activities

• FHA Architectural Peer Review Group (APRG) Initial Meeting, February 11, 2005:– Scope – Health Domains as identified by the FHA

Health Domain WG and incorporated into the FHA BRM (see FEA’06 Revision Summary, page 4).

– Semantics – Recommendations were made to consider an ontology that is being developed for this purpose by the CIO Council (actually by GSA, TopQuadrant, and SICoP).

• See Slide 18 for Example.

Page 18: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

18

NHIN RFI:1.6 Related Activities

• Healthcare Informatics Online, January 2004 Cover Story on Emerging Technologies:– Concept introduced in 2001 Scientific American article

and described using the scenario of a man who goes online, employing intelligent agents on the Semantic Web to set up a series of physician appointments and physical therapy sessions for his ailing mother. (It could be 10 years before such agent-enabled scenarios play out, but simpler semantic functions are already emerging.)

• My Note: Semantic Web Applications for National Security (SWANS), April 7-8, 2005, Crystal City, Virginia.

Page 19: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

19

NHIN RFI:1.6 Related Activities

• Healthcare Informatics Online, January 2004 Cover Story on Emerging Technologies:– “It’s not a Web replacement, it’s an evolution based

largely on eXtensible Markup Language (XML) with added technologies that allow computers to interpret and process data “ontologies”, or relationships between disparate pieces of information.”

– “The Semantic Web would represent a worldwide Web of connected data, radically different from today’s Web of discrete documents, which is why it could be the affordable answer to the electronic health record.”

• My Note: The Semantic Web could also deal with the privacy and security concerns expressed in the RFI Individual Responses.

Page 20: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

20

NHIN RFI:1.6 Related Activities

Page 21: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

21

NHIN RFI:1.7 Building Ontologies

• The Mind Map Book: How to Use Radiant Thinking to Maximize Your Brain’s Untapped Potential (Tony Buzan):– Before the web came hypertext. And before hypertext came

mind maps.– A mind map consists of a central word or concept, around the

central word you draw the 5 to 10 main ideas that relate to that word. You then take each of those child words and again draw the 5 to 10 main ideas.

– Mind maps allow associations and links to be recorded and reinforced.

– The non-linear nature of mind maps makes it easy to link and cross-reference different elements of the map.

• See next slide for examples from the “Explorer’s Guide to the Semantic Web,” Thomas Passin, Manning Publications, 2004, pages 106 and 141.

Page 22: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

22

Mind Maps for Searching and Ontologies

Searching

hughchanginggrowinginconsistent

keywordsontologiesclassificationmetadatasemantic Focusingsocial Analysismultiple Passesclustering

Ontologies

ENVIRONMENT

STRATEGIES

informalformaldistinctionsmultipletreeshierarchiestaxonomies vocabularies

combiningspecifyingcommittment

CLASSIFICATION

ONTOLOGIES

propertiesrelationshipsconstraintsidentifiers

NAMES

RDFSOWLDAMLDescription Logics

LANGUAGES

adhoccategoriesinternet

KINDS

predefined

Note: These are not complete.

Page 23: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

23

NHIN RFI:1.7 Building Ontologies

NHIN

generalorganizational & businessmanagement & operationalstandards & policiesfinancial, regulatory, & legalother

RFI

DR. BRAILER

WORK GROUPStechnical & architectureorganization & businessfinancial, regulatory, & legal

STRATEGIC PLAN GOALS regional initiativesclinical practicepopulation healthhealth interoperabilityFederal Health Architecture

ORGANIZATIONAL STRUCTURE

Inform Clinical PracticeInterconnect CliniciansPersonalize CareImprove Population Health

Possible/probable interrelationships

organizationaltechnicalsemantic

FRAMEWORKS

otherOTHER

standardsgovernanceprivacyregionalizationfinancingarchitectureregulation

NCVHSCCHITEtc.

STANDARDSORGANIZATIONS

Page 24: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

24

NHIN RFI:1.7 Building Ontologies

• An ontology is the organization of things into types and categories with a well-defined structure that are “networks of concepts”.

• Specific ontologies must be constructed with known vocabularies and rules of construction.

• A good ontology requires:– The ability to conceptualize and articulate the underlying ideas.– Skill at modeling abstractions.– Knowledge of the syntax of the modeling language.

• OWL is poised to become the major ontology language for the Web.– Use of well-developed and accepted ontologies whenever

possible.• The Suggested Upper Merged Ontology (SUMO) is a best practice

example.– A Community of Practice with all of these skills that can

collaborate to develop the ontology.• The Ontolog Forum is a best practice example (see next slide).

Page 25: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

25

NHIN RFI:1.7 Building Ontologies

• A key aspect of successful large scale interoperability is shared meaning.– Shared meaning requires not only a common syntax

(XML), but a common vocabulary.– That common vocabulary should be defined in terms

of the broadest and most general foundation concepts and be in a formal and computable language not subject to human interpretation in English alone.

– Formal ontologies, defined in logic, and a hierarchy of ontologies that build from a common semantic foundations are needed (se next slide).

Page 26: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

26

Current Ontology-Driven Information System for FHA/NHIN

Most General Thing

Process Location

Geographic Area of Interest

Airspace Target Area of Interest

UpperOntology

Mid-LevelOntology

DomainOntology

Most General Thing

Process Location

Geographic Area of Interest

Airspace Target Area of Interest

UpperOntology

Mid-LevelOntology

DomainOntology

Source: Netcentric Semantic Linking (Mapping): An Approach for Enterprise Semantic Interoperability, Mary Pulvermacher, et. Al. MITRE, October 2004.

SUMO

HL7 RIM

FEA-RMO

EONSNOMED CTLOINC

Examples

Page 27: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

27

NHIN RFI:1.7 Building Ontologies

• Strategy for the NHIN Ontology:– Compile repository/library of NHIN public and RFI

documents in their native file formats.– Repurpose the documents:

• Proprietary to text formats.• Proprietary to XML documents.• Chunk large documents into sub-documents.

– Compile the NHIN “Mind Maps” for defining searches and building the ontology.

– Work with ontology community of practices to draw in their expertise.

• Proposed new Ontology and Taxonomy Coordinating Group (ONTACG) of SICoP.

Page 28: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

28

2. Results and Next Steps

• 2.1 The Challenge

• 2.2 A Suggested Solution

• 2.3 The Content

• 2.4 The Pilot

• 2.5 Sample Results

• 2.6 Next Steps

Page 29: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

29

2.1 The Challenge

• Extract and organize the semantic concepts from about 5000 pages of semi-structured content in support of a comprehensive analysis to recommend the plan for the National Health Information Network (NHIN).– For example: Dr. Brailer, ONCHIT Technical

Assistance Call December 6, 2004, “NHIN refers to a specific bundle of technologies, business frameworks, financing arrangements, legal contracting or other mechanisms, policy requirements, organizational issues and related things that allow for network interoperability. So NHIN is the middleware in the grand schema of these pieces.”

Page 30: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

30

2.2 A Suggested Solution

• Besides manual human extraction individually and in the Work Group environment, there are machine-aided extraction, analysis, and visualization tools that could and should be brought to bear on this problem that would lead to the building on an ontology

• This approach was taken with the Federal Enterprise Architecture Reference Models to produce an ontology that has been released.– http://web-services.gov/fea-rmo.html

Page 31: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

31

2.3 The ContentContent Category

NextPage

(1)

FAST(2) Content

Analysts(3)

Comments

Background

(pre RFI)

Done Done Done Contains structured relationships

Organizations

About 50% Done Done Complex concepts and relationships

Individuals Done Done Done Only about 5 simple categories!

Workgroups Done Done Done Needs simplification

(1) Indexing, categorization, and relationship linking.(2) Indexing, keyword/concept extraction, and taxonomy.(3) Same as (2).

Page 32: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

32

2.4 The Pilot• A Recommended Start to the NHIN

Ontology:– The European Interoperability Framework:

• Organisational• Technical, &• Semantic

– Leavitt see interoperability: ..interoperability should be organically grown through the "messy, complex, difficult process called collaboration.”

• http://www.fcw.com/article88110

Page 33: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

33

2.4 The Pilot• Tools:

– Selection Criteria:• Selected for participation in the SWANS Conference, April 7-

8, 2005, because of support for Semantic Technologies (RDF/OWL).

• Willing to provide hardware, software, and advice for proof of concept.

• Two or more vendors initially – more after SWANS Conference

– Selection:• NextPage FolioViews and LivePublish (recently acquired by

FAST Search & Transfer)• FAST Data Search and ProPublish

– http://www.fastsearch.com• Content Analyst

– http://www.contentanalyst.com

Page 34: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

34

2.4 The Pilot

• Ontology Expertise:– Ontolog Forum:

• Submitted Response to the RFI– Available on the Internet

• Providing Ontology Engineering Advice• Suggests Brainstorming Session

– Proposed New SICoP Ontology and Taxonomy Coordinating Work Group (ONTACG)

Page 35: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

35

2.5 Sample Results

http://web-services.gov, See Best Practices

Page 36: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

36

2.5 Sample Results

http://web-services.gov, See Best Practices

Page 37: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

37

2.5 Sample Results

http://web-services.gov, See Best Practices

Page 38: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

38

2.5 Sample Results

http://web-services.gov, See Best Practices

Page 39: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

39

2.5 Sample Results

Folio Views Infobase of RFI’s

Page 40: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

40

2.5 Sample Results

Content Analyst: Compute Taxonomy

Page 41: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

41

2.5 Sample Results

Content Analyst: Run Queries

Page 42: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

42

2.5 Sample Results

Content Analyst: Set Training Documents

Page 43: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

43

2.5 Sample Results

FAST ProPublish: Production Manager

Page 44: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

44

2.5 Sample Results

FAST ProPublish: Build Progress

Page 45: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

45

2.5 Sample Results

FAST Data Search: Search View

Page 46: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

46

2.5 Sample Results

FAST Data Search: Taxonomy Results Saved in Excel Spreadsheet

Page 47: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

47

2.6 Next Steps

• NHIN Suggest a Series of Queries:– Results can be provided in Excel spreadsheets for

further analysis and reuse

• Add content from those agencies interviewed by the FHA Interoperability Work Group recently:– VA, DoD, EPA, CDC, FDA, NIH-NCI/DHS/HIS

• See future demonstrations with the initial public domain databases for semantic searching and ontology building (see next slide):– SWANS Conference, April 7-8, 2005– SICoP Meeting at KM Conference, April 22, 2005

Page 48: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

48

2.6 Next Steps

Content Source Content Type Pilot Example

Web Site Topics Children’s Health, Mercury, Etc.

Web Site Registries System of Registries

Exchange Network Nodes Pacific Water Quality

E-Gov E-Rulemaking Samples

Data Mart TBD TBD

Indicators Reports on the Indicators

EPA, Heinz, Etc.

GIS Maps Region 4 GeoBook

GIS Metadata Clearinghouse

Initial Public Domain Databases for Semantic Searching and Ontology Building

Page 49: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

49

Appendices

• A. Ontology Engineering

• B. FAST Data Search and ProPublish

• C. Content Analyst

Page 50: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

50

Appendix A: Ontology Engineering

• A.1 What Is An Ontology?• A.2 Basic Requirements For an Ontology• A.3 Ontology Examples• A.4 Formal Taxonomies for the U.S. Government• A.5 Medical Informatics Ontologies: Examples and

Design Decisions• A.6 GLIF in Protégé• A.7 Why Develop an Ontology?• A.8 Ontology-Development Process• A.9 What Is “Ontology Engineering”?• A.10 Ontology-Driven Information Systems

Page 51: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

51

A.1 What Is An Ontology?

• An ontology is an explicit description of a domain:– concepts

– properties and attributes of concepts

– constraints on properties and attributes

– Individuals (often, but not always)

• An ontology defines – a common vocabulary

– a shared understanding

Page 52: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

52

A.2 Basic Requirements For an Ontology

• 1. Finite controlled (extensible) vocabulary.

• 2. Unambiguous interpretation of classes and term relationships.

• 3. Strict hierarchical subclass relationships between classes.

• 4. Few others…

Source: Deborah McGuiness, Ontologies Come of Age, in the Semantic Web: Why, What, and How, MIT Press, 2002, page 6.

Page 53: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

53

A.3 Ontology Examples

• Taxonomies on the Web

– Yahoo! categories

• Catalogs for on-line shopping

– Amazon.com product catalog

• Domain-specific standard terminology

– SNOMED Clinical Terms – terminology for clinical medicine

– UNSPSC - terminology for products and services

Page 54: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

54

A.4 Formal Taxonomies for the U.S. Government

OWL Listing:<?xml version="1.0"?> <rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:daml="http://www.daml.org/2001/03/daml+oil#" xmlns="http://www.owl-ontologies.com/unnamed.owl#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:base="http://www.owl-ontologies.com/unnamed.owl"> <owl:Ontology rdf:about=""/> <owl:Class rdf:ID="Transportation"/> <owl:Class rdf:ID="AirVehicle"> <rdfs:subClassOf rdf:resource="#Transportation"/> </owl:Class> <owl:Class rdf:about="#GroundVehicle"> <rdfs:subClassOf rdf:resource="#Transportation"/> </owl:Class> <owl:Class rdf:about="#Automobile"> <rdfs:subClassOf> <owl:Class rdf:ID="GroundVehicle"/> </rdfs:subClassOf> Etc.

Source: Formal Taxonomies for the U.S. Government, Michael Daconta, Metadata Program Manager, US Department of Homeland Security, XML.Com, http://www.xml.com/pub/a/2005/01/26/formtax.html

Transportation Class Hierarchy

Page 55: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

55

A.5 Medical Informatics Ontologies: Examples and Design Decisions

• Foundational Model of Anatomy (FMA):– Developed at University of Washington as part of the Digital Anatomist

project.– Contains: ~70,000 distinct concepts, ~ 110,000 terms, and 140 relations

• Gene Ontology (GO):– A controlled vocabulary for describing genes and gene products with

three organizing components: Molecular function, Biological process, and Cellular component.

• Health Level 7 (HL7) Data Types and Top-Level RIM Classes:– HL7 data types as Protégé classes

• Guideline Interchange Format (GLIF) (See next slide):

– A format for sharing clinical guidelines independent of platforms and systems:

• Design to support multiple vocabularies and medical knowledge bases.

• Designed to work with different patient information model.

Page 56: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

56

A.6 GLIF in Protégé

Page 57: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

57

A.7 Why Develop an Ontology?

• To share common understanding of the structure of information – among people– among software agents

• To enable reuse of domain knowledge– to avoid “re-inventing the wheel”– to introduce standards to allow interoperability

Page 58: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

58

A.8 Ontology-Development Process

• In this tutorial:determine

scopeconsider

reuseenumerate

termsdefine

classesdefine

propertiesdefine

constraintscreate

instances

In reality - an iterative process:

determinescope

considerreuse

enumerateterms

defineclasses

considerreuse

enumerateterms

defineclasses

defineproperties

createinstances

defineclasses

defineproperties

defineconstraints

createinstances

defineclasses

considerreuse

defineproperties

defineconstraints

createinstances

Page 59: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

59

A.9 What Is “Ontology Engineering”?

• Ontology Engineering: Defining terms in the domain and relations among them– Defining concepts in the domain (classes)

– Arranging the concepts in a hierarchy (subclass-superclass hierarchy)

– Defining which attributes and properties (slots) classes can have and constraints on their values

– Defining individuals and filling in slot values

Page 60: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

60

A.10 Ontology-Driven Information Systems

• Methodology Side – the adoption of a highly interdisciplinary approach:– Analyze the structure at a high level of generality.– Formulate a clear and rigorous vocabulary.

• Architectural Side – the central role in the main components of an information system:– Information resources.– User interfaces.– Application programs.

See for example: Nicola Guarino, Formal Ontology and Information Systems,Proceedings of FOIS ’98, Trento, Italy, 6-8 June 1998.

Page 61: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

61

Appendix B: FAST Data Search

• B.1 Gartner Magic Quadrant for Enterprise Search, 2004

• B.2 FAST Data Search:– Categorization and Taxonomy Support– Integration

• B.3 FAST ProPublish System Overview:– Gather Content– Process Content– Deliver Content

Page 62: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

62

B.1 Gartner Magic Quadrant for Enterprise Search, 2004

Source: Gartner Research ID Number: M-22-7894, Whit Andrews, 17 May 2004.

Page 63: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

63

B.1 Gartner Analysis: Leaders• Fast Search & Transfer (FAST) now is counted in the Leaders quadrant,

moving from the Visionaries quadrant. The vendor has experienced explosive growth, providing better-than-average means and an expanding list of approaches of determining relevancy. Its architecture is superior among search vendors, and sales are strong. (Sales of enterprise search technology were $42 million in 2003, up from $36 million in 2002.) Its acquisition of the remainder of AltaVista's business has had no real impact on operations.

• Critical questions include whether FAST will:– 1) remain a specialist in search technologies;– 2) pursue "search-derivative applications" — FAST's term for the general

application category founded on search platforms, including customer relationship management (CRM) knowledge base support tools and scientific research managers; or

– 3) focus on original equipment manufacturer arrangements or on a broader suite of applications, such as those included in a smart enterprise suite. Search vendors typically follow an arc that leads to their acquiring a company, to failure or to a position as an enduring leader. FAST has the opportunity to pursue the last path.

• Note added by Brand Niemann: FAST acquired NextPage in December 2004 which provides electronic publishing software to 6 of the 9 leading electronic publishers in the world. I have used NextPage in the pilots to date.

Page 64: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

64

B.2 FAST Data Search: Categorization and Taxonomy Support

Page 65: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

65

B.2 FAST Data Search: Integration

Page 66: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

66

B.3 FAST ProPublish System OverviewGather Content Process Content Deliver Content

Page 67: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

67

B.3 FAST ProPublish System Overview

• Searches in the online FAST ProPublish system are powered by FAST proven search technology. Search results are displayed on a results list and additional navigation interfaces such as key words, dynamic drill-down lists, metadata structures, and hierarchy are also provided. When documents are retrieved, they are pulled from the content repository. Search hits are highlighted in HTML and XML documents.

• FAST ProPublish is designed to be a distributed application. Nearly every component may be run on a separate machine (or multiple machines) for extreme scalability and reliability. However, this same flexibility also allows all of the components to be run on a single server.

• FAST ProPublish provides the following services:– Search and query.– Data and text mining and analysis.– Exploration and static reporting.

Page 68: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

68

B.3 FAST ProPublish System Overview

• Gather Content:– The Production Manager is the tool you use to create

a collection. Also, through the Production Manager graphical user interface, you can establish a library. A library consists of a collection or group of related collections and enables you to structure content. That is, you can define a library hierarchically with folders, sub-folders, and collection nodes the way you want the content to appear on your site.

– Production Manager has the functionality and capability to build libraries from existing collections, or from collections that you define and build within the Production Manager interface from various sources of content.

Page 69: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

69

B.3 FAST ProPublish System Overview

• Process Content:– A collection is, as the name implies, a collection of

content/documents and is fully indexed, structured, and searchable. Documents within a collection reside in their native formats. Collections house three "chunks" of information:

• The table of contents (TOC) • An index of the content • A copy of the content

– Because collections contain this information, they are self-contained and portable.

Page 70: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

70

B.3 FAST ProPublish System Overview

• Process Content: – Each node in the content

tree is a library, folder, sub-folder, or collection.

– Folder nodes can contain other content nodes (such as sub-folders and collections).

– You can organize these nodes (folder and collection) within this pane according to your content and business needs to create a hierarchy of content for the library.

Page 71: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

71

B.3 FAST ProPublish System Overview

Icon Name Description

Library The library node contains all folder and content collection nodes for a given library.

Folder and Sub-folder Folder and sub-folder nodes enable you to create structure within the library and help you organize content.

Collection Collection nodes represent collections that Production Manager builds and updates.

Process Content: Content Tab Icons and Descriptions

Page 72: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

72

B.3 FAST ProPublish System Overview

• Deliver Content:– The user interface is composed of individual

components built using Velocity templates and the Struts framework. Some of the components are:

• Search components – search forms (simple, advanced, and custom), search results page (configurable), parametric search.

• Navigation components – hierarchical table of contents, browse-by-category, dynamic drill down for search refinement, breadcrumb trails.

• Document display components – document retrieval, search hit highlighting, next / previous document, next / previous hit document.

Page 73: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

73

B.3 FAST ProPublish System OverviewDeliver Content: Default User Interface

Page 74: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

74

B.3 FAST ProPublish System OverviewDeliver Content: Advanced Search Page

Page 75: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

75

Appendix C:Content Analyst

• C.1 Definitions• C.2 Conceptual Mapping• C.3 Document Proximity Conceptual Similarity• C.4 Term Proximity Conceptual Similarity• C.5 No Auxiliary Structures Required• C.6 Retrieval Using Conceptual Comparison• C.7 Terminology Variant Clustering• C.8 Conceptual Generalization

Page 76: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

76

Appendix C:Content Analyst (continued)

• C.9 Deep Conceptual Generalization• C.10 Cross-lingual Operations• C.11 Cross-lingual Capabilities• C.12 Automated Information Organization• C.13 Category Creation by Example• C.14 Automatic Categorization• C.15 Categorizing Items of Interest• C.16 Automated Taxonomy Generation

Page 77: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

77

Appendix C:Content Analyst (continued)

• C.17 Instant Context Display

• C.18 Alias Identification

• C.19 Automated Thematic Decomposition

• C.20 Conceptual Interlingua

• C.21 Product Status

• C.22 Performance

• C.23 For More Information

Page 78: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

78

C.1 Definitions

• Content Analyst:– …is a Machine Learning Technique…– …that allows Conceptual Comparison of Text

Objects…– …based on the Technique of Latent Semantic

Indexing.• Latent Semantic Indexing is a patented machine

learning technique that enables technology to identify, represent, and compare concepts that exist within a collection of documents or data.

Page 79: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

79

DocumentsDocuments

BiologicalWeapons

Transportation

AgricultureAgriculture

C.2 Conceptual Mapping

Page 80: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

80

...missle...

….fuel….

...rocket…

propellant

C.3 Document Proximity Conceptual Similarity

Content AnalystRepresentation Space

Page 81: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

81

Car Automobile

Content AnalystRepresentation Space

C.4 Term Proximity Conceptual Similarity

Page 82: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

82

Taxonomies

GrammarsThesauri

Ontologies

C.5 No Auxiliary Structures Required

Page 83: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

83

XX

QueryQueryDocumentsDocuments

In RelevanceIn RelevanceOrderOrder

Proximity Proximity Conceptual Similarity Conceptual Similarity

Natural RankingNatural Ranking

C.6 Retrieval Using Conceptual Comparison

Page 84: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

84

X

Osama bin Laden

Osama bin Laden

Usama bin Laden

Osama Binladen

Osama BinLadin

Usama Binladen

Osama bin Ladin

Usama bin Ladin

Usama Binladin

C.7 Terminology Variant Clustering

Page 85: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

85

User’sTerminology

Bomb

………….…devicesthat spreadshrapnel……………..

Author’sTerminology

CA Space

C.8 Conceptual Generalization

Page 86: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

86

XxxxxxxxxxxxxxXxxxxxxxxxxxxxMethods of armedstruggle not accepted internationallyXxxxxxxxxxxxxxxXxxxxxxxxxxxxxx

War Crimes

C.9 Deep Conceptual Generalization

Page 87: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

87

C.10 Cross-lingual Operations

Farsi Farsi English English

Arabic Arabic English English

English English Doc Doc

Retrieved DocumentsRetrieved Documentsin Correct Relevancein Correct RelevanceOrderOrder

English QueryEnglish Query

Results

Results

Documents in Documents in Multiple Multiple

LanguagesLanguages

Page 88: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

88

C.11 Cross-lingual Capabilities

• Arabic• Chinese• English• Farsi• French• Korean• Russian• Spanish

• Pashtu

• Urdu

• Italian

• German

• Portuguese

• Dutch

CurrentCurrent FutureFutureNear-termNear-term

• Japanese

Page 89: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

89

C.12 Automated Information Organization

• Sorting into Predetermined Categories

• Determining the Natural Topical Breakdown of Information

Page 90: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

90

C.13 Category Creation by Example

XxxxxxxxxXxxxxxxxx..anthrax..Xxxxxxxxx..smallpox.

Documents like this Documents like this Correspond to the Correspond to the

Category Category BioterrorismBioterrorism… …

CA Representation Space

Page 91: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

91

C.14 Automatic Categorization

NewlyAcquiredDocument

Document willbe Assignedto this Category

Exemplar Document

CA Space

Page 92: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

92

Sept. ReportSept. Report

Newly Acquired DocumentNewly Acquired Document

PrecursorsPrecursors

HamasHamas

Hamas Exemplar Document

C.15 Categorizing Items of Interest

Page 93: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

93

NewContent

Taxonomy

C.16 Automated Taxonomy Generation

Page 94: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

94

Last February Qatada and seven other men, said to be members of the GSPC's British cell, were arrested in London after the discovery of plans to bomb or use GB against an unspecified target in Strasbourg. Charges against Qatada were not pursued. During the investigation, codenamed Operation Odin, Special Branch officers raided Qatada's home in Acton, west London.

gbgb

sarinsarin

organophosphorousorganophosphorous

poisonouspoisonous

vaporsvapors

cholinesterasecholinesterase

resorptiveresorptive

bezhenarbezhenar

C.17 Instant Context Display

Page 95: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

95

ressamressam

ressam’s ressam’s

ahmed ahmed

bennibenni

charkaouicharkaoui

zubeirzubeir

abdelrazikabdelrazik

zoubeidazoubeida

Five men, three of whom identified themselves as Algerian, were arrested Thursday by federal officials wanting to question them about their possible links to Ahmed Ressam, an Algerian arrested in Washington state on explosive smuggling charges.

C.18 Alias Identification

Page 96: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

96

C.19 Automated Thematic Decomposition

The hardware, software, and bandwidth currently installed are adequate to support this level of downloading activity. Three people currently are engaged in developing a comprehensive list of URLs to be monitored. This is a labor-intensive task, as existing Internet indexes of online newspapers are very incomplete. Final decisions have not yet been made as to the eventual level of caching that will be done, or the total number of users to be supported. One of the most important aspects of the existing implementation is a web crawler that we have developed and refined over the past five years that is optimized for this application. This crawler can deal with the many idiosyncrasies of this type of download activity: primitive communications in some countries, bizarre naming conventions, inconsistent and partial postings, and frequent changes in web page structure. The current implementation of this crawler reflects five years of lessons learned in carrying out newspaper downloads from the Internet. One of the functions to be carried out with the downloaded data is entity and relationship extraction. In support of this effort, SAIC personnel have conducted a comparison of current entity and relationship software packages. The test involved processing of actual downloaded material. Of the half dozen packages tested, the product from Attensity was, by far, the most complete and accurate. This package is being procured for use in the download processing. It should be noted that even the best of the entity and relationship packages still miss many entities and relationships of interest and still generate an undesirably high number of false relations. We have a current task to examine the ways in which Content Analyst and Attensity can be used together to provide significantly improved overall entity and relationship extraction capabilities. Although not addressed in the RFI, one topic that we have paid considerable attention to is processing of images of newspapers using optical character recognition (OCR). At present, approximately 13% of all foreign newspapers posted to the web consist of imagesof pages, as opposed to character-encoded representations. This includes some important newspapers, for example, most of the Urdu material on the web is only available as images. In order to automatically filter these articles, and to make them available for retrieval, an OCR process must be carried out. At various times over the past five years we have implemented such capabilities for Arabic, Chinese, Farsi, and Russian materials. OCR of newspaper articles is a challenging, but not impossible task. The biggest problem is caused by the low resolution of images posted to the web

Topic #1

Topic #2

Topic #3

Page 97: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

97

ArbitraryArbitraryDocumentsDocuments

BiologicalWeapons

Transportation

AgricultureAgriculture

C.20 Conceptual Interlingua

Page 98: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

98

C.21 Product Status

• 6 Years Development

• 3 Years Operational Experience

• 24X7 Operations

• Multi-million Document Databases

• Conforms to Modern Standards:– J2EE– UNICODE– XML

Page 99: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

99

C.22 Performance

• Can Fully Index > 1M Documents in 14 Hours on a Single PC

• Can Categorize > 1 Million Documents per Day on a Single PC

• Can Distribute Index Creation and Retrieval Operations across Multiple PCs

Page 100: 1 Building An Ontology of the NHIN: Status Report 3 Brand Niemann Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee

100

C.23 For More Information

• Roger Bradford, 703-391-8700 x110, [email protected]