23
Form 836 (7/06) LA-UR- Approved for public release; distribution is unlimited. Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the Los Alamos National Security, LLC for the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher’s right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness. Title: Author(s): Intended for: 09-02971 Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning Linn Marks Collins, Jorge H. Roman, James E. Powell, Mark L. B. Martinez, Ketan K. Mane, Xiang Yao, A. Shelly Spearing, Miriam E. Blake Los Alamos National Laboratory Geoffrey Hoare, Rhonda White Florida Department of Health 6th International Conference on Information Systems for Crisis Response and Management Special Session on Solutions for Information Overload May 10-13, 2009 Göteborg, Sweden

Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

Embed Size (px)

DESCRIPTION

The proliferation of plans can result in debilitating information overload in public health and medical emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they are in alignment with one another. Human analysis of plans is time-consuming and difficult, so text analysis software tools are needed that can help humans (a) compare plans to find gaps or discrepancies and (b) locate relevant sections of plans and display links to them. This research-in-progress describes two text analysis tools being developed at the Los Alamos National Laboratory as part of E-SOS (Emergency Situation Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both tools were used to analyze pan flu plans from the White House, the US Department of Health and Human Services, the Centers for Disease Control, and the 50 states.

Citation preview

Page 1: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

Form 836 (7/06)

LA-UR-Approved for public release;distribution is unlimited.

Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the Los Alamos National Security, LLCfor the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396. By acceptanceof this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce thepublished form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requeststhat the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos NationalLaboratory strongly supports academic freedom and a researcher’s right to publish; as an institution, however, the Laboratory does notendorse the viewpoint of a publication or guarantee its technical correctness.

Title:

Author(s):

Intended for:

09-02971

Using Text Analysis to Reduce Information Overload inPandemic Influenza Planning

Linn Marks Collins, Jorge H. Roman, James E. Powell, MarkL. B. Martinez, Ketan K. Mane, Xiang Yao, A. ShellySpearing, Miriam E. BlakeLos Alamos National Laboratory

Geoffrey Hoare, Rhonda WhiteFlorida Department of Health

6th International Conference on Information Systems forCrisis Response and ManagementSpecial Session on Solutions for Information OverloadMay 10-13, 2009Göteborg, Sweden

Page 2: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

1

The proliferation of plans can result in debilitating information overload in public health and medical emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they are in alignment with one another. Human analysis of plans is time‐consuming and difficult, so text analysis software tools are needed that can help humans (a) compare plans to find gaps or discrepancies  and (b) locate relevant sections of plans and display links to them. This research‐in‐progress describes two text analysis tools being developed at the Los Alamos National Laboratory as part of E‐SOS (Emergency Situation Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both tools were used to analyze pan flu plans from the White House, the US Department of Health and Human Services, the Centers for Disease Control, and the 50 states.

Proceedings of the 6th International ISCRAM Conference – Gothenburg, Sweden, May 2009 J. Landgren and S. Jul, eds. 

Using Text Analysis to Reduce Information Overloadin Pandemic Influenza Planning

James E. Powell, Presenting AuthorLos Alamos National Laboratory

Linn Marks Collins, Jorge H. Roman, Mark L. B. Martinez, Ketan K. Mane,Xiang Yao, A. Shelly Spearing, Miriam E. Blake

Los Alamos National LaboratoryGeoffrey Hoare, Rhonda WhiteFlorida Department of Health

Page 3: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

2

Problem – Information Overload

The proliferation of plans can result in debilitating information overload in public health and medical emergencies

Pandemic influenza (pan flu) example• The US Department of Health and Human Services (HHS) and its Centers for

Disease Control and Prevention (CDC) have pan flu plans intended to coordinate the 50 states’ responses

• Each of the 50 states has its own pan flu plan• These plans and related implementing materials are often hundreds of pages

long and get updated frequently• Plans need to be analyzed, compared, and revised so that they are in

alignment with one another• Human analysis of plans is time-consuming and difficult

Page 4: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

3

Solution – Text Analysis and Document Linking

Text analysis software tools are needed that can help humans

1. Compare plans to find gaps or discrepancies by:• Extracting key concepts or themes from each plan• Identifying unique and common themes• Displaying the themes in a table and in a network visualization to facilitate

comparison

2. Locate relevant sections of plans by:• Conducting a federated search of indexed plans• Displaying links to relevant plans• Executing both tasks while users are writing sections of a plan, based on what

they are writing

Page 5: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

4

Research in Progress at theLos Alamos National Laboratory, Los Alamos, NM, US

Two text analysis tools are being developed as part of E-SOS: Emergency Situation Overview and Synthesis• Project began in 2007• Builds on several years of prior work by team members

E-SOS employs a number of tools• Collaborative workspaces• Semantic web technologies• Digital library technologies• Awareness tools

Two of the awareness tools are being used to analyze pan flu plans• THEMAT (Theme Awareness Tool)• CAT (Content Awareness Tool)

Page 6: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

5

E-SOS – System Goals

Collaborative workspaces where users can report and discuss information“Awareness tools”which display information that’s relevant to what users are currently reporting and discussingTechnologies that synthesize information from heterogeneous sources

Page 7: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

6

Texts – Pan Flu Plans

The White House Implementation Plan for the National Strategy for Pandemic InfluenzaUS Health and Human Services Federal Guidance to Assist States in Improving State-Level Pandemic Influenza Operating PlansThe Centers for Disease Control and Prevention (CDC) Influenza Pandemic Operation Plan (OPLAN)Pan flu plans for the states• In the initial study (December 2008 – February 2009) the Florida Department of

Health team provided the LANL team with the above pan flu plans as well as pan flu plans for six states in the US

• In the subsequent study (March 2009 - present) the LANL team expanded the project to include pan flu plans for all 50 states in the US, in keeping with its mission as a national laboratory

Page 8: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

7

THEMAT - Methodology

Extract themes, called knowledge signatures (kSigs), from each documentCreate sets of kSigs (taxonomies) for each document or set of documentsCompare taxonomies to identify unique and common themesCompute the network analytic relationships among themesGenerate theme network (tNet) visualizations for each document or set of documents

Page 9: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

8

THEMAT – Results

A set of knowledge signatures (kSigs) was generated for each documentThe common themes in pan flu plans were identified The unique themes in pan flu plans were identifiedThe common and unique themes were displayed side-by-side in columns to facilitate comparisonThe kSigs were highlighted in all documentsTheme networks (tNets) were created for each plan to display keyconcepts and their relationships

Page 10: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

9Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human Services, Centers for Disease Control, and six states showing similarities and differences: for example, “authorities” and “countries” are important in the federal plans but not the state plans

Page 11: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

10Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human Services, and Centers for Disease Control showing similarities and differences: for example, “global influenza preparedness” is important in the WH and HHS plans but not in the CDC plan

Page 12: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

11Interactive interface for finding the specific files where a knowledge signature (kSig) is located: in this case, “global influenza preparedness” can be found in Appendix C of the Health andHuman Services pan flu plan and in Chapter 5 of the US White House pan flu plan

Page 13: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

12Interactive interface for locating highlighted knowledge signatures (kSigs) within a document: in this case, “global influenza preparedness” in the US White House pan flu plan

Navigationcolumnwith links to all of the pages and kSigs in a document

Page 14: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

13Interactive interface visualizing the theme network (tNet) for the US Centers for Disease Control pan flu plan, showing which themes are related to each other and the importance of each theme (most important to least important = red, blue, green, yellow)

Page 15: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

14Visualization of the relative number of themes in the plans and the degree of similarity among and between them: in this case, the CDC plan and the California plan are least similar to each other and appear on the X-axis; the Arizona plan is a few degrees closer to the Californiaplan; the White House Plan is a few degrees closer to the CDC plan; and the White House plan contains more of the core themes than the other plans do

Page 16: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

15

THEMAT Analysis of Pan Flu Plans –Preliminary Conclusions

Displays of common and unique themes reveal similarities and differences in the plans• Some of the differences reflect differences in the authorities and responsibilities of

the government agencies that created the plans• Some of the differences reflect inconsistencies in terminology, which is a potential

human factors problem

Displays of knowledge signatures (kSigs) highlighted in the text reflect the original document structure• A common structure for all of the documents and plans would make it easier for

users to read through the highlighted kSigs and compare plans

Theme networks (tNets) provide a more compact view of themes than lists• Visualizations are not as familiar as lists, so some users may find them difficult to

understand

Page 17: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

16

CAT – Methodology

Step 1: Create a repository of content at LANLStep 2: Configure the CAT to include this repository as one of the targets of the federated searchStep 3: See what links to relevant content the CAT retrieves as users compose text about pan flu plansStep 4: Revise the targets as needed

CAT Components

Page 18: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

17

CAT – Results

A repository of content was created at LANLThe CAT was configured to include this repository as one of the targets of the federated searchThe number of links returned depends on the number of targets• One link per target is returned for each segment of parsed text (aka REST

query) • In order to increase the number of links, the number of targets has to be

increased

The repository of pan flu documents was subsequently divided into a number of smaller targets in order to increase the number of linksThis allows for finer-grained access

Page 19: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

18

CAT – Screen Shot

Results of a federated search of (1) one local repository of pandemic flu plans and (2) four websites with pandemic flu content

Page 20: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

19

CAT Linking of Pan Flu Plans – Preliminary Conclusions

The resulting output, showing links to relevant texts, reflects the target structure• Creating a collection of smaller, finer-grained targets optimizes the federated

search capabilities of the CAT

Under these conditions, the CAT facilitates:• Locating content that is related to the topic the user is writing about• Following links to that content• Creating a list of references for the topic

Page 21: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

20

Future Work

Use another E-SOS tool, the Web Awareness Tool (WEBAT), to collect more pan flu content from the web• Focus on adding pan flu plans from other countries

Use the THEMAT to analyze the additional contentUse the CAT to make the additional content a target of a federated searchBridge the two tools in order to better support planningShare the resulting text analysis with the appropriate federal agencies in the USShare relevant portions of the text analysis with the appropriate state agencies in the US and request that the person(s) responsible for writing pan flu plans complete a questionnaire providing feedback

Page 22: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

21

Questions?

Page 23: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

22

CAT – Drupal – Screen Shot