24
Extracting patient data from tables in clinical literature Case study on extraction of BMI, weight and number of patients Nikola Milosevic, Cassie Gregson, Robert Hernandez, Goran Nenadic

Extracting patient data from tables in clinical literature

Embed Size (px)

Citation preview

Page 1: Extracting patient data from tables in clinical literature

Extracting patient data from tables in clinical literature

Case study on extraction of BMI, weight and number of patients

Nikola Milosevic, Cassie Gregson, Robert Hernandez, Goran Nenadic

Page 2: Extracting patient data from tables in clinical literature

Clinical trial literature

• PubMed contains nearly 800 000 clinical trial publications

• Researchers challenged with the amount of published literature

Page 3: Extracting patient data from tables in clinical literature

Help from text mining?

• Text mining provides methods to process text on a large scale

• Current text mining efforts were mainly focused on text, rather than tables and figures

Page 4: Extracting patient data from tables in clinical literature

Tables in clinical documents

• A clinical trial publication contain 2.1 tables • Tables often contain information about

settings and findings of experiments

Page 5: Extracting patient data from tables in clinical literature

Challenges for table mining

• Dense content• Variety of layouts• Variety of value representation formats• Misleading visualization markup• Lack of resources (labelled datasets)• How to automatically make make sense from tables

Page 6: Extracting patient data from tables in clinical literature

Aim – a case study

• Extract information about number of patients, patient’s BMI and weight from tables in clinical trial literature

• A multi-layered approach to mining information from tables – to facilitate largescale semi-automated extraction – curation of data stored in tables

Page 7: Extracting patient data from tables in clinical literature

Methodology overview

• Rule based methodology– Rules created based on a manual analysis of small

subset of tables• Five processing layers– Detection– Functional– Structural– Syntactic– Semantic

Page 8: Extracting patient data from tables in clinical literature

Methodology overview

Page 9: Extracting patient data from tables in clinical literature

Table model

• We model 4 main types of tables– List– Matrix– Super-row– Multi-tables

• Based on table dimensionality

Page 10: Extracting patient data from tables in clinical literature

Table types (1)• List table:

Page 11: Extracting patient data from tables in clinical literature

Table types (2)

• Matrix table

Page 12: Extracting patient data from tables in clinical literature

Table types (3)

• Super-row table

Page 13: Extracting patient data from tables in clinical literature

Table types (4)

• Multi-table

Page 14: Extracting patient data from tables in clinical literature

1. Functional analysis

• Classifies cells to functional classes– Header, – super-row, – stub, – data

• Uses heuristics based on content and position

Page 15: Extracting patient data from tables in clinical literature

2. Structural analysis

• Determines relationships between cells• Using cell functions and table structure classifies

table into one of the structural table type:– List– Matrix– Super-row– Multi-table

• Based on the type, set of rules resolves the relationships

Page 16: Extracting patient data from tables in clinical literature

3.1 Extracting number of patient• Heuristic based approach• Searches captions, headers, cells• In captions 2 rules:

– n=%d– %d Adj*(patients|participants|subjects|individuals)– Usually total number of patients is found

• In header – usually n=%d– can be partial, needs adding up

• In cells– stub contains defined word or phrase– Can be partial, needs adding up

Page 17: Extracting patient data from tables in clinical literature

3.2 Extracting BMI

• Based on trigger phrase (BMI, body mass index) list and black list (change, increase)

• Trigger words in stub or header invoke possibility of appearance

• If black listed word is in vicinity it discards the value

• Range of 14-40

Page 18: Extracting patient data from tables in clinical literature

3.3 Extracting weights

• Based on trigger words and black lists• Looking in stub and header for words from

lists and values in data cells• Not useful to set range– Person can have 40 – 150 kg– In lbs: 80 – 350 lbs– Baby can have 1500 – 5000 g

Page 19: Extracting patient data from tables in clinical literature

Results• Corpus contained 3573 tables in 1273 documents• Each table on average 80 cells• Evaluating Functional and Structural processing: – Selected random 100 tables of each type and

evaluated• Evaluating information extraction:– Number of patients: • 758 contained data• 50 random documents

– BMI and weight: • 113 documents containing these information

Page 20: Extracting patient data from tables in clinical literature

Functional analysis results

Page 21: Extracting patient data from tables in clinical literature

Results for information extraction

• Extracting number of patients:

• Extracting weight and BMI:

Page 22: Extracting patient data from tables in clinical literature

Discussion

• Better scoped values, such as BMI can be modelled – better performance

• Define exhaustive white and black lists• Variety of presentation formats and means• Misleading markup• However, promising results

Page 23: Extracting patient data from tables in clinical literature

Summary

• Large-scale table mining to harvest population details from clinical trials

• Classified tables based on layout• Case study on clinical trial patient number,

BMI and weight• Promising performance

Page 24: Extracting patient data from tables in clinical literature

[email protected]

Nikola Milosevic
Add website and twitter contacts