Transcript
Page 1: Building a massive biomedical knowledge graph with citizen science

Building a massive biomedical knowledge

graph with citizen scienceBenjamin Good

The Scripps Research Institute @bgood

Not paying attention? be a citizen scientist at http://mark2cure.org

Page 2: Building a massive biomedical knowledge graph with citizen science

High level goal: improve access to published knowledge

22

articles added to PubMed per year

1 every 30 seconds, more than million a year

knowledge graph

Page 3: Building a massive biomedical knowledge graph with citizen science

Chemicals & drugsGenesOrganismsArea of studyBiological Process

Auto!Knowledge Graph

~10,000 articles

Ngly1 gene

?

New drug candidate?

Page 4: Building a massive biomedical knowledge graph with citizen science

Knowledge graph problems

• Assigning meaning to relations

• Incorrect relations • Missing relations • …

Page 5: Building a massive biomedical knowledge graph with citizen science

Facts of life in computer processing of human language

• False Positives and False Negatives always

• Human annotators remain the gold standard

• There are not nearly enough professional human annotators to process every document published

5 Not paying attention? be a citizen scientist at http://mark2cure.org

Page 6: Building a massive biomedical knowledge graph with citizen science

Observations

• There are about 2.92 billion Internet users

• Lots of them can read English

6 http://www.statista.com/statistics/273018/number-of-internet-users-worldwide/

Page 7: Building a massive biomedical knowledge graph with citizen science

Hypothesis

• We can generate the equivalent of massive numbers of professional annotators by aggregating the labor of large numbers of non-professional CITIZEN SCIENTISTS!!!

7

Page 8: Building a massive biomedical knowledge graph with citizen science

Building a Knowledge Graph

1. Find mentions of concepts in text

2. Identify relationships between concepts

8

Page 9: Building a massive biomedical knowledge graph with citizen science

Before we try for citizens..

• Can non-scientists collectively identify concepts in biomedical texts with high quality?

• We used the Amazon Mechanical Turk crowdsourcing platform to answer the question

9 Not paying attention? be a citizen scientist at http://mark2cure.org

Page 10: Building a massive biomedical knowledge graph with citizen science

Highlight the “disease”.

Page 11: Building a massive biomedical knowledge graph with citizen science

Answer was yes

• By combining the responses of multiple non-professional members of ‘the crowd’, we achieved equivalent quality to professional annotators

Good et al. “Microtask crowdsourcing for disease mention annotation in pubmed abstracts.” Pacific Symposium on Biocomputing 2015

http://psb.stanford.edu/psb-online/proceedings/psb15/good.pdf

Page 12: Building a massive biomedical knowledge graph with citizen science

Mark2Cure.org

Page 13: Building a massive biomedical knowledge graph with citizen science

Same task, different context

Page 14: Building a massive biomedical knowledge graph with citizen science

Experiment 1 in progressEvaluating quality and quantity of volunteer annotators

Goal is to complete about 600 abstracts, with 15 volunteers per abstract

Almost there!

Page 15: Building a massive biomedical knowledge graph with citizen science

mark2cure experiment 1Tasks/10

New usersLaunchTweet

Blog post

San Diego Union Tribune

Article

11:00am Feb. 9 5423, tasks complete

230 signups, 130 have completed a task

Not paying attention? be a citizen scientist at http://mark2cure.org

Page 16: Building a massive biomedical knowledge graph with citizen science

Next steps

• Implement and test a relation extraction workflow

• Start disease-focused knowledge capture missions

• First disease: NGLY1 deficiency

• http://ngly1.org

Page 17: Building a massive biomedical knowledge graph with citizen science

Thanks to the mark2cure team!

Max Nanis

Andrew Su

@bgood [email protected]

Ginger Tsueng

Chunlei Wu

Thank you to the citizen scientists

making this possible!

Page 18: Building a massive biomedical knowledge graph with citizen science

Why do I Mark2Cure?In memory of my daughter who had Cystic Fibrosis

Studied biology in college and I really miss it! My 4 year old daughter Phoebe is living with and battling rare disease.

I have Ehlers Danlos Syndrome. I hope to help people learn about this painful and debilitating disorder, so that others like me can receive more effective medical care.

I am retired, have a doctorate in medical humanities, and have two children with Gaucher disease. I am just looking for some way to put my education to use.

To give back

I Mark2Cure in memory of my son Mike who had type 1 diabetes.

Take part in something that helps humanity.

Page 19: Building a massive biomedical knowledge graph with citizen science
Page 20: Building a massive biomedical knowledge graph with citizen science

Increase precision with voting

20

1 or more votes (K=1)This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

K=2This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

K=3This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

K=4This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

Aggregation function

Page 21: Building a massive biomedical knowledge graph with citizen science

AMT results: 589 abstracts compared to gold standard

21

F = 0.87, k = 6


Recommended