Building a massive biomedical knowledge graph with citizen science

  • Published on
    14-Jul-2015

  • View
    438

  • Download
    1

Embed Size (px)

Transcript

  • Building a massive biomedical knowledge

    graph with citizen scienceBenjamin Good

    The Scripps Research Institute @bgood

    Not paying attention? be a citizen scientist at http://mark2cure.org

  • High level goal: improve access to published knowledge

    22

    articles added to PubMed per year

    1 every 30 seconds, more than million a year

    knowledge graph

  • Chemicals & drugsGenesOrganismsArea of studyBiological Process

    Auto!Knowledge Graph

    ~10,000 articles

    Ngly1 gene

    ?

    New drug candidate?

  • Knowledge graph problems

    Assigning meaning to relations

    Incorrect relations Missing relations

  • Facts of life in computer processing of human language

    False Positives and False Negatives always

    Human annotators remain the gold standard

    There are not nearly enough professional human annotators to process every document published

    5 Not paying attention? be a citizen scientist at http://mark2cure.org

  • Observations

    There are about 2.92 billion Internet users

    Lots of them can read English

    6 http://www.statista.com/statistics/273018/number-of-internet-users-worldwide/

  • Hypothesis

    We can generate the equivalent of massive numbers of professional annotators by aggregating the labor of large numbers of non-professional CITIZEN SCIENTISTS!!!

    7

  • Building a Knowledge Graph

    1. Find mentions of concepts in text

    2. Identify relationships between concepts

    8

  • Before we try for citizens..

    Can non-scientists collectively identify concepts in biomedical texts with high quality?

    We used the Amazon Mechanical Turk crowdsourcing platform to answer the question

    9 Not paying attention? be a citizen scientist at http://mark2cure.org

  • Highlight the disease.

  • Answer was yes

    By combining the responses of multiple non-professional members of the crowd, we achieved equivalent quality to professional annotators

    Good et al. Microtask crowdsourcing for disease mention annotation in pubmed abstracts. Pacific Symposium on Biocomputing 2015

    http://psb.stanford.edu/psb-online/proceedings/psb15/good.pdf

  • Mark2Cure.org

  • Same task, different context

  • Experiment 1 in progressEvaluating quality and quantity of volunteer annotators

    Goal is to complete about 600 abstracts, with 15 volunteers per abstract

    Almost there!

  • mark2cure experiment 1Tasks/10

    New usersLaunchTweet

    Blog post

    San Diego Union Tribune

    Article

    11:00am Feb. 9 5423, tasks complete

    230 signups, 130 have completed a taskNot paying attention? be a citizen scientist at http://mark2cure.org

  • Next steps

    Implement and test a relation extraction workflow

    Start disease-focused knowledge capture missions

    First disease: NGLY1 deficiency

    http://ngly1.org

  • Thanks to the mark2cure team!

    Max Nanis

    Andrew Su

    @bgood bgood@scripps.edu

    Ginger Tsueng

    Chunlei Wu

    Thank you to the citizen scientists

    making this possible!

  • Why do I Mark2Cure?In memory of my daughter who had Cystic Fibrosis

    Studied biology in college and I really miss it! My 4 year old daughter Phoebe is living with and battling rare disease.

    I have Ehlers Danlos Syndrome. I hope to help people learn about this painful and debilitating disorder, so that others like me can receive more effective medical care.

    I am retired, have a doctorate in medical humanities, and have two children with Gaucher disease. I am just looking for some way to put my education to use.

    To give back

    I Mark2Cure in memory of my son Mike who had type 1 diabetes.

    Take part in something that helps humanity.

  • Increase precision with voting

    20

    1 or more votes (K=1)This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

    K=2This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

    K=3This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

    K=4This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

    Aggregation function

  • AMT results: 589 abstracts compared to gold standard

    21

    F = 0.87, k = 6

Recommended

View more >