Creating structured biomedical knowledge networks via crowdsourcing
19
Creating structured biomedical knowledge networks via crowdsourcing Tong Shu Li Su Lab, The Scripps Research Institute Bio-Ontologies SIG, ISMB 2015 2015-07-10
Creating structured biomedical knowledge networks via crowdsourcing
1. Creating structured biomedical knowledge networks via
crowdsourcing Tong Shu Li Su Lab, The Scripps Research Institute
Bio-Ontologies SIG, ISMB 2015 2015-07-10
2. Knowledge networks allow for result interpretation
Bainbridge 2011
3. Network creation process
4. Relationship extraction subproblems
5. Crowdsourcing introduction Members of the public perform
small tasks for small amounts of money Tasks are usually difficult
for computers Workers contribute as a way of earning supplemental
income Useful source of labor for academics and companies
6. Crowdsourcing driven biocuration Goal: replicate work done
by PhD biocurators with members of the crowd Advantages:
Scalability Faster results at a lower cost Well suited for
non-automatable tasks where an expert is not necessary
7. Crowdsourcing relies on gold standards for validation
Crowdsourcing methods need to be validated with gold standards Gold
standard: EU-ADR corpus [1] Positive: known relationship
Speculative: uncertain relationship Negative: known lack of
relationship False: no claim of relationship Sentence-bound
relationships 300 Abstracts annotated with relationships between
genes/diseases/drugs [1] van Mulligan et al. (2012) J. Biomed
Inform. 45: 879
8. Platform interface for relation annotation
9. Crowd agreement with the EU-ADR Strict agreement with
EU-ADR: 71.67% (43/60 sentences) Agreement after combining
speculative and positive: 76.67% 10 judgements/sentence 10
cents/judgement Time to complete: 2 hours Total cost: $182.21
USD
10. Variability of gold standards Number of experts who chose
that relationship type Percent of raw EU-ADR relations
11. Crowd agreement as a proxy for clarity Percent of crowd
which chose published EU- ADR answer
12. Crowd agreement and accuracy probability Percent crowd
agreement for the top choice Percent of annotations which agreed
with EU-ADR
13. Abstract level relationship extraction
14. Preliminary results AUC of 0.904 Max F-score of 0.791
(0.773 precision, 0.809 recall) Max F-score achieved at a voting
score of 0.407 4.5 hours, $54.72 USD to annotate 30 abstracts
15. Conclusion and next steps Gold standards are variable and
imperfect Binary agreement may hide interesting information Expert
and crowd agreement can be used to measure gold standard
consistency Ambiguous portions of a gold standard may need to be
treated differently during evaluations Integration with machine
learning methods Data generation Feature extraction Semantically
typed relationships
16. Acknowledgements Dr. Andrew Su Dr. Benjamin Good Dr. Laura
Furlong Dr. Zhiyong Lu The Su Lab Were hiring!
17. EU-ADR relationship examples Positive For exposure levels
within standard recommended guidelines, radioisotopes are far more
likely to play a role in the occurrence of spontaneous abortions
than X- rays. Speculative Information from the SITE Cohort Study
should clarify whether use of these immunosuppressive drugs for
ocular inflammation increases the risk of mortality and fatal
cancer. Negative We found no evidence of impaired control of the
carbohydrate and lipid metabolism or aggravation of vascular
lesions during the two years an etonogestrel implant was used by
diabetic women. False The frequency of PONV did not correlate to
the amounts of alfentanil, propofol, postoperative antiemetics
consumed, or to female gender, non-smoking status, and history of
PONV or motion sickness.
18. Data for all 244 drug-disease sentences
19. Crowd agreement and accuracy probability Percent of
annotations which agreed with EU-ADR Percent crowd agreement for
the top choice