Upload
angelica-harvey
View
212
Download
0
Embed Size (px)
Citation preview
Functional Genomic Hypothesis Functional Genomic Hypothesis Generation and Experimentation Generation and Experimentation
by a Robot Scientistby a Robot Scientist
King et al, Nature 2004 427:247-252 King et al, Nature 2004 427:247-252
Presented by Monica C. SleumerPresented by Monica C. Sleumer
February 5, 2004February 5, 2004
Scientific DiscoveryScientific Discovery
““Branch of AI devoted to developing algorithms for Branch of AI devoted to developing algorithms for acquiring scientific knowledge”acquiring scientific knowledge”
Current applications:Current applications:– Analysis of mass-spec dataAnalysis of mass-spec data– Discovering structure-activity relationships for compounds Discovering structure-activity relationships for compounds – Making semantic connections in published literatureMaking semantic connections in published literature– Predicting mechanisms for chemical reactionsPredicting mechanisms for chemical reactions– Revising taxonomies to accommodate new dataRevising taxonomies to accommodate new data
Connect to laboratory instrumentationConnect to laboratory instrumentation
AccomplishmentAccomplishment
Automated entire scientific processAutomated entire scientific process
Robotic system that uses AI to “carry out Robotic system that uses AI to “carry out cycles of scientific experimentation”:cycles of scientific experimentation”:– Originates hypothesesOriginates hypotheses– Designs experimentsDesigns experiments– Performs the experimentsPerforms the experiments– Interprets the resultsInterprets the results
Application: Functional genomicsApplication: Functional genomics
Function unknown for 30% of yeast genesFunction unknown for 30% of yeast genesComplete laboratory automation possible Complete laboratory automation possible Goal: connect genes to their functionGoal: connect genes to their functionUsing: Using: – Logical model of aromatic amino acid Logical model of aromatic amino acid
synthesis pathwaysynthesis pathway– 8 deletion mutants8 deletion mutants– 9 metabolites9 metabolites– Auxotrophic growth experimentsAuxotrophic growth experiments
Aromatic Amino Acid PathwayAromatic Amino Acid Pathway
Classical vs Robot ScienceClassical vs Robot Science
Classical method:Classical method:– Scientific expertise and imagination used to Scientific expertise and imagination used to
form hypothesesform hypotheses– Consequences of hypotheses tested by Consequences of hypotheses tested by
experimentexperiment
Robot Scientist:Robot Scientist:– Hypotheses formed by abductionHypotheses formed by abduction– Tested by deduction Tested by deduction
Deduction and AbductionDeduction and Abduction
DeductionDeduction– Rule: P Rule: P Q, Fact: ~Q, Infer: ~P Q, Fact: ~Q, Infer: ~P – E.g.E.g. If a cell grows on minimal medium, then it can If a cell grows on minimal medium, then it can
synthesise tryptophansynthesise tryptophan– Fact Fact Cell cannot synthesise tryptophanCell cannot synthesise tryptophan– ∴ ∴ Cell cannot grow on minimal mediumCell cannot grow on minimal medium
AbductionAbduction– Rule: P Rule: P Q, Fact: ~P, Hypothesize: ~Q Q, Fact: ~P, Hypothesize: ~Q – E.g.E.g. If a cell grows on minimal medium, then it can If a cell grows on minimal medium, then it can
synthesise tryptophansynthesise tryptophan– Fact Fact Cell cannot grow on minimal mediumCell cannot grow on minimal medium– ∴ ∴ Cell cannot synthesise tryptophanCell cannot synthesise tryptophan
ImplementationImplementation
Software:Software:– Background knowledgeBackground knowledge– Logical inference engineLogical inference engine– Hypothesis generation codeHypothesis generation code– Experiment selection codeExperiment selection code– LIMS codeLIMS code
Hardware:Hardware:– Liquid-handling robotLiquid-handling robot– Plate readerPlate reader– CPU to do the scientific reasoningCPU to do the scientific reasoning
No human intellectual input into:No human intellectual input into:– Experimental designExperimental design– Data interpretationData interpretation
Robot ScientistRobot Scientist
Logical ProcessLogical Process
Prolog used to model dataProlog used to model dataMetabolic pathway represented as a Metabolic pathway represented as a directed graphdirected graphDeduction: a knockout mutant will grow Deduction: a knockout mutant will grow IFF a path can be found from the given IFF a path can be found from the given metabolites to the 3 needed aa.metabolites to the 3 needed aa.Abduction: if a knockout mutant doesn’t Abduction: if a knockout mutant doesn’t grow using the given metabolites: grow using the given metabolites: hypothesize which enzyme is missing hypothesize which enzyme is missing
Machine LearningMachine Learning
Improves performance based on prior Improves performance based on prior experienceexperience
Each hypothesis hasEach hypothesis has– Cost of testingCost of testing– Probability of being correctProbability of being correct
GoalsGoals– Find out which gene goes with which enzymeFind out which gene goes with which enzyme– Use the fewest possible resourcesUse the fewest possible resources
Experiment ChoosingExperiment Choosing
3 ways:3 ways:– Intelligent: “ASE”Intelligent: “ASE”– Cheapest Experiment: NaïveCheapest Experiment: Naïve– Random ExperimentRandom Experiment
Performance: Performance: – Accuracy: # of correct predictions madeAccuracy: # of correct predictions made– Cost and number of experiments requiredCost and number of experiments required
Both real experiments and simulationsBoth real experiments and simulationsComparison to humanComparison to human
Accuracy of the Experiment ChoosersAccuracy of the Experiment Choosers
ASE
Naive
Random
ASE
Naive
Random
Results of Computer SimulationsResults of Computer Simulations
ASENaive
Random
Random
ASENaive
No noise
Noise
ConclusionsConclusions
Scientific process can be automatedScientific process can be automated
Experiment selection strategies have significant Experiment selection strategies have significant impact on costimpact on cost
ASE outperforms ASE outperforms – Naïve by 3 foldNaïve by 3 fold– Random by 100 foldRandom by 100 fold
in terms of costin terms of cost
Performance is competitive with humanPerformance is competitive with human
Cost-effectiveness of science can be improvedCost-effectiveness of science can be improved
Future WorkFuture Work
Extend system to uncover function of other Extend system to uncover function of other metabolic genesmetabolic genes
Would need to:Would need to:– Extend model to entire biochemical pathway Extend model to entire biochemical pathway
in KEGGin KEGG– Become more robust in terms of possible Become more robust in terms of possible
errors in KEGGerrors in KEGG– Include prediction of previously unknown Include prediction of previously unknown
enzymesenzymes
CriticismsCriticisms
De-emphasis on how little of the pathway De-emphasis on how little of the pathway was actually testedwas actually tested
Not clear how deletion mutants were Not clear how deletion mutants were chosenchosen
No example of experiment cycleNo example of experiment cycle
Too large of a jump from theory to resultsToo large of a jump from theory to results
Results graphs too crowdedResults graphs too crowded
Discussion QuestionsDiscussion Questions
Would computer-generated experiments and Would computer-generated experiments and results be accepted?results be accepted?How much would we have to understand about a How much would we have to understand about a computer-generated discovery process?computer-generated discovery process?Compare this system to currently common Compare this system to currently common method of: method of: – Large-scale generation of dataLarge-scale generation of data– Extraction of knowledge by data-mining systems Extraction of knowledge by data-mining systems
What other aspects of genome analysis could What other aspects of genome analysis could scientific discovery be applied to?scientific discovery be applied to?