Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
LEARNINGPROPORTIONSINASEMI-SUPERVISEDSETTING:ACASESTUDYIN
PRECISIONMEDICINE
PredragRadivojac
DEPARTMENTOFCOMPUTERSCIENCEANDINFORMATICSINDIANAUNIVERSITY,BLOOMINGTON
November28,2016
AtsomepredicLonthreshold[one]ThirdoftheAlternaLveSplicingIsoformspredictedtoproduceFuncLonalProteins….
WHATISTHEFRACTIONOFENZYMESINAGENOME?
WHATISTHEFRACTIONOFENZYMESINAGENOME?
CharlesDann,Chemistry YuzhenYe,ComputerScienceTuliMukhopadhyay,Biology
EXAMPLEFROMPSYCHOLOGY
GreeneMR.EsLmaLonsofobjectfrequencyarefrequentlyoveresLmated.Cogni&on(2016)149:6-10.
ANDSOWEGO…
HiPedja,Youpose interesLngquesLons. I'dexpectyeasttohavethehighestenzymefracLonas itdoesnotneedtohaveconservedgenesformulLcellulardevelopment,cogniLon,etc.(thoughmanyoftheseprocessesrequiressignalingpathwayswithenzymes).SoherearemyesLmatesforenzymefracLon,basedenLrelyonintuiLon.
Yeast~45%;E.coli~35%;Mouse~25%;Human~25%;Arabidopsis~40%(noideahere)
IimagineImayhitlowonallofthese...
CD3
SUPERVISEDLEARNINGPROBLEM
SUPERVISEDLEARNINGPROBLEM
SUPERVISEDLEARNINGPROBLEM
SUPERVISEDLEARNINGPROBLEM
SEMI-SUPERVISEDLEARNINGPROBLEM
UNSUPERVISEDLEARNINGPROBLEM
POSITIVE-UNLABELEDLEARNINGPROBLEM(PU)
IDENTIFIABILITY
ANTICIPATEDLIKELIHOODFUNCTION
ENZYMES:EXPERIMENTALPROTOCOL
>sp|P04637|P53_HUMAN Cellular tumor antigen p53 MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
DevelopanSVMpredictor
- Linearkernel- AUC≈75%- Plam’scorrecLon
AAAA AAAC MEEP YYYY
0 0 1 0
VVVP
1 ... ...
RESULTS:ENZYMES
RESULTS:ENZYMES
DISEASEMUTATIONSINHUMANEXOME
PRECISIONMEDICINE
“Sotonight,I’mlaunchinganewPrecisionMedicineIniLaLvetobringusclosertocuringdiseaseslikecanceranddiabetes,andtogiveallofusaccesstothepersonalizedinformaLonweneedtokeepourselvesandourfamilieshealthier.Wecandothis.“–PresidentBarackObama.
01/20/2015
PrecisionMedicine
thescienceandpracLceofmatchingthebestdiagnosLc,therapeuLcand
prevenLonstrategiestopromotehealththataretailoredtoanindividual’sgeneLc,biological,behavioraland
psychosocialcharacterisLcs
PRECISIONMEDICINE
www.nih.gov/AllofUs-Research-Program
nih.gov
hmp://bgiamericas.com
GENOMESEQUENCING
TheAtlanLc
… AGCATACCGA …
HUMANGENOMEANDITSIMPACTONPHENOTYPEWHATISTHEMOLECULARBASISOFIT?
GENOMEPHENOME
… TTTACCGAGC …
… AGCATAGCGA …
Adaptedfrom:hmp://snp.ims.u-tokyo.ac.jp/samplesMethods.html#SNP
Gene
ExonDNA
>40millionknownuniquesitesofvaria5on!
Yueetal.JMolBiol,353:459(2005).
G38Din1tag
Anothergene…
regulatory non-synonymous intronicgenomicsynonymous
BASECHANGESRESULTINGINDIFFERENTPROTEINS
Whenappliedto43nsSNPsof18drugrelatedgenesfromtheThaiSNPresequencingprojecttherewerestrongcorrelaLons SlidefromSeanMooney’sgroup.
CURRENTTOOLSPREDICTEFFECTSOFVARIANTS
GROWTHOFDATA
HapMap Phase I
1000 Genomes project Phase I
TUMORBOARD
hmp://www.med.umich.edu/cancer/images/urologic-oncology-tumor-board.jpg
Person:68yearoldwomanCancertype:coloncancer,metastaLcMuta7ons:KRAS,C27FBRCA1,H57RTP53,T98*
Treatmentop7ons:-clinicaltrialatMDAnderson-conLnuewithchemotherapy-treatwithnewdrugforbreastcancer
MOLECULARCONSEQUENCESONP53
R175H: Metal-binding
V143A: Stability
K120R: Acetylation
R273H: DNA-binding
G245S: Protein-binding
p53–tumorsuppressorprotein
PDBstructures:2ybg,2j1w,1ycsand1tup
MUTPRED2.0
VikasPejaver,IndianaUniversity
Conservation Funct. Prop. Sequence Struct. Prop.
Neural networks
>sp|P04637|P53_HUMAN MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP DEAPRMPEAAPRVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS ...
...
Physicochemical Substitution matrices
Neural network ensemble:
• Z-score normalized and PCA’d • 30 feed-forward networks • bootstrap aggregating, balanced training • trained using resilient propagation
Cataly7cresiduesofPCSK9(2qtw)
n amemberoftheproteinaseKsub-familyofsubLlasesthatreducesthenumberofLDLreceptorsinliverthroughaposmranscripLonalmechanism.
n D374Yleadstoa10-foldincreaseincatalyLcacLvitythatcauseshypercholesterolemia.
Lagaceetal.JClinInvest,116:2995(2006). Xinetal.Bioinforma&cs26:1975-1982(2010).
GAINOFCATALYTICACTIVITYCAUSESDISEASE