Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Data mining for Dummies
Melanie Ganz-Benjaminsen Assistant Professor
Neurobiology Research UnitCopenhagen University Hospital/Rigshospitalet
Department of Computer ScienceUniversity of Copenhagen
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
MSc in Physics PhD in CSPostDocin USA
PostDocat RH
Asst. Prof. at DIKU
Who am I?
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Data mining
• process used to extract usable data from a larger set of “raw” data
• greatly exceeds the average data analysis you can do manually
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Data science
From http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Machine learning
• think of machine learning as a means of building models of data
• mathematical models that help understand the data
• “learning” since there are parameters in the model that get tuned based on the available data
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Machine learning
• Supervised learning:
– Classification
– Regression
• Unsupervised learning:
– Dimensionality reduction
– Clustering
• Semi-supervised learning
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Classification: Predicting discrete labels
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Classification: Predicting discrete labels
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Classification: Predicting discrete labels
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Contextualization
People who suffered a stroke
Healthy controls
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Models of existing data
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Prediction on new data
Categorize/make risk profiles for new patients
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Too easy?
• benefit of the machine learning approach is that it can generalize to much larger datasets in many more dimensions!
• More dimensions? -> more variables e.g. gender, family history, etc.
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Real example - clustering
Images taken from Beliveau et al., JNS (2017)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Real example - clustering
K = 7 K = 18
Images courtesy of Vincent Beliveau
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Recap Machine learning
• is building mathematical models to help describe the relation between “input” and “output” data
– input can be age and blood pressure and output stroke status
– or input can be 5-dimensional serotonin data at ca. 10.000 vertexes of cortex and output the number of regions I want to cluster the cortical data in
• BUT mathematical models can be limited and need to be appropriate for your data
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Bottom line?
High dimensional clinical and epidemiological data & statistical models with computer power
behind them (aka machine learning)
Personalized medicine ?
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
KU Artificial Intelligence centre
• The Data Science Laboratory (DSL) acts as the entrance for researchers and students to the AI Centre.
• Its overall aim is to enhance the quality of data analyses in research carried out at SCIENCE.
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Thank for your attention!
Questions?
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
References
• Brown, M.S., 2014. Data mining for dummies. John Wiley & Sons
• https://jakevdp.github.io/PythonDataScienceHandbook/
• http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
• Beliveau, V., Ganz, M., Feng, L., Ozenne, B., Højgaard, L., Fisher, P.M., Svarer, C., Greve, D.N. and Knudsen, G.M., 2017. A high-resolution in vivo atlas of the human brain's serotonin system. Journal of Neuroscience, 37(1), pp.120-128.
• Data Science lab: https://datalab.science.ku.dk/english/