Upload
dansk-bibliotekscenter
View
992
Download
0
Embed Size (px)
Citation preview
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Faculty of Science
Machine Learning for Science and SocietySome recent results
Christian IgelDepartment of Computer Science
Slide 1/30
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Why machine learning?
• Sometimes problems cannot be solved in the traditional waybecause
• the designer’s knowledge is limited, and/or• the sheer complexity and variability precludes an
accurate description.
• However, large amounts of data describing the task are oftenavailable or can be automatically obtained – and machinelearning can solve the problem.
Slide 4/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Why machine learning?
• Sometimes problems cannot be solved in the traditional waybecause
• the designer’s knowledge is limited, and/or• the sheer complexity and variability precludes an
accurate description.
• However, large amounts of data describing the task are oftenavailable or can be automatically obtained – and machinelearning can solve the problem.
Slide 5/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
Machine learning turns data into knowledge
Machine learning automates the process ofinductive inference:
1 Observe a phenomenon
2 Construct a model of that phenomenon
3 Make predictions using this model
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Alzheimer’s Disease progression
Disease progression
Cells
healthy neurons formation of A� plaques
and NFTsneuronal death
NFTs
A� plaques
MRI
structure normal size/shape normal size/shape abnormal size/shape
(atrophy)
MRI
intensity normal variation abnormal variation abnormal variation
Sca
le
Slide 7/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Hippocampal texture from structural MRI
• Image analysis of brain scans• Segmentation of brain regions
(hippocampi)• Extraction of image features
• Classification using SVMs
Sørensen, Igel, Hansen, Osler, Lauritzen, Rostrup, Nielsen. Early detection of Alzheimer’s disease using MRI hippocampaltexture. Human Brain Mapping, 2016
Slide 8/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
CADDementia Challenge
• Computer-Aided Diagnosis of Dementia based on structuralMRI data (CADDementia) competition
• Goal: Diagnosis into Alzheimer’s disease (AD), mild cognitiveimpairment (MCI), and normal controls (NC)
• No labelled training data was provided, we trained on freelyavailable data sets.
• We combined several biomarkers (cortical thicknessmeasurements, hippocampal shape, hippocampal texture,and volumetric measurements).
Sørensen, Pai, Anker, Balas, Lillholm, Igel, Nielsen. Dementia Diagnosis using MRI Cortical Thickness, Shape, Texture,and Volumetry. MICCAI 2014 – Challenge on Computer-Aided Diagnosis of Dementia Based on Structural MRI Data, 2014
Bron et al. Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: TheCADDementia challenge. NeuroImage, 2015
Slide 9/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
CADDementia: Results
Slide 10/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
• Our biomarker outperforms the state-of-the-art.
• The texture score remains significant if combined withother biomarkers.
• The biomarker generalizes across patient populations.
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Deep learning
• Deep learning refers to ML architecturescomposed of multiple levels of non-lineartransformations.
• Idea: Extracting more and more abstractfeatures from input data, learning moreabstract representations.
• Representations can be learned in asupervised and/or unsupervised manner.
• Example: Convolutional neural networks(CNNs) are popular deep learningarchitectures.LeCun, Bottou, Bengio, Haffner. Gradient-based learning applied todocument recognition. Proceedings of the IEEE, 1998
Bengio. Learning Deep Architecturesfor AI. Foundations and Trends in Ma-chine Learning 2(1): 1–127, 2009
Slide 12/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Breast cancer
• Breast cancer is a frequent cause of death among women
• Screening programs halve the risk of death1 and reducemortality by 28-36 %2
• Still: 33 % of cancers are missed,3 70 % of referrals are falsepositives,4 and 25 % of cancers could have been detectedearlier5
• Goal: More accurate image-based biomarkers allowingpersonalized breast cancer screening
1Otto et al. Cancer Epidemiol Biomarkers Prev 21, 20122Broeders et al. J Med Screen 19, 20123Karssemeijer et al. Radiology 227, 20034Yankaskas et al. Am J Roentgenol 177, 20015Timmers et al. Eur J Public Health, 2012
Slide 13/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Breast density scoringBreast density is related to breast cancer risk, the risk of missingbreast cancer, and the risk of false positive referral.
input manual CNN
Current work: Better biomarkers using CNN density scores andCNN texture scoresPetersen, Nielsen, Diao, Karssemeijer, Lillholm. Breast Tissue Segmentation and Mammographic Risk Scoring Using DeepLearning. Digital Mammography / IWDM, 2014
Slide 14/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Knee cartilage segmentation
• Cartilage segmentation in knee MRI is the method of choicefor quantifying cartilage deterioration.
• Cartilage deterioration implies osteoarthritis, one of the majorreasons for work disability in the western world.
input manual CNNPrasoon, Petersen, Igel, Lauze, Dam, Nielsen. Deep Feature Learning for Knee Cartilage Segmentation Using a TriplanarConvolutional Neural Network. Medical Image Computing and Computer Assisted Intervention (MICCAI), LNCS 8150, 2013
Slide 16/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Nuclear test ban treaty verification
• CTBTO (Peparatory Commission for the ComprehensiveNuclear-Test-Ban Treaty Organisation) watches out fornuclear weapon tests
• Verification of the comprehensive nuclear-test-ban treaty
• 15 GB of data every day from world-wide sensor network
• Data analysis must be partly automatic
Tuma et al. Integrated optimization of long-range underwater signal detection, feature ex-traction, and classification for nuclear treaty monitoring. IEEE Transactions on Geoscienceand Remote Sensing, to appear
Slide 18/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Typical tasks in astronomy
Slide 20/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
“Can you group all detected objects into classes?For instance, into stars, galaxies, and other objects. . . ’ ?’
“Can you find weird objects (which look differentfrom other ones)?”
“Can you find weird objects (which look differentfrom other ones)?”
“I am interested in these distant galaxies. Can youfind more of them? Can you find the most distantones?”
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Big data
Today’s and future projects
• Sloan Digital Sky Survey (SDSS) = 50TB in total
• Large Synoptic Survey Telescope (LSST) = 30TB per night
• Low Frequency Array (LOFAR) = 250TB per night
• Square Kilometer Array (SKA) = 250TB per second
Huge amounts of photometric data, however, detailedspectroscopic follow-ups are expensive.
Slide 21/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Specific star formation rate
• Goal: Predict specific star formation rate based onphotometric data
• Approach: Machine learning (ML) augmenting standardfeatures with image texture features
Slide 22/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Texture features
Original image Scale-space Curvedness Shape Index Shape Indexweighted by
Curvedness
Slide 23/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Predicting the specific star formation rate
• ML achieves higher accuracy than physical modelling• New data structures and GPU programming allow to process
huge amounts of data
Stensbo-Smidt, Igel, Zirm, Steenstrup Pedersen. Nearest Neighbour Regression Outperforms Model-based Prediction ofSpecific Star Formation Rate. IEEE Big Data 2013, 2013
Steenstrup Pedersen, Stensbo-Smidt, Zirm, Igel. Shape Index Descriptors Applied to Texture-Based Galaxy Analysis.International Conference on Computer Vision (ICCV), 2013
Gieseke, Heinermann, Oancea, Igel. Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs. InternationalConference on Machine Learning (ICML), 2014
Slide 24/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Outline
1 Machine Learning
2 MedicineDiagnosis and Prognosis of Alzheimer’s DiseaseAnalysis of MammogramsKnee Cartilage Segmentation
3 Nuclear-Test-Ban Treaty Monitoring
4 Astronomy
5 End Titles
Slide 26/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Shark
image.diku.dk/sharkIgel, Glasmachers, Heidrich-Meisner: Shark, Journal of Machine Learning Research 9:993–996, 2008
Gold Prize at Open Source Software Wold Challenge 2011
Slide 27/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Buffer k-d trees
http://bufferkdtree.readthedocs.orgGieseke, Heinermann, Oancea, Igel. Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs. International
Conference on Machine Learning (ICML), 2014
Slide 28/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Industrial Data Analysis Service (IDAS)
Slide 29/30 — Christian Igel — Machine Learning for Science and Society — [email protected]
U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E
Thanks
Machine Learning Labhttp://image.diku.dk/MLLab
Image Sectionhttp://www.diku.dk/english/research/imagesection
Special thanks: Sune Darkner, Fabian Gieseke, Mads Nielsen,Cosmin Oancea, Akshay Pai, Kim Steenstrup Pedersen, KaiLars Polsterer, Lauge Sørensen, Kristoffer Stensbo-Smidt
Slide 30/30 — Christian Igel — Machine Learning for Science and Society — [email protected]