22
UNIVERSITY OF COPENHAGEN DEPARTMENT OF COMPUTER SCIENCE Faculty of Science Machine Learning for Science and Society Some recent results Christian Igel Department of Computer Science [email protected] Slide 1/30

Machine learning for Science and Society

Embed Size (px)

Citation preview

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Faculty of Science

Machine Learning for Science and SocietySome recent results

Christian IgelDepartment of Computer Science

[email protected]

Slide 1/30

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Why machine learning?

• Sometimes problems cannot be solved in the traditional waybecause

• the designer’s knowledge is limited, and/or• the sheer complexity and variability precludes an

accurate description.

• However, large amounts of data describing the task are oftenavailable or can be automatically obtained – and machinelearning can solve the problem.

Slide 4/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Why machine learning?

• Sometimes problems cannot be solved in the traditional waybecause

• the designer’s knowledge is limited, and/or• the sheer complexity and variability precludes an

accurate description.

• However, large amounts of data describing the task are oftenavailable or can be automatically obtained – and machinelearning can solve the problem.

Slide 5/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

Machine learning turns data into knowledge

Machine learning automates the process ofinductive inference:

1 Observe a phenomenon

2 Construct a model of that phenomenon

3 Make predictions using this model

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Alzheimer’s Disease progression

Disease progression

Cells

healthy neurons formation of A� plaques

and NFTsneuronal death

NFTs

A� plaques

MRI

structure normal size/shape normal size/shape abnormal size/shape

(atrophy)

MRI

intensity normal variation abnormal variation abnormal variation

Sca

le

Slide 7/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Hippocampal texture from structural MRI

• Image analysis of brain scans• Segmentation of brain regions

(hippocampi)• Extraction of image features

• Classification using SVMs

Sørensen, Igel, Hansen, Osler, Lauritzen, Rostrup, Nielsen. Early detection of Alzheimer’s disease using MRI hippocampaltexture. Human Brain Mapping, 2016

Slide 8/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

CADDementia Challenge

• Computer-Aided Diagnosis of Dementia based on structuralMRI data (CADDementia) competition

• Goal: Diagnosis into Alzheimer’s disease (AD), mild cognitiveimpairment (MCI), and normal controls (NC)

• No labelled training data was provided, we trained on freelyavailable data sets.

• We combined several biomarkers (cortical thicknessmeasurements, hippocampal shape, hippocampal texture,and volumetric measurements).

Sørensen, Pai, Anker, Balas, Lillholm, Igel, Nielsen. Dementia Diagnosis using MRI Cortical Thickness, Shape, Texture,and Volumetry. MICCAI 2014 – Challenge on Computer-Aided Diagnosis of Dementia Based on Structural MRI Data, 2014

Bron et al. Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: TheCADDementia challenge. NeuroImage, 2015

Slide 9/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

CADDementia: Results

Slide 10/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

• Our biomarker outperforms the state-of-the-art.

• The texture score remains significant if combined withother biomarkers.

• The biomarker generalizes across patient populations.

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Deep learning

• Deep learning refers to ML architecturescomposed of multiple levels of non-lineartransformations.

• Idea: Extracting more and more abstractfeatures from input data, learning moreabstract representations.

• Representations can be learned in asupervised and/or unsupervised manner.

• Example: Convolutional neural networks(CNNs) are popular deep learningarchitectures.LeCun, Bottou, Bengio, Haffner. Gradient-based learning applied todocument recognition. Proceedings of the IEEE, 1998

Bengio. Learning Deep Architecturesfor AI. Foundations and Trends in Ma-chine Learning 2(1): 1–127, 2009

Slide 12/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Breast cancer

• Breast cancer is a frequent cause of death among women

• Screening programs halve the risk of death1 and reducemortality by 28-36 %2

• Still: 33 % of cancers are missed,3 70 % of referrals are falsepositives,4 and 25 % of cancers could have been detectedearlier5

• Goal: More accurate image-based biomarkers allowingpersonalized breast cancer screening

1Otto et al. Cancer Epidemiol Biomarkers Prev 21, 20122Broeders et al. J Med Screen 19, 20123Karssemeijer et al. Radiology 227, 20034Yankaskas et al. Am J Roentgenol 177, 20015Timmers et al. Eur J Public Health, 2012

Slide 13/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Breast density scoringBreast density is related to breast cancer risk, the risk of missingbreast cancer, and the risk of false positive referral.

input manual CNN

Current work: Better biomarkers using CNN density scores andCNN texture scoresPetersen, Nielsen, Diao, Karssemeijer, Lillholm. Breast Tissue Segmentation and Mammographic Risk Scoring Using DeepLearning. Digital Mammography / IWDM, 2014

Slide 14/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Knee cartilage segmentation

• Cartilage segmentation in knee MRI is the method of choicefor quantifying cartilage deterioration.

• Cartilage deterioration implies osteoarthritis, one of the majorreasons for work disability in the western world.

input manual CNNPrasoon, Petersen, Igel, Lauze, Dam, Nielsen. Deep Feature Learning for Knee Cartilage Segmentation Using a TriplanarConvolutional Neural Network. Medical Image Computing and Computer Assisted Intervention (MICCAI), LNCS 8150, 2013

Slide 16/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Nuclear test ban treaty verification

• CTBTO (Peparatory Commission for the ComprehensiveNuclear-Test-Ban Treaty Organisation) watches out fornuclear weapon tests

• Verification of the comprehensive nuclear-test-ban treaty

• 15 GB of data every day from world-wide sensor network

• Data analysis must be partly automatic

Tuma et al. Integrated optimization of long-range underwater signal detection, feature ex-traction, and classification for nuclear treaty monitoring. IEEE Transactions on Geoscienceand Remote Sensing, to appear

Slide 18/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Typical tasks in astronomy

Slide 20/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

“Can you group all detected objects into classes?For instance, into stars, galaxies, and other objects. . . ’ ?’

“Can you find weird objects (which look differentfrom other ones)?”

“Can you find weird objects (which look differentfrom other ones)?”

“I am interested in these distant galaxies. Can youfind more of them? Can you find the most distantones?”

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Big data

Today’s and future projects

• Sloan Digital Sky Survey (SDSS) = 50TB in total

• Large Synoptic Survey Telescope (LSST) = 30TB per night

• Low Frequency Array (LOFAR) = 250TB per night

• Square Kilometer Array (SKA) = 250TB per second

Huge amounts of photometric data, however, detailedspectroscopic follow-ups are expensive.

Slide 21/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Specific star formation rate

• Goal: Predict specific star formation rate based onphotometric data

• Approach: Machine learning (ML) augmenting standardfeatures with image texture features

Slide 22/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Texture features

Original image Scale-space Curvedness Shape Index Shape Indexweighted by

Curvedness

Slide 23/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Predicting the specific star formation rate

• ML achieves higher accuracy than physical modelling• New data structures and GPU programming allow to process

huge amounts of data

Stensbo-Smidt, Igel, Zirm, Steenstrup Pedersen. Nearest Neighbour Regression Outperforms Model-based Prediction ofSpecific Star Formation Rate. IEEE Big Data 2013, 2013

Steenstrup Pedersen, Stensbo-Smidt, Zirm, Igel. Shape Index Descriptors Applied to Texture-Based Galaxy Analysis.International Conference on Computer Vision (ICCV), 2013

Gieseke, Heinermann, Oancea, Igel. Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs. InternationalConference on Machine Learning (ICML), 2014

Slide 24/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Outline

1 Machine Learning

2 MedicineDiagnosis and Prognosis of Alzheimer’s DiseaseAnalysis of MammogramsKnee Cartilage Segmentation

3 Nuclear-Test-Ban Treaty Monitoring

4 Astronomy

5 End Titles

Slide 26/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Shark

image.diku.dk/sharkIgel, Glasmachers, Heidrich-Meisner: Shark, Journal of Machine Learning Research 9:993–996, 2008

Gold Prize at Open Source Software Wold Challenge 2011

Slide 27/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Buffer k-d trees

http://bufferkdtree.readthedocs.orgGieseke, Heinermann, Oancea, Igel. Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs. International

Conference on Machine Learning (ICML), 2014

Slide 28/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Industrial Data Analysis Service (IDAS)

Slide 29/30 — Christian Igel — Machine Learning for Science and Society — [email protected]

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F C O M P U T E R S C I E N C E

Thanks

Machine Learning Labhttp://image.diku.dk/MLLab

Image Sectionhttp://www.diku.dk/english/research/imagesection

Special thanks: Sune Darkner, Fabian Gieseke, Mads Nielsen,Cosmin Oancea, Akshay Pai, Kim Steenstrup Pedersen, KaiLars Polsterer, Lauge Sørensen, Kristoffer Stensbo-Smidt

Slide 30/30 — Christian Igel — Machine Learning for Science and Society — [email protected]