44
Big Data and Machine Learning Klaus-Robert Müller et al.

Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Big Data and Machine Learning

Klaus-Robert Müller et al.

Page 2: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Some Remarks

• Machine Learning

• small data (expensive!)

• big data

• big data in neuroscience: BCI et al.

• social media data

• physics & materials

Page 3: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Toward Brain Computer Interfacing

Klaus-Robert Müller, Siamac Fazli, Jan Mehnert, Stefan Haufe, Frank

Meinecke, Paul von Bünau, Franz Kiraly, Felix Biessmann, Sven Dähne,

Johannes Höhne, Michael Tangermann, Carmen Vidaure, Gabriel Curio,

Benjamin Blankertz et al.

Page 4: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Invasive BCI at it’s best

[From Schwartz]

Remark: 24*1000*

3600*30000 ~ 2tb/day

Page 5: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Noninvasive Brain-Computer Interface

DECODING

Page 6: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

BCI for communcation

Page 7: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

‚Brain Pong‘ with BBCI

Remark: 3*100*

3600*1000 ~ 1-2Gb/Experiment

Page 8: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

BBCI paradigms

- healthy subjects untrained for BCI

A: training <10min: right/left hand imagined movements

→ infer the respective brain acivities (ML & SP)

B: online feedback session

Leitmotiv: ›let the machines learn‹

Page 9: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Machine learning approach to BCI: infer prototypical pattern

Inference by CSP Algorithm

Page 10: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

The cerebral cocktail party problem

• use ICA/NGCA

projections for artifact

and noise removal

• feature extraction and

selection

[cf. Ziehe et al. 2000, Blanchard et al. 2006]

Page 11: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

BBCI Set-up

Artifact removal

[cf. Müller et al. 2001, 2007, 2008, Dornhege et al. 2003, 2007, Blankertz et al. 2004, 2005, 2006, 2007, 2008]

Page 12: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Shifting distributions within experiment

Page 13: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

20

Correlating apples and oranges

[Biessmann et al. Neuroimage 2012, Machine Learning 2010]

Page 14: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics
Page 15: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Temporal Dynamics of Web Data

Page 16: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Motivation

[Biessmann et al, 2012, and submitted]

Page 17: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Canonical Trend Analysis for Social Networks

Page 18: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Data Extraction

Page 19: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Data Extraction: Retweet Location

Page 20: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Mean Location of Reweeted News Articles

Page 21: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Downsampling of Geographic Information

Page 22: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Canonical Trend Model

Page 23: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Why projecting on canonical subspace

Recent development: tkCCA allows to optimally and nonlinearly correlate over time

[Biessmann et al 2010]

Page 24: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Canonical Trend Analysis

Page 25: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Canonical Trend Analysis

Page 26: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Efficient Computation of Canonical Trends

[Schölkopf, Smola & Müller 98, Boser, Gyon, Vapnik, 92]

Page 27: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Efficient Computation of Canonical Trends

Page 28: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Efficient Computation of Canonical Trends

Page 29: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Comparisons: Mean, PCA and Canonical Trends

Page 30: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Comparisons: Mean, PCA and Canonical Trends

Page 31: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Comparisons: Mean, PCA and Canonical Trends

Page 32: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Comparisons: Mean, PCA and Canonical Trends

Page 33: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Canonical Convolution

Page 34: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Spatiotemporal Analysis of Retweets of News

Page 35: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

53

And now for something completely different

[Montavon et al 13, Rupp et al 2012 ….]

Page 36: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

ML4Physics @ IPAM 2011

Klaus-Robert Müller, Matthias Rupp

Anatole von Lilienfeld and Alexandre Tkachenko et al

Page 37: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Machine Learning for chemical compound space

Ansatz:

instead of

[from von Lilienfeld]

Page 38: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Ansatz:

– Provide same information to ML as to SE:

• XYZ-file

• cast data similarly as in the SE:

– Unique and continuous in all of CCS

– Translationally, rotationally, permutationally invariant

– Symmetrical atoms contribute equally

→ ``Coulomb'' Matrix [energy]

→ fill up with zeros for smaller molecules

→ diagonalize OR sort rows according to their norm

→ measure distance between molecules:

Machine Learning for chemical compound space

[from von Lilienfeld]

Page 39: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Coulomb representation of molecules

2.4

iii Z=M

ji

ji

ijRR

ZZ=M

{Z1,R

1}

{Z2,R

2}

{Z3,R

3}

{0,R22}{0,R

21} {0,R23}

+ phantom atoms

{Z4,R

4}

...

Coulomb Matrix (Rupp12)

ijM

2323 M

Page 40: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Kernel ridge regression

Distances between M define Gaussian kernel matrix K

Predict energy as sum over weighted Gaussians

using weights that minimize error in training set

Exact solution

As many parameters as molecules + 2 global parameters, characteristic length-scale or kT of system (σ), and noise-level (λ)

[from von Lilienfeld]

Page 41: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

GDB-13 database of all organic molecules (within stability & synthetic constraints) of 13 heavy atoms or less: 0.9B compounds

Blum & Reymond, JACS (2009)

The data

[from von Lilienfeld]

Page 42: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Results

March 2012

Rupp et al., PRL

9.99 kcal/mol

(kernels + eigenspectrum)

December 2012

Montavon et al., NIPS

3.51 kcal/mol

(deep Neural nets + Coulomb sets)

More fun is yet to come...

Prediction considered chemically

accurate when MAE is below 1

kcal/mol

Dataset available at http://quantum-machine.org

Page 43: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics

Conclusion

• Machine Learning is a versatile and ready to use tool for data

analysis

• small data vs. big data

• fields of ML & Data Bases will hit a limit in near future

• time for a new marriage

Page 44: Big Data and Machine Learning - TU Berlin...Some Remarks •Machine Learning •small data (expensive!) •big data •big data in neuroscience: BCI et al. •social media data •physics