22
1 © 2014 The MathWorks, Inc. BIG DATA: Data Analytics with MATLAB Christophe POUILLOT Senior Consultant MathWorks [email protected]

BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

1© 2014 The MathWorks, Inc.

BIG DATA: Data Analytics with MATLAB

Christophe POUILLOT

Senior Consultant

MathWorks

[email protected]

Page 2: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

2

Definition of Big Data

Name Symbol Value

gigaoctet Go 109

téraoctet To 1012

pétaoctet Po 1015

exaoctet Eo 1018

zettaoctet Zo 1021

yottaoctet Yo 1024

Worldwide: 2,8Zo/year (2012)

Social network: xxTo/day

Data “so large and complex that it becomes difficult to process using on-hand

database management tools or traditional data processing applications.” from wikipedia

large complex

Data structured or not from different sources: Web/Text/Image mining

Volume

Velocity

Variety

Page 3: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

3

Data containers

Collection of files

Databases

Huge single file

Page 4: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

4

Big Data Analytics with MATLAB

Memory and Data Access

64-bit processors

Memory Mapped Variables

Disk Variables

Databases

Datastores

Platforms

Desktop (Multicore, GPU)

Clusters

Cloud Computing (MDCS on EC2)

Hadoop

Programming Constructs

Streaming

Block Processing

Parallel-for loops

GPU Arrays

SPMD and Distributed Arrays

MapReduce

Analysis

Machine learning

Analysis domain

Statistics

Page 5: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

5

Agenda

Machine Learning

Datastore/MapReduce

Integration with Hadoop

Databases

Huge file

Deployment

Key takeaways

Page 6: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

6

Overview – Machine Learning

Machine

Learning

Supervised

Learning

Classification

Regression

Unsupervised

LearningClustering

Group and interpretdata based only

on input data

Develop predictivemodel based on bothinput and output data

Type of Learning Categories of Algorithms

Page 7: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

7

Demo: Machine Learning

Page 8: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

8

Agenda

Machine Learning

Datastore/MapReduce

Integration with Hadoop

Databases

Huge file

Deployment

Key takeaways

Page 9: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

9

DataStore

datastoreImport text files & collections of text files

that don’t fit into memory

ds = datastore('file1.mat');

ds = datastore('*.csv');

ds = datastore('/shared/data_repository/');

ds = datastore('hdfs://myserver:7867/data/file1.txt');

ds = datastore({'/shared01/','/shared02/'});

while hasdata(ds)

T = read(ds);

end

Page 10: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

10

1503 UA LAX -5 -10 2356

540 PS BUR 13 5 186

1920 DL BOS 10 32 1876

1840 DL SFO 0 13 568

272 US BWI 4 -2 359

784 PS SEA 7 3 176

796 PS LAX -2 2 237

1525 UA SFO 3 -5 1867

632 PS SJC 2 -4 245

1610 UA MIA 60 34 1365

2032 DL EWR 10 16 789

2134 DL DFW -2 6 914

1503 UA LAX -5 -10 2356

540 PS BUR 13 5 186

1920 DL BOS 10 32 1876

1840 DL SFO 0 13 568

272 US BWI 4 -2 359

784 PS SEA 7 3 176

796 PS LAX -2 2 237

1525 UA SFO 3 -5 1867

632 US SJC 2 -4 245

1610 UA MIA 60 34 1365

2032 DL EWR 10 16 789

2134 DL DFW -2 6 914

UA

PS

DL

DL

2356

186

1876

568

US

PS

PS

UA

US

UA

DL

DL

245

1365

789

914

359

176

237

1867

UA 2356

PS 186

PS 237

UA 1867

UA 1365

DL 1876

DL 914

US 359

US 245

Data Store Map Reduce

Demo mapreduce

Page 11: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

11

Demo: datastore/mapreduce

Page 12: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

12

Agenda

Machine Learning

Datastore/MapReduce

Integration with Hadoop

Databases

Huge file

Deployment

Key takeaways

Page 13: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

13

Datastore

Integration with Hadoop

map.m

reduce.m

main.m

HDFS

Node Data

MATLAB

Node Data

Node Data

Map Reduce

Map Reduce

Map Reduce

Page 14: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

14

Agenda

Machine Learning

Datastore/MapReduce

Integration with Hadoop

Databases

Huge file

Deployment

Key takeaways

Page 15: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

15

Databases

Relational database (ODBC/JDBC-compliant)

NOSQL database

DatabaseDatastore (DataBase Toolbox)

conn = database.ODBCConnection('MySQL','username','pwd');

dbds = datastore(conn, 'select * from productTable');

MATLAB calls external functions: C/C++ shared libraries JAVA libraries .NET libraries COM Objects (ActiveX…) Python libraries WSDL Web Service

Page 16: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

16

Agenda

Machine Learning

Datastore/MapReduce

Integration with Hadoop

Databases

Huge file

Deployment

Key takeaways

Page 17: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

17

Huge flat files: Mapping memory within MATLAB

matfile: mat files

m = matfile('myFile.mat');

z = m.x(85:94,85:94); % read from disk

m.x(81:100,81:100) = magic(20); % write on disk

memmapfile: any files

m = memmapfile('records.dat','Offset',9000, 'Format','int32');

z = m.Data(85:94,85:94); % read from disk

m.Data(81:100,81:100) = magic(20); % write on disk

Read/write variables in files, without loading into memory

Page 18: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

18

Agenda

Machine Learning

Datastore/MapReduce

Integration with Hadoop

Databases

Huge file

Deployment

Key takeaways

Page 19: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

19

Deployment: Compiling MATLAB to go everywhere

.exe Java

.lib

.dll

MATLAB

Java

Analytic

Page 20: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

20

Agenda

Machine Learning

Datastore/MapReduce

Integration with Hadoop

Databases

Huge file

Deployment

Key takeaways

Page 21: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

21

Key takeaways

MATLAB is the framework for BIG DATA analytics

Training services

Consulting services

http://www.mathworks.fr/discovery/big-data-matlab.html?s_tid=gn_loc_drop

MathWorks services can help you:

Page 22: BIG DATA: Data Analytics with MATLAB€¦ · 4 Big Data Analytics with MATLAB Memory and Data Access 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores

22

Questions?