Upload
booz-allen-hamilton
View
5.501
Download
0
Embed Size (px)
DESCRIPTION
Talk given at R Rosetta Stone meetup in NYC on 1/7/2010 about MATLAB and R. Co-authored with Harlan Harris. Video of the talk available at:http://www.vcasmo.com/video/drewconway/7211
Citation preview
MATLAB/R Dic,onary R meetup NYC January 7, 2010
Harlan Harris [email protected]
@HarlanH
Marck Vaisman [email protected]
@wahalulu
MATLAB and the MATLAB logo are registered trademarks of The Mathworks.
About MATLAB
What is MATLAB • Commercial numerical
programming language, simula,on and visualiza,on
• One million users (engineers, scien,sts, academics)
• MATrix LABoratory – specializes in matrix opera,ons
• Mathworks -‐ base & add-‐ons • Open-‐source Octave project
MATLAB History • Developed by Cleve Moler
(Math/CS Prof at UNM) in the 1970’s as a higher-‐level numerical programming language (vs. Fortran LINPACK)
• Adopted by engineers for signal processing, control modeling
• Mul,purpose programming language
Notes
• Today’s focus: Compare MATLAB & R for data analysis, contrast as programming languages
• MATLAB is Base plus many toolboxes – Base includes: descrip,ve stats, covariance and correla,on, linear and nonlinear regression
– Sta,s,cs toolbox adds: dataset and category (like data.frames and factors) arrays, more visualiza,ons, distribu,ons, ANOVA, mul,variate regression, hypothesis tests
-‐>
• Interac,ve programming: Scripts and Read-‐Evaluate-‐Print Loop
• Similar representa,ons of data – Both use vectors/arrays as the primary data structures
• Matlab is based on 2-‐D matricies; R is based on 1-‐D vectors
– Both prefer vectorized func,ons to for loops – Variables are declared dynamically
• Can do most MATLAB func,onality in R; can do most R func,onality in MATLAB.
The basics: vectors, matrices and indexing
Task
Create a row vector v = [1 2 3 4] v<-‐c(1,2,3,4)
Create a column vector v=[1;2;3;4] or v=[1 2 3 4]’ v<-‐c(1,2,3,4) Note: R does not distinguish between row and column vectors
Enter a matrix A A=[1 2 3; 4 5 6] Enter values by row: A<-‐matrix(c(1,2,3,4,5,6), nrow=2, byrow=TRUE) Enter values by column: A<-‐matrix(c(1,4,2,5,3,6), nrow=2)
Access third element of vector v v(3) v[3] or v[[3]]
Access element of matrix A A(2,3) A[2,3]
“Glue” two matrices a1 and a2, same number of rows, side by side
A=[a1 a2] A<-‐cbind(a1,a2)
“Stack” two matrices a1 and a2, same number of columns
A=[a1;a2] A<-‐rbind(a1,a2)
Reshape* matrix A, making it an m x n matrix with elements taken columnwise from A
A=reshape(A,m,n) dim(A)<-‐c(m,n)
Operators
Task
Assignment = <-‐ or =
Whole Matrix Opera,ons: Multiplication: A*B Square the matrix: A^2 Raise to power k: A^k
A %*% B A %*% A A %*% A %*% A …
Element-‐by-‐element Opera,ons:
A.*B A./B A.^k
A*B A/B A^k
Compute A-‐1B A\B A%*% solve(B)
Sums Columns of matrix: sum(A) Rows of matrix: sum(A,2)
colSums(A) rowSums(A)
Logical operators (element-‐by-‐element on vectors/matrices)
a < b, a > b, a <= b, a >= b a == b a ~= b AND: a && b
OR: a || b
XOR: xor(a,b) NOT: ~a
a < b, a > b, a <= b, a >= b a == b a != b AND: a && b (short-‐circuit) a & b (element-‐wise) OR: a || b a | b XOR: xor(a,b) NOT: !a
Working with data structures
Task
Build a structure v of length n, capable of containing different data types in different elements. MATLAB: cell array R: list
v=cell(1,n) In general, cell(m,n) makes an m × n cell array. Then you can do e.g.: v{1}=12 v{2}=’hi there’ v{3}=rand(3)
v<-‐vector(’list’,n) Then you can do e.g.: v[[1]]<-‐12 v[[2]]<-‐’hi there’ v[[3]]<-‐matrix(runif(9),3)
Create a matrix-‐like object with different named columns. MATLAB: struct array R: data.frame
avals=2*ones(1,6); yvals=6:-‐1:1; v=[1 5 3 2 3 7]; d=struct(’a’, avals, ’yy’, yyvals, ’fac’, v);
v<-‐c(1,5,3,2,3,7) d<-‐data.frame(cbind(a=2, yy=6:1), v)
Condi,onals, control structures, loops
Task
for loops over values in vector v
for i=v command1 command2
end
If only one command: for (i in v) command
If multiple commands: for (i in v) { command1 command2
}
If/else statement if cond command1 command2
else command3 command4
end
MATLAB also has the elseif statement.
if (cond) { command1 command2
} else { command3 command4
}
R uses chained “else if” statements.
ifelse() func,on > print(ifelse(c(T,F), 2, 3)) [1] 2 3
Help!
Task
Get help on a func,on help fminsearch help(pmin) or ?pmin
Search the help for a word lookfor inverse ??inverse
Describe a variable class(a) class(a) str(a)
Show variables in environment who ls()
Underlying type of variable whos(‘a’) typeof(a)
Example: k-‐means clustering of Fisher Iris data Fisher Iris Dataset sepal_length,sepal_width,petal_length,petal_width,species 5.1,3.5,1.4,0.2,setosa 4.9,3.0,1.4,0.2,setosa 4.7,3.2,1.3,0.2,setosa 4.6,3.1,1.5,0.2,setosa …
Matlab and R as programming languages
Scrip,ng, real-‐,me analysis Scrip,ng, real-‐,me analysis
File-‐based environments Files unimportant
Impera,ve programming style Func,onal programming style (impure)
Sta,cally scoped Dynamically scoped
Func,ons with mul,ple return values Func,ons with named arguments, lazy evalua,on
Evolving OOP system Mul,ple compe,ng OOP systems
Can be compiled Cannot be compiled
Large library of func,ons Professional developed, cost money
Large library of func,ons Varying quality and support
Can embed (in) many other languages Can embed (in) many other languages
Func,ons
function [a, b] = minmax(z) % one function per .m file! % assign to formal return names a = min(z) b = max(z) end
% if minmax.m in path [smallest, largest] = …
minmax([1 30 3])
minmax <-‐ function(c, opt=12) { # functions are assigned to # variables ret <-‐ list(min = min(z), max = max(z)) ret # last statement is # return value }
# if minmax was created in current # environment x <-‐ minmax(c(1, 30, 3)) smallest <-‐ x$min
Object-‐Oriented Programming
• Formerly: objects were defined by a directory tree, with one method per file
• As of 2008: new classdef syntax resembles other languages
• S3 classes: anributes + syntax – class(object) – plot.lm()
• S4 classes: defini,ons + methods
• R.oo, proto, etc…
Other notes
• r.matlab package
• Graphics – Matlab has much bener 3-‐d/interac,ve graphics support – R has ggplot2 and much bener sta,s,cal graphics
Addi,onal Resources
• Will Dwinell, Data Mining in MATLAB • Computerworld ar,cle on Cleve Moler • Mathworks • Matlabcentral • Comparison of Data Analysis packages (hnp://anyall.org/blog/2009/02/comparison-‐of-‐data-‐analysis-‐packages-‐r-‐matlab-‐scipy-‐excel-‐sas-‐spss-‐stata/)
• R.matlab package • stackoverflow
References used for this talk
• David Hiebeler MATLAB/R Reference document: hnp://www.math.umaine.edu/~hiebeler/comp/matlabR.html
• hnp://www.cyclismo.org/tutorial/R/index.html • hnp://www.stat.berkeley.edu/~spector/R.pdf • MATLAB documenta,on
• hnp://www.r-‐cookbook.com/node/23
Thank You!