6
Data Transformation Summer Data Jam Chris Orwa 14 th July 2015

Data transformation

Embed Size (px)

Citation preview

Page 1: Data transformation

Data TransformationSummer Data Jam

Chris Orwa14th July 2015

Page 2: Data transformation

Principal Component Analysis

Principal component analysis (PCA) is a technique used

to emphasize variation and bring out strong patterns in a

dataset. It's often used to make data easy to explore and

visualize.

Statistically, PCA is the eigenvectors of a covariance

matrix.

Page 3: Data transformation

Let us Look at Some Concepts

Covariance

The covariance of two variables x and y in a data sample

measures how the variance of two attributes are related.

R codeduration = faithful$eruptions

waiting = faithful$waiting

cov(duration, waiting)

Page 4: Data transformation

Covariance Matrix

Page 5: Data transformation

Eigen Vectors

Eigenvector is a vector of a square matrix that points in a

direction invariant under the associated linear

transformation.

R codeB <- matrix(1:9, 3)

eigen(B)

Page 6: Data transformation

Principal Component Analysis

R Code#load dataa = read.csv(‘my_data.csv') #perform PCAc = prcomp(a)