Data transformation

Preview:

Citation preview

Data TransformationSummer Data Jam

Chris Orwa14th July 2015

Principal Component Analysis

Principal component analysis (PCA) is a technique used

to emphasize variation and bring out strong patterns in a

dataset. It's often used to make data easy to explore and

visualize.

Statistically, PCA is the eigenvectors of a covariance

matrix.

Let us Look at Some Concepts

Covariance

The covariance of two variables x and y in a data sample

measures how the variance of two attributes are related.

R codeduration = faithful$eruptions

waiting = faithful$waiting

cov(duration, waiting)

Covariance Matrix

Eigen Vectors

Eigenvector is a vector of a square matrix that points in a

direction invariant under the associated linear

transformation.

R codeB <- matrix(1:9, 3)

eigen(B)

Principal Component Analysis

R Code#load dataa = read.csv(‘my_data.csv') #perform PCAc = prcomp(a)

Recommended