110
Yeast Cell Cycles, Freq. 2 Proj. PCA on Freq. 2 Periodic Componen t Of Data

Yeast Cell Cycles, Freq. 2 Proj

  • Upload
    lilly

  • View
    20

  • Download
    1

Embed Size (px)

DESCRIPTION

Yeast Cell Cycles, Freq. 2 Proj. PCA on Freq. 2 Periodic Component Of Data. Source Batch Adj: Source Colors. Source Batch Adj: PC 1-3 & DWD direction. Source Batch Adj: DWD Source Adjustment. NCI 60: Raw Data, Platform Colored. NCI 60: Fully Adjusted Data, Platform Colored. - PowerPoint PPT Presentation

Citation preview

Title

Yeast Cell Cycles, Freq. 2 Proj.

PCA onFreq. 2PeriodicComponent Of Data1Source Batch Adj: Source Colors

2Source Batch Adj: PC 1-3 & DWD direction

3Source Batch Adj: DWD Source Adjustment

4NCI 60: Raw Data, Platform Colored

5NCI 60: Fully Adjusted Data, Platform Colored

6Matlab SoftwareWant to try similar analyses?

Matlab Available from UNC Site License

Download Software:

Google Marron Software

7Matlab SoftwareChoose

8

Matlab SoftwareDownload .zip File, & Expand to 3 Directories9

Matlab SoftwarePut these in Matlab Path10

Matlab SoftwarePut these in Matlab Path11Matlab BasicsMatlab has Modalities:

Interpreted (Type Commands & Run Individually)

Batch (Run Script Files = Command Sets)12Matlab BasicsMatlab in Interpreted Mode:

13Matlab BasicsMatlab in Interpreted Mode:

14Matlab BasicsMatlab in Interpreted Mode:

15Matlab BasicsMatlab in Interpreted Mode:

16Matlab BasicsMatlab in Interpreted Mode:

17Matlab BasicsMatlab in Interpreted Mode:

18Matlab BasicsMatlab in Interpreted Mode:

For description of a function:

>> help [function name]

19Matlab BasicsMatlab in Interpreted Mode:

20Matlab BasicsMatlab in Interpreted Mode:

To Find Functions:>> help [category name]

e.g.

>> help stats21Matlab BasicsMatlab in Interpreted Mode:

22Matlab BasicsMatlab has Modalities:

Interpreted (Type Commands)

Batch (Run Script Files)

For Serious Scientific Computing:Always Run Scripts23Matlab BasicsMatlab Script File: Just a List of Matlab Commands Matlab Executes Them in Order

Why Bother (Why Not Just Type Commands)?Reproducibility(Can Find Mistakes & Use Again Much Later)24Matlab Script FilesAn Example:

Recall Brushing Analysis of Next Generation Sequencing Data

25Simple1stView:CurveOverlay

(log scale)

Functional Data Analysis26

Often UsefulPopulationView:

PCAScoresFunctional Data Analysis27

SuggestionOfClusters???Functional Data Analysis28

SuggestionOfClusters

Which AreThese?Functional Data Analysis29

ManuallyBrushClustersFunctional Data Analysis30

ManuallyBrushClusters

ClearAlternateSplicingFunctional Data Analysis31Matlab Script FilesAn Example:

Recall Brushing Analysis of Next Generation Sequencing Data

Analysis In Script File:VisualizeNextGen2011.m

Matlab Script File Suffix32Matlab Script FilesAn Example:

Recall Brushing Analysis of Next Generation Sequencing Data

Analysis In Script File:VisualizeNextGen2011.m

Matlab Script File SuffixOn Course Web Page33Matlab Script FilesString of Text

34Matlab Script FilesCommand to Display String to Screen

35Matlab Script FilesNotes About Data (Maximizes Reproducibility)

36Matlab Script FilesHave Index for Each Part of Analysis

37Matlab Script FilesSo Keep Everything Done (Maxs Reprodity)

38Matlab Script FilesNote Some Are Graphics Shown (Can Repeat)

39Matlab Script FilesSet Graphics to Default

40Matlab Script FilesPut Different Program Parts in IF-Block

41Matlab Script FilesComment Out Currently Unused Commands

42Matlab Script FilesRead Data from Excel File

43Matlab Script FilesFor Generic Functional Data Analysis:

44Matlab Script FilesInput Data Matrix

45Matlab Script FilesStructure, with Other Settings

46Matlab Script FilesMake Scores Scatterplot

47Matlab Script FilesUses Careful Choice of Color Matrix

48Matlab Script FilesStart with PCA

49Matlab Script FilesThen Create Color Matrix

50Matlab Script FilesBlack Red Blue

51Matlab Script FilesRun Script Using Filename as a Command

52Cornea DataMain Point: OODA Beyond FDA

Recall Interplay:

Object Space Descriptor Space53Cornea DataCornea: Outer surface of the eyeDriver of Vision: Curvature of Cornea

Data Objects: Images on the unit diskRadial Curvature as Heat Map

Special Thanks to K. L. Cohen, N. Tripoli,UNC Ophthalmology54Cornea DataCornea Data:

Raw Data

DecomposeIntoModes ofVariation?

55Cornea DataReference: Locantore, et al (1999)

Visualization (generally true for images):More challenging than for curves (since cant overlay)Instead view sequence of imagesHarder to see population structure (than for curves)So PCA type decomposition of variation is more important 56Cornea DataNature of images (on the unit disk, not usual rectangle)Color is curvatureAlong radii of circle (direction with most effect on vision)Hotter (red, yellow) for more curvatureCooler (blue, green) for less curvatureFeature vec. is coeffs of Zernike expansionZernike basis: ~ Fourier basis, on diskConveniently represented in polar coords 57Cornea DataData Representation - Zernike BasisPixels as features is large and wastefulNatural to find more efficient representnPolar Coordinate Tensor Product of:Fourier basis (angular)Special Jacobi (radial, to avoid singularities)See:Schwiegerling, Greivenkamp & Miller (1995) Born & Wolf (1980) 58Cornea DataData Representation - Zernike Basis

Descriptor Space is Vector Space of Zernike Coefficients

So Perform PCA There59PCA of Cornea DataRecall: PCA can find (often insightful) direction of greatest variability

Main problem: display of result (no overlays for images)

Solution: show movie of marching along the direction vector60PCA of Cornea DataPC1 Movie:

61PCA of Cornea DataPC1 Summary:Mean (1st image): mild vertl astigmatismknown popn structure called with the ruleMain dirn: more curved & less curvedCorresponds to first optometric measure(89% of variatn, in Mean Resid. SS sense)Also: stronger astigm & no astigmFound corrn between astigm and curvreScores (blue): Apparent Gaussian distn62PCA of Cornea DataPC2 Movie:

63PCA of Cornea DataPC2 Movie:

Mean: same as aboveCommon centerpoint of point cloudAre studying directions from mean

Images along direction vector:Looks terrible???Why? 64PCA of Cornea DataPC2 Movie:Reason made clear in Scores Plot (blue): Single outlying data object drives PC dirnA known problem with PCARecall finds direction with max variationIn sense of varianceEasily dominated by single large observatn 65PCA of Cornea DataToy Example: Single Outlier Driving PCA

66PCA of Cornea DataPC2 Affected by Outlier:How bad is this problem?

View 1: Statistician: Arrggghh!!!!Outliers are very dangerousCan give arbitrary and meaningless dirns67PCA of Cornea DataPC2 Affected by Outlier:How bad is this problem?View 2: Ophthalmologist: No ProblemDriven by edge effects (see raw data)Artifact of light reflection data gathering (eyelid blocking, and drying effects)Routinely visually ignore those anywayFound interesting (& well known) dirn:steeper superior vs steeper inferior 68Cornea DataCornea Data:

Raw Data

Which oneis the outlier?

Will say more later

69PCA of Cornea DataPC3 Movie

70PCA of Cornea DataPC3 Movie (ophthalmologists view):Edge Effect Outlier is presentBut focusing on central region shows changing dirn of astigm (3% of MR SS)with the rule (vertical) vs. against the rule (horizontal)most astigmatism is with the rulemost of rest is against the rule(known folklore) 71PCA of Cornea DataPC4 movie

72PCA of Cornea DataContinue with ophthalmologists viewPC4 movie version: Other direction of astigmatism???Location (i.e. registration) effect???Harder to interpret OK, since only 1.7% of MR SSSubstantially less than for PC2 & PC3 73PCA of Cornea DataOphthalmologists View (cont.)

Overall Impressions / Conclusions:Useful decomposition of population variationUseful insight into population structure74PCA of Cornea Data Now return to Statisticians View:How can we handle these outliers?Even though not fatal here, can be for other examplesRecall Simple Toy Example (in 2d):

75Outliers in PCADeeper Toy Example:

76Outliers in PCADeeper Toy Example:Why is that an outlier?Never leaves range of other dataBut Euclidean distance to others very largerelative to other distancesAlso major difference in terms of shapeAnd even smoothnessImportant lesson: many directions in

77Outliers in PCAMuch likeearlierParabolasExample

But with1 outlierthrown in

78Outliers in PCAPCA for DeeperToy E.g.Data:

79Outliers in PCADeeper Toy Example:At first glance, mean and PC1 look similar to no outlier versionPC2 clearly driven completely by outlierPC2 scores plot (on right) gives clear outlier diagnosticOutlier does not appear in other directionsPrevious PC2, now appears as PC3Total Power (upper right plot) now spread farther80Outliers in PCACloser Look at Deeper Toy Example:

Mean influenced a little, by the outlierAppearance of corners at every other coordinate

PC1 substantially influenced by the outlierClear wiggles81Outliers in PCAWhat can (should?) be done about outliers?Context 1: Outliers are important aspects of the populationThey need to be highlighted in the analysisAlthough could separate into subpopulationsContext 2: Outliers are bad data, of no interestrecording errors? Other mistakes?Then should avoid distorted view of PCA 82Outliers in PCAStandard Statistical Approaches to Dealing with Outliers:Outlier Deletion: Kick out bad dataRobust Statistical methods:Work with full data set, but downweight bad dataReduce influence, instead of deleting 83Outliers in PCAExample Cornea Data:Can find PC2 outlier (by looking through data (careful!))Problem: after removal, another point dominates PC2Could delete that, but then another appearsAfter 4th step have eliminated 10% of data (n = 43) 84Outliers in PCAExample Cornea Data

85Outliers in PCAMotivates alternate approach:Robust Statistical MethodsRecall main idea:Downweight (instead of delete) outliers a large literature. Good intros(from different viewpoints) are: Huber (1981) Hampel, et al (1986)Staudte & Sheather (1990)

86Outliers in PCASimple robustness concept: breakdown pointhow much of data moved to will destroy estimate?Usual mean has breakdown 0Median has breakdown (best possible)Conclude:Median much more robust than meanMedian uses all dataMedian gets good breakdown fromequal vote

87Outliers in PCAMean has breakdown 0

Single OutlierPulls Mean Outsiderange of data

88Outliers in PCAControversy: Is medians equal vote scheme good or bad?Huber: Outliers contain some information,So should only control influence (e.g. median)Hampel, et. al.: Outliers contain no useful informationShould be assigned weight 0 (not done by median)Using proper robust method (not simply deleted) 89Outliers in PCARobustness Controversy (cont.):Both are right (depending on context)Source of major (unfortunately bitter) debate!Application to Cornea data:Hubers model more sensibleAlready know some useful info in each data pointThus median type methods are sensible

90Robust PCAWhat is multivariate median?There are several! (median generalizes in different ways)Coordinate-wise median Often worst Not rotation invariant(2-d data uniform on L)Can lie on convex hull of data(same example)Thus poor notion of center

91Robust PCACoordinate-wise median Not rotation invariant

Thus poor notion of center

92Robust PCACoordinate-wise median Can lie on convex hull of data

Thus poor notion of center

93Robust PCAWhat is multivariate median (cont.)?ii.Simplicial depth (a. k. a. data depth): Liu (1990)Paint Thickness of dim simplices with corners at dataNice ideaGood invariance propertiesSlow to compute

94Robust PCAWhat is multivariate median (cont.)?iii.Hubers M-estimate:Given data , Estimate center of population by

Where is the usual Euclidean normHere: use only (minimal impact by outliers)

95Robust PCAHubers M-estimate (cont):Estimate center of population by

Case :Can show (sample mean)(also called Frchet Mean)Here: use only (minimal impact by outliers)

96Robust PCA M-estimate (cont.):A view of minimizer: solution of

A useful viewpoint is based on: = Projn of data onto sphere centd at with radius And representation:

97Robust PCA M-estimate (cont.):Thus the solution of

is the solution of:

So is location where projected data are centered

Slide sphere around until mean (of projected data) is at center

98Robust PCA M-estimate (cont.):Slide sphere around until mean (of projected data) is at center

99Robust PCA M-estimate (cont.):Additional literature:Called geometric median (long before Huber) by: Haldane (1948)Shown unique for by: Milasevic and Ducharme (1987) Useful iterative algorithm: Gower (1974)(see also Sec. 3.2 of Huber).Cornea Data experience: works well for

100Robust PCA M-estimate for Cornea Data:

Sample Mean M-estimateDefinite improvementBut outliers still have some influenceImprovement? (will suggest one soon)

101Robust PCANow have robust measure of center, how about spread?

I.e. how can we do robust PCA?

102Robust PCANow have robust measure of center, how about spread?

I.e. how can we do robust PCA?

103Robust PCAApproaches to Robust PCA:Robust Estimation of Covariance MatrixProjection PursuitSpherical PCA104Robust PCARobust PCA 1: Robust Estimation of Covariance MatrixA. Component-wise Robust Covariances:Major problem: Hard to get non-negative definitenessMinimum Volume Ellipsoid: Rousseeuw & Leroy (2005) Requires (in available software)Needed for simple definition of affine invariant

105Important Aside 106Important Aside Classical Approach to HDLSS data: Dont have enough data for analysis, get more

Unworkable (and getting worse) for many modern settings:

Medical Imaging (e.g. Cornea Data)Micro-arrays & gene expressionChemometric spectra data107Robust PCARobust PCA 2: Projection PursuitIdea: focus onfinding direction of greatest variabilityReference: Li and Chen (1985)

Problems: Robust estimates of spread are nonlinearResults in many local optima

108Robust PCA109Robust PCARobust PCA 3:

Spherical PCA

110