24
R documentation of package SCUBA January 21, 2011

of package SCUBA - University of Bathmasas/scuba/SCUBA_manual_1.0.pdf · 2011. 1. 21. · Common tools for overcoming this problem are Spectral Clustering, Multidimensional Scaling

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

  • R documentationof package SCUBA

    January 21, 2011

  • Contents

    Background 3

    Information about SCUBA 6

    SCUBA-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Description of implemented functions 8

    displayKMeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    displaySpectClust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    plotEigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    PlotOutData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    plotPairsEigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    saveAllFilesForRun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    scuba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    spectClust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    ZoomArea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Index 23

    2

  • Background

    The analysis of data from a micro-array experiment is a major problem in countless biological

    situations. For example, a common need is to cluster cancer samples using gene expression profiles (see,

    for example, Chapter 9 of Wit and McClure [2004]). Typically, gene expression values are recorded in

    a matrix (table) W whose elements are non-negative numbers and the entry in the i-th row and j-th

    column of W , say Wij , represents the activity of gene i in sample j. Given the information in W ,

    one might want to classify (or cluster) the samples, or the genes, according to how “similar” they are

    to each other, where similarity is measured in terms of the activity of the genes across the samples.

    There are many elements in W because of the number of genes usually involved in such analyses. As a

    consequence, manipulating, visualising or clustering of micro-array data becomes a very hard problem.

    Common tools for overcoming this problem are Spectral Clustering, Multidimensional Scaling (MDS)

    and Principal Component Analysis (PCA), with the main idea of all these methods being the dimension

    reduction of the data.

    Our software is based on Spectral Clustering (c.f. Higham et al. [2005], von Luxburg [2007]) and

    we shall give a few more details and references for this technique in the next paragraph. However, it is

    well known that micro-array data is subject to noise, and hence it is obviously important to know how

    this noise affects the classifications produced by Spectral Clustering. The program SCUBA (Spectral

    Clustering Uncertainty via Bayesian Analysis) attempts to provide the user with a measure of how

    the uncertainty in the micro-array data affects the Spectral Clustering using techniques from Bayesian

    Statistics.

    The algorithm of Spectral Clustering applied to a gene-sample data matrix, as described in Higham

    et al. [2005] and Higham et al. [2007], is based on the Singular Value Decomposition (SVD) of a scaled

    version of the gene-sample matrix W . Roughly speaking, the idea behind the algorithm is to represent

    the data in a low-dimensional space in such a way that the Euclidean distances in low dimensions are

    inversely related to the similarities derived from the data matrix W . This is achieved via the coordi-

    nates of the singular vectors of the scaled version of W . In this respect, Spectral Clustering resembles

    very closely MDS and PCA in that all three methods use the spectral decomposition (or the SVD) of

    a distance or a similarity matrix associated with the data. Thus, all three methods provide low-rank

    3

  • 4 Background

    approximations of such a distance (or similarity) matrix derived from W . (In fact, it can be proved that

    Spectral Clustering is mathematically equivalent to a scaled version of MDS (c.f. Bavaud [2006]).)

    In Oh and Raftery [2001] the idea of using Markov Chain Monte Carlo (MCMC) simulations

    together with MDS was introduced. The aim of their approach was, given pairwise distances between

    some data, to take into account possible uncertainties, or inaccuracies, in the measurement of these

    distances and, based on this, to find an optimal low-dimensional configuration of the data. SCUBA

    utilizes some of these ideas (c.f. Hurn et al. [2011]) and applies MCMC techniques to Spectral Clustering

    in order to provide a tool for visualising the effects of the noise in gene-expression data. As an output,

    SCUBA plots a low-dimensional configuration of points, obtained by applying Spectral Clustering to the

    gene-expression data, as well as “clouds” around each point. These “clouds” are a visual representation

    of the uncertainty in Spectral Clustering induced by the noise in micro-array data.

  • Bibliography

    F. Bavaud. Spectral clustering and multidimensional scaling: a unified view. Proccedings of the IFCS

    2006 Conference: “Data Science and Classification”, Ljubljana, Slovenia, July 25 - 29, 2006, 2006.

    D. J. Higham, G. Kalna, and J. K. Vass. Analysis of the singular value decomposition as a tool for

    processing microarray expression data. Algoritmy, pages 250–259, 2005.

    D. J. Higham, G. Kalna, and M. Kibble. Spectral clustering and its use in bioinformatics. Journal of

    Computational and Applied Mathematics, 204(1), 2007.

    M. A. Hurn, S. Shaw, A. Spence, and Z. Stoyanov. Spectral clustering uncertainty via bayesian analysis

    (scuba): Report. University of Bath, 2011.

    Man-Suk Oh and A. E. Raftery. Bayesian multidimensional scaling and choice of dimension. Journal

    of the American Statistical Association, 96(455):1031–1044, 2001.

    U. von Luxburg. A tutorial on spectral clustering. Stat. Comput., 17:395–416, 2007.

    E. Wit and J. McClure. Statistics for Microarrays. Wiley, 2004.

    5

  • Information about SCUBA

    SCUBA-package Spectral Clustering Uncertainty via Bayesiand Analysis

    Description

    The package provides functionalities for performing an uncertainty analysis of Spectral Clustering

    on a data matrix. It implements various plot functions for representing the results.

    Details

    Package: SCUBA

    Type: Package

    Version: 1.0

    Date: 2011-01-20

    License: GPL-3

    The main function is scuba. This function applies MCMC techniques to Spectral Clustering in

    the following way: it takes the low-dimensional configuration computed from Spectral Clustering

    as an initial configuration for an MCMC algorithm, which updates that configuration (and other pa-

    rameters) iteratively. The function scuba calls a Fortran subroutine, which performs the MCMC

    uncertainty algorithm. This package also provides various functionalities for performing Spectral

    Clustering on large data matrices (see, for example, the function spectClust). The package

    was originally developed to analyse matrices with gene-sample data obtained from micro-array ex-

    periments, but can easily be adapted for other data from other applications as well (as long as the

    data can be represented as a rectangular matrix with non-negative elements). It also provides sev-

    6

  • SCUBA-package 7

    eral functions for visualisation (2D plots) of the results (see, for example, the following functions:

    displaySpectClust, displayKMeans, scuba, PlotOutData). For more details about

    the main function in the package, scuba, and about the package in general, please see our report,

    Spectral Clustering Uncertainty via Bayesian Analysis (SCUBA): Report.

    Author(s)

    Merrilee Hurn, Simon Shaw, Alastair Spence, Zhivko Stoyanov, Ivelina Stoyanova

    Maintainer: Zhivko Stoyanov , Ivelina Stoyanova

    References

    1. F. Bavaud Spectral clustering and multidimensional scaling: a unified view, Proccedings of

    the IFCS 2006 Conference, 2006.

    2. S. Butler Spectral Graph Theory: Three common spectra, Nankai University, Tianjin, 2006.

    3. D. J. Higham, G. Kalna and J. K. Vass Analysis of the singular value decomposition as a tool

    for processing microarray expression data, Algoritmy, pages 250–259, 2005.

    4. D. J. Higham, G. Kalna and M. Kibble Spectral Clustering and its use in Bioinformatics,

    Journal of Computational and Applied Mathematics, 204(1), 2007.

    5. M. A. Hurn, S. Shaw, A. Spence and Z. V. Stoyanov Spectral Clustering Uncertainty via

    Bayesian Analysis (SCUBA): Report, University of Bath, 2011.

    6. P. Grindrod, D. J. Higham, G. Kalna, A. Spence, Z. V. Stoyanov and J. K. Vass DNA Meets

    the SVD, Mathematics today, 2008.

    7. Man-Suk Oh and A. E. Raftery Bayesian Multidimensional Scaling and Choice of dimension,

    Journal of the American Statistical Association, 96(455), 2001.

    8. R. Sibson Studies in the Robustness of Multidimensional Scaling: Procrustes Statistics, J. R.

    Statist. Soc. B, 40, No. 2, pp. 234-238, 1978.

    9. E. Wit and J. McClure Statistics for Microarrays: Design, Analysis and Inference, Wiley,

    2004.

    See Also

    scuba, spectClust, displaySpectClust, displayKMeans, PlotOutData

  • Description of implemented functions

    displayKMeans Displays clustering of data from K Means algorithm

    Description

    The function performs K-means clustering on the low-dimensional presentation of a data matrix

    obtained by slectral clustering.

    Usage

    displayKMeans(X, q="samples", nclust = 2, d = 2)

    Arguments

    X A rectangular matrix of non-negative elements: rows of X represent genes and

    its columns are samples.

    q Either "samples" or "genes". Shows whether clustering is performed on

    samples (columns) or genes (rows).

    nclust Number of clusters. This specifies the number of points (centres) to be selected

    on the plot.

    d The dimensionality of the data, first d dimensions taken.

    Details

    Given a data matrix X and the desired number of clusters nclust, the function plots the low-

    dimensional representation obtained by spectral clustering and lets the user select (interactively) the

    8

  • displaySpectClust 9

    centres of the clusters. Then the function performes the K-means clustering algorithm implemented

    in R’s package stats.

    The function provides a graphical enhancement to the K-means clustering function (kmeans in

    package stats).

    Author(s)

    Zhivko Stoyanov, Ivelina Stoyanova

    See Also

    spectClust

    Examples

    # using the preloaded data matrix Leukemia38

    displayKMeans(Leukemia38)

    # varying the number of clusters

    displayKMeans(Leukemia38,nclust=3)

    displaySpectClust Displays results of the spectral clustering algorithm

    Description

    The function plots the results of spectral clustering on either the "samples" or the "genes". In order

    to do that it uses selected singular vectors of Xsc - a scaled version of the matrix X .

    Usage

    displaySpectClust(X, d=2, q="samples", keepFirst = FALSE)

  • 10 displaySpectClust

    Arguments

    X A rectangular matrix of non-negative elements: rows of X represent genes and

    its columns are samples.

    d Either 1 or 2. Present a plot of d-dimensions - i.e. first d eigenvectors. If 1 then

    the first eigenvector is plotted against indices.

    q Either "samples" or "genes". Shows whether clustering is performed on

    samples (columns) or genes (rows).

    keepFirst Either TRUE or FALSE. It is set to TRUE if the user wants to keep the first

    eigenvalue (equal to 1) and its corresponding eigenvector. FALSE if the first

    eugenvalue and eigenvector are to be ignored.

    Details

    The function plots the results of spectral clustering on either the "samples" (columns) or the "genes"

    (rows) of the selected singular vectors of Xsc.

    The function goes through the following stages:

    1. Displays d-dimensional plot of the result of spectral clustering and next to it the plot with the

    eigenvalues.

    2. Allows the user to select interactively an eigenvalue from the plot and its index is obtained.

    Displays 2-dimensional plots of all pairs of eigenvectors up to the index of the selected eigen-

    value.

    3. Allows the user to select a subplot of a particular pair of eigenvectors of interest.

    4. Provides command line options for going back to (1) or (3) or exiting the program.

    Author(s)

    Zhivko Stoyanov, Ivelina Stoyanova

    Examples

    # testing on some random matrix

    displaySpectClust(matrix(runif(8000),c(80,100)),2,"samples")

  • plotEigenvalues 11

    #testing on the preloaded data matrix Leukemia38

    displaySpectClust(Leukemia38,2,"genes")

    # using the Zachary Karate Club set

    displayKMeans(ZKC,nclust=2)

    plotEigenvalues Plots eigenvalues against their indices

    Description

    The function displays the eigenvalues obtained by spectral clustering.

    Usage

    plotEigenvalues(data_matrix = NULL, Evals = NULL,

    keepFirst = TRUE,

    selecting = FALSE)

    Arguments

    data_matrix The data matrix to be analysed and its eigenvalues plotted. It can be NULL if

    Evals is specified. In this case Evals contains the eigenvalues.

    Evals The eigenvalues to be plotted. If this is provided, the eigenvalues are used di-

    rectly, i.e. they are not computed.

    keepFirst Either TRUE or FALSE (default TRUE). It is set to TRUE if the user wants to

    keep the first eigenvalue (equal to 1) and its corresponding eigenvector. FALSE

    if the first eigenvalue and eigenvector are to be ignored.

    selecting Either TRUE or FALSE (default is FALSE). If TRUE the user is prompted to se-

    lect an eigenvalue by clicking on the plot and the index of the selcted eigenvalue

    is returned.

  • 12 PlotOutData

    Details

    The function displays the eigenvalues obtained by spectral clustering. If data_matrix is speci-

    fied the function calculates the eigenvalues using spectral clustering. If Evals is provided with the

    pre-calculated eigenvalues, they are used directly.

    Optionally, the first eigenvalue and eigenvector are omitted as they are not relevant for the clustering

    problem. The function also can optionally return the index of the interactively selected eigenvalue.

    Author(s)

    Zhivko Stoyanov, Ivelina Stoyanova

    See Also

    spectClust, plotPairsEigenvectors

    Examples

    # the preloaded data matrix Leukemia38 is used

    # first eigenvalue ignored

    # allows selection of an eigenvalue from

    the plot plotEigenvalues(data_matrix = Leukemia38, keepFirst =

    FALSE, selecting = TRUE)

    PlotOutData Plots the result of an MCMC uncertainty algorithm

    Description

    The main plotting function. It performs setup for plotting and calls the relevant plotting functions.

  • PlotOutData 13

    Usage

    PlotOutData(Xin, XinS, XoutR, XoutS, use_p,

    cols=NULL,

    first=2, second=NULL,

    scaleOutData = FALSE, zoomOutData = FALSE)

    Arguments

    Xin Preprocessed input data matrix obtained from spectral clustering. This is the

    initial configuration for the MCMC algorithm.

    XinS Preprocessed input data matrix, scaled.

    XoutR Postprocessed (rotated) output matrix from the MCMC algorithm.

    XoutS Postprocessed (rotated and scaled) output matrix (a set of configurations).

    use_p The dimension of the output configuration. This includes the first eigenvector

    which is ignored in spectral clustering.

    first The number of the eigenvector which is used for plotting. This defines the first

    coordinate in the low-dimenional representation obtained from spectral cluster-

    ing.

    second The number of the other eigenvector used for plotting. If NULL, the first eigen-

    value is plotted agains its indices.

    scaleOutData

    To be set to TRUE if the scaled result is required.

    zoomOutData To be set to TRUE if it is required to zoom certain area(s) of the plot centred

    at the selected point. The magnitude factor for zooming is user defined (on

    command line).

    Details

    This function is called within the main function scuba but can be called with different data ob-

    tained by independent method not in the package.

  • 14 plotPairsEigenvectors

    Author(s)

    Zhivko Stoyanov, Ivelina Stoyanova

    See Also

    ZoomArea

    plotPairsEigenvectors

    Plots pairs of eigenvectors

    Description

    Displays all pairs of eigenvectors for a given matrix up to a specified index.

    Usage

    plotPairsEigenvectors(S, evalNumber, keepFirst = TRUE, selecting =

    FALSE)

    Arguments

    S The matrix containing the eigenvectors as result from spectral clustering.

    evalNumber The pairs of the first evalNumber eigenvectors will be plotted.

    keepFirst Either TRUE or FALSE (default TRUE). It is set to TRUE if the user wants to

    keep the first eigenvalue (equal to 1) and its corresponding eigenvector. FALSE

    if the first eugenvalue and eigenvector are to be ignored.

    selecting Either TRUE or FALSE. If TRUE the user is prompted to select a subplot from

    the plot which is displayed enlarged.

  • saveAllFilesForRun 15

    Details

    This function displays all pairs of eigenvectors for a given matrix up to a specified index, evalNumber.

    The evalNumber can be selected using the plotEigenvalues. Further, a subplot of a particular pair

    can be selected to be displayed independently.

    Author(s)

    Zhivko Stoyanov, Ivelina Stoyanova

    See Also

    plotEigenvalues

    Examples

    # example for using the preloaded data matrix Leukemia38

    # first, we get the matrix of the eigenvectors

    # then plot the pairs of eigenvectors

    sc

  • 16 saveAllFilesForRun

    Usage

    saveAllFilesForRun(runName, outsideCall = TRUE)

    Arguments

    runName This argument can be set to a character string which is the name of the current

    run of the program (with the current data matrix and parameters). All output

    files with the result from the run will have their filenames prefixed by the name

    of the run. This allows the user to distinguish between results from different

    runs.

    outsideCall TRUE if this is an independent call outside of the main function of the program.

    If FALSE it is called from inside the main function and the working directory

    need not be changed.

    Details

    The main MCMC algorithm uses Fortran and C subroutines which read and write data to/from

    predefined files. Thus with each run of the program these files are rewritten and previous results are

    lost.

    This function provides the option of naming a run, i.e. with specific data and/or parameters, where

    the filenames are prefixed with the name of the run. E.g. if we run the program with the ma-

    trix Leukemia38 and save the result as run ’L38’ and then run the program with the matrix

    ColonAlon and save it as run ’CA’ we can then compare the results from the two runs stored in

    files L38-xs.out and CA-xs.out.

    Author(s)

    Zhivko Stoyanov, Ivelina Stoyanova

    Examples

    # after finishing a run which was not named

    # deciding to save the results,

  • scuba 17

    # e.g. they take considerable time to calculate again

    saveAllFilesForRun(runName="someRun", outsideCall=TRUE)

    scuba Display results of the MCMC uncertainty algorithm

    Description

    This is the main function of the SCUBA package. It performs Spectral Clustering of a data matrix

    and uses that as in initial configuration of an MCMC algorithm (written in Fortran and C), which

    updates that configuration (and other parameters of the model) iteratively. The output of the function

    is a 2D plot, on which “clouds” serve as a visual representation of the noise in Spectral Clustering

    indiced by uncertainty in the micro-array data.

    Usage

    scuba(filename = NULL, data_matrix = NULL,

    nburn = 1000, nsamp = 1000,

    usePreviousResultFilename = NULL,

    scaleOutData = FALSE, zoomOutData = FALSE,

    runName = NULL)

    Arguments

    filename Name of file containing a data matrix. It can be NULL if data_matrix is

    specified instead.

    data_matrix Preloaded data matrix. It can be NULL if data are loaded from a file.

    use_p Dimension of result.

    seed This is the seed used by the MCMC algorithm.

    nsweep The number of iterations performed by the MCMC algorithm.

  • 18 scuba

    nburn The number of burn-in iterations used in the MCMC algorithm.

    nsamp This is the number of samples to be output from the MCMC algorithm and

    should therefore be a positive integer, which is less than nsweep-nburn. This

    parameter becomes necessary when nsweep, the total number of iterations per-

    formed by the MCMC algorithm, is “too large.” The latter would make the size

    of the configuration, output from the MCMC algorithm, too big for R to handle.

    Thus, choosing nsamp < nsweep-nburn selects (uniformly, with respect to the

    number of the iteration) fewer samples to plot.usePreviousResultFilename

    If this is specified a file with results from a previous run will be used and the

    call to the MCMC algorithm will be omitted. This offers the opportunity to save

    time for repeated anayses of a large data matrix.

    scaleOutData To be set to TRUE if the scaled result needs to be plotted.

    zoomOutData To be set to TRUE if it is required to perform some zooming on the plot with the

    result.

    runName This argument can be set to a character string which is the name of the current

    run of the program (with the current data matrix and parameters). All output

    files with the result from the run will have their filenames prefixed by the name

    of the run. This allows the user to distinguish between results from different

    runs.

    Details

    This is the main function of the SCUBA package. This function applies MCMC uncertainty algo-

    rithm to the output from Spectral Clustering in the following way: it takes the low-dimensional con-

    figuration, computed by Spectral Clustering, as an initial configuration for an MCMC algorithm,

    which updates that configuration (and other parameters of the Bayesian model) iteratively. The

    function scuba calls a Fortran subroutine (via CallMCMCFortran), which performs the MCMC

    uncertainty algorithm. The name of the Fortran file, which contains the MCMC uncertainty algo-

    rithm is positive.f. Experienced Fortran users may edit positive.f, and in this way alter

    the MCMC algorithm, according to their own needs. The random number generators, used by the

    MCMC algorithm, are the random generators used in R and are called in the C file rng.c. The

    output of the function is a 2D plot, on which “clouds” serve as a visual representation of the noise

    in Spectral Clustering indiced by uncertainty in the micro-array data. There are variaous options

    for the plots (for more details, see the function PlotOutData). The function also provides the option

  • spectClust 19

    of zooming particular regions of the 2D plot (see description of the function ZoomArea and the

    variable zoomOutData).

    Author(s)

    Zhivko Stoyanov, Ivelina Stoyanova

    References

    1. D. J. Higham, G. Kalna and J. K. Vass Analysis of the singular value decomposition as a tool

    for processing microarray expression data, Algoritmy, pages 250–259, 2005.

    2. D. J. Higham, G. Kalna and M. Kibble Spectral Clustering and its use in Bioinformatics,

    Journal of Computational and Applied Mathematics, 204(1), 2007.

    3. M. A. Hurn, S. Shaw, A. Spence and Z. V. Stoyanov Spectral Clustering Uncertainty via

    Bayesian Analysis (SCUBA): Report, University of Bath, 2011.

    4. P. Grindrod, D. J. Higham, G. Kalna, A. Spence, Z. V. Stoyanov and J. K. Vass DNA Meets

    the SVD, Mathematics today, 2008.

    5. Man-Suk Oh and A. E. Raftery Bayesian Multidimensional Scaling and Choice of dimension,

    Journal of the American Statistical Association, 96(455), 2001.

    6. R. Sibson Studies in the Robustness of Multidimensional Scaling: Procrustes Statistics, J. R.

    Statist. Soc. B, 40, No. 2, pp. 234-238, 1978.

    7. E. Wit and J. McClure Statistics for Microarrays: Design, Analysis and Inference, Wiley,

    2004.

    Examples

    scuba(data_matrix = Leukemia38, zoomOutData = TRUE)

    scuba(data_matrix = Leukemia38, runName = "TestRun")

    spectClust Spectral clustering algorithm

  • 20 spectClust

    Description

    The function returns the singular values and singular vectors of a scaled version of the input matrix.

    Usage

    spectClust(X, econ = 0, keepFirst=FALSE)

    Arguments

    X A rectangular matrix of non-negative elements: rows of X represent genes and

    its columns are samples

    econ Either 0, 1 or 2. Default value is 0. Shows whether economy of calcula-

    tions/output is applied. Suppose Xsc = USV T is the result of SVD. If econ

    is 0, then both matrices U and V are output, if 1 - only matrix U , if 2 - only

    matrix V (this is the first to be calculated so for greater econ value less calcu-

    lations are performed).

    keepFirst Either TRUE or FALSE. Shows whether the first eigenvalue and corresponding

    eigenvector need to be kept or not.

    Details

    The input matrix X is a gene-sample matrix, whose entries are non-negative and represent the

    activity of a given gene in a particular sample. We assume that the i-th row of X represents the

    activity of the i-th gene, and the j-th column of X gives the activity of all the genes in the j-th

    sample.

    The algorithm finds the spectrum (SVD) of a scaled version of the matrix X , that is Xsc = USV T ,

    and returns a list of:

    1. the array of eigenvalues;

    2. the matrix U - which contains the eigenvectors of the matrix XT X; and/or

    3. the matrix V - which contains the eigenvectors of the matrix XXT .

    For more details see the references below.

  • ZoomArea 21

    Author(s)

    Zhivko Stoyanov, Ivelina Stoyanova

    References

    1. D. J. Higham, G. Kalna and J. K. Vass Analysis of the singular value decomposition as a tool

    for processing microarray expression data, Algoritmy, pages 250–259, 2005.

    2. D. J. Higham, G. Kalna and M. Kibble Spectral Clustering and its use in Bioinformatics,

    Journal of Computational and Applied Mathematics, 204(1), 2007.

    3. Peter Grindrod, Desmond J. Higham, Gabriela Kalna, Alastair Spence, Zhivko Stoyanov and

    Keith Vass DNA Meets the SVD, Mathematics today, 2008.

    Examples

    spectClust(matrix(runif(8000),c(80,100)),0)

    spectClust(Leukemia38)

    ZoomArea Zooms the plot centered at a selected point

    Description

    Allows the user to zoom a particular region of a given plot by a user-defined magnitude factor and

    centre point.

    Usage

    ZoomArea(Xin, Xout, use_p=3,

    cols=NULL,

    first = 2, second = NULL,

    resultType = "rotated", zoom = NULL)

  • 22 ZoomArea

    Arguments

    Xin Preprocessed input data matrix obtained from spectral clustering. This is the

    initial configuration for the MCMC uncertainty algorithm.

    Xout Postprocessed (rotated and/or scaled) output matrix (a set of configurations).

    use_p The dimension of the output configuration. This includes the first eigenvector

    which is ignored in spectral clustering.

    first The number of the eigenvector which is used for plotting. This defines the first

    coordinate in the low-dimensional representation obtained from spectral cluster-

    ing.

    second The number of the other eigenvector used for plotting. If NULL, the first eigen-

    value is plotted against its indices.

    resultType The type of the result to be plotted. It is important for correct labelling of axes.

    The default is ’rotated’.

    zoom The magnitude factor for zooming. If NULL the user is prompted to input a

    magnitude factor on the command line.

    Details

    Allows the user to zoom a particular region of a plot centered at selected point. The magnitude

    factor is user defined (on command line).

    A click on the zoomed plot returns to the original (not zoomed) plot.

    Author(s)

    Zhivko Stoyanov, Ivelina Stoyanova

  • Index

    ∗Topic Gene-expression dataplotEigenvalues, 12

    PlotOutData, 13

    plotPairsEigenvectors, 15

    saveAllFilesForRun, 16

    ZoomArea, 22

    ∗Topic MCMCplotEigenvalues, 12

    PlotOutData, 13

    plotPairsEigenvectors, 15

    saveAllFilesForRun, 16

    scuba, 18

    SCUBA-package, 6

    ZoomArea, 22

    ∗Topic MDSdisplaySpectClust, 10

    spectClust, 21

    ∗Topic PCAdisplaySpectClust, 10

    spectClust, 21

    ∗Topic SVDdisplayKMeans, 9

    displaySpectClust, 10

    spectClust, 21

    ∗Topic Spectral ClusteringdisplayKMeans, 9

    displaySpectClust, 10

    plotEigenvalues, 12

    PlotOutData, 13

    plotPairsEigenvectors, 15

    saveAllFilesForRun, 16

    scuba, 18

    SCUBA-package, 6

    spectClust, 21

    ZoomArea, 22

    ∗Topic gene-expression datadisplayKMeans, 9

    displaySpectClust, 10

    scuba, 18

    SCUBA-package, 6

    spectClust, 21

    ∗Topic k-meansdisplayKMeans, 9

    Background, 3

    CallMCMCFortran, 19

    Description of implemented

    functions, 9

    displayKMeans, 7, 8, 9

    displaySpectClust, 7, 8, 10

    Information about SCUBA, 6

    plotEigenvalues, 12, 16

    PlotOutData, 7, 8, 13, 20

    plotPairsEigenvectors, 13, 15

    saveAllFilesForRun, 16

    SCUBA (SCUBA-package), 6

    23

  • 24 INDEX

    scuba, 6–8, 18

    SCUBA-package, 6

    spectClust, 6, 8, 10, 13, 21

    ZoomArea, 15, 20, 22

    ContentsBackgroundInformation about SCUBASCUBA-package

    Description of implemented functionsdisplayKMeansdisplaySpectClustplotEigenvaluesPlotOutDataplotPairsEigenvectorssaveAllFilesForRunscubaspectClustZoomArea

    Index