Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011

Preview:

Citation preview

Introduction to caArray

caBIG® Molecular Analysis Tools

Knowledge Center

April 3, 2011

caArray Overview

• More than a simple repository for microarray data.• Supports data management throughout the life of

experiment.• Allows collaborative sharing of pre-publication data

with partners.• Provide data to other biomedical/clinical tools to form

a comprehensive solution for array data management, search, and analysis.

Why use caArray?

• Target Users: • Bench scientists performing microarray data collection and

annotation• Microarray core facility scientists and technicians• Bioinformatics and data management coordinators• Multi-institutional data coordinating center informaticians

• Addressing Critical Needs:• Manage all aspects of array data: raw data, derived data,

sample annotation, experimental design• Ensure data are private (in a local instance) until published• Supports array data sharing using a federated model• Find what you are looking for fast: query annotated data within,

and across, datasets • Facilitate data integration: provide annotated data to other

analytical caBIG® tools

Key Functions of caArray

• Query annotated data within and across datasets with search and navigate features

• Uploading of array files from industry formats (e.g., Affymetrix, GenePix, Illumina, Agilent)

• Annotation of data to harmonize datasets and reduce time to aggregate data

• MAGE-TAB import and export functionality• GEO-SOFT export functionality• Security and authentication features that include group-

based permissions• Provide annotated data to other caBIG® tools that support

analytical analysis• Rich programmatic APIs that allow analytical tools (on and

off the Grid) to pull data from caArray and visualize/analyze it.

Web Interface: Find Things Fast

• User-friendly web interface for browse and search

Platform Support: Grow Towards All Inclusive

• The collection of most available Affymetrix, Illumina, and Agilent array platforms/designs in caArray ensures that most native data files can be stored, parsed, and associated to samples.

Parsed Data Formats: the More, the Better for Users

• MAGE-TAB format• Agilent raw TXT for aCGH, expression and miRNA assays • Agilent GEML/XML array designs • Nimblegen pair Report TXT (raw and normalized) • Nimblegen NDF array designs • Illumina CSV• Illumina Sample Probe Profile TXT • Illumina genotyping processed data matrix TXT • Illumina BGX/TXT array designs • Affymetrix CEL and CHP in AGCC/Calvin formats in addition to the

GCOS formats • Affymetrix CNCHP copy number data (CN4 and CN5) • Copy Number data in a prescribed MAGE-TAB Data Matrix format.

MAGE-TAB: Save Time on Sample Annotation

IDF

SDRFExcel-like Format, Controlled Vocabhttp://www.mged.org/mage-tab/

Data Management: Loading Data

Data Management: Sample Annotation and Datasets

Data Export: Zip, MAGE-TAB, or GEO Soft

Collaboration and Data Sharing

• Investigators define collaboration groups for sharing of pre-publication data with a set of partners.

• Access control at the experiment level or at individual samples.

• Data is private until made public by the Data Owner.

Data Analysis: Tool Integration

gene expression data gene expression data and SNP data

Cross-query over many caArray instances

gene expression data and copy number data

A Glance at the Technology

• Tool Platform: • Enterprise-web based system that works within a Firefox or Internet Explorer

browser

• CBIIT-Hosted Installation of caArray: • Limited computer skills are required to use the application; directed at laboratory

researchers

• Local Installation of caArray:• Moderate technical expertise is required to install the tool

• Upgrade Availability:• To make upgrades as seamless as possible, an upgrade installer, both available in

GUI format as well as command line format, upgrades installed caArray instance while maintaining data integrity.

The Next Step: Accessing Online Resources for caArray

Molecular Analysis Tools Knowledge Center

https://wiki.nci.nih.gov/x/R5GNAg

caArray User Forum https://cabig-kc.nci.nih.gov/Molecular/forums/viewforum.php?f=6

Tool Landing Page https://cabig.nci.nih.gov/tools/caArray

Access to Demo caArray Instance

https://array-train.nci.nih.gov/caarray/home.action(Register from that site for a training account)

Application Support Email: ncicb@pop.nci.nih.gov

Phone: 301-451-4384

Toll-free: 888-478-4423

Recommended