Upload
keiichiro-ono
View
353
Download
6
Embed Size (px)
Citation preview
Reproducible Workflows with Jupyter Notebook and CytoscapeKeiichiro OnoCytoscape Core Developer TeamUC, San Diego Trey Ideker Lab / National Resource for Network Biology
5/19/2016 Advanced Cytoscape Workshop
Course Materials: Clone/Fork/Download this repository!
https://github.com/idekerlab/tsri-lecture
Setup Guide:
https://github.com/idekerlab/tsri-lecture/blob/master/documents/Setup%20Guide.pdf
Cytoscape 3.4.0:
http://www.cytoscape.org/download.php
Keiichiro Ono
Cytoscape Core Developer since 2005 @UCSD Trey Ideker Lab
Area of Interest:Biological Data Integration & Visualization
Agenda
• Reproducible Analysis & Visualization
• Introduction to Jupyter Notebook
• Create a reproducible network visualization workflows with Python
Review
- Network analysis / visualization is a powerful method to get biological insights from your screening result
- Cytoscape is the de-facto standard tool to perform this type of analysis
Review
-Core features of Cytoscape -Navigation (Pan/Zoom/Select) -Network / Table Data Import -Automatic Layout -Visual Style
Creating Visualizations in Cytoscape
Name Type
BRCA1 gene
MAP2K1 gene
C05981 compound
• Mapping from Type to Node Shape • Mapping from Type to Node Color
C05981
BRCA1
MAP2K1
Creating mappings from data points to Visual Properties
http://www.the-scientist.com/?articles.view/articleNo/43632/title/Get-With-the-Program/
https://theconversation.com/how-computers-broke-science-and-what-we-can-do-to-fix-it-49938http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
Reproducibility…it’s a known issue
Problems- Reproducibility of biological research, especially for in vivo/vitro
experiments, is a hard problem
- But this is true even for in silico analysis! - OS version - Revision of scripts - Data analysis software versions - Version of data files - Command line parameters written on a paper napkin - “Black magic” only a grad student knows
- This is something we need to fix, using latest technologies and best practices
Data Preparation
- Cleansing
- Normalization
- Missing values
- Corrupted values
- Reformat
- Conversion
Analysis
- Filtering
- Standard graph statistics
- Density
- Betweenness - Centrality
- Clustering
- Community Detection
- GO enrichment analysis
Visualization
- Mapping
- Data points to visual variables
- Layout
- For graphs:
- Force-directed
- Tree
Language-Agnostic
- From next version (4.x), Python Notebook will be an implementation of Jupyter
- You can switch to other language kernels
- In this lecture, we will use Python, but you can use language of your choice to control Cytoscape
Question
• Cytoscape is a desktop application
• Point & click GUI operation
• Easy to use, but how can we make our workflow reproducible?
What is cyREST?
- Platform-independent, RESTful API module for Cytoscape - Means you can access basic Cytoscape data objects
programmatically - Now it’s a Cytoscape Core feature!
REST
Interactive Data Analysis Environments
In-House Databases External Computing Resources
- Graph Layout- Statistical Analysis- Data Pre-processing
RStudio
- NumPy- SciPy- Pandas- NetworkX
IPython Notebook
File / Code Hosting ServicesPublic Data Repository
PSICQUIC Services
EBI RDF Platform
Other Bioinformatics Web Applications / Services
- igraph- rCurl
Command Line Tools
> sed> awk> grep> curl
Web Browsers
Data Repository & Collaboration Service
Data Bus (Internet)
Your Workstation
Cytoscape App Store
Cytoscape Desktop
Apps
Core
REST
curl http://mygene.info/v2/query?q=kras
{ "hits": [ { "taxid": 9606, "entrezgene": 3845, "symbol": "KRAS", "_id": "3845", "name": "Kirsten rat sarcoma viral oncogene homolog" }, { "taxid": 10090, "entrezgene": 16653, "symbol": "Kras", "_id": "16653", "name": "Kirsten rat sarcoma viral oncogene homolog" }, { "taxid": 10116, "entrezgene": 24525, "symbol": "Kras", "_id": "24525", "name": "Kirsten rat sarcoma viral oncogene" }, { "taxid": 10090, "entrezgene": 110836, "symbol": "Kras2-rs2", "_id": "110836", "name": "Kirsten rat sarcoma oncogene 2, related sequence 2" }, { "taxid": 10090, "entrezgene": 110832, "symbol": "Kras2-rs1", "_id": "110832", "name": "Kirsten rat sarcoma oncogene 2, related sequence 1" }, { "taxid": 10090, "entrezgene": 111117, "symbol": "Kras1-ps", "_id": "111117", "name": "Kirsten rat sarcoma oncogene 1, pseudogene" } ], "max_score": 391.5175, "took": 4, "total": 6}
Mapping Cytoscape API to HTTP Methods
Create
Read
Update
Delete
Cytoscape Operations
POST
GET
PUT
DELETE
HTTP Methods
RESTLab notebook to record
your workflow
Make Cytoscape controllable via scripts
Manage multiple versions of your
notebooks and other scripts
RESTLab notebook to record
your workflow
Make Cytoscape controllable via scripts
Manage multiple versions of your
notebooks and other scripts
Missing: Environment to execute your workflow
Bare Metal MachineOS (Linux)
Docker
FrameworksApplication
FrameworksApplication
FrameworksApplication
FrameworksApplication
FrameworksApplication
What is Docker?
- Container to run applications in an isolated environment
- Application = Layer of images
- Sharable Environments
- Environments as code
Docker Hub
- Sharing environments as code!
- Dockerfile - Definition of your container
- “GitHub of Images”
Further Readings
• My presentation slides
• http://www.slideshare.net/keiono
• cyREST web sites
• http://apps.cytoscape.org/apps/cyrest
• https://github.com/idekerlab/cyREST/wiki
• py2cytoscape — https://github.com/idekerlab/py2cytoscape