16
biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1 , Alex Borchers 1 , Timothy McPhillips 2 , Shirley Cohen 3 , Mark A. Miller 1 , Ilkay Altintas 1 1 San Diego Supercomputer Center, UCSD 2 University of California, Davis 3 University of Pennsylvania

Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy

Embed Size (px)

Citation preview

biology.sdsc.edu

CIPRes in Kepler: An integrative workflow package for

streamlining phylogenetic data analyses

Zhijie Guan1, Alex Borchers1, Timothy McPhillips2, Shirley Cohen3, Mark A. Miller1, Ilkay Altintas1

1San Diego Supercomputer Center, UCSD2University of California, Davis

3University of Pennsylvania

biology.sdsc.edu

What is a Scientific Workflow? Combination of

data integration, analysis, and visualization steps larger, automated "scientific process"

Mission of scientific workflow systems Promote “scientific discovery” by providing tools and methods to

generate scientific workflows Create an extensible and customizable graphical user interface

for scientists from different scientific domains Support computational experiment creation, execution, sharing,

reuse and provenance Design frameworks which define efficient ways to connect to the

existing data and integrate heterogeneous data from multiple resources

Make technology useful through user’s monitor!!!

biology.sdsc.edu

Promoter Identification Workflow

Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)

biology.sdsc.edu

A Workflow for Phylogeny Analysis

biology.sdsc.edu

Kepler is a Scientific Workflow System

… and a cross-project collaboration June 2, 2006 Beta release

www.kepler-project.orgwww.kepler-project.org

Ptolemy II: A software system used for prototyping engineering systemKEPLER: A platform to design and execute Scientific Workflows

KEPLER = “Ptolemy II + X” for Scientific Workflows

Builds upon the open-source Ptolemy II framework

biology.sdsc.edu

Some Kepler Contributors

Ptolemy IIPtolemy II

Resurgence

Griddles

SRB

LOOKING

SKIDL

NLADR Contributor names and funding info are at the Kepler website!!

Other contributors: - Chesire (UK Text Mining Center) - DART (Great Barrier Reef, Australia) - National Digital Archives + UCSD-TV (US) - …

biology.sdsc.edu

A co-development in KEPLER: GEON Dataset Generation & Registration

SQL database access (JDBC)

% Makefile$> ant run

% Makefile$> ant run

biology.sdsc.edu

Phylogeny Analysis Workflows

Local Disk

MultipleSequenceAlignment

PhylogenyAnalysis

TreeVisualization

biology.sdsc.edu

Kepler Workflow: Actors Actor

Encapsulation of parameterized actions

Interface defined by ports and parameters

Port Communication between input and

output data The place where data get in/out

Model of computation Flow of control Sequential / parallel execution Implementation is a framework

Actor-Oriented Design

biology.sdsc.edu

CIPRes Workflow: Actors

Input Port:Nexus File Content

Data MatrixTree

Taxa InfoOutput Ports:

biology.sdsc.edu

Some actors in place for…• Generic Web Service Client and Web Service Harvester• Customizable RDBMS query and update• Command Line wrapper tools (local, ssh, scp, ftp, etc.) • Some Grid actors-Globus Job Runner, GridFTP-based file access, Proxy Certificate Generator

• SRB support• Native R and Matlab support• Interaction with Nimrod and APST• Communication with ORBs through actors and services• Imaging, Gridding, Vis Support• Textual and Graphical Output• …more generic and domain-oriented actors…

biology.sdsc.edu

CIPRes Workflow

Run ClustalWChoose the input

file

Get the subset of the aligned sequences

Read the treeParse the

treeDisplay the

tree

Run PAUP for Tree Inference

Channel: Convey the data

GUIGen: Parameter Setting

Actor:

Results:

biology.sdsc.edu

CIPRes Workflows: Demo

Read Sequences Multiple Sequence Alignment Display the Alignment

Matrix Alignment Tree Inference Consensus Tree Tree Visualization

biology.sdsc.edu

Summary Kepler is good at:

Integrating data, programs, and computing resources Capturing your ideas and realizing them Supporting computational experiment creation,

execution, sharing, and reuse Quickly prototyping scientific workflows Building streamlining applications

Visual programming language Don’t write your application, “draw”/compose it

Cipres-Kepler package can be used to build scientific workflows for phylogenetic data analyses

biology.sdsc.edu

Future Work Cipres-Kepler can help you There is (always) a lot more to work on:

More actors for phylogeny analyses Automatically generating actors based on CORBA

services Database (TreeBase) support to store large amounts of

data More computing power for large dataset processing

Need your collaboration: Sharing experiences Teaching each other the domain knowledge Locating a specific problem and solving it

biology.sdsc.edu

Questions?

Zhijie [email protected]

Cipres-Kepler Release:

ftp://ftp.sdsc.edu/outgoing/borchers/cipresReleases/20060621/cipresKepler_Dist.tgz