30
JIP - PIPELINE SYSTEM ACCESSIBLE HIGH THROUGHPUT COMPUTING

JIP Pipeline System Introduction

Embed Size (px)

DESCRIPTION

This talks covers some of the basic aspects of the JIP pipeline system (http://pyjip.readthedocs.org) and it's command line interface. JIP is a system to manage jobs on a Cluster system and simplify the process of building computational pipelines. JIP can interact with Slurm, SGE, PBS/Torque, or LSF clusters and comes with a small local scheduler to run without any remote grid engine.

Citation preview

Page 1: JIP Pipeline System Introduction

J I P - P I P E L I N E S Y S T E MA C C E S S I B L E H I G H T H R O U G H P U T C O M P U T I N G

Page 2: JIP Pipeline System Introduction

W H Y ?S E R I O U S LY

• Job Management

• Implementation

• Batch job handling

• Reusable and…

• … documented tools

Page 3: JIP Pipeline System Introduction

L O C AT I O N S

P L E A S E TA K E A L O O K

• Documentation http://pyjip.rtfd.org

• Source Code https://github.com/thasso/pyjip

• Exampleshttps://github.com/thasso/pyjip/tree/master/examples

Page 4: JIP Pipeline System Introduction

C L I O R A P I

• Commands to run and submit jobs

• List and query jobs

• Manipulate jobs (delete, archive, cancel, edit,…)

• Cleanup jobs and list profiles and tools

• Start your own server

Page 5: JIP Pipeline System Introduction

Commands ======== run Locally run a jip script submit submit a jip script to a remote cluster bash Run or submit a bash command !List and query jobs =================== jobs list and update jobs from the job database !Manipulate jobs =============== delete delete the selected jobs archive archive the selected jobs cancel cancel selected and running jobs hold put selected jobs on hold restart restart selected jobs logs show log files of jobs edit edit job commands for a given job show show job options and command for jobs !Miscellaneous ============= tools list all tools available through the search paths profiles list all available profiles clean remove job logs check check job status server start the jip grid server

C

L

I

O

R

A

P

I

Page 6: JIP Pipeline System Introduction

H E L L O W O R L D

Lets get started

Page 7: JIP Pipeline System Introduction

H E L L O W O R L D

#!/usr/bin/env jip # Prints hello world !echo "Hello world"

#!/usr/bin/env jip # Prints hello world using perl !#%begin command perl print "Hello world\n"; #%end

#!/usr/bin/env jip !#%begin command python print "Hello world" #%end @pytool()

def hello_world(): """Prints hello world""" print "Hello python"

Page 8: JIP Pipeline System Introduction

#%begin command [perl|RScript|…]

• command block to run scripts

• specify an interpreter (default bash)

• use templates to access options and variables

#%end

Page 9: JIP Pipeline System Introduction

O P T I O N S A N D D O C U M E N TAT I O N

• Options are specified in your documentation

• Specify Inputs, Outputs, and other Options

• Options are available as ${variables}

Page 10: JIP Pipeline System Introduction

O P T I O N S A N D D O C U M E N TAT I O N

#!/usr/bin/env jip # # BWA/Samtools pileup # # Usage: # pileup.jip -i <input> -r <reference> -o <output> # # Inputs: # -i, --input <input> The input file # -r, --reference <reference> The genomic reference # # Outputs: # -o, --output <output> The .bcf output file # # Options: # —-fast Enable fast mode

Page 11: JIP Pipeline System Introduction

T E M P L AT E S A N D VA R I A B L E S

• Access variables and options ${variable}

• Apply filters:

• arg — ${bool|arg} ${file|arg(“>”)}

• pre / suf — ${input|suf(“.txt”)}

• name, ext, and, abs — ${input|name|ext}

Page 12: JIP Pipeline System Introduction

S I N G L E T O O L S

• Inputs, Outputs, Options

• Phases:

• init — initialise the tool and its options

• setup — perform setup using option (values)

• validate — check input files and options

• execute — execute through interpreter

Page 13: JIP Pipeline System Introduction

E X E C U T I O N

• Check all inputs (dependency aware)

• Update the DB and run the command block

• Update DB

S U C C E S S FA I L U R E

• Remove output

• Update DB

Page 14: JIP Pipeline System Introduction

G E M T O B E D

#!/usr/bin/env jip # Delegates to gem-2-bed to create BED graphs from .map files # # Usage: # gem2bed -i <input> -I <index> # # Inputs: # -i, --input <input> The .map input file (can be compressed) # -I, --index <index> The .gem index !#%begin init add_output('graph', '${input|name|re("\.map(.gz)?", ".bg")}') add_output('sizes', '${input|name|re("\.map(.gz)?", ".sizes")}') #%end !zcat -f ${input} | \ ${__file__|parent}/gem-2-bed blocks-coverage -I ${index} \ -o ${graph|ext} -T $JIP_THREADS

D O C U M E N TAT I O N

I N I T I A L I S AT I O N

E X E C U T I O N

Page 15: JIP Pipeline System Introduction

B E D 2 B I G W I G#!/usr/bin/env jip # Delegates to gem-2-bed to create BED graphs from .map files # # Usage: # bed2wig -g <graph> -s <sizes> [-o <output>] # # Inputs: # -g, --graph <graph> The graph file generated with gem-2-bed # -s, --sizes <sizes> The sizes file generated with gem-2-wig # # Outputs: # -o, --output <output> The output file name # [default: ${graph|ext}.bw] !#%begin init add_output('output', '${graph|name|ext}.bw') #%end !#%begin setup profile.threads = 1 #%end !${__file__|parent}/bedGraphToBigWig ${graph} ${sizes} ${output}

Page 16: JIP Pipeline System Introduction

P I P E L I N E S

• Inputs, Outputs, Options

• Phases

• init, setup, validate

• create pipeline

Page 17: JIP Pipeline System Introduction

G E M 2 B I G W I G

#!/usr/bin/env jip # Creates a bed graph from a .map file and converts it to wig # # Usage: # gem2wig -i <input> -I <index> # # Inputs: # -i, --input <input> The .map input file (can be compressed) # -I, --index <index> The .gem index !#%begin pipeline bed = job(temp=True).run('gem2bed', input=input, index=index) run('bed2wig', graph=bed.graph, sizes=bed.sizes)

Page 18: JIP Pipeline System Introduction

G E M 2 B I G W I G

#!/usr/bin/env jip # Creates a bed graph from a .map file and converts it to wig # # Usage: # gem2wig -i <input> -I <index> # # Inputs: # -i, --input <input> The .map input file (can be compressed) # -I, --index <index> The .gem index !#%begin pipeline bed = job(temp=True).run('gem2bed', input=input, index=index) run('bed2wig', graph=bed.graph, sizes=bed.sizes)

D O C U M E N TAT I O N

P I P E L I N E

Page 19: JIP Pipeline System Introduction

#%begin pipeline

bed = job(temp=True).run('gem2bed', input=input, index=index)

#%end

Page 20: JIP Pipeline System Introduction

#%begin pipeline

bed = job(temp=True).run('gem2bed', input=input, index=index)

run('bed2wig', graph=bed.graph, sizes=bed.sizes)

#%end

Page 21: JIP Pipeline System Introduction

D E M O

Page 22: JIP Pipeline System Introduction

M U LT I P L E X I N G

S T R E A M S

Page 23: JIP Pipeline System Introduction

M U LT I P L E X I N G A N D S T R E A M S

echo "Hello World" | \ (tee > producer_out.txt | (tee >(wc -w) | wc -l))

bash('echo "Hello World"'), output='producer_out.txt') \ | (bash('wc -l') + bash('wc -w'))

producer = bash('echo "Hello World"', output='producer_out.txt') word_count = bash("wc -w", input=producer) line_count = bash("wc -l", input=producer) producer | (word_count + line_count)

B A S H

J I P

J I P

Page 24: JIP Pipeline System Introduction

Common Questions

Page 25: JIP Pipeline System Introduction

S U B M I T S I N G L E C O M M A N D S

• The jip bash command wraps single executions

• You can run or submit

• Dry runs and multiplexing are supported

D E M O

Page 26: JIP Pipeline System Introduction

S U B M I T F O R M U LT I P L E F I L E S

• Fan-Out operations work for all tools

• Define a single input option

• Specify multiple values

• Works also for the jip bash command

D E M O

Page 27: JIP Pipeline System Introduction

W H AT W A S T H E C O M M A N D

• jip show shows job properties and the command

• jip edit loads the job command in an editor

D E M O

Page 28: JIP Pipeline System Introduction

R E S TA R T I N G A N D M O V I N G

• jip restart resubmits jobs after failure

• jip restart can also move jobs and pipelines to other queues/partitions

D E M O

Page 29: JIP Pipeline System Introduction

C U S T O M I S E L O G F I L E S

• The job profile covers stdout and stderr log files

• jip logs finds and shows log files for jobs

D E M O

Page 30: JIP Pipeline System Introduction

Q U E S T I O N S ?

Thank You