17

Click here to load reader

Part 6 of "Introduction to linux for bioinformatics": Productivity tips

Embed Size (px)

DESCRIPTION

This is part 6 of the training "Introduction to linux for bioinformatics". Here we show basic tips to become rapidly more efficient on the command line. Interested in following this training session? Please contact me at http://www.jakonix.be/contact.html

Citation preview

Page 1: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof.

Introduction to Linux for Bioinformatics

Productivity

Joachim Jacob5 and 12 May 2014

Page 2: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

2 of 17

Multiple commands

In bash, commands put on one line when be separated by “;”

$ wget http://homepage.tudelft.nl/19j49/t-SNE_files/tSNE_linux.tar.gz ; tar xvfz tSNE_linux.tar.gz

Page 3: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

3 of 17

Multiple commands

Commands on a oneliner can also be separated by && or ||.

&& Only execute the command if the preceding one finished correctly.

$ curl corz.org/ip && echo '\n'

|| (not a pipe!) - Inverse of the above. Only execute the command if the preceding one did not succesfully ends.

Page 4: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

4 of 17

Piping a list of files with xargs

A pipe reads the output of a command.

Some commands requires the file name to be passed, instead of the content of the file. E.g. this doesn't work:

$ ls | less

$ ls | fileUsage: file [-bchikLlNnprsvz0] [--apple] [--mime-encoding] [--mime-type] [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ... file -C [-m magicfiles] file [--help]

Page 5: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

5 of 17

Piping a list of files with xargs

Some commands requires the file name to be passed, instead of the content of the file.

xargs passes the output of a command as a list of arguments to another program.

$ ls | xargs filebin: directorybuddy.sh: Bourne-Again shell script, ASCII text executableCompression_exercise: directoryDesktop: directoryDocuments: directoryDownloads: directoryFastQValidator.0.1.1.tgz: gzip compressed data, from Unix, last modified: Fri Oct 19 16:44:23 2012

Page 6: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

6 of 17

.bashrc

~/.bashrc is a hidden configuration file for bash in your home.

It configures the prompt in your terminal.It contains aliases to commands.

Page 7: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

7 of 17

alias example

When you enter a first word on the command line that bash does not recognize as a command, it will search in the aliases for the word.

You can specify aliases in .bashrc. An example:

Page 8: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

8 of 17

Alias example

Some interesting aliases

alias ll='ls -lh'alias dirsize="du -sh */"alias uncom='grep -v -E "^\#|^$"'alias hosts="cat /etc/hosts"alias dedup="awk '! x[$0]++' "

Aliases are perfectly suited for storing one-liners: find some athttps://wikis.utexas.edu/display/bioiteam/Scott%27s+list+of+linux+one-liners

Page 10: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

10 of 17

Finding stuff: locate

Extremely quick and convenient:locate

However, it won't find the newest files you created. First you need to update the database by running:updatedb

It accepts wildcards. Example:$ locate *.sam

Bonus: How to filter on a certain location?

Page 11: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

11 of 17

Finding stuff: find

More elaborate tool to find stuff:$ find -name alignment.sam

Find won't find without specifying options:-name : to search on the name of the file-type : to search for the type: (f)ile, (d)irectory, (l)ink-perm : to search for the permissions (111 or rwx)…

This is the power tool to find stuff.

Page 12: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

12 of 17

Finding stuff: find

The most powerful option of find:-exec Execute a command on the found entities.

Page 13: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

13 of 17

Finding stuff: find

The most powerful option of find:-exec Execute a command on the found entities.

$ find -name \*.gz ./DRR000542_2.fastq.subset.gz./DRR000542_1.fastq.subset.gz./DRR000545_2.fastq.subset.gz./DRR000545_1.fastq.subset.gz$ find -name \*.gz -exec gunzip {} \;$ lsDRR000542_1.fastq.subset DRR000545_1.fastq.subsetDRR000542_2.fastq.subset DRR000545_2.fastq.subset

Page 14: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

14 of 17

Command substitution in bash

In bash, the output of commands can be directly stored in a variable. Put the command between back-ticks.

$ test=`ls -l`$ echo $testtotal 7929624 -rw-rw-r-- 1 joachim joachim 15326 May 10 2013 0538c2b.jpg -rw-rw-r-- 1 joachim joachim 4914797 Nov 8 16:15 18d7alY

Page 15: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

15 of 17

Command substitution in bash

A variable can also contain a list. A list contains several entities (e.g. files).

Extracting first 100k lines from compressed text file:

for filename in `ls DRR00054*tar.gz`; \ do zcat $filename | head -n 1000000 \

>${file%.gz}.subset; done

The output of ls is being put in a list. 'for' assigns one after the other the name of the file to the variable file. This variable is used in the

oneliner zcat | head.

Page 16: Part 6 of "Introduction to linux for bioinformatics": Productivity tips

16 of 17

Keywords.bashrc

;

alias

prompt

locate

find

Command substitution

Write in your own words what the terms mean