Shell Scripting Basics Arun Sethuraman. What’s a shell? Command line interpreter for Unix Bourne (sh), Bourne-again (bash), C shell (csh, tcsh), etc Handful

  • View
    214

  • Download
    2

Embed Size (px)

Text of Shell Scripting Basics Arun Sethuraman. What’s a shell? Command line interpreter for Unix Bourne...

  • Slide 1
  • Shell Scripting Basics Arun Sethuraman
  • Slide 2
  • Whats a shell? Command line interpreter for Unix Bourne (sh), Bourne-again (bash), C shell (csh, tcsh), etc Handful of commands Text mining made easy!
  • Slide 3
  • Before we get started Unix/Mac Users: Open a terminal Windows Users: Should have installed VMware Player, and downloaded the virtual machine with Unix pre-loaded on it (else do it now!)
  • Slide 4
  • VMware Player Basics Allows creating/playing virtual machines We will use a standalone version of GNU/Linux called SliTaz, which is very minimalist (< 40 mb), but should work for all our exercises. Download all example files from my website: www.sites.google.com/site/arunsethuraman1/teaching instead of from Blackboard. www.sites.google.com/site/arunsethuraman1/teaching Save state of virtual machine, suspend, restart, etc. Switch environments using CTRL+ALT File sharing is a little complicated so before you submit your assignment for next week, VMware users please email me and stop by my office with your laptop to submit it (unless you can get Gmail to work without any glitches inside Midori).
  • Slide 5
  • Slide 6
  • Slide 7
  • Working at the prompt The prompt refers to Unixs native command line interface. Your prompt should look something like: username@prompt:~$ Prompt commands are similar to python scripts can specify variables, run one-liner commands, specify entire program flows, etc.
  • Slide 8
  • Unix 101 Try: man ls pwd clear Ctrl+C echo ps cat tail head cd mkdir rm cp mv cal kill vi/vim find set who
  • Slide 9
  • Piping Piping (|) refers to sequentially running multiple commands at one go. For eg. Say I want to read a file, then print only the last line of the file, try: cat example1.txt | tail n 1 ls | grep exam cat example4.txt | head Important: Piped commands only work on the output of the previous command!
  • Slide 10
  • Regular Expressions Describe a pattern (sequence of characters) [A-Z]*, [a-z]* [0-9]*, [0-9]\{n\} Escape (special) characters start with \ ^ - start of a line $ - end of a line
  • Slide 11
  • Examples Eg. {bicycle, bidirectional, biology, binary, bigotry, bill, big, bin, bionic, } Eg. {Sunday, Monday, , Saturday} Eg. {121, 123, 124, , 129} example3.">
  • Example 3 Your first shell script! Copy example3.sh to your folder. Explore its contents: #!/bin/sh sed s/a/A/g s/b/B/g example2.txt > example3.txt Execute this script using./example3.sh Oops what happened here?
  • Slide 16
  • Permissions in Unix Unix has three permission/file access modes for all files read (r), write (w), and execute (x). Need to specify permissions explicitly for executables. Try chmod +x example3.sh, then try./example3.sh
  • Slide 17
  • Example 3 contd. Add script to change all small letters to capital letters in example2.txt and save it as a new file, example3.txt Execute it in the command line. Write a script to change find all numbers, and replace them with [ref].
  • Slide 18
  • Example 4 awk Syntax: awk { } Used to mine column formatted data. Columns denoted by $ Copy example4.txt to your folder awk to print only the third column of the file and save it to awk to print the 4 th and 5 th columns, separated by a tab character to a new file
  • Slide 19
  • Example 5 a FASTA file Copy example5.fasta from /usr/home to your folder Explore its contents what is the FASTA file format? What does it contain? Do you see a pattern? Now use any of the commands we just learned to extract only the gene-ID from the FASTA file. Print it. Count the number of AC repeats, save to a file Save only the first 5 lines in example5.fasta to
  • Slide 20
  • Example 6 Executing commands in Shell What is BLAST? Write a shell script to: BLAST against all nucleotide BLAST databases. Save output of BLAST to a separate file call it What hits do you get? Explore the BLAST output, pull out only gene IDs for all your hits with e value = 0.0, and with Genbank accessions (gb), save it to a new file HINT: Youll notice that there are multiple IDs, separated by | to tell awk to use this as a delimiter, use awk BEGIN { FS=|}; HINT: To sort a list, use sort function
  • Slide 21
  • Slide 22
  • Example 7 Advanced scripts (Assignment) Write a python script to pull all gene IDs from, look for these gene IDs against NCBI and obtain all hits, save it to a file. Execute this python script, then parse out only protein ids (gene/protein=) values from it using a shell script into a separate file. Copy all these protein IDs (they should be Genbank accession IDs), paste into the query at www.pantherdb.org, select all species on the list, add PANTHER-GO-Slim Biological Process to your columns. www.pantherdb.org
  • Slide 23
  • Slide 24
  • Assignment (contd.) Save the output of PANTHER as a file. Now parse this file using grep/sed/awk to print only the GO terms they should be separated by ; Make a unique list of these GO terms by using the uniq function, save this to a final assignment submission file. HINT: Prior to pulling unique values, try replacing the ; values with something else, say a newline character \n.