Upload
others
View
22
Download
0
Embed Size (px)
Citation preview
INTRODUCTION TO UNIX
Vivek Krishnakumar
JCVI Genomic Science and Leadership Workshop
Presented on: 05/26/2016
Overview
• What is Unix? Where is it used?
• Bioinformatics and Unix
• Paradigm of Local Computers vs Remote Servers
• Unix Command Line Interface (CLI) Issuing commands
• Brief overview of Directory structures
• Brief overview of Unix File permissions
• General Unix rules
• FIN
• Skim through Supplementary slides (will serve as reference material for hands-on)
WHAT IS UNIX?
• Portable, multi-tasking, multi-user, time-sharing operating system
• It is the leading operating system of choice for servers such as
supercomputers.
• More than 90% of the top 500 fastest computers are based on Unix.
• Available for free (Open Source) or nearly free: Unix-like OSs like Linux,
Minix, BSD, Android
• Mac computers are related to Unix because they are also based on Unix
• Depending on the purpose of the Unix machine, it may or may not have a
Desktop environment that we are familiar with on our personal computers.
• Unix uses X Window System to provide the Desktop environment.
Source: http://www.jcsystemsconsulting.com/pictures
BIOINFORMATICS AND UNIX
• Many bioinformatics core tools are written for use with Unix
• BLAST, CLUSTALW, PHRAP, etc.
• Many web applications are also supported on web servers hosted on
Unix-based machines
• Unix supports development and use of software using many
different programming languages (Python, Perl, Java, R, C, C++).
• Multiple users can log in at the same time
• A user logging in over the network can do just about anything a user
sitting in front of the computer can do.
• Which means it can multitask (run many processes in parallel).
DB
IGV
Server/Remote Computer (Unix)
Personal/Local Computer
(Mac/Windows/Unix)
Terminal
Data
filesBLAST
SSH
WEB
WEB
App
Service
Data
files
SCP
Window
APPLICATIONS AND SERVERS
Thanks to Manpreet Katari (NYU) for slide(s)
UNIX COMMAND LINE INTERFACE (SHELL)
• Users communicate with the OS via the shell
• The shell interprets the commands typed by the user on the keyboard
• Different shells are available for Unix systems
o sh - bourne shell
o csh - C shell
o bash - bourne-again shell
o zsh - Z shell
• You can type the commands directly at the shell or build scripts to accomplish certain tasks
* During the workshop hands-on sessions, we will be working with the bash shell, one of the most popular unix shells.
ISSUING COMMANDS
• The command prompt (signified by a $ symbol) requires
that you enter/type the command followed by arguments
(if necessary)
• If there is an output to the program, it usually prints it on
the screen, often referred to as standard output (stdout)
DIRECTORY STRUCTURES IN UNIX
• Directories are used to organize
files. Analogous to folders in
Windows and Mac OS.
• Important diretories:
o / - 'root’
o /home – Personal user
directories
o /bin - Common programs,
shared by the system and
users
o /lib - Library files required by
the programs
o /usr - Programs, libraries,
documentation etc. for all
user-related programs
o /mnt - Mount point for external
file systems
o /tmp - Temporary space;
cleaned up periodically and on
reboot
Reference: http://www.redhat.com/mirrors/LDP/LDP/intro-
linux/html/sect_03_01.html
Learn more: http://ccn.ucla.edu/wiki/index.php/UNIX_Permissions,
http://sc.tamu.edu/help/general/unix/unix.html
GRANULAR FILE PERMISSIONS
READ (R), WRITE (W), EXECUTE (X)
GENERAL UNIX RULES
• Unix is CaSe sEnSiTiVe
o file.txt and File.txt are not one and the same
o This applies to the commands as well (e.g.: ls is not Ls)
o In any directory, you can have only one file with a given name
• Filenames should only contain letters [A-z], numbers [0-9],
underscores [_], dots [.] and hyphens [-].
Absolutely no-spAc3s_what_so.ever
• File extensions are optional but recommended because most
programs follow some standards. Example:
o .txt - File containing plain text data
o .sh - File containing commands to be executed
o .pl/.py - Perl/Python scripts
Following slides are for your reference
To be used in the Hands-On Session(s)
GETTING HELP IN UNIX
• Unix is not a very user-friendly system
• The whatis database provides a description for each command:
• If you don't know what command to use, search by keyword:
$ apropos KEYWORD
• Similar to a 'User Manual' which you get with household appliances
and electronic devices, Unix also offers help in the form of "manual"
pages for every command
• These man pages describe all available command line options and
how each option modifies its behavior
• For help at the shell, type man followed by the name of the
command:
$ man ls
AUTO-COMPLETION & HISTORY
• Unix offers file/directory name auto-completion
o When typing out a file/directory name partially, hit the <tab> key for possible matches
• Unix shell tracks the command history (upto a certain limit, size can be controlled by environment variables).
o Get the last 5 executed commands
$ history | tail -n5
o Re-execute the 25th command
$ !25
$ cd /export
NAVIGATING THRU DIRECTORIES
• pwd - Print the present working directory (sometimes your
pwd will be visible at the prompt)
• cd DIR - Change to specified directory
• cd .. - Change directory to one level up
• cd or cd ~ - Change directory to the user home
. (current directory)
.. (parent directory)
~ (home directory)
• mkdir DIR - Make a directory
• rmdir DIR - Remove a directory (only if it is empty)
VIEWING FILE CONTENTS
• less FILE - Page through a file (alternative to more). less allows you to
navigate up/down with the arrow keys on the keyboard. space to page
down. Esc to exit
• cat FILE - Dump the entire file contents to standard out (stdout)
• wc FILE – Perform a word count on the file(s)
(line, word, byte count)
• head FILE - Show first 10 lines of a file (-n to control the number of lines)
• tail FILE - Show last 10 lines of a file
(-n to control the number of lines)
PIPING & FILE REDIRECTIONS
Unix allows serializing commands using the pipe (I) operator and
redirection of the standard input/output streams (>, >> and <)
Retrieve lines
15 through 20
of the file
>Redirect standard output
(stdout) to file
>> Append stdout to file
<Redirect standard input (stdin) to command
cat Concatenate to stdout
• Searches through a file or standard input (stdin) stream for patterns
• All matched lines are returned to standard output (stdout)
• Syntax: grep [-options] <pattern> <file name>
• Options:
o -c - Count the number of matches
o -i - Make search case-insensitive
o -v - Invert-match
o Providing search context:
-A4 - extract 4 lines after match
-B3 - extract 3 lines before match
-C4 - extract 4 lines before and after match
• Pattern (Plaintext string or Regular Expressions)
o ^ - Specify the beginning of a line
o $ - Specify the end of a line
Nice introductory article to using regular expressions with grep
http://www.cyberciti.biz/faq/grep-regular-expressions/
PATTERN SEARCHING (GREP)
CUTTING & PASTING
cut - Extract specific columns from a multi-column delimited file
• Syntax: cut [-options] FILE
• Available options:
o -f1,2,3,6 - Specify the index of the columns
o -d"," - specify the input column delimiter
o --output-delimiter="\t" - used to modify the output delimiter
paste - Join multiple files in desired order
• Syntax: paste FILE1 FILE2 ...
• It writes lines which consists of sequentially corresponding lines from each input FILE[12...]
SORTING & UNIQ-ING
• sort is a program used to sort the lines of standard input
• Syntax: sort [-options] FILE
• Several options available (investigate using man or invoke --help):
o -k2,2 - sort file by 2nd column
o -n - sort by numeric order
o -r - reverse the sort order
o -t; - specify an alternative field separator
o -u - print only unique (uniq) lines
• uniq is a program used to discard all but one successive identical lines from the input
• Syntax: uniq [-options] FILE
• Available options:o -c - prefix line with count of number of occurrences
o -d - print out only duplicate lines
• sed (stream editor) is used to modify input streams (stdin or file contents)
• Syntax: sed [-options] PATTERN FILE
• Some example commands
$ sed s/exon/CDS/ old.gff > new.gff There are four parts to this
substitute command:
s Substitute command
/../../ Delimiter
exon Search Pattern
CDS Replacement string
Changes made only to the first occurrence of the pattern on each line
• Some more examples:
$ sed s/exon/CDS/ -i old.gff # in-place change
$ sed s/exon/CDS/g -i old.gff # change globally
$ sed '1/50 s/exon/CDS' old.gff # modify first 50 lines
• Enclose the substitute command within quotes when dealing with complex
patterns
Comprehensive sed manual: http://www.grymoire.com/Unix/Sed.html
STRING SUBSTITUTION USING SED
FINDING/LOCATING FILES
• Syntax: find PATH EXP
• Searches recursively through all subfolders
• Options:
o -name specify file/folder name
o -iname for case insensitive search
o -type f finds only files and -type d only folders
o -print will print out that path of the file(s) found
o -exec allows you to execute a command on the files found
• Examples
$ find /home/train01 -name ”file.txt”
$ find . -type f -iname "*.sh"
$ find . -name "rc.conf" -print
$ find . -name "*.sh" \
-exec chmod +x '{}' \;
USEFUL KEYBOARD SHORTCUTS
Manipulate the current command
Ctrl + A Go to the beginning of the command prompt line
Ctrl + E Go to the end of the command prompt line
Ctrl + L Clears the screen, similar to the ‘clear’ command
Ctrl + U Clears the line before the cursor position
Ctrl + K Clears the line after the cursor position
Ctrl + W Deletes the word before the cursor position
Search
Ctrl + R Lets you search through previously used commands
Job Control
Ctrl + C Kills the currently running job
Ctrl + Z Puts the current job into a suspended state
Ctrl + D Exits the current shell
JOB CONTROL
• The jobs command shows you all the jobs running in the current terminal (with status info)
$ jobs
[1]- Stopped vi run.sh
[2]+ Stopped less file.txt
• Each job is given a number. They can be run in the background or foreground:
$ bg 2 # Run in bg and gives you control of the shell
$ fg 2 # Run in the foreground
• Launch a job in the background directly like so:
$ run.sh &
• List all running jobs (filter by user if necessary):
$ ps -u train01
PID TTY TIME CMD
19231 pts/21 00:00:00 vi
19233 pts/21 00:00:00 less
• Kill any job (by PID, name or job number)
$ kill 19233
$ kill %2
$ killall less
QUESTIONS?