25

COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This
Page 2: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

COURSE OVERVIEW

Page 3: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

LESSON 2: BASICS OF PYTHON

Page 4: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

- Variables- Operators- Loop & Conditional- Functions??

Page 5: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

VARIABLESNumbers = Integer, float

String = ”string”

List = []

Tuple = ()

Dictionary = {A:1}

Boolean = True, False

variable

Function drink()

Let´s write your variable!

Page 6: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

VARIABLESNumbers = Integer, float

String = ”string”

List = []

Tuple = ()

Dictionary = {A:1}

Boolean = True, False

Page 7: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

VARIABLESNumbers = Integer, float

String = ”string” - List of characters

List = []

Tuple = () - read only list

Dictionary = {A:1}

Boolean = True, False

- Function type()

Page 8: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

print list # Prints complete list

print list[0] # Prints first element of the list

print list[1:3] # Prints elements starting from 2nd till 3rd

print list[2:] # Prints elements starting from 3rd element

VARIABLES

print string # Prints complete string

print string[0] # Prints first element of the string

print string[1:3] # Prints elements starting from 2nd till 3rd

print string[2:] # Prints elements starting from 3rd element

print tuple # Prints complete tuple

print tuple[0] # Prints first element of the tuple

Page 9: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

OPERATORS

Arithmetic Operators : +, - , *, /, %, **

Comparison Operators : = =, !=, <, >, <=, >=

Assignment Operators : =, +=, -=, *=, /=, %=, **=

Page 10: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

LOOPS & CONDITIONAL

Repetition : while, for //// break, continue, pass

for LOOP_VARIABLE in SEQUENCE: STATEMENTS

Separation : if, if … else, if … elif … else

if BOOLEAN EXPRESSION: STATEMENTS_1

elif BOOLEAN EXPRESSION: STATEMENTS_2

else:ALTERNATIVE STATEMENTS

Page 11: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

LOOPS & CONDITIONAL

Repetition : while, for //// break, continue, pass

for LOOP_VARIABLE in SEQUENCE: STATEMENTSif LOOP_VARIABLE in SEQUENCE:

STATEMENTSelse:

STATEMENTS

Iterations

Page 12: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

FUNCTIONSdef functionname( parameters ):

"function_docstring" statements return [expression]

A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide better modularity for your application and a high degree of code reusing.

https://www.tutorialspoint.com/python/python_functions.htm

Page 13: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

WHERE TO GET HELPWorld Wide Web• www.google.com

Developer Community• www.stackoverflow.com

Python Website (www.python.org)• Python 3 Documentation• https://docs.python.org/3/

• Python 3 Tutorials• https://docs.python.org/3/tutorial/i

ndex.html

Page 15: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

Variables

Types of variables

In [1]:

In [2]:

In [3]:

Working with lists

Out[1]:

list

insert variable: 10

Out[2]:

int

1 , 2 , 3

# defining variable and displaying variable typevariable = [1,3,4,5,]type (variable)

# The 'input' function will prompt user to enter a value.# The value will be printed upon pressing enter. variable = int(input("insert variable: "))type(variable) # Sometimes how variables are defined is important

# multiple assignments on the same line are possible a, b, c = 1,2,3print(a,b,c,sep= " , ")

Page 16: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [33]:

In [34]:

In [35]:

In [36]:

In [37]:

In [32]:

Out[33]:

str

1,2,3,4,5,6

1,2 3,4,5,6

[1, 2, 3] [5, 6]

2 6

# Defining a string as a list of individual charactersvariable = "1,2,3,4,5,6"type (variable)

print (variable)

# Displaying a subset of a list by specifying indices.# An index refers to the location of an element in a list.# Indices start at 0 in Python, i.e. printing mylist[0] will display the first element of the variable mylist.print (variable[0:3],variable[4:])

# Assigning list of numbers to variablevariable2 = [1,2,3,4,5,6]

# Printing subset of elements in list of numbersprint (variable2[0:3],variable2[4:])

# Retrieving individual list elements from their indicesstart = variable[1] # 2nd element of list stored in 'variable' (again Python starts counting at 0)end = variable[5] # 6th element of list stored in 'variable'print (start,end)

Page 17: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [38]:

Operators

In [40]:

In [41]:

In [26]:

Exercise: Try to sum one str or list with one int and see what happens

What would happen if we multiplied them?

Loops and Conditionals

[[1, 2, 3], 2]

2.0

fourfive

[1, 2, 3, 4, 5, 6]

mylist = [] # initializing empty listmylist.append([1,2,3]) # appending element to list, here the element is another list of numbersmylist.append(2) # appending element to list print(mylist)

# Addition operatorvariable = 10print(variable + 5)

# Now let's try with str variable = "four"print(variable + "five")

# A string is like a list of individual characters.# We said that str are like list, right? So what happens if we sum them using the '+' operator?variable1 = [1,2,3]variable2 = [4,5,6]print(variable1 + variable2)

Page 18: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [1]:

list( ) and range( ) function

In [28]:

In [ ]:

In [46]:

[0, 5, 10]

['1', '2', '3', '4']

[10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]

# for loop example to construct a list dynamicallyvariablelist= []for letter in [0,1,2]: #print(letter) variablelist.append(letter*5)print(variablelist)

# printing a string as a list of characterslista =list("1234")print(lista)

# using the 'range' function to automatically generate a list# here we define a list of integer values that start from 10 and go up to 100 by increments of 5 listb = list(range(10,100,5))print (listb)

Page 19: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [30]:

In [12]:

In [48]:

While

In [49]:

Conditional

0123456789

[50, 55, 60, 65, 70, 75, 80, 85, 90, 95]

True

01234

# using a for loop to print the individual values in a list of 10 consecutive integersfor value in range(10): print (value)

# Converting a range of values to a list, then printing itprint (list(range(50,100,5)))

# Comparing variables using logical operator '<' to check if one variable is greater than another onevariable =1variable2 = 5print(variable < variable2 )

# Using a while loop to count up to 5, printing the count value at every iteration count = 0while count< 5: print(count) count += 1 # This is the same as count = count + 1

Page 20: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [33]:

In [ ]:

Break, Continue, and Pass

In [14]:

Below is extra material not covered in class, try it atyour own risk ;)

Let's start coding!!

What is the temperature? 30Get some exercise outside.

Number is 1Number is 2Number is 3Number is 4Out of loop

# Small program to determine what to wear based on user-specified temperaturetemperature = int(input('What is the temperature? '))if temperature > 70: print('Wear shorts pants.')elif temperature < 70: print('Wear long pants.')else: print('You are wearing no pants!!.')

# For loop to repeat code block 10 times# Loop halted when counter variable reaches 5, forcing code to exit the loop and terminatenumber = 0for number in range(10): number = number + 1 if number == 5: break print('Number is ' + str(number)) print('Out of loop')

Page 21: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [3]:

In [4]:

In [5]:

Opening the files

In [8]:

In [9]:

Out[4]:

Bio.Seq.Seq

Out[5]:

Seq('TCATGTGACCA')

Out[9]:

dict_keys(['sample_well', 'dye', 'polymer', 'machine_model', 'run_start', 'run_finish', 'abif_raw'])

#There is one library to python that allow to work with .ab1 files. This library is Biopython and can be installed by the command: conda install biopythonfrom Bio import SeqIOfrom Bio import pairwise2from Bio.Seq import Seq

##This library allow to create a new type of variable called Bio.Seq.Seq. This variable is like one string but with extra functions like give you the complement of a given sequence.my_seq = Seq("AGTACACTGGT")type(my_seq)

my_seq.complement()

#We can open files .ab1 whit this application using the comand Seq.read and including the direction of the file in our computer.file1 = SeqIO.read("/Users/Rafa/Desktop/file1.ab1", 'abi')

#This is all the stuff that is inside of the file, it is like a diccionary of diccionaries.file1.annotations.keys()

Page 22: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [10]:

In [11]:

In [12]:

In [13]:

Out[10]:

dict_keys(['AEPt1', 'AEPt2', 'APFN2', 'APXV1', 'APrN1', 'APrV1', 'APrX1', 'ARTN1', 'ASPF1', 'ASPt1', 'ASPt2', 'AUDT1', 'B1Pt1', 'B1Pt2', 'BCTS1', 'BufT1', 'CTID1', 'CTNM1', 'CTOw1', 'CTTL1', 'CpEP1', 'DATA1', 'DATA2', 'DATA3', 'DATA4', 'DATA5', 'DATA6', 'DATA7', 'DATA8', 'DATA9', 'DATA10', 'DATA11', 'DATA12', 'DCHT1', 'DSam1', 'DySN1', 'Dye#1', 'DyeN1', 'DyeN2', 'DyeN3', 'DyeN4', 'DyeW1', 'DyeW2', 'DyeW3', 'DyeW4', 'EPVt1', 'EVNT1', 'EVNT2', 'EVNT3', 'EVNT4', 'FTab1', 'FVoc1', 'FWO_1', 'Feat1', 'GTyp1', 'HCFG1', 'HCFG2', 'HCFG3', 'HCFG4', 'InSc1', 'InVt1', 'LANE1', 'LIMS1', 'LNTD1', 'LsrP1', 'MCHN1', 'MODF1', 'MODL1', 'NAVG1', 'NLNE1', 'NOIS1', 'PBAS1', 'PBAS2', 'PCON1', 'PCON2', 'PDMF1', 'PDMF2', 'PLOC1', 'PLOC2', 'PSZE1', 'PTYP1', 'PXLB1', 'RGNm1', 'RMXV1', 'RMdN1', 'RMdV1', 'RMdX1', 'RPrN1', 'RPrV1', 'RUND1', 'RUND2', 'RUND3', 'RUND4', 'RUNT1', 'RUNT2', 'RUNT3', 'RUNT4', 'Rate1', 'RunN1', 'S/N%1', 'SCAN1', 'SMED1', 'SMLt1', 'SMPL1', 'SPAC1', 'SPAC2', 'SPAC3', 'SVER1', 'SVER2', 'SVER3', 'Scal1', 'Scan1', 'TUBE1', 'Tmpr1', 'User1', 'phAR1', 'phCH1', 'phDY1', 'phQL1', 'phTR1', 'phTR2'])

2017-04-21 21:54:21

#There are even more...file1.annotations['abif_raw'].keys()

# We can find the data when our file was created. print(file1.annotations['run_start'])

# The sequence in the file is on: file1.annotations['abif_raw']['PBAS1']# but we can acces to the raw data of the spectophotometry inside: file1.annotations['abif_raw']["DATA9"],file1.annotations['abif_raw']["DATA10"],file1.annotations['abif_raw']["DATA11"] and file1.annotations['abif_raw']["DATA12"].# each DATA list are the values for each nucleotid, to decode them the information is on: file1.annotations['abif_raw']['FWO_1']

#There is not only raw data (Sample), but also probability data. The latter tells where the peaks are (e.g. Probability.peak_index(100)) and how good they are. This will be very helpful. That data is saved here: PLOC1 = file1.annotations['abif_raw']['PLOC1']

Page 23: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [14]:

In [15]:

GATC

# On FWO_1 are decoded the differents DATA lists, DATA9, DATA10, DATA11 and DATA12 print(file1.annotations['abif_raw']['FWO_1']) # So DATA9 is G# DATA10 is A# ...

# Give them easy namesG = file1.annotations['abif_raw']["DATA9"]A = file1.annotations['abif_raw']["DATA10"]T = file1.annotations['abif_raw']["DATA11"]C = file1.annotations['abif_raw']["DATA12"] #make a list with the 4 nucleotids togetherdatamerged = []for e in range(len(G)): datamerged.append([G[e], A[e], T[e], C[e]]) # Keep only the values of the list that are in the peakspeaks=[]for pos, val in enumerate(datamerged): if pos in PLOC1: peaks.append(val) total = []for value in peaks: if value[0] > value[1] and value[0] > value[2] and value[0] > value[3]: total.append("G") if value[1] > value[0] and value[1] > value[2] and value[1] > value[3]: total.append("A") if value[2] > value[0] and value[2] > value[1] and value[2] > value[3]: total.append("T") if value[3] > value[0] and value[3] > value[1] and value[3] > value[2]: total.append("C") secuence = ""for value in total: secuence += value

Page 24: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [16]:

In [17]:

In [18]:

In [19]:

GGGGGAGCTGCATTGTTGTCAAGGCCATAGAGCCTCCCTAATTCTTTACAGTGATATCACACTCACTGATAAAAACCTCATTATCTTCTCCAGCATAGGTAAGGAAGGATATAAATCCATCTAGATGCCCAACCCACCCACTCACCTTGAGCTTGGCTAGCTCTTAGTTGGTGCACCACTTTCAGTGACAAATCTCACTTCCTGCCCTCACTGCTCTAAACCTTGTCCACTCTGTGTACTTCTGACCATTGATGTTGGCTCTGCTCTCTCCACACAGGTCACAACATTAACTCAGGAGGTTTCCCAGCTAGGGAGAGATATGAGAAGCATCATGCAACTTCTGGAAAACATCTTGTCACCTCAGCAGCCATCCCAGTTTTGTTCTCTACATCCCACCCCAATGTGTCCTTCCAGAGAAAGTTTACAGACTAGGGTGAGTTGGAGTGCTCACCAGCCTTGCCTACATTTGCAGGCAGGTGGAGCACATCTTTACCATGGTAATGTCGCCTCTGGATCCTGGAGTAGCGGTGGGAAGTTGTTATCCGCTCACAATTCCACACAACTTGCTAGCCGGAAGCAAAAAGCTAAAGCCCGTGGAGCCT

Out[18]:

[('GGGGGAGCTGCATTGTTGTCAAGGCCATAGAGCCTCCCTAATTCTTTACAGTGATATCACACTCACTGATAAAAACCTCATTATCTTCTCCAGCATAGGTAAGGAAGGATATAAATCCATCTAGATGCCCAACCCACCCACTCACCTTGAGCTTGGCTAGCTCTTAGTTGGTGCACCACTTTCAGTGACAAATCTCACTTCCTGCCCTCACTGCTCTAAACCTTGTCCACTCTGTGTACTTCTGACCATTGATGTTGGCTCTGCTCTCTCCACACAGGTCACAACATTAACTCAGGAGGTTTCCCAGCTAGGGAGAGATATGAGAAGCATCATGCAACTTCTGGAAAACATCTTGTCACCTCAGCAGCCATCCCAGTTTTGTTCTCTACATCCCACCCCAATGTGTCCTTCCAGAGAAAGTTTACAGACTAGGGTGAGTTGGAGTGCTCACCAGCCTTGCCTACATTTGCAGGCAGGTGGAGCACATCTTTACCATGGTAATGTCGCCTCTGGATCCTGGAGTAGCGGTGGGAAGTTGTTATCCGCTCACAATTCCACACAACTTGCTAGCCGGAAGCAAAAAGCTAAAGCCCGTGGAGCCT', 'GCCGGAGCTGCATTGATGTCAAGGCCATAGAGCCTCCCTAATTCTTTACAGTGATATCACACTCACTGATAAAAACCTCATTATCTTCTCCAGCATAGGTAAGGAAGGATATAAATCCATCTAGATGCCCAACCCACCCACTCACCTTGAGCTTGGCTAGCTCTTAGTTGGTGCACCACTTTCAGTGACAAATCTCACTTCCTGCCCTCACTGCTCTAAACCTTGTCCACTCTGTGTACTTCTGACCATTGATGTTGGCTCTGCTCTCTCCACACAGGTCACAACATTAACTCAGGAGGTTTCCCAGCTAGGGAGAGATATGAGAAGCATCATGCAACTTCTGGAAAACATCTTGTCACCTCAGCAGCCATCCCAGTTTTGTTCTCTACATCCCACCCCAATGTGTCCTTCCAGAGAAAGTTTACAGACTAGGGTGAGTTGGAGTGCTCACCAGCCTTGCCTACATTTGCAGGCAGGTGGAGCACATCTTTACCATGGTAATGTCGCCTCTGGATCCTGGAGTAGCGGTGGGAGGTTGTTATCCGCTCACAATTCCACTCAACATGCTAGCCGGAAGCAAAAAGCTAAAGCCTGTGGAGCCA', 1181.0, 3, 601)]

print(secuence)

sec = file1.annotations['abif_raw']['PBAS1']

# The function pairwise2 allow to compare sequences. pairwise2.align.localms(secuence, sec, 2, -1, -5, -5)

import matplotlib.pyplot as pltfrom collections import defaultdict

Page 25: COURSE OVERVIEW - Amazon S3 · from Bio import SeqIO from Bio import pairwise2 from Bio.Seq import Seq ##This library allow to create a new type of variable called Bio.Seq.Seq. This

In [20]:

In [21]:

In [ ]:

plt.plot(G, color = 'blue')plt.plot(A, color = 'red')plt.plot(T, color = 'green')plt.plot(C, color = 'yellow')plt.xlim(3000,3500)plt.ylim(0,1500)plt.show()

# More information about this library: #http://biopython.org/DIST/docs/tutorial/Tutorial.pdf