22
An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

An Introduction to Python and Its Use in Bioinformatics

Csc 487/687 Computing for Bioinformatics

Page 2: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Scopes

Scopes define the “visibility” of a variable

Variables defined outside of a function are visible to all of the functions within a module (file)

Variables defined within a function are local to that function

To make a variable that is defined within a function global, use the global keyword

Ex 1:

x = 5

def fnc():

x = 2

print x,

fnc()

print x

>>> ?

Ex 2:

x = 5

def fnc():

global x

x = 2

print x,

fnc()

print x

>>> ?

Page 3: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Modules

Why use?– Code reuse– System namespace partitioning (avoid name clashes)– Implementing shared services or data

Module– A file containing Python definitions and statements. – The file name is the module name with the suffix .py

appended – Definitions from a module can be imported into other

modules or into the main module

Page 4: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Modules

How to structure a Program– One top-level file

Main control flow of program

– Zero or more supplemental files known as modules

Libraries of tools

Page 5: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Modules - Import

Import – used to gain access to tools in modulesEx:

contents of file b.pydef spam(text):

print text, 'spam'

contents of file a.pyimport bb.spam('gumby')

Page 6: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Modules - Import

Within a module, the module's name (as a string) is available as the value of the global variable _ _name_ _ – >>> import b– >>>print b._ _name_ _– ?– spam=b.spam– spam(“How are you”)

Page 7: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

The Module Search Path

The current directory The list of directories specified by the

environment variable PYTHONPATH When PYTHONPATH is not set, or when the

file is not found there, the search continues in an installation-dependent default path

Page 8: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Standard Modules

Python Library Reference Built into the interpreter

– >>> import sys – >>> sys.ps1 – '>>> ' – >>> sys.ps2– '... ' – >>> sys.ps1 = ‘L> ' – L> print ‘Hello!' – Yuck! – L> sys.path.append(“C:\\”)

Page 9: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Python Documentation Sources

#comments In-file documentation The dir function Lists of attributes

available on objects Docstrings:__doc__ In-file documentation

attached to objects

Page 10: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

The Dir() Function

used to find out which names a module defines. It returns a sorted list of strings.

dir() does not list the names of built-in functions and variables. If you want a list of those, they are defined in the standard module __builtin__.

Page 11: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

The Dir() Function Example

The contents of fibo.py

def fib(n): a, b = 0, 1 while b < n:

print b, a, b = b, a+b

def fib2(n): result = []

a, b = 0, 1 while b < n:

result.append(b) a, b = b, a+b return result

Page 12: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

The Dir() Function Example

>>> import fibo, sys >>> dir(fibo) ['__name__', 'fib', 'fib2'] >>> dir(sys) ['__displayhook__', '__doc__', '__excepthook__', '__name__',

'__stderr__', '__stdin__', '__stdout__', '_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder', 'callstats', 'copyright', 'displayhook', 'exc_clear', 'exc_info', 'exc_type', 'excepthook', 'exec_prefix', 'executable', 'exit', 'getdefaultencoding', 'getdlopenflags', 'getrecursionlimit', 'getrefcount', 'hexversion', 'maxint', 'maxunicode', 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2', 'setcheckinterval', 'setdlopenflags', 'setprofile', 'setrecursionlimit', 'settrace', 'stderr', 'stdin', 'stdout', 'version', 'version_info', 'warnoptions']

Page 13: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

The Dir() Function Example(2)

>>> import __builtin__ >>> dir(__builtin__) ['ArithmeticError', 'AssertionError', 'AttributeError', 'DeprecationWarning', 'EOFError',

'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FloatingPointError', 'FutureWarning', 'IOError', 'ImportError', 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'OverflowWarning', 'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError', 'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UserWarning', 'ValueError', 'Warning', 'WindowsError', 'ZeroDivisionError', '_', '__debug__', '__doc__', '__import__', '__name__', 'abs', 'apply', 'basestring', 'bool', 'buffer', 'callable', 'chr', 'classmethod', 'cmp', 'coerce', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'execfile', 'exit', 'file', 'filter', 'float', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'intern', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'long', 'map', 'max', 'min', 'object', 'oct', 'open', 'ord', 'pow', 'property', 'quit', 'range', 'raw_input', 'reduce', 'reload', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'unichr', 'unicode', 'vars', 'xrange', 'zip']

Page 14: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Python Docstrings

Provide a convenient way of associating documentation with Python modules, functions, classes, and methods.

An object's docsting is defined by including a string constant as the first statement in the object's definition.

Docstrings can be accessed from the interpreter and from Python programs using the "__doc__" attribute

Page 15: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

DocString Example

Ex: b.py# Internal comment"""Module Docstring comment """def fn(): """Function Docstring comment """

>>> print b.__doc__Module Docstring comment>>> print b.fn.__doc__Function Doctring comment

Page 16: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Class Definition Syntax

class ClassName:

<statement-1>

. . .

<statement-N>

Page 17: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Class Objects

Attribute references– Valid attribute names are all the names that were

in the class's namespace when the class object was created.

Instantiation uses function notation

Just pretend that the class object is a parameterless

function that returns a new instance of the class

Page 18: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Inheritance

class DerivedClassName(BaseClassName): <statement-1> . . . <statement-N>

The name BaseClassName must be defined in a scope containing the derived class definition

class DerivedClassName(modname.BaseClassName):

Page 19: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Multiple Inheritance

class DerivedClassName(Base1, Base2, Base3):

<statement-1>

. . .

<statement-N>

Page 20: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Iterators

for element in [1, 2, 3]: print element for element in (1, 2, 3): print element for key in {'one':1, 'two':2}: print key for char in "123": print char for line in open("myfile.txt"): print line

Page 21: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Iterators

>>> s = 'abc' >>> it = iter(s) >>> it <iterator object at 0x00A1DB50> >>> it.next() 'a‘ >>> it.next() 'b‘ >>> it.next() 'c‘ >>> it.next() Traceback (most recent call last): File "<stdin>", line 1, in ? it.next() StopIteration

Page 22: An Introduction to Python and Its Use in Bioinformatics Csc 487/687 Computing for Bioinformatics

Debugging

Can use print statements to “manually” debug

Can use debugger in PythonWin