Week 15 - Lecture 29 · Week 15 - Lecture 30 István Albert BMB, Bioinformatics. The Zen of Python...

Preview:

Citation preview

BMMB 597D - Practical Data Analysis for Life Scientists

Week 15 - Lecture 29

István Albert

BMB, Bioinformatics

Databases and Object Persistence

• Maintaining data across several runs

• Storing data in a relational database

Object persistence with shelve()

Read back the object in a different progam

You can save and recall just about any python object

Relational databases

• Data is organized in tables (rows) and described by type (integer, float, string)

• Standardized query language called SQL• Some people say SQL stands for Structured

Query Language:

True with some caveats:

– SQL is not actually structured– SQL is not just for query– SQL is not a programming language

Most common model: client - server

• There is a server (for example UCSC MySQL server) that can be connected to via the MySQL client

• You can send a query to the servers

• The MySQL client needs to be installed on your server

• You need to know how the data is modeled at the UCSC servers.

MySQL query example

*************************** 1. row ***************************bin: 707name: NM_005378chrom: chr2strand: +txStart: 15998133txEnd: 16004580

cdsStart: 15999637cdsEnd: 16003670

exonCount: 3exonStarts: 15998133,15999520,16003065,exonEnds: 15998316,16000427,16004580,

id: 0name2: MYCN

cdsStartStat: cmplcdsEndStat: cmplexonFrames: -1,0,1,

mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select * from refGene where name="NM_005378"'

Sqlite relational database in Python

We run this only once to initialize the database table that stores our data

Populate the database

Query the database

Populate your database with the gff file

For simplicity we call the feature type name

Query your database

Exercise: explore various queries

You may want to google: SQL tutorial

BMMB 597D - Practical Data Analysis for Life Scientists

Week 15 - Lecture 30

István Albert

BMB, Bioinformatics

The Zen of Python

import this

Write and run: import this

About the course

• I hope it was interesting

• I hope it was useful

• Future plans expand on the subjects, do more difficult problems in a second lecture series

• This course may or may not be offered in the future.

• Depends on you the potential audience, advisors and administration.

• If you liked it mention this to your advisor/commitee members etc.

Computation == Thought

Advice

If you know what an object ISthen you will know what it DOES

Print it. Check its type. Check its content.

Simplicity – the essential ingredient

• It always easier to create a processes than to comprehend it!

• Keep it simple! Don’t repeat yourself.

• Don’t be afraid to toss the program away. If you can’t debug it, toss it away and start fresh with a slightly different perspective

Visit Biostar – over 1000 questions!http://biostar.stackexchange.com/

Ask your questions there! We’ll try to build it into an extensive knowledge base!

Bioinformatics with Python

• BioPython – has parsers to a large number of bioinformatics formats

• PyFasta – is able to indexes very large files, you can quickly access any part of a genome

• BX-python – very good interval handling data structures

• Pygr – graph representation for biological data

• Pycogent - evolutionary algorithms

Search for the name to find them

Optimizing programs

• All programs have bottlenecks

• You only need to optimize if there is a problem (or you foresee one)

• Don’t optimize prematurely

• Make it work make it right make it fast

How to collect profiling information from a program

Display the statistics

From XKCD: http://xkcd.com/353/

Thanks!