Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
BMMB 597D - Practical Data Analysis for Life Scientists
Week 15 - Lecture 29
István Albert
BMB, Bioinformatics
Databases and Object Persistence
• Maintaining data across several runs
• Storing data in a relational database
Object persistence with shelve()
Read back the object in a different progam
You can save and recall just about any python object
Relational databases
• Data is organized in tables (rows) and described by type (integer, float, string)
• Standardized query language called SQL• Some people say SQL stands for Structured
Query Language:
True with some caveats:
– SQL is not actually structured– SQL is not just for query– SQL is not a programming language
Most common model: client - server
• There is a server (for example UCSC MySQL server) that can be connected to via the MySQL client
• You can send a query to the servers
• The MySQL client needs to be installed on your server
• You need to know how the data is modeled at the UCSC servers.
MySQL query example
*************************** 1. row ***************************bin: 707name: NM_005378chrom: chr2strand: +txStart: 15998133txEnd: 16004580
cdsStart: 15999637cdsEnd: 16003670
exonCount: 3exonStarts: 15998133,15999520,16003065,exonEnds: 15998316,16000427,16004580,
id: 0name2: MYCN
cdsStartStat: cmplcdsEndStat: cmplexonFrames: -1,0,1,
mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select * from refGene where name="NM_005378"'
Sqlite relational database in Python
We run this only once to initialize the database table that stores our data
Populate the database
Query the database
Populate your database with the gff file
For simplicity we call the feature type name
Query your database
Exercise: explore various queries
You may want to google: SQL tutorial
BMMB 597D - Practical Data Analysis for Life Scientists
Week 15 - Lecture 30
István Albert
BMB, Bioinformatics
The Zen of Python
import this
Write and run: import this
About the course
• I hope it was interesting
• I hope it was useful
• Future plans expand on the subjects, do more difficult problems in a second lecture series
• This course may or may not be offered in the future.
• Depends on you the potential audience, advisors and administration.
• If you liked it mention this to your advisor/commitee members etc.
Computation == Thought
Advice
If you know what an object ISthen you will know what it DOES
Print it. Check its type. Check its content.
Simplicity – the essential ingredient
• It always easier to create a processes than to comprehend it!
• Keep it simple! Don’t repeat yourself.
• Don’t be afraid to toss the program away. If you can’t debug it, toss it away and start fresh with a slightly different perspective
Visit Biostar – over 1000 questions!http://biostar.stackexchange.com/
Ask your questions there! We’ll try to build it into an extensive knowledge base!
Bioinformatics with Python
• BioPython – has parsers to a large number of bioinformatics formats
• PyFasta – is able to indexes very large files, you can quickly access any part of a genome
• BX-python – very good interval handling data structures
• Pygr – graph representation for biological data
• Pycogent - evolutionary algorithms
Search for the name to find them
Optimizing programs
• All programs have bottlenecks
• You only need to optimize if there is a problem (or you foresee one)
• Don’t optimize prematurely
• Make it work make it right make it fast
How to collect profiling information from a program
Display the statistics
From XKCD: http://xkcd.com/353/
Thanks!