24
Genome Revolution: COMPSCI 004G 2.1 Bioinformatics Vocabulary Processing, analyzing, experimenting with data Where does the data come from? How do we get it? What does it mean? What do we do with it? From nucleotide to protein to gene Identification is important Annotation is important

Genome Revolution: COMPSCI 004G 2.1 Bioinformatics Vocabulary l Processing, analyzing, experimenting with data Where does the data come from? How do

Embed Size (px)

Citation preview

Genome Revolution: COMPSCI 004G 2.1

Bioinformatics Vocabulary

Processing, analyzing, experimenting with data Where does the data come from? How do we get it? What does it mean? What do we do with it?

From nucleotide to protein to gene Identification is important Annotation is important

Genome Revolution: COMPSCI 004G 2.2

What does DNA (data) look like?

TGAAC v ACTTG Which direction is right?

What is a base-pair? nucleotide?

What is a protein, how coded? Identification?

What is an amino acid? Codon? Coding?

Why are proteins important? Finding? Using?…

http://www.blc.arizona.edu/Molecular_Graphics/DNA_Structure/DNA_Tutorial.HTML

Genome Revolution: COMPSCI 004G 2.3

How do we get CGATC into software?

http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/sequencing.html

Genome Revolution: COMPSCI 004G 2.4

From Shotgun to Gene

Comparing two approaches HGP: human genome project Celera Genomics

Why was there a race? Is the race over? Who owns the data? What public good does the data serve?

Should scientists be concerned about public policy? Was the Manhattan project like the HGP?

Genome Revolution: COMPSCI 004G 2.5

What is a program? What is code?

Instructions in a language a computer executes Languages have different characteristics,

strengths, weaknesses Scheme, BASIC, C++, Fortran, Java, Perl,

PHP, …

Computer executes one instruction at a time Memory and state of machine change Execute the next instruction Repeat Stop, run out of memory, pull plug, …

Genome Revolution: COMPSCI 004G 2.6

From browser to genome analysis

Netscape, first widely distributed browser Who wrote it? What operating systems did it run on? What does it mean for a program to run?

When you execute a Google query what happens? Where does code run? How do you see the results?

Search at NCBI, jimwatsonsequence, … Where does the code execute?

Genome Revolution: COMPSCI 004G 2.7

Writing a program

Create the program using a computer language Design, test, document, maintain, …

Test and debug the program Does the program do what you want? How do you know what the program

does? How do you fix it? What skills are needed?

Genome Revolution: COMPSCI 004G 2.8

More on understanding programs

You write code in Java, or Perl, or C++ or php or … The code must run/execute somewhere You must understand what it does (how?)

• •

In your mind and on paper simulate/understand computer’s execution of your code What you wrote, not what you meant How do you make a drawing?

Genome Revolution: COMPSCI 004G 2.9

Creating a Program Specify the problem

remove ambiguities identify constraints

Develop algorithms, design classes, design software architecture

Implement program revisit design test, code, debug revisit design

Documentation, testing, maintenance of program

From ideas to electrons

Genome Revolution: COMPSCI 004G 2.10

Writing and Understanding Java

Language independent skills in programming What is a loop, how do you design a

program? What is an array, how do you access files?

However, writing programs in any language requires understanding the syntax and semantics of the programming language Syntax is similar to rules of spelling and

grammar:• i before e except after c• Two spaces after a period, then use a capital letter

Genome Revolution: COMPSCI 004G 2.11

Syntax and Semantics

Semantics is what a program (or English sentence) means You ain’t nothing but a hound dog. La chienne de ma tante est sur votre

tete.

At first it seems like the syntax is hard to master, but the semantics are much harder Natural languages are more forgiving

than programming languages.

Genome Revolution: COMPSCI 004G 2.12

Toward an Understanding of Java Traditional first program, doesn’t convey

power of computing but it illustrates basic components of a simple program

public class SayHello {

// traditional first program

public static void main(String[] args) { System.out.println("Hello World!"); }}

This program must be edited/typed, compiled and executed

Genome Revolution: COMPSCI 004G 2.13

How Things Work: PrintLots.java

public class PrintLots { // … public void once(){ twice(); twice(); } public static void main(String[] args){ PrintLots printer = new PrintLots(); printer.once(); }}

Genome Revolution: COMPSCI 004G 2.14

Java Vocabulary

Variable, object, identifier, method, call Name of something: object or method The car starts, the dog barks, I speak

Invoke or call method: method lives in object (or in a class) An object is an instance of a class My car is a Volvo 850, yours is a BMW

… My car starts, yours stops:

v850.start();

Genome Revolution: COMPSCI 004G 2.15

Methods/Functions can return values

What does the square root function do? When called with parameters of 4, 6.2, -

1 What does the method getGcount() return?

public class DNAstuff { public int getGcount(String dna) { int total = 0; for(int k=0; k < dna.length(); k++){ if (dna.charAt(k) == 'g'){ total = total + 1; } } return total; }}

Genome Revolution: COMPSCI 004G 2.16

Lydia Kavraki Awards

Grace Murray Hopper Brilliant 10

"I like to work on problems that will generally improve the quality of our life,"

What's the thing you love most about science?

“Working with students and interacting with people from diverse intellectual backgrounds. Discovery and the challenge of solving a tough problem, especially when it can really affect the quality of our lives. I find the whole process energizing.”

Genome Revolution: COMPSCI 004G 2.17

John Kemeny (1926-1982) Invented BASIC, assistant to

Einstein, Professor and President of Dartmouth

"If you have a large number of unrelated ideas, you have to get quite a distance away from them to get a view of all of them, and this is the role of abstraction."

"...it is the greatest achievement of a teacher to enable his students to surpass him."

Genome Revolution: COMPSCI 004G 2.18

Anatomy of for-loop

String s = new String("AGTCCG");

String rs = new String("");for(int k=0; k < 3; k++){ rs = rs + s.charAt(k);}

Initialization happens once

Loop test evaluated If true body executes If false skip after loop

After loop body, increment executed and test re-evaluated

What should be true about test?

What about body? What about together?

Genome Revolution: COMPSCI 004G 2.19

Program Style People who use your program don’t read your code

You’ll write programs to match user needs

People who maintain or modify your program do read code Must be readable, understandable without you next

door Use a consistent programming style, adhere to

conventions

Identifiers are names of functions, parameters, (variables, classes, …) Sequence of letters, numbers, underscore __ characters Cannot begin with a number (we won’t begin with __) big_head vs. BigHead, we’ll use AlTeRnAtInG format Make identifiers meaningful, not droll and witty

Genome Revolution: COMPSCI 004G 2.20

Equality of values and objects

int x = 3*12;

if (x == 36) {is-executed}

String s = new String("genetic");

String t = s.substring(0,4);

if (t == "gene") {not executed}

if (t.equals("gene")) {is-executed}

Primitive types are boxes Object types are labels on boxes

If we don't call new there's no box for the label No box is called null, it means no object referred to

or referenced by variable/pointer/reference

Genome Revolution: COMPSCI 004G 2.21

s t

Objects and values

Primitive variables are boxes think memory location with value

Object variables are labels that are put on boxesString s = new String("genome");

String t = new String("genome");

if (s == t) {they label the same box}

if (s.equals(t)) {contents of boxes the same}

What's in the boxes? "genome" is in the boxes

Genome Revolution: COMPSCI 004G 2.22

Objects, values, classes

For primitive types: int, char, double, boolean Variables have names and are themselves boxes

(metaphorically) Two int variables assigned 17 are equal with ==

For object types: String, Sequence, others Variables have names and are labels for boxes If no box assigned, created, then label applied to

null Can assign label to existing box (via another label) Can create new box using new

Object types are references or pointers or labels to storage

Genome Revolution: COMPSCI 004G 2.23

Don Knuth (Art of Programming)“My feeling is that when we

prepare a program, it can be like composing poetry or music; as Andrei Ershov has said, programming can give us both intellectual and emotional satisfaction, because it is a real achievement to master complexity and to establish a system of consistent rules.”

“We have seen that computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty.”

Genome Revolution: COMPSCI 004G 2.24

Ada Lovelace, 1816-1853 Daughter of Byron,

advocate of work of Charles Babbage, designer of early “computer” (the Analytical Engine)

Made Babbage’s work accessible

“It would weave algebraic patterns the way the Jacquard loom weaved patterns in textiles”

Tutored in mathematics by Augustus de Morgan

Marched around the billiard table playing the violin

Ada is a notable programming language