18
CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

Embed Size (px)

Citation preview

Page 1: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

CS 177 Hands-on lab with databases

 Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Page 2: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

Quiz #1

Homework #1 Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Al-Bawardy, Rasha F. 13Antonio, Dion 13Berro, Reem G. 14Chien, Yu Fung 11Dharker, Nachiket S. 12Eunkyung, An 13Gansberger, Kristen M.Gupta, Madhur V. 12Hand, Damon 12Hua, Dong 13Karim, Halima R. 11Kebede, Mikael 11Koyama, Kaori 9Kwak, Yoon I. 13Marwin, Victor M. 5Mody, Manali 10Moorjani, Priya G. 14Qukub, Dunia 12Ryan, Caitlyn E. 14Williams, Bernadette 10Yahan, Lin 12Yawo, Akrodou 6Zhou, Leming

14

Page 3: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

The International Nucleotide SequenceThe International Nucleotide SequenceDatabase CollaborationDatabase Collaboration

EBI

GenBankGenBank

DDBJDDBJ

EMBLEMBL

EMBLEMBL

Entrez

SRS

getentry

NIGNIGCIB

NCBI

NIHNIH

•Submissions•Updates •Submissions

•Updates

•Submissions•Updates

SequinBankItftp

Page 4: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

ATTGACTA

Primary vs. Derivative DatabasesPrimary vs. Derivative DatabasesACGTGC

TTGACA

CGTGAAT

TGACTA

TATAGCCG

ACGTG

C

ACGTGC

ACGTGC

TTGACA

TTGACA

TTGACA

CGTGA C

GTGA

CGTGA

ATTGACTA

ATTGACTA ATTGACTA

ATTGACTA

TATAGCCG

TATAGCCG

TATAGCCG

TATAGCCG

GenBank

TATAGCCG TATAGCCGTATAGCCGTATAGCCG

ATGA

CATT

GAGA

ATT

ATTCC GAGA

ATTCCGAGA

ATT

ATTCC GAGA

ATTCC

SequencingCenters

GAGA

ATTCC GAGA

ATTCC

UniGene

RefSeq

GenomeAssembly

Labs

Curators

Algorithms

TATAGCCGAGCTCCGATACCGATGACAA

Page 5: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

The Entrez Databases

Page 6: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

The (ever) Expanding Entrez System

Nucleotide

Protein

Structure

PubMed

PopSet

Genome

OMIM

Taxonomy

Books

ProbeSet

3D Domains

UniSTS

SNP

CDD

Entrez

UniGeneJournals

PubMedCentral

Page 7: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

Genbank

Search and retrieval of sequences

Entrez is a retrieval system for searching several linked databases. It provides access to:PubMed; Nucleotide; Protein; Structure; Genome; PopSet; OMIM; Taxonomy and more.

BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA.

Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Page 8: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

BLAST selections

Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Page 9: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

GenBank format

Page 10: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

Fasta format

Page 11: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

Sequence formats

ASN.1

DNAStrider

EMBL

Fitch

GCG

GenBank/GB

IG/Stanford

MSF

NBRF

Olsen

PAUP/NEXUS

Pearson/Fasta

Phylip

PIR/CODATA

Plain/Raw

Pretty

Zuker

- FASTA is a popular sequence format

- it also is a sequence similarity and homology search tool (similar to BLAST) used by EMBL-EBI

NOTE:

Convertible in ReadSeq (Web based)

http://bimas.dcrt.nih.gov/molbio/readseq/

http://www.hgmp.mrc.ac.uk/embnet.news/vol6_1/ForCon/forcon.html

or ForCon (stand-alone application)

Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Page 12: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein
Page 13: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

2) Go to Entrez nucleotide. Find all sequences for the following terms:

neander

Neanderthals

Neanderthal

neanderthal

neanderthal*

Homo sapiens neanderthalensis

Lab exercises

1) How many sequences are available in GenBank for Neanderthals? Depends on your search strategy …

1

0

1

1

6

6

2) Go to Entrez taxonomy. Try to find all sequences for Neanderthals!

6

Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Page 14: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

Lab exercises

4) How many nucleotide sequences are available for the house mouse Mus musculus? Try both Entrez nucleotides and Entrez taxonomy. How do you explain the difference? Entrez taxonomy

Entrez nucleotides

5) A man is found murdered in Yellowstone National Park. Few hairs of unidentified origin are recovered on the victim’s clothes. The samples arrive in the lab and DNA is isolated and sequenced:

CCATGCATATAAGCATGTACATAATATTATATTCTTACATAGGACATATTAACTCAATCTCATAATTCAT

Formulate a hypothesis regarding the origin of the recovered hairs and potential links with the killing!

Canis lupus (Gray Wolf)

5.403.701

5.458.506 (Mus musculus)

5.393.552 (house mouse)

5.458.527 (Mus musculsus OR house mouse)

Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Page 15: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

The Poliovirus Problem

VOL 297, 9 August 2002

Cello, J; Paul, A.V. & Wimmer, E.:

Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template

- they generated about 7.7 kilobases of single-stranded RNA genome based on the know genetic map

- DNA fragments were synthesized from purified oligo- nucleotides (average length 69: bases)

- the cDNA was then transcribed into highly infectious RNA Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Page 16: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

The Poliovirus Problem

17 July 2002

Weiss, R.:

Mail-Order Molecules Brew a Terrorism Debate

- mail-order oligonucleotides can be used to manufacture a deadly virus

- because they are so small, most oligos lack a “fingerprint”

- call for more control and/or institutional oversight

Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Page 17: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

The Poliovirus Problem

- search in Genbank for nucleotide sequences of the poliovirus

 

- copy about 100 bp from a sequence of your choice and paste it into the search window of blastn, is the fragment identifiable as poliovirus?

 

- if so, do a blastn search with a 90 bp, 80 bp, 70 bp … fragment

 

- what is the length of the shortest fragment still identifiable as poliovirus?

- is this fragment shorter than the average length of 69 bp used to synthesize the poliovirus?

- do these oligos have a “fingerprint” (i.e. can ‘typical’ oligos with lengths of 20-50 be assigned to a particular organism)?

Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

Are these oligos so small that they lack a “fingerprint” ??

Page 18: CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein

Homework assignment lecture #4

Explain in your own words and in simple termsthe basics of the BLAST tool!

- assignment is due on 6 Oct 2003, 3:30 PM

- send your assignment as e-mail attachment to [email protected]

(type your name and the term “homework” in the subject line)

- maximum size: 500 words

Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises