Upload
rodney-sparks
View
213
Download
0
Embed Size (px)
Citation preview
hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD 100mus pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD 100 M : :: PS RL :LCD :I V SQEF AH VLA S: R Q : :SP TF Q:L : Y ::: : :L
hum TZF p 92 PLQEAARALGVQSLEEACW------RARGD---RAKKPDPG----------------LKKHQEEPEKPSRNPERELGDPGEKQKP--------------- 151hum pLZF p 101 DLLYAAEILEIEYLEEQCLKMLETIQASDDNDTEATMADGGAEEEEDRKARYLKNIFISKHSSEESGYASVAGQSLPGPMVDQSPSVSTSFGLSAMSPTK 200mus pLZF p 101 DLLYAAEILEIEYLEEQCLKILETIQASDDNDTEATMADGGGEEEDDRKARYLKNIFISKHSSEESGYASVAGQSLPGPMVDQSPSVSTSFGLSAMSPTK 200 L AA L :: LEE C :A D A D G : KH E : : L P Q P
hum TZF p 152 EQVSRTGGREQEMLH-KHSPPRG--RPEMAG-----ATQEAQQEQTRSKEKRLQ-AP------VG--------QRGADG-----KHGVLTWLRENPGGSE 223hum pLZF p 201 AAVDSLMTIGQSLLQGTLQPPAGPEEPTLAGGGRHPGVAEVKTEMMQVDEVPSQDSPGAAESSISGGMGDKVEERGKEGPGTPTRSSVITSARELHYGRE 300mus pLZF p 201 AAVDSLMSIGQSLLQGTLQPPAGPEEPTLAGGGRHPGVAEVKMEMMQVDEAPCQDSPGAAESSISGGMGDKFEERSKEGPGTPTRRSVITSARELHYGRE 300 V Q :L: PP G P :AG E : E : E Q :P : :R :G : V:T RE G E
hum TZF p 224 ESLRKLPGPLP----PAGSLQTSVTP--RP--SWAEAP----WLVGGQP-ALWSILLMPPRYGIPFYHST-----PTTGAWQEVWR-----------EQR 294hum pLZF p 301 ESAEQVPPPAEAGQAPTGRPEHPAPPPEKHLGIYSVLPNHKADAVLSMPSSVTSGLHVQPALAVSMDFSTYGGLLPQGFIQRELFSKLGELAVGMKSESR 400mus pLZF p 301 ESGEQLSPPVEAGQGPPGRQEPLAPPVEKHLGIYSVLPNHKADAVLSMPSSVTSGLHVQPALAVSMDFSTYGGLLPQGFIQRELFSKLGELAVGMKAESR 400 ES :: P P G : P : : P V P :: S L : P : ST P :E: E R
hum TZF p 295 ----------IPLSLN--------APKGLWSQ----------N-----Q--LASSSPTPGSLP-QGPAQLSP-GEMEESDQGHTGALAT-----CAG--- 349hum pLZF p 401 TIGEQCSVCGVELPDNEAVEQHRKLHSGMKTYGCELCGKRFLDSLRLRMHLLAHSAGAKAFVCDQCGAQFSKEDALETHRQTHTGTDMAVFCLLCGKRFQ 500mus pLZF p 401 PLGEQCSVCGVELPDNEAVEQHRKLHSGMKTYGCELCGKRFLDSLRLRMHLLAHSAGAKAFVCDQCGAQFSKEDALETHRQTHTGTDMAVFCLLCGKRFQ 500 : L N G: : LA S: : : Q AQ S :E Q HTG: : C
hum TZF p 350 --------HEDKAG--------CP---P---------RPHPPPAPPARS------R----------------PYACSVCGKRFSLKHQMETHYRVHTGEK 399hum pLZF p 501 AQSALQQHMEVHAGVRSYICSECNRTFPSHTALKRHLRSHTGDHPYECEFCGSCFRDESTLKSHKRIHTGEKPYECNGCDKKFSLKHQLETHYRVHTGEK 600mus pLZF p 501 AQSALQQHMEVHAGVRSYICSECNRTFPSHTALKRHLRSHTGDHPYECEFCGSCFRDESTLKSHKRIHTGEKPYECNGCGKKFSLKHQLETHYRVHTGEK 600 E :AG C P R H P R PY C C K:FSLKHQ:ETHYRVHTGEK
hum TZF p 400 PFSCSLCPQRSRDFSAMTKHLRTH-GAAPYRCSLCGAGCPSLASMQAHMRGHSPSQLPPGWTIRSTFLYSSSRPSRPSTSPCCPSSSTT 487hum pLZF p 601 PFECKLCHQRSRDYSAMIKHLRTHNGASPYQCTICTEYCPSLSSMQKHMKGHKPEEIPPDWRIEKTYLYLCY-V 673mus pLZF p 601 PFECKLCHQRSRDYSAMIKHLRTHNGASPYQCTICTEYCPSLSSMQKHMKGHKPEEIPPDWRIEKTYLYLCYV 673 PF C LC QRSRD:SAM KHLRTH GA:PY:C::C CPSL:SMQ HM:GH P ::PP W I T:LY :
Hs.99430 Homo sapiens EXPRESSION INFORMATION cDNA sources: Blood, Ovary, Testis EST SEQUENCES (8)AI150041 cDNA clone IMAGE:1751830 Testis 3' read 1.1 kbAA927876 cDNA clone IMAGE:1541369 3' read 1.1 kbAI223414 cDNA clone IMAGE:1838461 Testis 3' read 1.0 kbAI150330 cDNA clone IMAGE:1751988 Testis 3' read 0.6 kbAA868505 cDNA clone IMAGE:1408687 Testis 3' readAA476210 cDNA clone IMAGE:771312 Ovary 3' readAA456628 cDNA clone IMAGE:809583 Ovary 3' readAI361709 cDNA clone IMAGE:2021901 Blood 3' read
Northern Blotting
LOCUS AF130255 1960 bp mRNA PRI 22-FEB-1999DEFINITION Homo sapiens testis zinc finger protein (TZFP) mRNA, complete cds.ACCESSION AF130255KEYWORDS .SOURCE human. ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.REFERENCE 1 (bases 1 to 1960) AUTHORS Tang,Tang K., Lai,Chun-Hung, Tang,Chieh-Ju C., Huang,Chang-Jen and Lin,Wen-chang. TITLE Identification and gene structure of a novel human PLZF related transcription factor gene, TZFP JOURNAL UnpublishedREFERENCE 2 (bases 1 to 1960) AUTHORS Tang,T. K., Tang,C.-J. C. and Lin,W.-c. TITLE Direct Submission JOURNAL Submitted (22-FEB-1999) Institute of Biomedical Sciences, Academia Sinica, No. 128, Sec. 2, Academia Road, Taipei, Taiwan 11529, TAIWAN
dbEST Id: 1659486
IDENTIFIERSEST name: om18b09.s1GenBank Acc: AA927876GenBank gi: 3076620
CLONE INFOClone Id: IMAGE:1541369 (3')Source: NCIInsert length: 1074DNA type: cDNA
PRIMERSSequencing: -40m13 fwd. ET from AmershamSEQUENCE TTTGACGGGAGGGCACAGGAAACTCTTTATTATGGTGATGAGATCGACAATCTCCCCTAC TGTTAACCTTCGCTCCTGCACACTTCAGTGTCCTCACTCTGTAGGGCTCGCTGGCCTGGG CTTCTGCGACCCGCGATCGTCCAGGAGAGGGCACTCGGCGCCCTTCCTGGGGTNNTCTGG GGCGGAATTTGCTAGGCCGCCGTAGCAGCTGTGCCAGGTCAGAAGCCGAGCCGGNCCGCT TTTCGTTCTTTAATTGGACTCTTGGCTAAGACGCTACCGACACCCCGTCAGTGGTGGAGG AAGAAGGACAACAGGGAGAGGTCGAGGQuality: High quality sequence stops at base: 318Entry Created: Apr 17 1998Last Updated: Jun 10 1998
COMMENTS This clone is available royalty-free through LLNL ; contact the IMAGE Consortium ([email protected]) for further information.
LIBRARYdbEST lib id: 1042Lib Name: Soares_NFL_T_GBC_S1
Organism: Homo sapiensOrgan: pooledLab host: DH10BVector: pT7T3D-Pac (Pharmacia) with a modified polylinkerR. Site 1: Not IR. Site 2: Eco RIDescription: Equal amounts of plasmid DNA from three normalized libraries (fetal lung NbHL19W, testis NHT, and B-cell NCI_CGAP_GCB1) were mixed, and ss circles were made in vitro. Following HAP purification, this DNA was used as tracer in a subtractive hybridization reaction. The driver was PCR-amplified cDNAs from pools of 5,000 clones made from the same 3 libraries. The pools consisted of I.M.A.G.E. clones 297480-302087, 682632-687239, 726408-728711, and 729096-731399. Subtraction by Bento Soares and M. Fatima Bonaldo.
Human cDNA Library Details:470 different libraries so farcovering more than 40 tissues
Stomach 202.NCI_CGAP_Gas1 gastric tumor 203.NCI_CGAP_Gas4 gastric tumor Testis 204.Barstead HPL-RB5 testis 205.Soares testis NHT 206.Life Tech. testis (10426-013) Thymus 207.NCI_CGAP_Thym1 thymoma Thyroid 208.NCI_CGAP_Thy1 invasive thyroid tumor Uterus 209.NCI_CGAP_Ut1 uterine tumor 210.NCI_CGAP_Ut2 uterine tumor 211.NCI_CGAP_Ut3 uterine tumor 212.NCI_CGAP_Ut4 uterine tumor 213.Soares pregnant uterus NbHPU
Q & A
CGAP
Why CGAP? In the last two decades we have learned that genetic changes lie at the root of all cancers. In response, the Cancer Genome Anatomy Project (CGAP) will unite the newest technologies, along with those both cost-effective and capable of high-throughput, to identify all the genes responsible for the establishment and growth of cancer.
Project Goals To achieve a comprehensive molecular characterization of normal, precancerous, and malignant cells.
Normal Cells
Cancer Cells
Comparing the fingerprints of a normal versus a cancer cell will highlight genes that by their suspicious absence or presence (such as Gene H ) deserve further scientific scrutiny to determine whether such suspects play a role in cancer, or can be exploited in a test for early detection.
Identifying the genetic differences among normal cells, precancerous cells, and cancer cells, will contribute to our understanding of cancer as it
fosters the discovery of genes that directly cause cancer provides us with a way to identify early precancerous cells and thus enhances our methods for early detection improves our ability to match patients with appropriate treatment
Time line
Malignant TumorPre-cancer
The research results displayed in this graph demonstrate that for patients suffering from the cancer neuroblastoma, the presence or absence of a specific set of genes found on Chromosome 1 strongly correlates with patient outcome. Therefore, in the future this characteristic of the tumor can be used to identify those patients that would benefit from more aggressive treatment, and those best served by the current treatment protocol.
Sequencing of Expressed Sequence Tags (ESTs) Serial Analysis of Gene ExpressionDifferential Display ApproachesHybridization Analysis
Digital Differential Display
The foundation of DDD is UniGene. UniGene employs a conservative method to assign all the human EST sequences that meet minimal standards of quality to distinct "clusters", each representing a unique human expressed gene. DDD takes advantage of UniGene by comparing the number of times sequences from different libraries were assigned to a particular UniGene cluster. This has the advantage that DDD will only report on sequences that we have confidencerepresent bona fide human expressed genes. There will of course be many differences in the number of sequences contained in each library that are assigned to a particular UniGene cluster, but only some of these differences are likely to reflect biological reality. Therefore DDD employs a statistical method of comparison - The Fisher Exact Test - to identify only those differences that are likely to be real. One important factor in determining statistical relevance is the absolute number of sequences in each library that have been successfully assigned to a UniGene cluster. In many cases there are not enough sequencesin dbEST libraries to meet the threshold of significance employed in the Fisher Exact Test. Since DDD will only yield a report if there are differences that exceed this threshold, it is expected that many comparisons will yield nothing.
the fraction of sequences within the pool
visual aid that reflects the numerical values
statistically significant pairwise comparison
THREE PRINCIPLES UNDERLIE THE SAGE TECHNOLOGY:
One short oligonucleotide sequence from a defined location within a transcript ("tag") allows accurate quantitation.
Tag size (10-14bp) is optimal for high throughput while maintaining accurate gene identification and quantitation.
The combined power of serial and parallel processing increases data throughput by orders of magnitude when compared to conventional approaches.
Ortholog:Homologous genes that have diverged from each other after speciation events (e.g., human beta- and chimp beta-globin)
Paralog:Homologous genes that have diverged from each other after gene duplication events (e.g., human beta- and gamma-globin)
Xenolog:Homologous genes that have diverged from each other after lateral gene transfer events (e.g., antibiotic resistance genes in bacteria)
Homolog:Genes that are descended from a common ancestor (e.g., all globins)
Dec. 11, 1998:
C. elegans: Sequence to Biology
-Jonathan Hodgkin, H. Robert Horvitz, Barbara R. Jasny, Judith Kimble*
This special issue of Science celebrates a landmark in biology: determination ofthe essentially complete DNA sequence of an animal genome. The animal is a smallinvertebrate, the nematode (or roundworm) Caenorhabditis elegans, and thesequence consists of about 97 million base pairs of DNA, approximatelyone-thirtieth the number in the human genome. Nonetheless, the information contentis enormous--eight times that of the budding yeast Saccharomyces cerevisiae, the only other eukaryote with a sequenced genome.
Genomic sequence of the Nematode C. elegnas:A platform for investigating biology
The C. elegans Squencing Consortium
97 MB257 YACs (20% only in YAC)2527 cosmids113 fosmids44 PCR19,099 predicted genes18,891 proteins here(16,260 reviewed)
EST: 67,815 EST from 40,379 clones
7432 genes
A multicellular organism genome
Genefinder program:** transplicing**
40% of predicted genes have ESTmatches
16,260/19,099 genes have been interactively reviewed. Average of one gene per 5 Kb.Average of five introns per gene.27% of genome resides in exons.
pFAM protein family search :Intracellular communicationTranscriptional regulation
Table 1. The 20 most common protein domains in C. elegans (41). RRM, RNA recognition motif; RBD, RNA binding domain; RNP, ribonuclear protein motif; UDP, uridine 5'-
diphosphate. -------------------------------------------------------------------Number Description-------------------------------------------------------------------
650 7 TM chemoreceptor410 Eukaryotic protein kinase domain240 Zinc finger, C4 type (two domains)170 Collagen140 7 TM receptor (rhodopsin family)130 Zinc finger, C2H2 type120 Lectin C-type domain short and long forms100 RNA recognition motif (RRM, RBD, or RNP domain) 90 Zinc finger, C3HC4 type (RING finger) 90 Protein-tyrosine phosphatase 90 Ankyrin repeat 90 WD domain, G-beta repeats 80 Homeobox domain 80 Neurotransmitter-gated ion channel 80 Cytochrome P450 80 Helicases conserved C-terminal domain 80 Alcohol/other dehydrogenases, short-chain type 70 UDP-glucoronosyl and UDP-glucosyl transferases 70 EGF-like domain 70 Immunoglobulin superfamily
Worming secrets from the C. elegans genome:Dec 11, 1998. Sciences
Washington University Genome Sequencing Center.Sanger Centre
8 - year effort: Sydney Brenner starts all.by 1992, they were doing a million bases per year. ~$200 MHigh-through put sequencing.Human genome project.
“We will be doing a lot of jumping back and forth between species” - F. Collins
Ping-Pong homology search
In silico cloning:In order to perform an electronic cDNA library screen, the EST
sequences retrieved in this way can be used as queries in a BLASTN search of dbEST to identify over-lapping ESTs. This procedure can be reiterated with the newly identified ESTs until no additional hits are found. The ESTs isolated can be assembled into sequence contigs
using computer softwares. EST 2
EST 1EST 3
There are many sequencing related errors in the dbEST.
Query= (597 letters) Sequences producing significant alignments: (bits) Valuelcl|THC200240 224 4e-58lcl|THC151579 181 3e-45lcl|AA099787 127 8e-29
lcl|THC200240 Length = 764 Score = 224 bits (565), Expect = 4e-58 Identities = 106/187 (56%), Positives = 136/187 (72%)
Query: 248 SGMKKNKYGNIEDLVVHLNFVCPKGIIQKQCQVPRMSSGPDIHQIILGSEGTLGVVSEVT 307 SGMKKN YGNIEDLVVH+ V P+GII+K CQ PRMS+GPDIH I+GSEGTLGV++E TSbjct: 3 SGMKKNIYGNIEDLVVHIKXVTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEAT 182
lcl|THC151579 Length = 698 Score = 181 bits (455), Expect = 3e-45 Identities = 81/142 (57%), Positives = 106/142 (74%)
Query: 446 LGMNHGVLGESFETSVPWDKVLSLCRNVKELMKREAKAQGVTHPVLANCRVTQVYDAGAC 505 L + + VLGESFETS PWD+V+ LCRNVKE + RE K +GV + CRVTQ YDAGACSbjct: 41 LALEYXVLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGAC 220
sp|O00116|ADAS_HUMAN ALKYLDIHYDROXYACETONEPHOSPHATE SYNTHASE PRECURSOR (ALKYL-DHAP SYNTHASE) (ALKYLGLYCERONE-PHOSPHATE SYNTHASE) Length = 658 Score = 124 bits (309), Expect = 5e-29 Identities = 59/60 (98%), Positives = 59/60 (98%) 248Query: 1 SGMKKNIYGNIEDLVVHIKXVTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEAT 60 SGMKKNIYGNIEDLVVHIK VTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEATSbjct: 319 SGMKKNIYGNIEDLVVHIKMVTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEAT 378
THC200240
sp|O00116|ADAS_HUMAN ALKYLDIHYDROXYACETONEPHOSPHATE SYNTHASE PRECURSOR (ALKYL-DHAP SYNTHASE) (ALKYLGLYCERONE-PHOSPHATE SYNTHASE) Length = 658 Score = 127 bits (315), Expect = 1e-29 Identities = 59/60 (98%), Positives = 59/60 (98%) 446Query: 1 LALEYXVLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGAC 60 LALEY VLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGACSbjct: 517 LALEYYVLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGAC 576
THC151579
446-248=198
517-319=198
[THC195737---------------------------------------------
MTRHGKNSTAASVYTYHERRRDAKASGYGTLHARLGADSIKEFHCCSLTLQPCRNPVISPTGYI
F
--------]
DREAILENILAQKKAYAKKLKEYEKQVAEESAAAKIAEGQAETFTKRTQFSAIESTPSRTGAVA
T
[THC195737--------------------
PRPEVGSLKRQGGVMSTEIAAKVKAHGEEGVMSNMKGDKSTSLPSFWIPELNPTAVATKLEKPS
S
----------------------------------------------------]
KVLCPVSGKPIKLKELLEVKFTPMPGTETAAHRKFLCPVTRDELTNTTRCAYLKKSKSVVKYDV
V
[THC195737----------------------]
EKLIKGDGIDPINGEPMSEDDIIELQRGGTGYSATNETKAKLIRPQLELQ*
U58746
Translation of 1 MTRHGKNCTAGAVYTYHEKKKDTAASGYGTQNIRLSRDAVKDFDCCCLSLQPCHD 55U58746 1 MTRHGKNSTAASVYTYHERRRDAKASGYGTLHARLGADSIKEFHCCSLTLQPCRN 55 *******.** .******...*. ****** . ** *..*.* **.*.****.
Translation of 56 PVVTPDGYLYEREAILEYILHQKKEIARQMKAYEKQRGTRREEQKELQRAASQDH 110U58746 56 PVISPTGYIFDREAILENILAQKKAYAKKLKEYEKQVAEESAAAKIAEGQAETFT 110 **..* **...****** ** *** *...* **** * . *
Translation of 111 VRGFLEKESAIVSRPLNPFTAKALSGTSPD-----------DVQPGPSVGPPSKD 154U58746 111 KRTQFSAIESTPSRTGAVATPRPEVGSLKRQGGVMSTEIAAKVKAHGEEGVMSNM 165 * . ** * . *. *. * *
Translation of 155 K-DK--VLPSFWIPSLTPEAKATKLEKPSRTVTCPMSGKPLRMSDLTPVHFTPLD 206U58746 166 KGDKSTSLPSFWIPELNPTAVATKLEKPSSKVLCPVSGKPIKLKELLEVKFTPMP 220 * ** ******* *.* * ******** * **.****... .* *.***.
Translation of 207 SSVDRVGLITRSER-YVCAVTRDSLSNATPCAVLRPSGAVVTLECVEKLIRKDMV 260U58746 221 ------GTETAAHRKFLCPVTRDELTNTTRCAYLKKSKSVVKYDVVEKLIKGDGI 269 * * . * ..* **** *.*.* ** *. * .** . *****. * .
Translation of 261 DPVTGDKLTDRDIIVLQRGGTGFAGSGVKLQAEKSRPVMQA 301U58746 270 DPINGEPMSEDDIIELQRGGTGYSAT-NETKAKLIRPQLELQ 310 **..*. ... *** *******.. . .* ** ..
(44%/59%)
[THC171302--MVFGENQDLIRTHFQKEADKVRAMKTNWGLFTRTRMIAQSDYDFIVTYQQAENEAERSTVLSVFKEK-------------------------------------------------------------------AVYAFVHLMSQISKDDYVRYTLTLIDDMLREDVTRTIIFEDVAVLLKRSPFSFFMGLLHRQDQYIVH-------------------------------------------------------------------ITFSILTKMAVFGNIKLSGDELDYCMGSLKEAMNRGTNNDYIVTAVRCMQTLFRFDPYRVSFVNING-------------------------------------------------------------------YDSLTHALYSTRKCGFQIQYQIIFCMWLLTFNGHAAEVALSGNLIQTISGILGNCQKEKVIRIVVST-----------------] [THC177150--------------------------------------------LRNLITSNQDVYMKKQAALQMIQNRIPTKLDHLENRKFTDVDLVEDMVYLQTELKKVVQVLTSFDEY-------------------------------------------------------------------ENELRQGSLHWSPAHKCEVFWNENAHRLNDNRQELLKLLVAMLEKSNDPLVLCVAAHDIGEFVRYYP------------------------------------------------]RGKLKVEQLGGKEAMMRLLTVKDPNVRYHALLAAQKLMINNWKDLGLEI
U50199
gi|2895578 (AF041338) vacuolar proton pump subunit SFD alpha is... 927 0.0gi|2895576 (AF041337) vacuolar proton pump subunit SFD beta iso... 885 0.0gi|1213557 (U50199) coded for by C. elegans cDNA yk89e9.5; code... 468 e-131gi|1086810 (U41109) similar to S. cerevisiae vacular H(+)-ATPas... 335 5e-91gnl|PID|e351278 (Z99532) hypothetical protein [Schizosaccharomy... 185 5e-46sp|P41807|VM13_YEAST VACUOLAR ATP SYNTHASE 54 KD SUBUNIT (V-ATP... 123 2e-27
gi|1213557 (U50199) coded for by C. elegans cDNA yk89e9.5; coded for by C. elegans cDNA cm7g5; coded for by C. elegans cDNA cm14b9; coded for by C. elegans cDNA yk52g5.5; coded for by C. elegans cDNA yk76e5.5; coded for by C. elegans cDNA yk131f11.5; c... Length = 470 Score = 468 bits (1192), Expect = e-131 Identities = 243/477 (50%), Positives = 314/477 (64%), Gaps = 20/477 (4%)
Human gene: 483 aa
gi|2895578 (AF041338) vacuolar proton pump subunit SFD alpha isoform [Bos taurus] Length = 483 Score = 927 bits (2369), Expect = 0.0 Identities = 460/483 (95%), Positives = 465/483 (96%)
Query: 1 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMISAEDCEFIQRFEMKRSPE 60 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMIS+EDCEFIQRFEMKRSPESbjct: 1 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMISSEDCEFIQRFEMKRSPE 60
Query: 61 EKQEMLQTEGSQCAKTFINLMTHICKEQTVQYILTMVDDMLQENHQRVSIFFDYARCSKN 120 EKQEMLQTEGSQ AKTFINLMTHI KEQTVQYILT+VDD LQENHQRVSIFFDYA+ SKNSbjct: 61 EKQEMLQTEGSQRAKTFINLMTHISKEQTVQYILTLVDDTLQENHQRVSIFFDYAKRSKN 120
Query: 121 TAWPYFLPILNRQDPFTVHMAARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGS 180 TAW YFLP+LNRQD FTVHM ARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGSSbjct: 121 TAWSYFLPMLNRQDLFTVHMTARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGS 180
Query: 181 GVAVETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQ 240 GV ETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQSbjct: 181 GVTAETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQ 240
Query: 241 YQMIFSIWLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKSTERE 300 YQMIFS+WLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKS ERESbjct: 241 YQMIFSVWLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKSVERE 300
Query: 301 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELK 360 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELKSbjct: 301 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELK 360
Query: 361 SGRLEWSPVHKSEKFWRENAVRLNEKNYELLKILTKLLEVSDDPQXLAVAAHDVGXYVRX 420 SGRLEWSPVHKSEKFWREN RLNEKNYELLKILTKLLEVSDDPQ LAVAAHDVG YVR Sbjct: 361 SGRLEWSPVHKSEKFWRENPARLNEKNYELLKILTKLLEVSDDPQVLAVAAHDVGEYVRH 420
Query: 421 YPRGKRVIEQXGGKQLVMNHMHHEXQQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQTXA 480 YPRGKRVIEQ GGKQLVMNHMHHE QQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQT ASbjct: 421 YPRGKRVIEQLGGKQLVMNHMHHEDQQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQTAA 480
Query: 481 ARS 483 ARSSbjct: 481 ARS 483
[AA134689-----------------------------------------------MSLNGFGEHTRSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGYSYCGETAAYAF--------------------------]KQVVSSAVERVFILGPSHVVALNGCAITTCSKYRTPLGDLIVDHKINEELRATRHFDLMDRRDEES [THC196496-------------------------------------EHSIEMQLPFIAKVMGSKRYTIVPVLVGSLPGSRQQTYGNIFAHYMEDPRNLFVISSDFCHWGERF------------------------------------------------------------------SFSPYDRHSSIPIYEQITNMDKQGMSAIETLNPAAFNDYLKKTQNTICGRNPILIMLQAAEHFRIS-----------------------------------]NNHTHEFRFLHYTQSNKVRSSVDSSVSYASGVLFVHPN
U64857
Translation of 1 MSNR---VVCREASHAGSWYTASGPQLNAQLEGWLSQVQSTKRPARAIIAPHAGY 52U64857 1 MSLNGFGEHTRSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGY 55 ** .* ********.* * ** ** . ***.*.*****
Translation of 53 TYCGSCAAHAYKQVDPSITRRIFILGPSHHVPLSRCALSSVDIYRTPLYDLRIDQ 107U64857 56 SYCGETAAYAFKQVVSSAVERVFILGPSHVVALNGCAITTCSKYRTPLGDLIVDH 110 .*** .** *.*** * *.******* * * **... ***** ** .*.
Translation of 108 KIYGELWKTGMFERMSLQTDEDEHSIEMHLPYTAKAMESHKDEFTIIPVLVGALS 162U64857 111 KINEELRATRHFDLMDRRDEESEHSIEMQLPFIAKVMGSKR--YTIVPVLVGSLP 163 ** ** * *. * . .* ******.**. ** * *.. .**.*****.*
Translation of 163 ESKEQEFGKLFSKYLADPSNLFVVSSDFCHWGQRFRYSYYD-ESQGEIYRSIEHL 216U64857 164 GSRQQTYGNIFAHYMEDPRNLFVISSDFCHWGERFSFSPYDRHSSIPIYEQITNM 218 *..* .* .*..*. ** ****.********.** .* ** * ** * ..
Translation of 217 DKMGMSIIEQLDPVSFSNYLKKYHNTICGRHPIGVLLNAITELQK-NGMNMSFSF 270U64857 219 DKQGMSAIETLNPAAFNDYLKKTQNTICGRNPILIMLQAAEHFRISNNHTHEFRF 273 ** *** ** * * .* **** .******.** ..*.* . *. . * *
Translation of 271 LNYAQSSQCRNWQDSSVSYAAGALTVH 297U64857 274 LHYTQSNKVRSSVDSSVSYASGVLFVHPN 302 *.*.** . * *******.* * **
gi|1465834 (U64857) No definition line found [Caenorhabditis el... 300 1e-80sp|Q10212|YAY4_SCHPO HYPOTHETICAL 34.8 KD PROTEIN C4H3.04C IN C... 215 3e-55sp|P47085|YJX8_YEAST HYPOTHETICAL 38.5 KD PROTEIN IN SUI2-TDH2 ... 195 3e-49gi|2425141 (AF020286) similar to C. elegans CEESS08F encoded by... 155 4e-37gnl|PID|d1031681 (AP000006) 294aa long hypothetical protein [Py... 87 1e-16gi|2983422 (AE000712) hypothetical protein [Aquifex aeolicus] 85 7e-16gi|2621080 (AE000796) conserved protein [Methanobacterium therm... 79 4e-14gnl|PID|e283857 (Y08257) orf c05005 [Sulfolobus solfataricus] 78 9e-14sp|Q57846|Y403_METJA HYPOTHETICAL PROTEIN MJ0403 >gi|2129073|pi... 77 2e-13gi|2983762 (AE000735) hypothetical protein [Aquifex aeolicus] 68 1e-10
gi|1465834 (U64857) No definition line found [Caenorhabditis elegans] Length = 302 Score = 300 bits (759), Expect = 1e-80 Identities = 153/292 (52%), Positives = 198/292 (67%), Gaps = 4/292 (1%)
Query: 8 REASHAGSWYTASGPQLNAQLEGWLSQVQSTKRPARAIIAPHAGYTYCGSCAAHAYKQVD 67 R ASHAGSWY A+ L+ QL WL ARA+I+PHAGY+YCG AA+A+KQV Sbjct: 11 RSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGYSYCGETAAYAFKQVV 70
BLASTP (Jan. 10, 1999)
[THC132858-------------------]MKQFKRGIERDGTGFVVLMAEEAEDMWHIYNLIRIGDIIKASTIRKVVSETSTGTTSSQRVHTM
LTVSVESIDFDPGAQELHLKGRNIEENDIVKLGAYHTIDLEPNRKFTLQKTEWDSIDLERLNLA
[THC85433------------------------------------------LDPAQAADVAAVVLHEGLANVCLITPAMTLTRAKIDMTIPRKRKGFTSQHEKGLEKFYEAVSTA--------------------------------------------] {AA938998*****************FMRHVNLQVVKCVIVASRGFVKDAFMQHLIAHADANGKKFTTEQRAKFMLTHSSSGFKHALKEV*******} [THC200182----------------------------------------------------LETPQVALRLADTKAQGEVKALNQFLELMSTEPDRAFYGFNHVNRANQELAIETLLVADSLFRA-----------------------------------------------]QDIETRRKYVRLVESVREQNGKVHIFSSMHVSGEQLAQLTGCAAILRFPMPDLDDEPMDEN
Z36238
Translation of 1 MKLVRKNIEKDNAGQVTLVPEEPEDMWHTYNLVQVGDSLRASTIRKVQTESSTGS 55Z36238 1 MKQFKRGIERDGTGFVVLMAEEAEDMWHIYNLIRIGDIIKASTIRKVVSETSTGT 55 ** ...**.*..* * *. ** ***** ***...** ..******* .*.***.
Translation of 56 VGSNRVRTTLTLCVEAIDFDSQACQLRVKGTNIQENEYVKMGAYHTIELEPNRQF 110Z36238 56 TSSQRVHTMLTVSVESIDFDPGAQELHLKGRNIEENDIVKLGAYHTIDLEPNRKF 110 *.**.* **..**.**** * .*..** **.**. **.******.*****.*
Translation of 111 TLAKKQWDSVVLERIEQACDPAWSADVAAVVMQEGLAHICLVTPSMTLTRAKVEV 165Z36238 111 TLQKTEWDSIDLERLNLALDPAQAADVAAVVLHEGLANVCLITPAMTLTRAKIDM 165 ** * .***. ***. * *** .*******..****..**.**.*******...
Translation of 166 NIPRKRKGNCSQHDRALERFYEQVVQAIQRHIHFDVVKCILVASPGFVREQFCDY 220Z36238 166 TIPRKRKGFTSQHEKGLEKFYEAVSTAFMRHVNLQVVKCVIVASRGFVKDAFMQH 220 .******* .***.. **.*** * * **.. ****..*** ***.. *
Translation of 221 MFQQAVKTDNKLLLGNRSKFLQVHASSGHKYSLKEALCDPTVLARLSDTKAAGEV 275Z36238 221 LIAHADANGKKFTTEQRAKFMLTHSSSGFKHALKEVLETPQVALRLADTKAQGEV 275 . .* . * .*.**. *.*** * .*** * * * **.**** ***
Translation of 276 KALDDSYKMLQHEPDRAFYGLKQVEKANEAMAIDTLLISDELFRHQDVATRSRYV 330Z36238 276 KALNQFLELMSTEPDRAFYGFNHVNRANQELAIETLLVADSLFRAQDIETRRKYV 330 *** .. ******** .* .**. .**.***..* *** **. ** .**
Translation of 331 RLVDSVKENAGTVRIFSSLHVSGEQLSQLTGVAAILRFPVPELSDQEGDS-SSEE 384Z36238 331 RLVESVREQNGKVHIFSSMHVSGEQLAQLTGCAAILRFPMPDLDDEPMDEN 381 ***.**.*. * *.****.*******.**** *******.*.* *. *
Translation of 385 D 385Z36238 382 381
sp|P48612|PELO_DROME PELOTA PROTEIN >gi|973224 (U27197) pelota ... 520 e-147sp|P50444|YNU6_CAEEL HYPOTHETICAL 42.9 KD PROTEIN R74.6 IN CHRO... 446 e-125gi|3941543 (AF069497) pelota [Arabidopsis thaliana] 385 e-106pir||S45456 DOM34 protein - yeast (Saccharomyces cerevisiae) >g... 236 2e-61sp|P33309|DO34_YEAST DOM34 PROTEIN >gi|295608 (L11277) DOM34 [S... 212 2e-54gnl|PID|e304505 (Z86109) unknown [Saccharomyces pastorianus] 199 3e-50gi|2622770 (AE000923) cell division protein [Methanobacterium t... 155 4e-37gnl|PID|d1031529 (AP000006) 356aa long hypothetical protein [Py... 146 3e-34sp|Q57638|Y174_METJA HYPOTHETICAL PROTEIN MJ0174 >gi|2127805|pi... 145 6e-34gi|2649765 (AE001046) cell division protein pelota (pelA) [Arch... 116 3e-25
sp|P50444|YNU6_CAEEL HYPOTHETICAL 42.9 KD PROTEIN R74.6 IN CHROMOSOME III >gi|3879163|gnl|PID|e1348805 (Z36238) Similar to the DOM34 protein of saccharomyces cerevisiae (Swiss Prot accession number P33309) [Caenorhabditis elegans] Length = 381 Score = 446 bits (1136), Expect = e-125 Identities = 215/371 (57%), Positives = 282/371 (75%)
BLASTP (Jan. 10, 1999)
5 55
55 55
55 555 555555 555 5 5
5 5 5555 5555555555
5
5
5
5
55 55 555 55 5555 55 5555 55555 555 55 55555 555 555 55 55
5
555 555 555 555555 5555 55 5555 555 5 5555555 5555 5 555555 5555 5
5
55
55
5
55
55
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
0 100 200 300 400 500 600 700 800 900 1000
C.
ele
gan
s p
rote
in len
gth
CGI protein length
HH
H
H
HHH
H
H
HH
H
H
H
H
HH
H
HHH H
H
H
H
H
H
HH
HH
HH
HH
H
HHH
H
H
H
H
H
H
HH HH
HH
H
HH
H
H
H
H
H
H
H
H
HH
H
HH
HH
H
H
H
HHHH
HH
H
HH
H
H
H
H
H
H
HH
HH
HHH
H
H
H
H
H
H
H
H
HHHHH H
HH
HHHH
HH
H H
HHHHH
H
H
H
H
H
HHH
HHHH
HH
HH H
H
HH
H
HHHH
H
H
0
100
200
300
400
500
600
700
800
0 100 200 300 400 500 600 700 800 900 1000
Matc
h a
rea len
gth
CGI protein length
A
A
AA
A
A
A
A
A
A
A
A
AA
AA
A
A
A
A
A
A
AA
A
A
A
AA
A
AA
AAA
A
A
A
A
AA
AA
A
A
A
A
A
AA
A
AAA
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
AAA
A
A AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AAA
A
A
A
A
A
A
A
A
A
A
AA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600 700 800 900 1000
Pro
tein
sim
ilari
ty b
etw
een
CG
I an
d C
. ele
gan
s
CGI protein length
C. elegans from WormPept: 18,452 entries HGI searches
(5 days for TBLASTN analysis)
*Families 3,934*Known Gene 7,954*New Contig 3,456*Undetermined 2,070
<100 aa 1,038
*150 full length genes so far, more expected following GAP closure and 5’RACE.
83% between Human & C. elegans11% C. elegans specific
C. elegans from WormPept: 18,452 entries MGI searches
(5 days for TBLASTN analysis)
*Families 5,602*Known Gene 4,151*New Contig 5,805*Undetermined 1,856
<100 aa 1,038
84% between Mouse & C. elegans10% C. elegans specific