Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
FUNCTIONAL GENOMICS
Methods to understand gene function
1. Compare with genes in otherorganisms. Comparative Genomics
2. Comparison of structural motifs3. Generation of mutations and to
study the resulting phenotype.4. Study gene(s) responsible for a
phenotype
GenomicsAccording to OMS: study of genes and their functions, and allthe connecting technics
The human genome was first published in 2001
•The International Consortium, integrated by 20 groups of different countries and theprivate company Celera, simultaneously made public on the 12 of February of the 2001, the provisional map of the human genome from assembling the genome from differentindividuals.
The International Consortium calculated that the human genome contains 31,780 proteincoding genes, up to date only 22.000 have been found. On the other han, Celera indicated the existence of 26,000 genes while the total number would be of 38.000.
The sequence obtained represented 90% of the genome.
Some Facts:
• The Team of Celera Genomics used to sequence the human genome DNA samples ofthree women and two men (an Afro-American, a Chinese, an Asian, a hispanomexicanand a caucasian).
• Each person shares a 99.99 percent of the same genetic code with the rest of thehuman beings. Only 1,250 nucleotide separate a person of another one.
• ∼223 human genes show similarity to bacterial genes.
• Only 5 % of the genome codes for proteins, 25% sequences are almost desert in between genes.
• 250-300.000 different proteins are calculated, therefore each gene could code about 10 different proteins.
• About 35% of the genome contains repeated sequences
• A huge number of small variations in the genes, knows as polymorphisms (SNP) havebeen found. Celera has calculated 2.1 million SNP in the genome while the Consortium1.4 million. The vast majority of these polymorphisms does not have a concrete clinicaleffect but probably result different sensitivity to certain drugs and the predisposition ornot to suffer a certain disease.
Just consider that
Aproximately 36% desciphered genes have an unknown function.A live being is not the result of reading a pre-existing code but theinfluence of complex interactions within the same organism and withthe environment.
Personal Genomics:
PLoS Biology 2007 5:e254September 4, 2007
Diploid ReconstructionHalf of genome is in haplotype blocks of >200kb
Functional Genomics
It is the field of molecular biology that attempts towards the systematicharvesting of information and data collected by genome sequencingprojects and about the function carried out by these genes.
Aims to:identify and define the function of genesexamine the inter-relations and interactions between thousands ofgenes.determine when and why certain phenotypes are expressed, Understand which set of genes are specifically responsible for thatphenotype and in what conditions. Focuses on dynamic aspects such as transcription, translation andprotein-protein interactions as opposed to genomic information ofgenes and structures
DNA RNA
cDNA
protein DNA RNA
cDNA
protein
Comparison
Most common technologies used
1. cDNA transfection into cells and search for resulting function
2. Library transfection into cells and search for specific function: Expression-cloning
3. Site-directed mutagenesis
4. DNA microarrays
5. siRNA
6. SAGE for mRNA
Most common technologies used
1. cDNA transfection into cells and search for resulting function
2. Library transfection into cells and search for specific function: Expression-cloning
3. Site-directed mutagenesis
4. DNA microarrays
5. siRNA
6. SAGE for mRNA
A fundamental approach to studying gene expressionis through cDNA libraries.
• Isolate RNA (always from a specificorganism, region, and time point)
• Convert RNA to complementary DNA
• Subclone into a vector
vector
insert
1. Comparative Genomics: Analysis of gene expression in cDNA libraries
Requisites for library plasmid
• Having a prokaryote originof replication
• Antibiotic resistance gene• Work under control of a
strong eukaryotic promoter(CMV) and contain Kozacksequence ribosomebinding.
Most common technologies used
1. cDNA transfection into cells and search for resulting function
2. Library transfection into cells and search for specific function: Expression-cloning
3. Site-directed mutagenesis
4. DNA microarrays
5. siRNA
6. SAGE for mRNA
2. Expression-cloning
Allows to isolate a gen according to its function/ its ability toinduce a given phenotype in a system (or a cell) that normallydoes not have it.
Sequential enrichment of positive fractions
Fundamental requirements
Select a cell line silent to the requested phenotype, i.e., HEK, COS cells.Select a life being or a tissue ftom which the gene will be selected. Previous characterization of thetissue should warrant physiologically abundantpresence of the gene.Prepare total mRNA to be divided in differentfractions. Alternatively, cDNA could be prepared andinserted in appropiate vectors.
Too keep in mind:
The functional assay is crucial to identify positive hits. Primary tests must be simple parameters, toprovide fast results, with great sensitivity to detectpositive poolsEach primary test must be followed by confirmingtests, more elaborated, and of greater specificity toexclude false positives.
A few examples of cDNAS cloned by this method
Amino acidAbsorption
Manduca sexta
intestineNeutral amino acid transporter
voltageclampratbrainDelayed rectifierK+ channel
glucose dependent ofNa+ absorption
RabbitIntestineNa+/glucoseCotransporter
Selection MethodOrganismTissuecDNA
Cloning of TRPV1
A design with Antibody neutralization
Alternatively
Use of functional complementation for cloning multimeric proteins i.e. epithelial Na+ Channel (Canessa et al.)Hybrid depletion: use cDNAs library fractions and RNA and test capacity to diminish the expression of a function. i.e. ClC-0 (Jentsch et al.)
Most common technologies used
1. cDNA transfection into cells and search for resulting function
2. Library transfection into cells and search for specific function: Expression-cloning
3. Site-directed mutagenesis
4. DNA microarrays
5. siRNA
6. SAGE for mRNA
3. Site-directed Mutagenesis
3.1. Generate site specific mutations in vitro and analyzeresultante phenotype. 3.2. Transgenic and knockout animals
3.3. Random mutagenesis
3.1. Site specific mutations in vitro
• Allows elucidating functional elementsans specific interactions.
• Validate simple biological processes i.e.
Shaker was a K+-channel or elucidatedevelopmente processes.
•Structure-function studies.
Not valid for multi-genic processes
Many mutants result in non-functionalgenes: uninformative.
Silent heterologousexpression system required
Function of a given gene in the whole animal. Transgenesis: introduce exogenous DNA in thegermline.Method: DNA microinjection onto pronucleus of a fertilized ovocyte. Random insertion multiplecopies Head-tail) in a single site of the genome.
3.2. Mutagenesis in vivo
Promotor cDNA polyA
Intrones Artificiales
Transgenesis do not alter (nordelete) endogenous gene.
Gene Targeting
Alteration of endogenous gene by Homologous Recombinación (HR) with an exogenous gene designedin vitro. HR takes place in the innermass of a blastocyst (stem cells).
HR events are selected positively(and sometimes negatively) withantibiotics.
Exogenous DNA is introduced in thegerm line of an animal which the transmit itto its progenie according to Mendel law.
Putative problems:Lethal phenotypeNo phenotipe
Conditional Transgenics/Knockouts
Use Cre recombinase, under control of a tissue-specific promoter, whichrecognizes LoxP sequences.
Cre eliminates fragments flanked by LoxP sites
•At single gene level: Uses specialpolimerase ie Mutazyme® or modifyingamount of template or number of cycles in the reaction, or concentration of dNTPs orMg2+.
•In mice: Using alkylating agent N-etil-N-nitrosourea (ENU). ENU is a super-mutagenic reagents which generatestransversions AT-TA or AT-GC.
3.3. Random mutagenesis
Most common technologies used
1. cDNA transfection into cells and search for resulting function
2. Library transfection into cells and search for specific function: Expression-cloning
3. Site-directed mutagenesis
4. siRNA
5. DNA microarrays
6. SAGE for mRNA
Intervention at RNA level
Gene expression is regulated in several basic ways
• by region (e.g. brain versus kidney)• in development (e.g. fetal versus adult tissue)• in dynamic response to environmental signals (e.g. immediate-early response genes)
• in disease states• by gene activity
4. siRNA
Mechanism of post-transcripcional silencing observed in plants, fungi and in nematodes and responsable in cellular responses in eukaryotes to silence viral invasion of RNAs. Useful torepress endogenous protein translation and to guarantee thegenome stability. siRNA Design can inhibit protein translation or to promote mRNA degradation
RNAi forms double chain RNA thusproducing specific silencing ofhomologous genes.
Dicer (with type RNAIII activity) formprecursors of double chain.
The resulting siRNAi works likesequences of recognition of the RNAiInducing Silencing Complex (RISC) that recognize mRNA homologous andwill induce its degradation.
Methods to synthesize siRNA
ACE Chemical Synthesis uses a modification of the phosphoramidite classicmethod of oligonucleotide synthesis, 2´-ACE, producing a water and nucleaseresistant intermediary. It is the election method for large scale synthesis. Givesgreat purity, any modification can be obtained
Test-tube transcription: needs a template representing the target sequence. nonspecific Sequence modifications are limited. Low scale, low cost. Increasedprobability of siRNA. Low scale, low cost.
Retrovirals plasmids and vectors: require use of transcribed plasmids or use ofvirus that form shRNAs. Promoters based on RNApolimerase-III. Use in induced systems, modifications cannot be incorporated
Most common technologies used
1. cDNA transfection into cells and search for resulting function
2. Library transfection into cells and search for specific function: Expression-cloning
3. Site-directed mutagenesis
4. siRNA
5. DNA microarrays
6. SAGE for mRNA
Intervention at RNA level
5. DNA y RNA Microarrays
•Based on DNA or RNA hybridisation odlabeled DNA bound and immobilized to a specific surface.
•Immobilized DNA will hybridize tocomplementary sequences in the sample.
•The sequence of the immobilized DNA at each position is perfectly known.
•The presence/ausence of hybrids is givenwith different intensity color and the relativeabundance with different colors.
Microarray databases
Two main repositories:
Gene expression omnibus (GEO) at NCBI
ArrayExpress at the European Bioinformatics Institute (EBI)
Most common technologies used
1. cDNA transfection into cells and search for resulting function
2. Library transfection into cells and search for specific function: Expression-cloning
3. Site-directed mutagenesis
4. siRNA
5. DNA microarrays
6. SAGE for mRNA
Intervention at RNA level
5. SAGE
Serial Analysis of Gene Expression (SAGE) gives an overview of a cell’s complete gene activity. Captures mRNAs, identify them and count them, produce a “photo” ofthe mRNAs population (transcriptome) in a sample of interest.
By comparing different types of cells, cell profiles are generated thus are putativelyuseful to understand healthy/diseased cells.
SAGE gives more qualitative data than microarrays.
Since SAGE is not based on hybridization, the mRNA sequences do not need to be known a priori.
Trap RNAs with beads
Some of the steps of SAGE:
Convert RNA into cDNA Digest each cDNAat one end
Attach a "dockingmodule" to this end; herea new enzyme can dock, and cut off a short tag
Combine two tags into a unit, a di-tag
Pick the best concatamers and sequence them
Identify how many different cDNAs there are, and count them
Match the sequence of each tag to the gene that produced the RNA
http://www.embl-heidelberg.de/info/sage
Pick the best concatamers and sequence them: 14 nt are enough to match an RNA to the precise gene that produced it
Identify how many different cDNAs there are, and count them
Match the sequence of each tag to the gene that produced the RNA
Example of a concatemer:
CATGTTGGGTAGCATAG 4
CACCGAAACCTATGTAG 3
CATGGTACGATGATTAG 2
AGGACCCACGAGCTAG 1
CATG
CATGGGACAATGCTTAG 6
GTTAGGACGAGGTAG 5
66TACGTTTCCA
66GCGATATTGT
80GCCTTGTTTA
83TAGCCCAGAT
91GCGATGGCGG
92TAGGACGAGG
112TCCCCGTACA
125GCGCAGACTT
1075ATCTGAGTTC
CountTag_Sequence
A computer program generates a list of tags and tellshow many times each one has been found in the cell
Identify the RNA and the gene that produced each of the tagsby comparing the tags to a database containing allknown genes from the organism.
BcDNA.GM122704TAACGACCGC
ribosomal protein S5 homolog (M(1)15D) 50GCCGAAGTTG
ribosomal protein S3134GCCCGCAACA
ribosomal protein L18a45GGAGCCCGCC
rpL2163GCAAAACCGG
NADH dehydrogenase 3 (ND3) gene99TTTTTGTTAA
SF1 protein (SF1 gene)9CCGCCGTGGG
ubiquitin 52-AA extension protein45GTTAACCATC
rpa1 mRNA fragment for r ribosomal protein81GCCTTGTTTA
no match1ACCGCCTTCG
T-complex protein 1, z-subunit2AAATCGGAAT
translation elongation factor 1 gamma5ATATTGTCAA
Gene NameCountTag_Sequence
Large quantity of data is produced by these techniques and the desireto find biologically meaningful patterns, bioinformatics and referencelibraries becomes a crucial tool for analysis.
1. Comparative Genomics comparativa. EST
Expressed sequence Tags (EST).
Some comercially available libraries are obtained from EST cDNAs.
cDNAs are generated from allmRNAs of a single cell or a giventissue. Hundreds or thousandsgenes are selected, andsequenced a single time withoutverification.Sequences are incomplete.
mRNA tejido
Genoteca de cDNAS
Secuencia de extremo 5´ a 3´
RT-PCR
purify RNA, label
hybridize,wash, image
Biological insight
Sampleacquisition
Dataacquisition
Data analysis
Data confirmation
data storage
experimentaldesign
SAGE and EST databases
Main repository:
UniGene at NCBI: www.ncbi.nlm.nih.gov/UniGeneUniGene data come from many cDNA libraries. Obtain information on its abundance and its regional
http://mgc.nci.nih.gov