67
PR9_Genomske tehnologije metode za analizo genomov

PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

PR9_Genomske tehnologije –metode za analizo genomov

Page 2: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Genome sequencing was driven by

technological progress// Tehnološki

napredek je poganjal sekveniranje

genomov

Page 3: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 4: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Genomski projekti/Genome projects

Page 5: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Overview of a genome project.

First, the genome must be selected,

which involves several factors including

cost and relevance. Second, the

sequence is generated and assembled

at a given sequencing center (such as

BGI or DOE JGI). Third, the genome

sequence is annotated at several levels:

DNA, protein, gene pathways, or

comparatively.

Page 6: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 7: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Increase in the number of genomes completed per year separated by

bacterial phylum. Data source: NCBI, complete genomes

Lagesen K, Ussery DW, Wassenaar TM. 2010. Genome update: the 1000th

genome - a cautionary tale. Microbiology 156:603-608.

Page 8: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 9: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Genome sequencing/Sekveniranje genomov

Page 10: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 11: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 12: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 13: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Processing and analysis of genome data is a "big data problem"

Page 14: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Bioinformacijska orodja za analizo celotnih genomov

Page 15: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Whole-genome alignments

available online

Page 16: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Selected web-based bioinformatic tools and web

services for integrative cancer genome analysis

Page 17: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Genome Browsers

AGBL5 (ATP/GTP binding protein-like 5) is a protein-coding

gene. Diseases associated with AGBL5 include intrahepatic

cholangiocarcinoma, and cholangiocarcinoma. GO annotations

related to this gene include metallocarboxypeptidase activity and

tubulin binding. An important paralog of this gene is AGTPBP1.

Page 18: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

HAVANA - human and vertebrate analysis

and Annotation (at the Sanger Institute)

The HAVANA group provides the manual annotation of

human, mouse, zebrafish and other vertebrate

genomes that appears in the Vega browser. The value

of a genome is only as good as its annotation. To

create a gold standard reference annotation the

Human and Vertebrate Analysis and Annotation

(HAVANA) team uses tools developed in-house to

manually annotate human, mouse and zebrafish

genomes. The team aims to develop accurate and

comprehensive annotation representing the full

complexity of gene loci and their features. Manual

annotation is especially important in areas that are not

well catered for by automated annotation systems, such

as splice variation, pseudogenes, conserved gene

families, duplications and non-coding genes. The

HAVANA team constantly updates its methods by

incorporating new data sources that are created as

new technologies are developed. HAVANA annotation

is freely available through genome browsers, including

VEGA, Ensembl and UCSC.

Page 19: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

mouse GENCODE freeze (version M2, 10/12/2013)

Total No of Genes 38924

Protein-coding genes 22572

Long non-coding RNA genes 4074

Small non coding RNA genes 5853

Pseudogenes 5948

- processed pseudogenes: 4556

- unprocessed pseudogenes: 1157

- unitary pseudogenes: 14

- polymorphic pseudogenes: 15

- pseudogenes: 204

Immunoglobulin/T-cell receptor gene segments

- protein coding segments: 477

- pseudogenes: 2

Total No of Transcripts 94545

Protein-coding transcripts 47394

- full length protein-coding: 38260

- partial length protein-coding: 9134

Nonsense mediated decay transcripts 4134

Long non-coding RNA loci transcripts 6053

Total No of distinct translations 38862

Genes that have more than one distinct translations

7946

Human GENCODE freeze (version 19, 10/12/2013)

Total No of Genes 57820

Protein-coding genes 20345

Long non-coding RNA genes 13870

Small non-coding RNA genes 9013

Pseudogenes 14206

- processed pseudogenes: 10532

- unprocessed pseudogenes: 2942

- unitary pseudogenes: 161

- polymorphic pseudogenes: 45

- pseudogenes: 296

Immunoglobulin/T-cell receptor gene segments

- protein coding segments: 386

- pseudogenes: 230

Total No of Transcripts 196520

Protein-coding transcripts 81814

- full length protein-coding: 57005

- partial length protein-coding: 24809

Nonsense mediated decay transcripts 13052

Long non-coding RNA loci transcripts 23898

Total No of distinct translations 61559

Genes that have more than one distinct translations

13600

The GENCODE Project: Encyclopaedia of genes and gene variants (http://www.gencodegenes.org/)

Page 20: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

‘Omics’ data are providing comprehensive

descriptions of nearly all components and interactions

within the cell.Omics data sets that describe virtually all biomolecules in the cell

are starting to become available. These data can be generally

classified into three categories: components, interactions and

functional-states data. Components data detail the molecular

content of the cell or system, interactions data specify links

between molecular components, and functional-states data provide

an integrated readout of all omics data types by revealing the

overall cellular phenotype. The central pathway traces the

biological information flow from the genome to the ultimate

cellular phenotype, and the available omics data types that are

used to describe these processes are indicated in the adjacent

boxes. From the top, DNA (genomics) is first transcribed to

mRNA (transcriptomics) and translated into protein (proteomics),

which can catalyse reactions that act on and give rise to

metabolites (metabolomics), glycoproteins and oligosaccharides

(glycomics), and various lipids (lipidomics). Many of these

components can be tagged and localized within the cell

(localizomics). The processes that are responsible for generating

and modifying these cellular components are generally dictated

by molecular interactions, for example by protein–DNA

interactions in the case of transcription, and protein–protein

interactions in translational processes as well as enzymatic

reactions. Ultimately, the metabolic pathways comprise

integrated networks, or flux maps (fluxomics), which dictate

the cellular behaviour, or phenotype (phenomics).

Page 21: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

'Omics' data repositories

Page 22: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Data visualization. The University of California-Santa Cruz (UCSC) Genome Browser is a tool for viewing genomic data sets. A vast amount

of data is available for viewing through this browser. This example from the browser shows numerous data types in K562 cells from the ENCODE

Consortium. A random gene was selected — katanin p60 subunit A-like 1 (KATNAL1) — that shows several points that can be identified by using

this tool. The promoter has a typical chromatin structure (a peak of histone 3 lysine 4 trimethylation (H3K4me3) between the bimodal peaks of

H3K4me1), is bound by RNA polymerase II (RNAPII) and is DNase hypersensitive. The gene is transcribed, as indicated by RNA sequencing

(RNA–seq) data, as well as H3K36me3 localization. The gene lies between two CCCTC-binding factor (CTCF)-bound sites that could be tested

for insulator activity. An intronic H3K4me1 peak (highlighted) predicts an enhancer element, corroborated by the DNase I hypersensitivity site

peak. There is a broad repressive domain of H3K27me3 downstream, which could have an open chromatin structure in another cell type.

Page 23: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

The interplay and codependence of

experimental and computational

approaches.

The centrally located yellow box labeled

“Validation of Biological Mechanisms”

depicts the ultimate goal for researchers

studying a biological pathway. Approaches

illustrated from the top of the image

downward depict high-throughput

analyses used to predict transcription

factor binding sites and to determine

functional activity of those elements.

Approaches illustrated from the bottom of

the image upward signify more

conventional “locus-specific” analyses

that start from a narrowly defined

hypothesis of biological function and can

include the use of animal models.

Page 24: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Evolucijske analize genomov

Page 25: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

The phylogenetic

inference process

Page 26: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Methods of phylogenomic inference

The flowchart shows steps in the inference

of evolutionary trees from genomic data.

Genomic information is obtained by large-

scale DNA sequencing. In general, sets of

orthologous genes are then assembled from

specific sets of species for phylogenetic

analysis. This homology or orthology

assessment is a crucial step that is almost

always based on simple similarity

comparisons (for example, BLAST

searches). Most methods used for the

subsequent reconstruction of phylogenetic

trees are either sequence-based or are based

on whole-genome features.

Page 27: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Strukturna genomika – analiza SD in CNV

(variabilni genom)

Page 28: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Array-based, genome-wide methods for the

identification of copy-number variants

Page 29: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Funkcionalna analiza genoma – ENCODE (pri človeku),

modENCODE (pri vinski mušici in glisti (C. elegans))

Page 30: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

The Organization of the ENCODE Consortium.

(A) Schematic representation of the major methods

that are being used to detect functional elements

(gray boxes), represented on an idealized model of

mammalian chromatin and a mammalian gene.

(B) The overall data flow from the production

groups after reproducibility assessment to the Data

Coordinating Center (UCSC) for public access and

to other public databases. Data analysis is

performed by production groups for quality control

and research, as well as at a cross-Consortium level

for data integration.

Page 31: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

modENCODE

(www.modencode.org/)

Page 32: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Finding function in novel targets: C.

elegans as a model organism

Overview of the RNA interference

supported target identification process.

Page 33: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 34: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Študij interakcij na genomskem nivoju

•Y1H (protein (TF) - DNA interakcija)

•Y2H (protein-protein interakcije)

•Y3H (RNA-protein interakcija), tudi za detekcijo majhnih ligandov

•membraneY2H (interakcije med membranskimi proteini in naravnim okoljem)

Page 35: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 36: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Comparison of experimental protocols. Experiments to

detect different aspects of DNA-binding proteins share

many of the same steps; simplified schematics of the

main steps are shown.

a| Chromatin immunoprecipitation followed by

sequencing (ChIP–seq) for DNA-binding proteins such

as transcription factors. Recent variations on the

standard protocol include using endonuclease digestion

instead of sonication (ChIP–exo) to increase the

resolution of binding-site detection and to eliminate

contaminating DNA, and DNA amplification after ChIP

for samples with limited cells.

b| ChIP–seq for histone modifications uses micrococcal

nuclease (MNase) digestion to fragment DNA and can

also now be run on low-quantity samples when combined

with the additional post-ChIP amplification.

c| DNase–seq relies on digestion by the DNaseI nuclease

to identify regions of nucleosome-depleted open

chromatin where there are binding sites for all types of

factors, but it cannot identify what specific factors are

bound.

d| Formaldehyde-assisted identification of regulatory

elements (FAIRE–seq) similarly identifies nucleosome-

depleted regions by extracting fragmented DNA that is

not crosslinked to nucleosomes.

LinDA, single-tube linear DNA amplification; T7, T7

phage RNA polymerase.

Page 37: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

DNaseI footprints correspond to bound

proteins.The distribution of DNaseI digestion sites with

DNaseI hypersensitive regions is not uniform; peaks

and troughs occur in the signal, where troughs are

due to the protection of DNA sequences by bound

proteins. Transcription factor binding motif databases

such as JASPAR can be searched using the sequence

from each footprint to predict what factor is bound.

Shown here are data from the proximal promoter

region of the human fragile-X mental retardation 1

(FMR1) gene, with motif-matching results for one

footprint indicating that potentially bound factors are

interferon regulatory factor 1 (IRF1) or IRF2. DNaseI

footprints had been identified previously at this locus

in lymphoblastoid cells. More recent data from

DNase–seq was used to recapitulate these results in a

single experiment.

Page 38: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Detecting chromatin interactions.In three-dimensional space, distal genomic regions

on the same or different chromosomes interact, and

this can be mediated by one or more DNA-binding

proteins.

a| Chromatin conformation capture experiments use

a ligation step to join distant fragments that are

interacting in three-dimensional chromatin space,

thus providing information on possible targets for

DNA-bound proteins.

b| Chromatin interaction analysis with paired-end

tag sequencing (ChIA-PET) similarly detects

chromatin interactions using a ligation step to pair

non-adjacent interacting regions. However, ChIA-

PET uses a chromatin immunoprecipitation (ChIP)

step to more specifically identify interactions with a

particular bound protein, such as RNA polymerase

II. It should be noted that the DNA that is actually

sequenced as part of the paired-end sequencing does

not necessarily correspond to the precise region of

interaction but is dictated by the presence of restriction

enzyme targets.

Page 39: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Identification of protein–DNA

interactions using the ChIP-chip

approach.

Transcription factors are cross-linked

in vivo to their binding sites, sonicated

and DNA fragments that are covalently

bound to a transcription factor are

enriched and purified by

immunoprecipitation. The isolated DNA

is subsequently tagged by fluorescence

labels and hybridized on a DNA

microarray, thereby allowing the

identification of genomic regulatory

DNA elements. Control experiments are

carried out to detect non-specific

background.

Page 40: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Methods to analyze transcription factor function.

(A) The modular structure of TFs. Domains which function in

activation, repression, dimerization or protein–protein

interaction, and DNA-binding are color coded in red, green,

yellow, or blue, respectively.

(B) Overview of methods which can be used to elucidate TF

functions.

For details see text. Y2H, yeast two-hybrid; Y1H, yeast one-

hybrid; B1H, bacterial one-hybrid; P2H, protoplast two-hybrid;

BiFC, bimolecular fluorescence complementation; FRET,

fluorescence resonance energy transfer; Co-IP, co-immuno-

precipitation; PTA, protoplast transactivation; EMSA,

electrophoretic mobility shift assay; SELEX, systematic

evolution of ligands by exponential enrichment; DamID, DNA

adenine methylation identification; ChIP, chromatin immuno-

precipitation; ChIP-chip, ChIP combined with tilling array

technology; ChIP-seq, ChIP combined with sequencing of

immunoprecipitated DNA fragments; CRES-T, chimeric

repressor gene silencing technology; RNAi, RNA interference;

TSS, Transcriptional start site.

Page 41: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 42: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Naslednja generacija sekveniranja in njena uporaba

Page 43: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

High-throughput sequencing technologies.The read length, the number of reads and the total amount of sequence generated in a

typical run are indicated for each of the new-generation high-throughput sequencing

technologies. a| 454 GS FLX pyrosequencing. Oligonucleotide adaptors are ligated

to fragmented DNA and immobilized to the surface of microscopic beads before

PCR amplification in an oil–droplet emulsion. Beads are isolated in picolitre wells

and incubated with dNTPs, DNA polymerase and beads bearing enzymes for the

chemiluminescent reaction. Incorporation of a nucleotide into the complementary

strand releases pyrophosphate, which is used to produce ATP. This, in turn, provides

the energy for the generation of light. The light emitted is recorded as an image for

analysis. b| SOLiD sample preparation is similar to that of 454 pyrosequencing.

After amplification, the beads are immobilized onto a custom substrate. A primer

that is complementary to the adaptor sequence (green), random oligonucleotides with

known 3' dinucleotides and a corresponding fluorophore are hybridized sequentially

along the sequence and image data collected. After five repeats, the complementary

strand is melted away and a new primer is added to the adaptor sequence, ending at a

position one nucleotide upstream of the previous primer. Second-strand synthesis is

repeated, allowing two-colour encoding and double reading of each of the target

nucleotides. Repeats of these cycles ensure that nucleotides in the gap between

known dinucleotides are read. Knowledge of the first base in the adaptor reveals the

dinucleotide using the colour-space scheme. In other words, knowing that the last

adaptor nucleotide is T and the colour is red means that the first base to be sequenced

must be A. Knowing that the first base is A and the colour is green means that the

next base must be C and so on. c| For Solexa GA sequencing, adaptors are ligated

onto DNA and used to anchor the fragments to a prepared substrate. Fold-back PCR

results in isolated spots of DNA of a large enough quantity that the amassed

fluorophore can be detected. Terminator nucleotides and DNA polymerase are then

used to create complementary-strand DNA. Images are collected at the end of each

cycle before the terminator is removed. d| Heliscope sequencing immobilizes

unamplified DNA with ligated adaptors to a substrate. Each species of dNTP with a

bright fluorophore attached is used sequentially to create second-strand DNA; a

'virtual terminator' prevents the inclusion of more than one nucleotide per strand and

cycle, and background signal is reduced by removal of 'used' fluorophore at the start

of each cycle. e| The new sequencing method developed by Pacific Biosciences

occurs in zeptolitre wells that contain an immobilized DNA polymerase. DNA and

dNTPs are added for synthesis. Fluorophores are cleaved from the complementary

strand as it grows and diffuse away, allowing single nucleotides to be read.

Continuous detection of fluorescence in the detection volume and high dNTP

concentration allow extremely fast and long reading.

Page 44: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Sequencing protocols are the molecular steps by which specific information in these cells is captured and

transformed into a population of adaptor-flanked DNA fragments for sequencing. Future applications of

sequencing may arise through new combinations of steps, the introduction of new steps or entirely new

approaches to sequencing

Page 45: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

High-throughput sequencing platforms.

The schematic shows the main high-

throughput sequencing platforms available

to microbiologists today, and the

associated sample preparation and

template amplification procedures.

Page 46: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Comparison of next-generation sequencing platforms

Page 47: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Sequencing technologies and their uses.Various NGS methods can precisely map and quantify

chromatin features, DNA modifications and several

specific steps in the cascade of information from

transcription to translation. These technologies can be

applied in a variety of medically relevant settings,

including uncovering regulatory mechanisms and

expression profiles that distinguish normal and cancer

cells, and identifying disease biomarkers, particularly

regulatory variants that fall outside of protein coding

regions. Together, these methods can be used for

integrated personal omics profiling to map all

regulatory and functional elements in an individual.

Using this basal profile, dynamics of the various

components can be studied in the context of disease,

infection, treatment options, and so on. Such studies will

be the cornerstone of personalized and predictive

medicine.

Page 48: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

A High Diversity of Next-Generation or Deep Sequencing Approaches Is Currently Available for Profiling Genomes,

Epigenomes, Methylomes, and Transcriptomes. A plethora of deep sequencing approaches are now available, ranging from

approaches to map the primary sequence of DNA (whole-genome-seq and exome-seq), map DNA methylation marks (meDIP-

seq, 5-hmC-seq, and many others), profile chromatin structure (MNase-seq, DNase-seq, and FAIRE-seq), profile all the

different stages of the transcriptome (GRO-seq, RNA-seq, and ribo-seq), profile transcription factors, cofactors, and histone

marks (ChIP-seq), profile RNA interactions to the genome or the transcriptome (ChIRP-seq and CLIP-seq and variants), and

finally profile the structure of the genome in the tridimensional space (ChIA-PET, HiC, and several others). All these

approaches are now available for the neurobiology community and are primed to revolutionize the field.

Page 49: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Next-generation DNA sequencing of paired-end tags

(PET) for transcriptome and genome analyses. PET

applications to address genome biology questions.

Cells have many different mechanisms for processing,

modifying, controlling, and transducing information

encoded in the genome. The PET technology can be

applied to investigate many questions regarding

nuclear processes, such as transcriptomes by RNA-

PET, transcription and epigenetic regulation by ChIP-

PET and ChIA-PET, as well as genome structural

variation by DNA-PET. Examples of PET data from

GIS-PET (an early version of RNA-PET), ChIP-PET, and

ChIA-PET experiments of human breast cancer MCF-7

cells with estrogen induction treatment at the TFF1 locus

(chr21:42,653,000-42,673,000) are shown: the high level

of TFF1 gene expression and the low level of TMPRSS3

gene expression; the ERα binding at the TFF1 promoter

and enhancer sites; and the long-range chromatin

interactions between the two ERα binding sites. An

example of DNA-PET data at the TNFRSF14 locus in the

genome of MCF-7 cells shows an inversion event

detected by two clusters of discordant DNA-PET cluster

mapping.

Page 50: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 51: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

High-throughput sequencing: HTS or HT-Seq

Page 52: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Overview of NGS-based analysis strategies.

Primary analysis: This part describes analyses steps that are based directly on

the reads and are physically derived from sequence comparisons: CNV,

chromosomal InDels are Insertions or Deletions (including translocations), SNP

annotation regards SNPs already known and described (e.g. in dbSNP) while de

novo SNP detection results from alleles detected via multiple sequencing

coverage of the SNP position. Alternative splice sites can be detected via

mapping to a splice junction library or by direct genomic alignment of exon

spanning reads (when reads or clusters are longer than 50 nucleotides), new

transcripts/loci are derived by direct mapping of novel exons and splice-

overlaps. Downstream analysis: This part differs for the three major application

areas: genomic DNA-seq is genomic resequencing employed in genome-wide-

association-studies (GWAS), the definition of Haplotypes and tumor typing

usually via tumor-specific chromosomal InDels. ChIP-Seq determines genome-

wide patterns of modified chromatin such as histone methylation or acetylation

status as well as binding regions for DNA-binding proteins, usually DNA-

dependent RNA polymerases or TFs leading to the definition of patterns such as

TFBSs. RNA-Seq determines the the genome-wide expression of known as well

as unknown transcripts, which can be identified by mapping of the RNA-Seq

sequence tags to the genome and the transcriptome. This will also identify most

splice variants. All three strategies converge into the biology-oriented

downstream analysis involving identification of pathways, cis-regulatory

modules and regulatory networks, which also involves the integration of prior

knowledge as depicted in the flanking database areas in addition to the

experimental data. Finally, meta-analysis allows merging of several lines of

evidence [NGS results and other results (e.g. proteomics, metabolomics, etc)]

into a more complete description of the underlying biology, via network

reconstruction, multiple correlations of various lines of evidence (such as histone

modifications, pol II binding and transcription rates) and the cross examination

of multiple experiments such as transcriptional profiles from several patients.

Page 53: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

How we are getting there: a subway map

of sequencing technology.Despite the disparate goals of different sequencing

experiments, the great variety of sequencing

experiments is a result of distinct combinations

of a relatively small set of core techniques,

which are represented as open circles or

‘stations’. Like subway lines, individual

sequencing experiments move from station to

station, until they ultimately arrive at a common

terminal: DNA sequencing. For example, the

initial demonstration of Hi-C was a comparative

experiment that progressed through cell culture,

cross-linking, proximity ligation, mechanical

shearing, affinity purification, adaptor ligation and

PCR amplification, before finally arriving at

sequencing. Other examples shown correspond to

sequencing applications in Table 1. For visual

clarity, not all stations and routes are shown. New

routes are being added regularly. TrAP, translating

ribosome affinity purification.

Page 54: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 55: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Strategies for next-generation sequencing in cancer. This schematic demonstrates the potential strategies for the

application of next-generation sequencing (NGS) in clinical oncology testing and research. Whole-genome sequencing

evaluates the entire genome and includes both gene-coding and non-coding regions. Exome sequencing uses baits to

hybridize and capture corresponding regions of the genome, focusing on the coding regions of the genome. Exome

sequencing can include the whole exome (about 20,000 genes), comprising just over 1% of the genome; alternatively, it can

focus on a panel of genes (hundreds of genes or more). Amplicon-based sequencing utilizes PCR amplification to isolate a

smaller region for sequencing. Transcriptome sequencing or RNAseq evaluates the expressed RNA and can be used to

measure gene expression, splice variants and nominate candidate gene fusions. Similar to exome sequencing, complementary

baits can be used to hybridize and capture portions of the transcriptome to focus on selected genes of interest.

Page 56: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

NGS transforms today’s biology

● Genome sequencing

● Comparative genomics

● Mutation discovery

● Gene expression

● DNA barcoding

● Metagenomics

● Epigenomics

• 2nd and 3rd generation sequencing instruments are

revolutionizing biological research.

• Earliest impacts have been on cancer genomics and

metagenomics.

• The extreme need for bioinformatics-based analytical

approaches to interpret these large data sets has

revitalized the field and introduced statistical and

mathematical rigor.

• Integration across data sets from DNA, RNA,

methylation, proteomics, etc. presents the next challenge

but provides comprehensive analytical power to inform

biology.

• With newer instruments, clinical applications have

potential for implementation, with appropriate interpretive

algorithms.

Page 57: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 58: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

1D in 3D analiza genomov

Page 59: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Dimensionality of the genome.The understanding of the human

genome has expanded with

advances of sequencing

technologies, from (A) 1D

sequencing of the human genome to

(B) 2D mapping of structural

variants (SVs) using methods such as

paired-end sequencing, (C) 3D

genome wide chromosomal

conformation capture using ChIA-

PET and Hi-C, and (D) four

dimensions across time.

Page 60: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

1D genome analysis. Mapping various genomic and

chromatin features along chromosomes yields 1D data. The

top part of the figure shows a screenshot of the Genome

browser (http://genome.ucsc.edu/), presenting genome

tracks for the ENCODE region Enm005. All data were

generated by the ENCODE pilot project and are publicly

available through the UCSC genome browser. Methods

used to collect data are indicated on the left.

Tracks indicate the presence of genes (GENCODE genes);

the presence of DNase-I-hypersensitive sites; the level of

RNA (expression profiling data from HeLa cells); the level

of acetylated H3 (H3Ac; generated by ChIP-chip); and

levels of RNA polymerase II (Pol2) binding, and levels of

acetylated H4 (H4Ac) and methylated H3 (H3K27me3) in

HeLa cells as determined by ChIP-chip.

The bottom part of the figure shows an analysis of

chromosomal domains, by using a combination of wavelet

analysis and Hidden-Markov-Model segmentation of 1D

datasets, including replication timing, DNase I

hypersensitivity and histone modifications.

Page 61: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

3D genome analysis. The spatial organization of genomes can be studied

using single-cell methods or using population-based methods, and at different

resolutions or length scales (A-D).

(A) A hypothetical pair of metaphase chromosomes. 1D compartmentalization is

indicated: constitutive heterochromatin domains include the centromere (cen),

pericentromeric heterochromatin (subcen het) and telomeres (tel). Chromosome arms

further consist of alternating active and repressed domains (indicated by different

colors). Numbers indicate (chromosomal) regions to be analysed by 3D methods in

panels B, C and D.

(B) Spatial organization of chromosomes shown in A in the interphase nucleus.

Chromosomal regions that are located far apart on the same chromosome (2 and 3) or

located on different chromosomes (1 and 2, 1 and 3) can colocalize in 3D to form

spatial compartments.

(C) A higher-resolution (Mb scale) analysis of cis and trans associations of

chromosomal regions 1 to 3 shown in B. At this resolution, associations of groups of

genes can be detected surrounding subnuclear structures such as transcription

factories (green circles) and splicing bodies [characterized by the presence of the

splicing factor SC35 (red)]. For example, a trans interaction between regions 1 and 2

can occur through colocalization to the same transcription factory or to two different

transcription factories that are both associated with one SC35 granule.

(D) High-resolution (Kb scale) analysis of 3D folding and long-range associations

that can be studied using 3C-based methods. At this scale, specific looping

interactions can be detected between genes and regulatory elements. This scheme

provides an example of a 3C analysis in which the interaction probability of a single

defined genomic element is mapped throughout the larger region 3 (right). Peaks in this

3C map indicate long-range interactions that suggest a looped conformation

(indicated on the left).

Page 62: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek
Page 63: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Schematic representation of 3C-based

methods.There are many methods derived from the original

3C design. Here, we present a few popular

methods. In brief, cells are cross-linked, and

chromatin is digested by restriction enzymes or

sonicated. The structures of protein complexes

containing DNA are preserved. These complexes

are then diluted to a very low concentration, and

ligation reactions are performed. Different

amplification strategies are used to measure the

relative cross-linking efficiency between loci.

3C is used to detect one specific interaction.

4C detects all possible interacting regions of one

given locus.

5C and HiC provide “many-to-many” interacting

efficiencies in a large genomic region or the whole

genome.

ChIA-PET includes immunoprecipitation to

specifically examine the long-range interactions

associated with a specific protein.3C = Chromosome conformation capture is a high-throughput molecular

biology technique used to analyze the organization of chromosomes in a cell's

natural state. Studying the structural properties and spatial organization of

chromosomes is important for the understanding and evaluation of the

regulation of gene expression, DNA replication and repair, and recombination.

Page 64: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Three-dimensional interpretation (left) of

regulatory and transcriptional complexity in one-

dimensional genome representation (right).

(A) The genome forms large complex clusters and

introspective folded clusters with specialized

transcription compartments. Each of these clusters

correlates to a collection of transcripts and

“background” ChIP-seq enrichment.

(B) Within each cluster the genome is folded to

associate with subnuclear structures containing

transcription factors and machinery, splicing, and

other accessory proteins. These associations

coregulate genes to generate interleaved complex

transcriptional networks of coding (blue) and

noncoding transcripts (green). Proximal cross-linking

with ChIP-seq results in a complex landscape of

enrichment across loci that reflect the folded genome

structure.

(C) Within each gene, local dynamic chromatin

folding determines the association of alternative

promoters and local noncoding RNAs with a shared

regulatory architecture, thereby mediating

coregulated gene expression.

Page 65: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Hype cycles in genome analysis (Biosciences)

Page 66: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek

Hype cycles

The Hype Cycle is a branded graphical tool developed and used by IT research and advisory firm Gartner for representing the maturity, adoption

and social application of specific technologies. Hype in new media (in the more general media sense of the term "hype") plays a large part in the

adoption of new media forms by society. Terry Flew states that hype (generally the enthusiastic and strong feeling around new forms of media

and technology in which people expect everything will be modified for the better surrounding new media technologies and their popularization,

along with the development of the Internet, is a common characteristic. But following shortly after the period of 'inflated expectations', as per the

diagram above, the new media technologies quickly fall into a period of disenchantment, which is the end of the primary, and strongest, phase of

hype.

Page 67: PR9 Genomske tehnologije metode za analizo genomov...PR9_Genomske tehnologije – metode za analizo genomov. Genome sequencing was driven by technological progress// Tehnološki napredek