Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High
Speedup(to appear in HPCA 2019)
Yatish Turakhia*Sneha Goenka*
Prof. Gill BejeranoProf. William J. Dally
Stanford University
*equal contribution
Darwin-WGA: A Co-processor for Whole Genome Alignments
Sections
1. Introduction and Applications of Whole Genome Alignment
2. Whole Genome Alignment: Algorithm and Case for Gapped Filtering
3. Darwin-WGA: Algorithm and Hardware-accelerator
4. Experimental Methodology and Results
2
Darwin-WGA: A Co-processor for Whole Genome Alignments
Major Breakthroughs in Genomics
3
Genome Sequencing
~$1K per human genome
Darwin-WGA: A Co-processor for Whole Genome Alignments
Major Breakthroughs in Genomics
4
Genome Sequencing Genome Editing
~$1K per human genome ~$15K per CRISPR knockout experiment per gene per human stem cell line
Darwin-WGA: A Co-processor for Whole Genome Alignments
Number of Species in Genome Databases Are Growing Rapidly
5
Darwin-WGA: A Co-processor for Whole Genome Alignments
Number of Species in Genome Databases Are Growing Rapidly
6
Lots more sequencing underway!
Darwin-WGA: A Co-processor for Whole Genome Alignments
Number of Species in Genome Databases Are Growing Rapidly
7
Comparing genomic sequences imperative for extracting knowledge (identifying functional elements and understanding evolution at the molecular level)
Lots more sequencing underway!
Darwin-WGA: A Co-processor for Whole Genome Alignments
N species => O(N2) pairwise comparisons
• In ~5 years, with ~10K vertebrate genomes available, up to 100M pairwise comparisons!
8
Darwin-WGA: A Co-processor for Whole Genome Alignments
N species => O(N2) pairwise comparisons
• In ~5 years, with ~10K vertebrate genomes available, up to 100M pairwise comparisons!
• Single human-mouse pairwise comparison with a state-of-the-art tool (LASTZ) requires 36 hours on AWS c4.x8large instance (costing around $57)
• Cost of all pairwise comparisons >> Sequencing + Assembly costs
9
~$57 per mammalian pair!
Darwin-WGA: A Co-processor for Whole Genome Alignments
Evolution = Mutation + Selection
10
Neutral Drift Positive SelectionNegative Selection
Time • Harmful mutations are ”weeded out” of the population (negative selection)
• Beneficial mutations spread throughout the population and remain “conserved” (positive selection)
• Mutations lacking a function undergo a “neutral drift” with time
• Higher than expected sequence conservationacross species implies all sorts of biological function*
*Note: lack of conservation does not imply lack of function
Population
Darwin-WGA: A Co-processor for Whole Genome Alignments
Whole Genome Alignments help predict functional elements
11
•Function elements include genes (protein recipes) and regulatory sequences (“control logic” for expressing genes, including enhancers, promoters, repressors etc.)
• In comparative genomics, we assume excess conservation implies function
Regions with sequence conservation
Darwin-WGA: A Co-processor for Whole Genome Alignments
Whole Genome Alignments help predict functional elements
12
•Function elements include genes (protein recipes) and regulatory sequences (“control logic” for expressing genes, including enhancers, promoters, repressors etc.)
• In comparative genomics, we assume excess conservation implies functionApolipoprotein A1 geneEnhancer
Regions with sequence conservation
Functional screen (eg.CRISPR knockout or mRNA-seq)
Darwin-WGA: A Co-processor for Whole Genome Alignments
Sections
1. Introduction and Applications of Whole Genome Alignment
2. Whole Genome Alignment: Algorithm and Case for Gapped Filtering
3. Darwin-WGA: Algorithm and Hardware-accelerator
4. Experimental Methodology and Results
13
Darwin-WGA: A Co-processor for Whole Genome Alignments
Smith-Waterman Algorithm
14
H i, j = max)H i − 1, j − 1 +W(r0, q2)
H i − 1, j + gapH i, j − 1 + gap
A C G TA 2 -1 -1 -1C -1 2 -1 -1G -1 -1 2 -1T -1 -1 -1 2
W =
gap = 1
Smith-waterman equations: Scoring Parameters:1 2
Darwin-WGA: A Co-processor for Whole Genome Alignments
Smith-Waterman Algorithm
15
* G C G A C T T T* 0 0 0 0 0 0 0 0 0G 0 2 1 2 1 0 0 0 0T 0 1 1 1 1 0 2 2 2C 0 0 3 2 1 3 2 1 1G 0 2 2 5 4 3 2 1 0T 0 1 1 4 4 3 5 4 3T 0 0 0 3 3 3 5 7 6T 0 0 0 2 2 2 5 7 9
Quer
y
Target
H i, j = max)H i − 1, j − 1 +W(r0, q2)
H i − 1, j + gapH i, j − 1 + gap
A C G TA 2 -1 -1 -1C -1 2 -1 -1G -1 -1 2 -1T -1 -1 -1 2
W =
gap = 1
Smith-waterman equations: Scoring Parameters:
Matrix Fill:
1 2
3
Darwin-WGA: A Co-processor for Whole Genome Alignments
Smith-Waterman Algorithm
16
* G C G A C T T T* 0 0 0 0 0 0 0 0 0G 0 2 1 2 1 0 0 0 0T 0 1 1 1 1 0 2 2 2C 0 0 3 2 1 3 2 1 1G 0 2 2 5 4 3 2 1 0T 0 1 1 4 4 3 5 4 3T 0 0 0 3 3 3 5 7 6T 0 0 0 2 2 2 5 7 9
Quer
y
Target
G - C G A C T T T| | | | | |G T C G - - T T T
Alignment
Score = 9
H i, j = max)H i − 1, j − 1 +W(r0, q2)
H i − 1, j + gapH i, j − 1 + gap
A C G TA 2 -1 -1 -1C -1 2 -1 -1G -1 -1 2 -1T -1 -1 -1 2
W =
gap = 1
Smith-waterman equations: Scoring Parameters:
Matrix Fill: Traceback:
1 2
3 4
Darwin-WGA: A Co-processor for Whole Genome Alignments
Smith-Waterman Algorithm: Intractable on Whole Genomes
•O(L1 . L2) for sequences of lengths L1and L2
•For human genome (L1 ~ 3 x 109 bp) and mouse genome (L2 ~ 3 x 109 bp), Smith-Waterman would require computing O(1018) cells
•Intractable on whole genomes!
•All software whole-genome aligners based on the heuristic seed-filter-extend paradigm
17
Some very long alignments (several Mbp) with large indels Many small alignments (30-1Kbp)
Darwin-WGA: A Co-processor for Whole Genome Alignments
Whole Genome Alignments Using LASTZ
18
Que
ry
Target
Seed hits
Seeding
Darwin-WGA: A Co-processor for Whole Genome Alignments
Whole Genome Alignments Using LASTZ
19
Que
ry
Target
Seed hits
Que
ry
Target
Seeding Filtering (Ungapped)
Anchors (score > h)
No gaps allowed in extension
Darwin-WGA: A Co-processor for Whole Genome Alignments
Whole Genome Alignments Using LASTZ
20
Que
ry
Target
Seed hits
Que
ry
Target
Que
ry
Target
Seeding Filtering (Ungapped) Extension (Y-drop)
Anchors (score > h)
No gaps allowed in extension Adjacent anchors can be merged during extension
Darwin-WGA: A Co-processor for Whole Genome Alignments
Ungapped Filtering in LASTZ is Problematic for Sensitivity
21
Darwin-WGA: A Co-processor for Whole Genome Alignments
Ungapped Filtering in LASTZ is Problematic for Sensitivity
22
• Both papers reduced upgapped filtering threshold in LASTZ for higher sensitivity
• ~100X slowdown!
Darwin-WGA: A Co-processor for Whole Genome Alignments
Ungapped Filtering in LASTZ is Problematic for Sensitivity
23
• Both papers reduced upgapped filtering threshold in LASTZ for higher sensitivity
• ~100X slowdown!• Sensitivity increase higher for more distant
species• Many alignments in the “twilight zone”
Darwin-WGA: A Co-processor for Whole Genome Alignments
Frequent Indels: Case for Gapped Filtering
24
Darwin-WGA: A Co-processor for Whole Genome Alignments
Frequent Indels: Case for Gapped Filtering
25
Indel frequency increases for more distantly related species!
Ungappedthreshold in
LASTZ
Darwin-WGA: A Co-processor for Whole Genome Alignments
Sections
1. Introduction and Applications of Whole Genome Alignment
2. Whole Genome Alignment: Algorithm and Case for Gapped Filtering
3. Darwin-WGA: Algorithm and Hardware-accelerator
4. Experimental Methodology and Results
26
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA Framework
27
•Darwin-WGA replaces the ungapped filtering used in LASTZ by gapped filtering using Banded Smith-Waterman (BSW) algorithm
•First WGA tool to incorporate dynamic programming in the filtering stage•Extension uses novel GACT-X algorithm that combines previously proposed GACT with X-drop
•Dynamic programming stages (filtering and extension) are accelerated in hardware
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA: Algorithm Overview
28
Que
ry
Target
Chunk 1
Chunk 2
Chunk 3
Seed hit
Target
Que
ry!max
Outside tile overlapInside tile overlapOn tile overlap border
Anchor
T1
T2
T3
Que
ry
Target
Seeding Filtering (BSW) Extension (GACT-X)
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA: Hardware-acceleration using Systolic Arrays
29
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA: Hardware-acceleration using Systolic Arrays
30
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA: Hardware-acceleration using Systolic Arrays
31
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA: Hardware-acceleration using Systolic Arrays
32
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA: Hardware-acceleration using Systolic Arrays
33
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA: Hardware-acceleration using Systolic Arrays
34
GACT-X• Stripe boundaries
where wavefrontscore drops below (max-Y)
Banded-SW• Stripe boundaries
deterministic based on band size (B)
• No traceback
Darwin-WGA: A Co-processor for Whole Genome Alignments
Sections
1. Introduction and Applications of Whole Genome Alignment
2. Whole Genome Alignment: Algorithm and Case for Gapped Filtering
3. Darwin-WGA: Algorithm and Hardware-accelerator
4. Experimental Methodology and Results
35
Darwin-WGA: A Co-processor for Whole Genome Alignments
Experimental Setup
36
Configuration Area (mm2) Power (W)
BSW Logic 64 x (64PE array) 16.6 25.6
GACT-X Logic 12 x (64PE array) 4.2 6.72Traceback SRAM 12 x (64PE x 16KB/PE) 15.1 7.92
DRAM DDR4-2400R 4 x 32GB - 3.10
TOTAL 35.9 43.340
50100150200250
CPU FPGA ASIC
Pow
er (i
n W
)
CPU (baseline)• AWS c4.8xlarge instance• 36 vCPUs (18 physical cores)• LASTZ as software baseline,
Parasail to estimate iso-sensitive runtime, cores fully-utilized
• $1.59/hour
FPGA (Darwin-WGA)• AWS F1 f1.2xlarge instance• 1 Xilinx Virtex Ultrascale+ FPGA
(50 BSW and 2 GACT-X arrays with 32PEs)
• 8 vCPUs • $1.65/hour
ASIC (Darwin-WGA)• TSMC 40nm DC synthesis (not a chip prototype)
Darwin-WGA: A Co-processor for Whole Genome Alignments
Species and Genome Assembly
37
Target species Size (Mbp)
Query Species Size (Mbp)
Caenorhabditis elegans (ce11)
100 Caenorhabditis briggsae (cb4)
105
Drosophila melanogaster
(dm6)
137
Drosophila simulans(droSim1)
110
Drosophila yakuba(droYak2)
120
Drosophila pseudoobscura
(dp4)
127
• dm6-droSim1 molecular distance comparable to human-monkey• dm6-dp4 molecular distance comparable to human-chicken
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA Sensitivity Improvement Versus LASTZ
38
Species pair Top-10 Alignment Chain Scores
Matching Base-pairs within Alignments
Number of Aligning Exons (protein-coding genes)
ce11-cb4 +5.73% 3.12x +2.70%dm6-droSim1 +0.03% 1.25x +0.20%dm6-droYak2 +0.05% 1.41x +0.09%
dm6-dp4 +1.86% 1.42x +0.41%
orthologous sequences (derived from “speciation”), mostly neutrally evolving
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA Sensitivity Improvement Versus LASTZ
39
Species pair Top-10 Alignment Chain Scores
Matching Base-pairs within Alignments
Number of Aligning Exons (protein-coding genes)
ce11-cb4 +5.73% 3.12x +2.70%dm6-droSim1 +0.03% 1.25x +0.20%dm6-droYak2 +0.05% 1.41x +0.09%
dm6-dp4 +1.86% 1.42x +0.41%
orthologous sequences (derived from “speciation”), mostly neutrally evolving
paralagous sequences (derived from “duplication”), mostly neutrally evolving
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA Sensitivity Improvement Versus LASTZ
40
Species pair Top-10 Alignment Chain Scores
Matching Base-pairs within Alignments
Number of Aligning Exons (protein-coding genes)
ce11-cb4 +5.73% 3.12x +2.70%dm6-droSim1 +0.03% 1.25x +0.20%dm6-droYak2 +0.05% 1.41x +0.09%
dm6-dp4 +1.86% 1.42x +0.41%
orthologous sequences (derived from “speciation”), mostly neutrally evolving
paralagous sequences (derived from “duplication”), mostly neutrally evolving
functionally relevant orthologous sequences, under some selective pressure (at least in the target species)
Darwin-WGA: A Co-processor for Whole Genome Alignments
Darwin-WGA Sensitivity Improvement Versus LASTZ
41
Species pair Top-10 Alignment Chain Scores
Matching Base-pairs within Alignments
Number of Aligning Exons (protein-coding genes)
ce11-cb4 +5.73% 3.12x +2.70%dm6-droSim1 +0.03% 1.25x +0.20%dm6-droYak2 +0.05% 1.41x +0.09%
dm6-dp4 +1.86% 1.42x +0.41%
orthologous sequences (derived from “speciation”), mostly neutrally evolving
paralagous sequences (derived from “duplication”), mostly neutrally evolving
functionally relevant orthologous sequences, under some selective pressure (at least in the target species)
False positive rate (2-mer shuffled genome): 0.0007%
Darwin-WGA: A Co-processor for Whole Genome Alignments
Example: Improved Darwin-WGA Sensitivity
42
Indels (shown by arrows) around each seed hit – dropped by ungapped filtering (LASTZ) but retained by gapped filtering (Darwin-WGA)
Darwin-WGA: A Co-processor for Whole Genome Alignments
Runtime and Cost Comparison
43
Species pair LASTZ runtime
(sec)
Iso-sensitive s/w runtime
(sec)
Darwin-WGA runtime (sec) Darwin-WGA ImprovementFPGA ASIC FPGA
(Perf/$)ASIC
(Perf/W)ce11-cb4 481 64,960 3,823 219 19.1x 1,478x
dm6-droSim1 643 142,627 5,936 461 23.2x 1,547xdm6-droYak2 654 144,454 6,001 469 23.2x 1,539x
dm6-dp4 557 125,700 4,987 404 24.3x 1,553x
200x slowdown22x speedup
306x speedup
Darwin-WGA: A Co-processor for Whole Genome Alignments
Summary
•WGA using LASTZ on a single pair of mammals costs ~$57 on AWS instances
•1K genomes => 1M WGA pairs possible
•Darwin-WGA replaces ungapped (gapless) filtering in LASTZ by Banded Smith-Waterman algorithm for higher sensitivity (up to 3x matching base-pairs, 5.7% more orthologs, 2.1% more exons)
•Additional sensitivity costs 200x slowdown in software
•FPGA: 24x performance/$ improvement
•ASIC: 1,500x performance/Watt improvement
44
Darwin-WGA: A Co-processor for Whole Genome Alignments
Thank You!
45