45
Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup (to appear in HPCA 2019) Yatish Turakhia* Sneha Goenka* Prof. Gill Bejerano Prof. William J. Dally Stanford University *equal contribution

Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High

Speedup(to appear in HPCA 2019)

Yatish Turakhia*Sneha Goenka*

Prof. Gill BejeranoProf. William J. Dally

Stanford University

*equal contribution

Page 2: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Sections

1. Introduction and Applications of Whole Genome Alignment

2. Whole Genome Alignment: Algorithm and Case for Gapped Filtering

3. Darwin-WGA: Algorithm and Hardware-accelerator

4. Experimental Methodology and Results

2

Page 3: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Major Breakthroughs in Genomics

3

Genome Sequencing

~$1K per human genome

Page 4: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Major Breakthroughs in Genomics

4

Genome Sequencing Genome Editing

~$1K per human genome ~$15K per CRISPR knockout experiment per gene per human stem cell line

Page 5: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Number of Species in Genome Databases Are Growing Rapidly

5

Page 6: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Number of Species in Genome Databases Are Growing Rapidly

6

Lots more sequencing underway!

Page 7: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Number of Species in Genome Databases Are Growing Rapidly

7

Comparing genomic sequences imperative for extracting knowledge (identifying functional elements and understanding evolution at the molecular level)

Lots more sequencing underway!

Page 8: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

N species => O(N2) pairwise comparisons

• In ~5 years, with ~10K vertebrate genomes available, up to 100M pairwise comparisons!

8

Page 9: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

N species => O(N2) pairwise comparisons

• In ~5 years, with ~10K vertebrate genomes available, up to 100M pairwise comparisons!

• Single human-mouse pairwise comparison with a state-of-the-art tool (LASTZ) requires 36 hours on AWS c4.x8large instance (costing around $57)

• Cost of all pairwise comparisons >> Sequencing + Assembly costs

9

~$57 per mammalian pair!

Page 10: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Evolution = Mutation + Selection

10

Neutral Drift Positive SelectionNegative Selection

Time • Harmful mutations are ”weeded out” of the population (negative selection)

• Beneficial mutations spread throughout the population and remain “conserved” (positive selection)

• Mutations lacking a function undergo a “neutral drift” with time

• Higher than expected sequence conservationacross species implies all sorts of biological function*

*Note: lack of conservation does not imply lack of function

Population

Page 11: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Whole Genome Alignments help predict functional elements

11

•Function elements include genes (protein recipes) and regulatory sequences (“control logic” for expressing genes, including enhancers, promoters, repressors etc.)

• In comparative genomics, we assume excess conservation implies function

Regions with sequence conservation

Page 12: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Whole Genome Alignments help predict functional elements

12

•Function elements include genes (protein recipes) and regulatory sequences (“control logic” for expressing genes, including enhancers, promoters, repressors etc.)

• In comparative genomics, we assume excess conservation implies functionApolipoprotein A1 geneEnhancer

Regions with sequence conservation

Functional screen (eg.CRISPR knockout or mRNA-seq)

Page 13: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Sections

1. Introduction and Applications of Whole Genome Alignment

2. Whole Genome Alignment: Algorithm and Case for Gapped Filtering

3. Darwin-WGA: Algorithm and Hardware-accelerator

4. Experimental Methodology and Results

13

Page 14: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Smith-Waterman Algorithm

14

H i, j = max)H i − 1, j − 1 +W(r0, q2)

H i − 1, j + gapH i, j − 1 + gap

A C G TA 2 -1 -1 -1C -1 2 -1 -1G -1 -1 2 -1T -1 -1 -1 2

W =

gap = 1

Smith-waterman equations: Scoring Parameters:1 2

Page 15: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Smith-Waterman Algorithm

15

* G C G A C T T T* 0 0 0 0 0 0 0 0 0G 0 2 1 2 1 0 0 0 0T 0 1 1 1 1 0 2 2 2C 0 0 3 2 1 3 2 1 1G 0 2 2 5 4 3 2 1 0T 0 1 1 4 4 3 5 4 3T 0 0 0 3 3 3 5 7 6T 0 0 0 2 2 2 5 7 9

Quer

y

Target

H i, j = max)H i − 1, j − 1 +W(r0, q2)

H i − 1, j + gapH i, j − 1 + gap

A C G TA 2 -1 -1 -1C -1 2 -1 -1G -1 -1 2 -1T -1 -1 -1 2

W =

gap = 1

Smith-waterman equations: Scoring Parameters:

Matrix Fill:

1 2

3

Page 16: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Smith-Waterman Algorithm

16

* G C G A C T T T* 0 0 0 0 0 0 0 0 0G 0 2 1 2 1 0 0 0 0T 0 1 1 1 1 0 2 2 2C 0 0 3 2 1 3 2 1 1G 0 2 2 5 4 3 2 1 0T 0 1 1 4 4 3 5 4 3T 0 0 0 3 3 3 5 7 6T 0 0 0 2 2 2 5 7 9

Quer

y

Target

G - C G A C T T T| | | | | |G T C G - - T T T

Alignment

Score = 9

H i, j = max)H i − 1, j − 1 +W(r0, q2)

H i − 1, j + gapH i, j − 1 + gap

A C G TA 2 -1 -1 -1C -1 2 -1 -1G -1 -1 2 -1T -1 -1 -1 2

W =

gap = 1

Smith-waterman equations: Scoring Parameters:

Matrix Fill: Traceback:

1 2

3 4

Page 17: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Smith-Waterman Algorithm: Intractable on Whole Genomes

•O(L1 . L2) for sequences of lengths L1and L2

•For human genome (L1 ~ 3 x 109 bp) and mouse genome (L2 ~ 3 x 109 bp), Smith-Waterman would require computing O(1018) cells

•Intractable on whole genomes!

•All software whole-genome aligners based on the heuristic seed-filter-extend paradigm

17

Some very long alignments (several Mbp) with large indels Many small alignments (30-1Kbp)

Page 18: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Whole Genome Alignments Using LASTZ

18

Que

ry

Target

Seed hits

Seeding

Page 19: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Whole Genome Alignments Using LASTZ

19

Que

ry

Target

Seed hits

Que

ry

Target

Seeding Filtering (Ungapped)

Anchors (score > h)

No gaps allowed in extension

Page 20: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Whole Genome Alignments Using LASTZ

20

Que

ry

Target

Seed hits

Que

ry

Target

Que

ry

Target

Seeding Filtering (Ungapped) Extension (Y-drop)

Anchors (score > h)

No gaps allowed in extension Adjacent anchors can be merged during extension

Page 21: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Ungapped Filtering in LASTZ is Problematic for Sensitivity

21

Page 22: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Ungapped Filtering in LASTZ is Problematic for Sensitivity

22

• Both papers reduced upgapped filtering threshold in LASTZ for higher sensitivity

• ~100X slowdown!

Page 23: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Ungapped Filtering in LASTZ is Problematic for Sensitivity

23

• Both papers reduced upgapped filtering threshold in LASTZ for higher sensitivity

• ~100X slowdown!• Sensitivity increase higher for more distant

species• Many alignments in the “twilight zone”

Page 24: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Frequent Indels: Case for Gapped Filtering

24

Page 25: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Frequent Indels: Case for Gapped Filtering

25

Indel frequency increases for more distantly related species!

Ungappedthreshold in

LASTZ

Page 26: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Sections

1. Introduction and Applications of Whole Genome Alignment

2. Whole Genome Alignment: Algorithm and Case for Gapped Filtering

3. Darwin-WGA: Algorithm and Hardware-accelerator

4. Experimental Methodology and Results

26

Page 27: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA Framework

27

•Darwin-WGA replaces the ungapped filtering used in LASTZ by gapped filtering using Banded Smith-Waterman (BSW) algorithm

•First WGA tool to incorporate dynamic programming in the filtering stage•Extension uses novel GACT-X algorithm that combines previously proposed GACT with X-drop

•Dynamic programming stages (filtering and extension) are accelerated in hardware

Page 28: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA: Algorithm Overview

28

Que

ry

Target

Chunk 1

Chunk 2

Chunk 3

Seed hit

Target

Que

ry!max

Outside tile overlapInside tile overlapOn tile overlap border

Anchor

T1

T2

T3

Que

ry

Target

Seeding Filtering (BSW) Extension (GACT-X)

Page 29: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA: Hardware-acceleration using Systolic Arrays

29

Page 30: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA: Hardware-acceleration using Systolic Arrays

30

Page 31: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA: Hardware-acceleration using Systolic Arrays

31

Page 32: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA: Hardware-acceleration using Systolic Arrays

32

Page 33: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA: Hardware-acceleration using Systolic Arrays

33

Page 34: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA: Hardware-acceleration using Systolic Arrays

34

GACT-X• Stripe boundaries

where wavefrontscore drops below (max-Y)

Banded-SW• Stripe boundaries

deterministic based on band size (B)

• No traceback

Page 35: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Sections

1. Introduction and Applications of Whole Genome Alignment

2. Whole Genome Alignment: Algorithm and Case for Gapped Filtering

3. Darwin-WGA: Algorithm and Hardware-accelerator

4. Experimental Methodology and Results

35

Page 36: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Experimental Setup

36

Configuration Area (mm2) Power (W)

BSW Logic 64 x (64PE array) 16.6 25.6

GACT-X Logic 12 x (64PE array) 4.2 6.72Traceback SRAM 12 x (64PE x 16KB/PE) 15.1 7.92

DRAM DDR4-2400R 4 x 32GB - 3.10

TOTAL 35.9 43.340

50100150200250

CPU FPGA ASIC

Pow

er (i

n W

)

CPU (baseline)• AWS c4.8xlarge instance• 36 vCPUs (18 physical cores)• LASTZ as software baseline,

Parasail to estimate iso-sensitive runtime, cores fully-utilized

• $1.59/hour

FPGA (Darwin-WGA)• AWS F1 f1.2xlarge instance• 1 Xilinx Virtex Ultrascale+ FPGA

(50 BSW and 2 GACT-X arrays with 32PEs)

• 8 vCPUs • $1.65/hour

ASIC (Darwin-WGA)• TSMC 40nm DC synthesis (not a chip prototype)

Page 37: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Species and Genome Assembly

37

Target species Size (Mbp)

Query Species Size (Mbp)

Caenorhabditis elegans (ce11)

100 Caenorhabditis briggsae (cb4)

105

Drosophila melanogaster

(dm6)

137

Drosophila simulans(droSim1)

110

Drosophila yakuba(droYak2)

120

Drosophila pseudoobscura

(dp4)

127

• dm6-droSim1 molecular distance comparable to human-monkey• dm6-dp4 molecular distance comparable to human-chicken

Page 38: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA Sensitivity Improvement Versus LASTZ

38

Species pair Top-10 Alignment Chain Scores

Matching Base-pairs within Alignments

Number of Aligning Exons (protein-coding genes)

ce11-cb4 +5.73% 3.12x +2.70%dm6-droSim1 +0.03% 1.25x +0.20%dm6-droYak2 +0.05% 1.41x +0.09%

dm6-dp4 +1.86% 1.42x +0.41%

orthologous sequences (derived from “speciation”), mostly neutrally evolving

Page 39: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA Sensitivity Improvement Versus LASTZ

39

Species pair Top-10 Alignment Chain Scores

Matching Base-pairs within Alignments

Number of Aligning Exons (protein-coding genes)

ce11-cb4 +5.73% 3.12x +2.70%dm6-droSim1 +0.03% 1.25x +0.20%dm6-droYak2 +0.05% 1.41x +0.09%

dm6-dp4 +1.86% 1.42x +0.41%

orthologous sequences (derived from “speciation”), mostly neutrally evolving

paralagous sequences (derived from “duplication”), mostly neutrally evolving

Page 40: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA Sensitivity Improvement Versus LASTZ

40

Species pair Top-10 Alignment Chain Scores

Matching Base-pairs within Alignments

Number of Aligning Exons (protein-coding genes)

ce11-cb4 +5.73% 3.12x +2.70%dm6-droSim1 +0.03% 1.25x +0.20%dm6-droYak2 +0.05% 1.41x +0.09%

dm6-dp4 +1.86% 1.42x +0.41%

orthologous sequences (derived from “speciation”), mostly neutrally evolving

paralagous sequences (derived from “duplication”), mostly neutrally evolving

functionally relevant orthologous sequences, under some selective pressure (at least in the target species)

Page 41: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Darwin-WGA Sensitivity Improvement Versus LASTZ

41

Species pair Top-10 Alignment Chain Scores

Matching Base-pairs within Alignments

Number of Aligning Exons (protein-coding genes)

ce11-cb4 +5.73% 3.12x +2.70%dm6-droSim1 +0.03% 1.25x +0.20%dm6-droYak2 +0.05% 1.41x +0.09%

dm6-dp4 +1.86% 1.42x +0.41%

orthologous sequences (derived from “speciation”), mostly neutrally evolving

paralagous sequences (derived from “duplication”), mostly neutrally evolving

functionally relevant orthologous sequences, under some selective pressure (at least in the target species)

False positive rate (2-mer shuffled genome): 0.0007%

Page 42: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Example: Improved Darwin-WGA Sensitivity

42

Indels (shown by arrows) around each seed hit – dropped by ungapped filtering (LASTZ) but retained by gapped filtering (Darwin-WGA)

Page 43: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Runtime and Cost Comparison

43

Species pair LASTZ runtime

(sec)

Iso-sensitive s/w runtime

(sec)

Darwin-WGA runtime (sec) Darwin-WGA ImprovementFPGA ASIC FPGA

(Perf/$)ASIC

(Perf/W)ce11-cb4 481 64,960 3,823 219 19.1x 1,478x

dm6-droSim1 643 142,627 5,936 461 23.2x 1,547xdm6-droYak2 654 144,454 6,001 469 23.2x 1,539x

dm6-dp4 557 125,700 4,987 404 24.3x 1,553x

200x slowdown22x speedup

306x speedup

Page 44: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Summary

•WGA using LASTZ on a single pair of mammals costs ~$57 on AWS instances

•1K genomes => 1M WGA pairs possible

•Darwin-WGA replaces ungapped (gapless) filtering in LASTZ by Banded Smith-Waterman algorithm for higher sensitivity (up to 3x matching base-pairs, 5.7% more orthologs, 2.1% more exons)

•Additional sensitivity costs 200x slowdown in software

•FPGA: 24x performance/$ improvement

•ASIC: 1,500x performance/Watt improvement

44

Page 45: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... Talks/2019/Yatish_Turakhia.pdfDarwin-WGA: A Co-processor for Whole Genome Alignments Darwin-WGA Sensitivity Improvement

Darwin-WGA: A Co-processor for Whole Genome Alignments

Thank You!

45