Upload
seattle-daml-meetup
View
155
Download
2
Embed Size (px)
Citation preview
Next generation of big data
7/29/2016 2
Video + sensorsGenomics data
© Microsoft Research. All rights reserved.
The digital universe is growing
7/29/2016 3
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
45,000,000
50,000,000
Petabytes
Digital Universe
Installed Capacity
Capacity Shipped
Source: IDC
© Microsoft Research. All rights reserved.
Dense, really dense
Vs.
Cold Storage: 1EB, Size: Two Walmart Supercenters
7/29/2016 4
It’s here!
3-4 orders of magnitude
denser than tape
© Microsoft Research. All rights reserved.
Durable
7/29/2016 5
(Illustration: Philipp Stössel/ETH Zurich)
DNA synthetic fossils survive:
Source: Grass et al. Robust Chemical Preservation of Digital
Information on DNA in Silica with Error-Correcting Codes
And readers never become obsolete
Time Temperature
1 week 70˚C
2,000 years 10˚C
2,000,000 years -18˚C
© Microsoft Research. All rights reserved.
A DNA storage primer
7/29/2016 6
G
0101000101011100
A C T A CG
GA C Tbases:
A C T
addressdata
© Microsoft Research. All rights reserved.
Encoding Synthesis
G
A
C
T
AG
CA
C
T
Sequencing Decoding
Random
Access
(write) (read)
Preservation
Copying DNA
7/29/2016 7
C GTGCGAG GA C T
address
G C C TG A C T G A C T
primer target primer target
C T G A
C GTGCGAG GA C TG C C TG A C T G A C T
C GTGCGAG GA C TG C C TG A C T G A C T
C GTGCGAG GA C TG C C TG A C T G A C T
C GTGCGAG GA C TG C C TG A C T G A C T
primer
© Microsoft Research. All rights reserved.
Random Access with DNA
7/29/2016 8
0101000101011100
C GTGCGAG GA C T
address
G A C TA A G A A A C G
primer target primer target
C GTGCGAG GA C T
address
G C C TG A C T G A C T
primer target primer target
C T G A
C GTGCGAG GA C TG C C TG A C T G A C T
C GTGCGAG GA C TG C C TG A C T G A C T
C GTGCGAG GA C TG C C TG A C T G A C T
C GTGCGAG GA C TG C C TG A C T G A C T
© Microsoft Research. All rights reserved.
selecting one
item out of two
DNA storage works
7/29/2016 9© Microsoft Research. All rights reserved.
Encoding Synthesis
G
A
C
T
AG
CA
C
T
Sequencing Decoding
Random
Access
(write) (read)
200MB 200MB
latency: ~day latency: ~hoursArchival Storage
Improvements by biotechnology industry
7/29/2016 10
Source: Robert Carlson
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.00E+08
1.00E+09
1.00E+10
1.00E+11
1.00E+12
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020
YearTransistors per chip Reading DNA (bases/person/day) Writing DNA (bases/person/day)
Carlson’s Curves
Moore’s Law
DNA Reads
DNA Writes
© Microsoft Research. All rights reserved.
DNA: ultimate storage
© Microsoft Research. All rights reserved.7/29/2016 11
Dense
Durable
Never obsolete
How would you use it?