36
Ten Years and Change the MX data archive at ALS 8.3.1

Ten Years and Change the MX data archive at ALS 8.3.1

Embed Size (px)

Citation preview

Page 1: Ten Years and Change the MX data archive at ALS 8.3.1

Ten Years and Change

the MX data archive at ALS 8.3.1

Page 2: Ten Years and Change the MX data archive at ALS 8.3.1

ALS 8.3.1 data collection history

0

10

20

30

40

50

60

70

2001200220032004200520062007200820092010201120122013

actual

doubling = 2.8 years

tera

byte

s (u

ncom

pres

sed)

Page 3: Ten Years and Change the MX data archive at ALS 8.3.1

ALS 8.3.1 data collection history

0

10

20

30

40

50

60

70

2001200220032004200520062007200820092010201120122013

Proteum 300

Q210

Q315 (907)

Q315r (926)

tera

byte

s (u

ncom

pres

sed)

Page 4: Ten Years and Change the MX data archive at ALS 8.3.1

ALS 8.3.1 data collection history

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

2001200220032004200520062007200820092010201120122013

Proteum 300

Q210

Q315 (907)

Q315r (926)

imag

es x

106

Page 5: Ten Years and Change the MX data archive at ALS 8.3.1

ALS 8.3.1 data collection historyim

ages

x 1

06

Page 6: Ten Years and Change the MX data archive at ALS 8.3.1

ALS 8.3.1 data collection history

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5images collected

PDB data collection

PDB deposition

imag

es x

106

1250

1000

750

500

250

0

PD

B e

ntr

ies

Page 7: Ten Years and Change the MX data archive at ALS 8.3.1

ALS 8.3.1 data collection historyim

ages

x 1

06

1250

1000

750

500

250

0

PD

B e

ntr

ies

Page 8: Ten Years and Change the MX data archive at ALS 8.3.1

imag

es x

106

1250

1000

750

500

250

0

PD

B e

ntr

ies

Page 9: Ten Years and Change the MX data archive at ALS 8.3.1

DVD data archive: 82 TB

Page 10: Ten Years and Change the MX data archive at ALS 8.3.1

Which data go with which PDB?

• 260,000 images are called “test”

• cell: 48 62 84 90 101 104– is within 5 Å and 5° of 16,000 PDBs

focusing on 2001-2006

• 490 PDBs credit ALS 8.3.1 with data

• 44 of these didn’t actually collect data

• 64 collected data, but no credit

Page 11: Ten Years and Change the MX data archive at ALS 8.3.1

1. images from 2001-2006

2. collected “near” edges

3. find “runs” of >10 images

4. unify multi-wedge sets

5. run labelit & XDS

6. >70% complete?

7. I/σ > 10

8. reduced cell vs PDB

1,604,031

682,712

3602

3331

2524

1479

1054

1 to 200+

Which data go with which PDB?

Page 12: Ten Years and Change the MX data archive at ALS 8.3.1

Responses to inquiries

“I have to find my old note book as I have no idea what that is.”

“I have changed jobs a few times since and am really far away from crystallography now.”

“Will see what I can find.”

“We solved it but never published it. Sorry!”

Page 13: Ten Years and Change the MX data archive at ALS 8.3.1

DVD data archive

Page 14: Ten Years and Change the MX data archive at ALS 8.3.1
Page 15: Ten Years and Change the MX data archive at ALS 8.3.1
Page 16: Ten Years and Change the MX data archive at ALS 8.3.1

Primary failure mode of DVDs

Page 17: Ten Years and Change the MX data archive at ALS 8.3.1
Page 18: Ten Years and Change the MX data archive at ALS 8.3.1
Page 19: Ten Years and Change the MX data archive at ALS 8.3.1

dataset identification protocol

1. images from 2001-2006

2. collected “near” edges

3. find “runs” of >10 images

4. sort out multi-wedge sets

5. run XDS

6. >70% complete?

7. I/σ > 10

8. reduced cell vs PDB

1,604,031

682,712

3602

3331

2524

1479

1054

1 to 200+

Page 20: Ten Years and Change the MX data archive at ALS 8.3.1

Unit Cell: 90.9 90.9 46.8 90 90 120

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.00 0.50 1.00 1.50 2.00

best

Rcr

yst a

fter

rig

id-b

ody

refin

emen

t

RMS unit cell length deviation (Å)

1hh7 M. TB CSOR

1rb5

myoglobin

Page 21: Ten Years and Change the MX data archive at ALS 8.3.1

MAD/SAD datasets

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.20 0.30 0.40 0.50 0.60

Ris

o vs

PD

B d

epos

it

best Rcryst after rigid-body refinement

Published

non-isomorphous

Unsolved?

Page 22: Ten Years and Change the MX data archive at ALS 8.3.1

EGDADec 01 19:45:12 2001 egda46_*1_E#_###.img (1112 images, Se MAD)Dec 02 15:10:06 2001 egda27_*1_###.img (180, 1A, native?)Dec 02 19:21:55 2001 egdau1_*1_###.img (427, 8000eV (U?) SAD)Dec 02 20:58:26 2001 egdau1_*2_###.img (360, 8000eV (U?) SAD)Jun 01 14:07:43 2002 egda60_*1_###.img (360, Lutetium SAD)

“I think that these EGDA data sets are very likely some of xxx’s data sets, he was working on E.coli guanine deaminase, something he brought from yyy. No structure was ever published James, xxx was unable to solve the structure from these data.”

Page 23: Ten Years and Change the MX data archive at ALS 8.3.1

~2.9 ÅP21212

R = 0.32Rfree = 0.39

PDB ID: ????

E. coliguaninedeaminase

Page 24: Ten Years and Change the MX data archive at ALS 8.3.1

Summary

• saving data could double productivity

• unit cell is not a good score

• lossy compression: rallying cry?

• backup vs archive

• metadata: what do we really know?

Page 25: Ten Years and Change the MX data archive at ALS 8.3.1

Brief Summary

• this is a lot of work.

• who is going to pay for it?

Page 26: Ten Years and Change the MX data archive at ALS 8.3.1

backblaze.com “pod” server

backblaze.com offers “unlimited storage” data backup for $5/month.

Page 27: Ten Years and Change the MX data archive at ALS 8.3.1

backblaze offers

“unlimited storage” data backup for

$5/month.

Page 28: Ten Years and Change the MX data archive at ALS 8.3.1

backblazedoes not sellthese “pods”,but “protocase.com” will.

Page 29: Ten Years and Change the MX data archive at ALS 8.3.1
Page 30: Ten Years and Change the MX data archive at ALS 8.3.1

compresses 4.2x

Page 31: Ten Years and Change the MX data archive at ALS 8.3.1

compresses 337x

Page 32: Ten Years and Change the MX data archive at ALS 8.3.1

compresses 5x, but only one per dataset!

Page 33: Ten Years and Change the MX data archive at ALS 8.3.1

compresses 3.5x

Page 34: Ten Years and Change the MX data archive at ALS 8.3.1

compressed ~50x

Page 35: Ten Years and Change the MX data archive at ALS 8.3.1

compresses 5.2x

Page 36: Ten Years and Change the MX data archive at ALS 8.3.1

Lossy compression vs R/Rfree

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

1 10 100

R_cryst

R_free

R f

acto

r

compression ratio