14
State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim Simo Zouine Infrastructure and involvement of Bioinformatic Plateform INRA- Toulouse

State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

Embed Size (px)

Citation preview

Page 1: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

State-of-the-artFrance

GBF-Toulouse Sequencing Team BAC selection and Finishing

• Murielle Philippot• Pierre Frasse

Genome Assembly• Vincent Cahais• Sana Hakim• Simo Zouine

Infrastructure and involvement of Bioinformatic Plateform INRA- Toulouse

Page 2: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

– 1 seule banque pour obtenir simultanément des séquences de type shot-gun et L-PET

– Séquences de L-PET de plus grande taille, ~100 bases pour chacune des extrémités

Page 3: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

State-of-he-artFrance

• GBF: 12 Runs– 12 runs 3kb home-made different library + 8 runs Shotgun

planned for July –August lack of DNA)– 2 716 911 sequences (3 runs)

• WUR: 27.5 Runs– 22 486 227 séquences– 15 runs Shotgun– 6 runs 3kb– 6.5 runs 20k

• Italy: 2 Runs– 1 366 781 séquences– 1 runs 3kb– 1 runs 20kb

Page 4: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim
Page 5: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim
Page 6: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim
Page 7: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

20Kb Hollande

0

1000

2000

3000

4000

5000

1 129 257 385 513 641 769 897

length sequence

nu

mb

er o

f se

qu

ence

s FUCQJ OD02_Ho

FW0TE9I01_Ho

FW0TE9I02_Ho

FW6J 5DV01_Ho

FW6J 5DV02_Ho

FW95G3M01_Ho

FW95G3M02_Ho

FWEXUQG01_Ho

FWGOYO101_Ho

FWGOYO102_Ho

FWLY2VR01_Ho

FWLY2VR02_Ho

FXDORVL01_Ho

Page 8: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim
Page 9: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim
Page 10: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

Moyenne Qualité par SSF France

0,00

5,00

10,00

15,00

20,00

25,00

30,00

35,00

Longueur moyenne par SFFFrance (3 kb)

0

100

200

300

400

500

600

700

Moyenne Qualité par SFF Italie (20kb/3kb)

0,00

5,00

10,00

15,00

20,00

25,00

30,00

35,00

Longueur moyenne par SFF Italie (20 kb / 3 kb)

0

100

200

300

400

500

600

700

Page 11: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

Qualité moyenne par SFF (Hollande WGS)

0,005,00

10,0015,0020,0025,0030,0035,00

Moyenne des qualités par SFF (Hollande 3kb)

0,005,00

10,0015,0020,0025,0030,0035,00

Longueur moyenne par SFF (Hollande WGS)

0100200300400500600700

Longueur moyenne par SFF (Hollande 3kb)

0100200300400500600700

Page 12: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

Nombre de N moyens par SFFLongueur moyenne(Hollande shotgun)

0200400600800

0,005,0010,0015,0020,00

Nombre de N moyens par SFFLongueur moyenne

(Hollande 3 Kb)

0200400600

0,0010,0020,00

Nombre de N moyen par SFF Longueur moyenne

(France)

0

200

400

600

FT7TVTG01.fasta FV59KIZ01.fasta FWZTXF202.fasta0510

1520

Nombre de N moyen par SFFLongueur moyenne

(Italie)

0100200300400500

0,005,0010,0015,0020,00

Page 13: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

• Mean sequence length: 400 – 500 nt

• Mean sequence quality: 25 – 30

• Shotgun gives:

- longer reads (550 nt)

- higher frequency of long reads

• Chloroplast and mitochondria genome contamination:

- estimated very low (1600 – 1800 / 500k reads corresponding to 1 run)

• The ration of 2 runs for 1 x coverage has been slightly over-estimated

Conclusions

Page 14: State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim

• 1 run 454 sequencing of the 8 or 20 kb new PET libraries

• BAC-end sequencing of the sheared library (50 000 clones; 5-6 x)

• Whole Genome draft assembly with non Newbler assemblers

Suggestions - Questions