Transcript
Page 1: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Barbera van Schaik

Bioinformatics Laboratory, AMC

[email protected]

Page 2: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Current sequencing projects

Neurogenetics laboratorySomatic mutation detection (Frank Baas, Marja Jakobs)

Laboratory of Experimental VirologyVirus discovery (Michel de Vries)

Dept. Clinical Immunology & RheumatologyTCR-beta variant detection (Niek de Vries, Paul Klarenbeek)

Clinical VirologyHepatitis C (Richard Molenkamp)

Erasmus MC – RotterdamRe-sequencing tumor samples (Ernie de Boer, Michael Moorhouse)

Page 3: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

TCR-beta

Page 4: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Recombinatie van gensegmenten :

Vanuit de germline:

- Van ieder gensegment wordt 1 variant geselecteerd

-Deze worden aan elkaar gekoppeld

Alleen functionele recombinaties leiden tot functionele T-cellen

Paul Klarenbeek

Page 5: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Totale theoretische variatie(in vivo blijkt dit veel lager te liggen)

Paul Klarenbeek

Page 6: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

CDR3 region

Unique for each clonal expansion

How to identify clonal expansions?

Germline DNA

mRNA

Thymocytes

Paul Klarenbeek

Page 7: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Dept. Clinical Immunology& Rheumatology

Goal: identify and enumerate TCR-beta variants

Vn CDR3 JPrimer A

J

C

Primer B

Barcode

Vn CDR3 JPrimer A

C

C

Primer B

Barcode

Paul Klarenbeek

Page 8: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Roche (454) sequencing

Run 07-07-2008110,509 sequences

26,445,844 nucleotides in total

Run 30-09-2008106,234 sequences

24,983,646 nt

Page 9: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Sequence lengthsFrequency sequence length

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 50 100 150 200 250 300 350 400 450

sequence length

freq

uen

cy

frequency sequence length

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0 50 100 150 200 250 300

sequence length

freq

uen

cy

20080707

20080930

Page 10: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Pipeline Rheumatology

Convert sff to fasta+quality scores

Chop sequences into: MID, fragment

Divide sequences based on MID and region

Identify the V, J and C segments

Locate highly variable area

Count variants Quality control

Perl scriptsRoche softwareBLATAccess/Excel

Page 11: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Recognize MID and primer

MID = barcode for samplePrimer = Primer for J or C segment

# MIDs for the 3 regions that are sequenced with primer for J segment:

MID=(TAGT|ACTA|CGAC|CTCG)# Primer for J segment (first) or C segment (last two):Primer=(CTTACCTACAACGGTTAACCTGGTC|

AGCTCAAACACAGCGACCTC|GGAACACCTTGTTCAGGTC)

Page 12: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Pipeline Rheumatology

Convert sff to fasta+quality scores

Chop sequences into: MID, fragment

Divide sequences based on MID and region

Identify the V, J and C segments

Locate highly variable area

Count variants Quality control

Perl scriptsRoche softwareBLATAccess/Excel

Page 13: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Identify V, J and segment

IMGT website: reference sequencesBLAT all roche sequences against 3

referencesSelection: only store first hit per reference

MID C J CDR3 V

MID J CDR3 V

Page 14: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Locate highly variable region(CDR3)

Get CDR3 sequence -> countDetermine deletions from V and J segment

Determine reading frame

MID C J CDR3 V

MID J CDR3 V

Page 15: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Pipeline Rheumatology

Convert sff to fasta+quality scores

Chop sequences into: MID, fragment

Divide sequences based on MID and region

Identify the V, J and C segments

Locate highly variable area

Count variants Quality control

Perl scriptsRoche softwareBLATAccess/Excel

Page 16: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Count variants: BLAT hits

Blat hits TRBV reference

0

1000

2000

3000

4000

5000

6000X

0719

2|T

RB

V12

-3*0

1|H

omo

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

M14

264|

TR

BV

12-4

*02|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X74

844|

TR

BV

7-9*

06|H

omo

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

M14

264|

TR

BV

12-4

*02|

Hom

o

ACTA ACTA ACTA ACTA ACTA CGAC CGAC CTCG CTCG nomatch nomatch nomatch nomatch nomatch TAGT TAGT TAGT TAGT TAGT

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

region/MID/hit

freq

uen

cy

Page 17: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Count CDR3 variantsregion cdr3_sequence freq

1 767

1 CTCCCTTTTTGGGGG 32

1 CCCTCAGGATTTCAGGG 30

1 TGGGTC 29

1 CAAACATGA 23

1 GGGACGGAGA 20

1 GGACAGT 20

1 CCCAGACAGG 19

1 AACCGGA 18

1 CTCCACTGGACACGT 18

1 ACGGG 18

1 CGCCCGGGACAGGGCCCTTCGGGG 18

1 CCACCCCGCGGCAGGAGGG 17

1 CCCACCGGGACAGGGGCGTC 17

1 GGTATACGGGCAGCGG 16

1 GACCTTGTGGTC 16

1 ACAGGGGGAG 16

1 GCGGG 16

Example of CDR3

sequences

Page 18: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

V segment deletions

0

5000

10000

15000

20000

25000

30000

35000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 21 22 23 24 25 26 28 29 31 32

nt

fre

qu

en

cy

Count deleted ntof V and J segment

Check where alignment stops wrt reference

J segment deletions

0

5000

10000

15000

20000

25000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

nt

freq

uen

cy

Page 19: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Pipeline Rheumatology

Convert sff to fasta+quality scores

Chop sequences into: MID, fragment

Divide sequences based on MID and region

Identify the V, J and C segments

Locate highly variable area

Count variants Quality control

Perl scriptsRoche softwareBLATAccess/Excel

Page 20: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

To do next

Determine reading frameQuality control

Future plansDetection of all TCR-beta variants

TCR-alpha receptor variantsSame procedure for B-cells

Screen patients for receptor variations

Page 21: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl
Page 22: Barbera van Schaik Bioinformatics Laboratory, AMC b.d.vanschaik@amc.uva.nl

Recommended