50
VarSeq WGS CNV Caller Tutorial Release 2.2.1 Golden Helix, Inc. June 02, 2020

VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller TutorialRelease 2.2.1Golden Helix, Inc.June 02, 2020

Page 2: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

CONTENTS

1 Setup 3

2 WGS CNV Calling Algorithm Overview 11

3 Importing Variant and Alignment Data 13

4 Running the CNV Caller 23

5 Performing Sample QC 29

6 Plotting CNV Data 33

7 Annotating CNVs 43

8 Conclusion 47

i

Page 3: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

ii

Page 4: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Updated: May 2020

Level: Advanced

Version: 2.2.1 or higher

Product: VarSeq

This tutorial covers the basics of the VarSeq Whole Genome CNV calling algorithm with an emphasis on visualizationand interpretation of results.

Requirements

To complete this tutorial you will need to download and unzip the following file, which includes a starter project.

ImportantThis workflow requires an active VarSeq license with the CNV Caller on Binned Regions feature included. You cango to Discover VarSeq or email [email protected] to request an evaluation license with the CNV functionalityincluded.

DownloadVarSeq_WGS_CNV_Tutorial.zip

Files included in the above ZIP file:

• VarSeq WGS CNV Caller Tutorial - Starter project containing the variant and coverage data for 21 samples.

Note: VarSeq version 2.2.1 was used to create this tutorial. While every attempt will be made to keep this contentrelevant, it is possible that certain features or icons may change with newer releases.

CONTENTS 1

Page 5: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

2 CONTENTS

Page 6: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

CHAPTER

ONE

SETUP

The most recent version of VarSeq can be downloaded from here: VarSeq Download.

Select your operating system and download. Additional information for platform specific installation can be found inthe Installing and Initializing section of the manual.

The Setup Wizard will then guide you through the setup process.

On the final page of the Setup Wizard, select Finish with the Launch VarSeq option checked.

This will bring up the introductory VarSeq page where new users can register their information. This will lead to aconfirmation email being sent to confirm the email address.

Once the email has been confirmed, users can select the Login tab and enter their login email and password.

3

Page 7: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

4 Chapter 1. Setup

Page 8: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

5

Page 9: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

6 Chapter 1. Setup

Page 10: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

7

Page 11: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

At this point, the VarSeq Viewer mode is accessed and can be used. If the user already has a license key, this can beactivated by selecting Help on the title bar and then selecting Activate a VarSeq License Key.

This will bring up a dialog where the license key can be entered. Enter you license key, select and select Verify.

Once the license key is verified, select the I accept the license agreement after reading the agreement, and selectVerify.

Congratulations! At this point, the product license is activated and you are ready to start an example project or atutorial!

Note: During the initial installation process, the user will be asked where to store the AppData folder. Althoughthis location can be changed after installation, it is recommended that multiple-user organizations select a shared drivelocation to increase ease of project sharing and to decrease redundancy.

8 Chapter 1. Setup

Page 12: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

9

Page 13: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

10 Chapter 1. Setup

Page 14: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

CHAPTER

TWO

WGS CNV CALLING ALGORITHM OVERVIEW

VarSeq ® software supports calling CNVs from coverage data computed from imported BAM files. This tutorialfocuses on calling and interpretation of CNVs using VarSeq from whole genome sequencing (WGS) data.

In this tutorial, we will begin by opening an existing project containing computed binned coverage data for a numberof samples. Using this coverage data, we will call CNVs, plot the CNV data, and interpret the results.

The project files are contained in the ZIP folder that accompanies this tutorial. This project contains variant andcoverage data for 21 samples. After the ZIP folder has been downloaded, extract the contents to a convenient location.

The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithmuses changes in coverage relative to a collection of reference samples as evidence of CNV events. Using these referencesamples, the algorithm computes two evidence metrics: Z Score and Ratio. The Z Score measures the number ofstandard deviations from the reference sample mean, while the Ratio is the normalized mean for the sample of interestdivided by the average normalized mean for the reference samples. The utility of these metrics can be seen by lookingat the duplication event shown below.

The composition of Reference Samples has some strong recommendations:

• Having 30 or more reference samples

• Derived from the sample library prep methods though not necessary to come from the same run

Figure 1-1. Ratio and Z Score for a Het Deletion in ATM gene.

In the Figure 1-1, the drop in both Z Score and Ratio over multiple exons of the ATM gene provide supporting evidencefor the called het deletion event.

The WGS Binned CNV caller is generally looking for large CNV events (on the scale of multiple genes or even anentire chromosome). The Z Score and Ratio metrics can exhibit some noise over these larger event regions and register

11

Page 15: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

a larger event as multiple smaller events. The solution to this problem is segmentation which looks to group the smallerevents and lump them into one larger event.

You can easily see this in Figure 1-2 where an entire chromosome 2 duplication run without segmentation has manysmall duplication events, but with segmentation, the large aneuploidy event is accurately represented.

Figure 1-2. Segmentation to detect large CNV events.

Using these three metrics, the algorithm assigns a CNV state to each binned region and then merges these regions toobtain contiguous CNV events.

Once a set of CNV events have been called, quality control flagging is performed to identify unreliable samples andpotentially problematic CNV calls. These QC flags are applied to both CNV events as well as samples.

The following are examples of CNV event flags:

• Low reference sample read depth in the surrounding region;

• High variation in the region between reference samples; and

• If Ratio or Z Score fall within the noise of the surrounding region.

The following are examples of Sample flags:

• Their metrics have extremely high variation;

• Samples have very low mean depth; and

• Samples differ significantly from the selected reference samples.

By flagging these events and samples, we provide a second layer of heuristics, which can be used to reduce falsepositives and identify questionable CNV calls.

12 Chapter 2. WGS CNV Calling Algorithm Overview

Page 16: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

CHAPTER

THREE

IMPORTING VARIANT AND ALIGNMENT DATA

Important: The starter project provided in this tutorial already contains the variants and coverage data for 21 samples.In this portion of the tutorial we will show you how the import of the VCF variant data was completed and how thecoverage data was computed on the BAM files so you can also follow along using your own data instead of using theprovided project.

If you are already familiar with this process or will be working with the project provided for this tutorial, please skipto the Running the CNV Caller section of the tutorial.

As mentioned earlier, The VS-CNV algorithm uses changes in coverage relative to a collection of reference samplesas evidence of CNV events. To create a set of reference samples to be used as a basis for CNV calling, users cancompute coverage on BAM files using the Reference Sample Manager.

• Open VarSeq and click Tools > Manage Reference Samples. This menu computes coverage on BAM files andsubsequently adds CNV Reference samples to the reference sample library.

Figure 2-1. Opening the Reference Sample Manager

• Click on the Add References button and select Add Files on the first screen of the Add CNV References to addsample BAMs.

• Ensure that Binned References is selected. Next, if there are regions to exclude, click on Select Track tobrowse to the interval track (BED file) that defines the regions that coverage will not be calculated over. Noteusers can import their own BED files using the Convert Wizard. Once an interval track has been selected, clickCreate to create a set reference samples to be used as a basis for CNV calling.

13

Page 17: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 2-2. Adding BAM files

14 Chapter 3. Importing Variant and Alignment Data

Page 18: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 2-3. Selecting Interval Track.

Now that you have added samples to the reference sample set. You can create a VarSeq Project and import samples tocall CNVs on.

• Open VarSeq and click Create New Project. Select the Empty Project option. Select your genome assemblyand a name for the project and click OK.

• Click on the Import Variants button and select Add Files on the first screen of the Import Variants Wizard.

• Navigate to the directory where your VCF files are saved and select them for import (like is seen in Figure 2-5).And then click Next >.

Note: If you do not use the Manage Reference Samples option to import your reference samples as mentioned above,you will need to import enough samples to build your Reference Panel. 30 samples is the recommended number ofreference samples. Therefore, you will want to import at least 31 samples, 30 used for reference and an additionalsample for analysis.

Once the 31 samples are processed through the CNV tool, VarSeq will save the coverage profile for these samples inthe Coverage Reference Samples folder found in the VarSeq User Data location on your computer (Tools > OpenFolder > Reference Samples Folder).

For any subsequent run of the algorithm you can import any number of samples for analysis and VarSeq will pull areference set of samples from those available in the Reference Sample Folder.

• If importing into an Empty Project you can select the Individual Samples option in the Sample Relationshipsdialog. Click Next >.

On the next dialog we will be associating the BAM files with the imported VCF files so that Targeted Region Coverage

15

Page 19: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 2-4. Create New Project Dialog.

16 Chapter 3. Importing Variant and Alignment Data

Page 20: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 2-5. Select VCF files for Import.

17

Page 21: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 2-6. Select Sample Relationships.

18 Chapter 3. Importing Variant and Alignment Data

Page 22: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

can be computed.

• Click Associate BAM File at the top of the dialog and navigate to the directory where your BAM files are stored.If your BAM files names match the sample name or file name for the VCF file then they should be automaticallyassociated, if not then manually select each BAM file. Click OK once done.

Figure 2-7. Associate BAM Files to VCFs.

The BAM file paths should now be filled out for each sample on the import dialog.

• Click Next > and Finish to complete the VCF variant data import.

Now to compute the binned coverage calculations required to detect CNVs.

• Go to Add > Computed Data... to bring up the different algorithm options for the project.

Scrolling near the bottom to the Sample section, select Binned Region Coverage and then click on OK.

The Binned Region Parameters dialog then appears with different options like Additional Depth Threshold and theoption to mask specific regions, but for this tutorial, leave the default options and select OK.

Note: It is important to note that samples will only be matched to reference samples with the same bin size.

Once this computation finishes you are ready to begin CNV calling.

19

Page 23: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 2-8. BAM Files Associated in Dialog.

Figure 2-9. VarSeq Variant Table.

20 Chapter 3. Importing Variant and Alignment Data

Page 24: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 2-10. Adding Coverage Tables.

Figure 2-11. Selecting the Binned Region Coverage algorithm.

21

Page 25: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 2-12. Running the Binned Region Coverage algorithm.

22 Chapter 3. Importing Variant and Alignment Data

Page 26: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

CHAPTER

FOUR

RUNNING THE CNV CALLER

When you open the example project accompanying this tutorial, you will be greeted by the VarSeq Coverage Regionstable. This table includes information about the read depth of each coverage region for the sample of interest.

To call CNVs over these coverage regions:

• Click the Add button in the upper left-hand corner of the window

• Select Computed Data

• Change the dropdown menu to Coverage Regions

• Select CNV Caller on Binned Regions.

This will open up the Binned CNV Caller dialog window.

The options presented here include the following:

• Minimum Number of Reference Samples: The minimum number of reference samples to be selected by thealgorithm.

• Maximum Number of Reference Samples: The maximum number of reference samples to be selected by thealgorithm.

• Exclude reference samples with percent difference greater than: This option will filter reference sampleswith a percent difference above the specified value after a minimum of 10 samples have been selected.

• Add samples to reference set: This option adds the current project’s sample to the set of reference samples.

• Independently normalize non-autosomal targets: If this option is selected, non-autosomal targets will notbe normalized using the autosomal targets, but will instead be normalized separately. This option should beused if few non-autosomal targets are present, or if the entire X or Y chromosomes are likely to be deleted orduplicated.

• Reference Sample Folder: Specifies the file location where the reference samples are stored.

• Z-Score Threshold: Specifies the Z Score cutoff threshold for calling CNV events.

• Controls average target mean depth below: Flags targets with average reference sample depth below thespecified value.

• Controls variation coefficient above: Flags targets for which the variation coefficient is above the specifiedvalue. A high variation coefficient indicates that there is extreme variation in reference sample coverage for thetarget region.

• Use optimal segmentation algorithm (slower): Instigates the optimal segmentation algorithm with takes moretime to complete.

Leaving the default options to run the CNV calling algorithm:

• Click OK

23

Page 27: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 3-1. Coverage Region Table.

24 Chapter 4. Running the CNV Caller

Page 28: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 3-2. Select Binned Region CNV algorithm.

25

Page 29: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 3-3. Binned Region CNV Algorithm Options.

26 Chapter 4. Running the CNV Caller

Page 30: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

When the algorithm runs, it will select a set of reference samples for each sample in the project. The reference setis chosen from the collection of samples in the reference folder that share the same binned regions as the sample ofinterest. The algorithm selects those samples that are most similar to the sample of interest in terms of normalizedcoverage.

Because we chose to Add samples to reference set, the 21 samples in our coverage table will first be placed in ourreference set and then used by the algorithm. If one of the project samples was already added to the reference sampleset, it will not be duplicated in the CNV analysis.

27

Page 31: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

28 Chapter 4. Running the CNV Caller

Page 32: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

CHAPTER

FIVE

PERFORMING SAMPLE QC

Once the CNV caller finishes computing results, a new table will be created labeled CNVs. This table contains theinformation related to each CNV called by the algorithm, but before examining these results, users should alwaysperform sample-level quality control. This can be done by exploring the sample table, which is now populated withseveral useful metrics related to the CNV algorithm.

To open the sample table in VarSeq, click on the plus sign on the tab bar and then select Table.

This will open up a new blank table tab. From the Select Table Type... dropdown menu, select Samples to displaythe Samples table.

You will notice that the Samples table has the column groups (from left to right) of “Sample Info”, “Binned CoverageStatistics”, and “Copy Number Variants”. The first group, “Sample Info”, displays the Sample Name, Affection status,Sex, and BAM Path. The next group shows the coverage statistic information associated with the Binned Coveragealgorithm used to compute the Coverage Statistics. The third group displays the information for each CNV called.Scroll over to the “Copy Number Variants” heading.

The most useful field for sample QC is the “Sample Flags” field. This field will list one or more of the following flagsif the sample fails any of our quality tests:

• High IQR: High interquartile range for Z-score and ratio. This flag indicates that there is high variance betweentargets for one or more of the evidence metrics.

• Low Sample Mean Depth: Sample mean depth below 30.

• Mismatch to reference samples: Match score indicates low similarity to control samples.

• Mismatch to non-autosomal reference samples: Match score indicates low similarity to non-autosomal con-trol samples.

• Few Gender Matches: Not enough reference samples with matching gender to call X and Y CNVs.

If any of the first three flags are listed for a given sample, then all CNV calls associated with the sample will mostlikely be unreliable, while if last two flags are present, then CNV calls in non-autosomal will be unreliable.

Notice the five highlighted samples in Figure 4-3 with the High IQR flag. The low matching quality of these samplesmay warrant rerunning the samples to improve their quality to better match the additional reference samples.

In addition to QC flags, the sample table also provides summary information about the number of CNVs called, theinferred gender of the sample, the reference samples chosen, and the percent difference between each sample and it’sthe references set.

29

Page 33: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 4-1. Adding a new table.

30 Chapter 5. Performing Sample QC

Page 34: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 4-2. Selecting the Samples Table.

31

Page 35: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 4-3. Samples table view.

32 Chapter 5. Performing Sample QC

Page 36: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

CHAPTER

SIX

PLOTTING CNV DATA

Now that we have performed sample-level QC, we can filter and plot our CNV calls, along with the relevant evidence.

To switch from the sample table to the CNV table:

• Select the CNVs tab from the tab bar.

The CNV table provides many useful pieces of information ideal for filtering CNV calls and plotting the CNV resultscan be helpful when performing analysis.

Before plotting, the CNV State column can be queried to exclude missing values. This can be done by right-clickingon the CNV State column header and selecting Query Column Values.

This opens up a new filter tag along the top of the CNV tab. Click on the question mark to display the different optionsin this column.

Checking all of the options in this list will keep them in the column and then remove any missing values. The selectedoptions here are Duplicate, Het Deletion, Deletion, and CN LoH in this example. The selected options will nowappear in the query filter tab in the header, and clicking anywhere on the screen outside of the query value selectionwindow will set the currently selected configuration and close the window.

Although there are no CNVs called in this sample (Female 1), we will continue to set up the CNV analysis and thenlook at a different sample with a CNV called.

Now that the missing CNV State values are not being shown, the CNV State column can be implemented into the filterchain to isolate the specific events per sample. This is done by right-clicking on the CNV State column header andselecting Add to Filter Chain.

This allows for the selection of CN LoH events, Deletions, Duplications, and Het Deletions for the given sample.

This field can also be plotted by right-clicking on the CNV State column header and selecting Plot for CurrentSample.

This will open a GenomeBrowse view containing the CNV State of the current sample plotted along side the genetrack. You may have to click on the CNV row in the CNV table to navigate directly to the event.

In addition to the CNV state, it is also useful to plot the evidence used to call the CNVs. To do this:

• Open the coverage table by selecting the Coverage Regions tab.

This table contains the CNV data associated with each bin coverage region. This includes the regional CNV State,Flags, Z Score, Ratio, and the number of variants for which VAF was considered.

The two primary pieces of evidence used to call CNVs are the Z Score and Ratio.

To plot these fields:

• Right-click on the Z Score column, then select Plot for Current Sample.

• Then, right-click on the Ratio column, then select Plot for Current Sample.

33

Page 37: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 5-1. CNVs Table View.

34 Chapter 6. Plotting CNV Data

Page 38: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 5-2. Query the CNV State column values.

Figure 5-3. Query the CNV State column values options.

35

Page 39: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 5-4. Query the CNV State column values selection.

Figure 5-5. Building CNV Filter Chain.

36 Chapter 6. Plotting CNV Data

Page 40: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 5-6. CNV Filtering.

Figure 5-7. Right-Click Column Menu.

37

Page 41: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 5-8. GenomeBrowse window with CNV State.

38 Chapter 6. Plotting CNV Data

Page 42: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 5-9. Z Score and Ratio added to GenomeBrowse.

39

Page 43: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Next we will take a look at one of the CNV entries in a different sample. To switch samples:

• Select the dropdown menu on the title bar and select sample, Female 2.

Figure 5-10. Changing current sample.

Next, take a look at the CNV calls associated with this sample by:

• Select the CNVs table tab.

There are two CNVs called for this sample. They are both Duplications found in chromosome 3 and they are highlighedin Figure 5-11.

Now take a closer look by selecting the first row of the CNV table.

When the table view window and GenomeBrowse window are both displayed, the GenomeBrowse view will move tothe detailed genomic region about the selected CNV. Here, the user can more easily notice the elevated Z Score andRatio values for the CNV event compared to the surrounding diploid regions.

40 Chapter 6. Plotting CNV Data

Page 44: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 5-11. CNV table for Female 2 sample.

41

Page 45: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 5-12. Closer look at CNV.

42 Chapter 6. Plotting CNV Data

Page 46: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

CHAPTER

SEVEN

ANNOTATING CNVS

The CNVs that are found with the Binned Region coverage algorithm are usually large events on the scale of entiregenes or even entire chromosomes. For the previous example, we can determine which genes are overlapped by theCNV by taking the following steps:

• Click on the Add title bar icon and then Computed Data...

• Change the dropdown menu on the top from Variants to CNVs

• Select Annotate Overlapping Genes from the Project/Cohort section.

Figure 6-1. Add CNV annotations menu.

43

Page 47: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

This brings up the Select Data Source window which allows the user to select a gene annotation track. For thisexample, select RefSeq Genes 105 Interim v3, NCBI and then click on Select.

Figure 6-2. CNV select data source menu for overlapping genes selection.

This brings up the option to select preferred transcripts next, but we will stick with the default options, so click OK.

The overlapping gene results can now be found by scrolling to the right on the CNV tab. The first field listed underthe Overlapping Genes RefSeq Genes 105 Interim v3, NCBI is Gene Names. Looking at our example CNV inchromosome 3, the list of genes overlapped includes 19 different genes.

44 Chapter 7. Annotating CNVs

Page 48: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 6-3. CNV overlapping genes options dialog.

45

Page 49: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

VarSeq WGS CNV Caller Tutorial, Release 2.2.1

Figure 6-4. Viewing overlapping gene data.

46 Chapter 7. Annotating CNVs

Page 50: VarSeq WGS CNV Caller Tutorial - doc.goldenhelix.com€¦ · The VarSeq WGS CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes

CHAPTER

EIGHT

CONCLUSION

This tutorial was designed to provide a demonstration of VarSeq’s WGS CNV calling capabilities.

If you are interested in getting a demo license to try out this and other features please request a demo from: DiscoverVarSeq

Additional features and capabilities are being added all the time, so if you do not see a feature you need for yourworkflows please do not hesitate to let us know at Golden Helix Support!

47