The Variants tab is used to set parameters for the variant analysis phase of the assembly. To access the tab from the Analysis Options screen, click the Advanced Analysis Options button then click on the Variants tab. The options available in this tab may vary depending on the workflow.
The table below shows editable options in alphabetical order; each workflow includes a subset of these options. Default parameters vary according to the sequencing technology and project type specified elsewhere in the wizard, and values seldom need to be changed.
Parameter | Description |
---|---|
Heterozygous peak threshold | This option is only available if you are using Sanger trace data. It is designed to identify positions in a read that contain two different bases that are both real. This can occur, for example, when you sequence a PCR product from a diploid genome at sites that are heterozygous. The percentage threshold is the minimum height of the secondary peak relative to the primary peak that’s required to call the second base. Increasing the percentage increases the stringency at the cost of potentially increasing false negatives; decreasing the percentage calls more positions at the cost of potentially increasing false positives. |
Editable Variant Filters section | |
This section lets you specify the non-permanent “soft” filters for SNP data. SNPs that do not meet thresholds specified in this section are removed from certain displays (e.g., tables) but are still retained in the final project and may be displayed in downstream analysis, if desired. | |
Filter stringency | Use the down menu to specify low, medium, high or custom stringency. Choosing custom enables the next three options in the dialog, Otherwise, these options are disabled and instead populated with unchangeable default values based on your stringency selection. |
Minimum variant percentage | The minimum percent of non-reference bases required to call a SNP. When it performs SNP passes, SeqMan NGen will include regions in an assembly that have coverage less than or equal to the specified value. The default value is 5. A non-zero value is recommended when using Ion Torrent data, or working with larger genomes or doing population studies. Very low values will lead to larger files, but do not necessarily result in better SNP calls. This is only enabled when Custom is chosen as the Filter stringency. |
P not ref | The minimum SNP quality score (Qcall) required to include a position as a putative SNP. For more information on the several ways to set P not Ref, see the topic Filter based on P not ref. This is only enabled when Custom is chosen as the Filter stringency. |
Depth | The minimum depth of coverage required to include a position as a putative SNP. This is only enabled when Custom is chosen as the Filter stringency. |
Fixed Variant Filters section | |
This section lets you specify permanent “hard” filters for SNP data. SNPs that do not meet thresholds specified in this section are permanently deleted without saving and will not be displayed at any point downstream. | |
Minimum variant percentage | The minimum percent of non-reference bases required to call a SNP. When it performs SNP passes, SeqMan NGen will include regions in an assembly that have coverage less than or equal to the specified value. The default value is 5. A non-zero value is recommended when using Ion Torrent data, or working with larger genomes or doing population studies. Very low values will lead to larger files, but do not necessarily result in better SNP calls. Minimum variant percentage and Minimum variant count can be used in tandem to control the number of reportable SNPs, and by extension, the size of the SNP table. |
P not ref |
|
Minimum variant count | The minimum number of non-reference bases required to call a SNP. When it performs SNP passes, SeqMan NGen will include regions in an assembly that have coverage less than or equal to the specified value. Minimum SNP percentage* and Minimum variant count can be used in tandem to control the number of reportable SNPs, and by extension, the size of the SNP table. |
Minimum base quality score | The minimum quality score below which a base will not be considered. |
Minimum strand coverage | The minimum number of reads from each strand required to call a variant at a given position. |
Maximum strand bias | Strand Bias (SB) for a SNP is the bias for the SNP appearing on one strand versus the other. It is measured relative to the strand bias in the assembly at the location of the SNP. For example, in a column with 60 forward reads and 40 backward reads, 6 SNP bases on the forward strands, and 4 on the reverse strands would be unbiased. SB is given by the formula: SB = |SNP% f – SNP% r | / Total SNP% …where SNP% f and SNP% r are the percentage of reads containing the variant on the forward (top) and reverse (bottom) strands, respectively; and SNP% is the total percentage of reads containing the variant. SB is calculated based on an “absolute value,” and will therefore be a positive number.
|
Bases to mask at ends of reads | The specified number of bases from both the 5’ and 3’ ends of each read will be masked from the SNP caller and will not be considered during variant calling. |
Bayesian-based removal of heterozygous indels | Check this box to turn on H-factor, a Bayesian-based model that excludes heterozygous calls. If you want to view the MID column in the ArrayStar SNP Report, you must check this box. By default, the box is unchecked. |
Once you are finished, click OK to save changes and return to the Assembly Options screen, or Cancel to return without saving changes.
Need more help with this?
Contact DNASTAR