The Variants tab is used to set parameters for the variant analysis phase of the assembly. To access the tab from the Analysis Options screen, click the Advanced Analysis Options button then click on the Variants tab. The options available in this tab vary depending on the workflow.

  • The Editable Variant Filters section lets you specify the non-permanent “soft” filters for SNP data. SNPs that do not meet thresholds specified in this section are removed from certain displays (e.g., tables) but are still retained in the final project and may be displayed in downstream analysis, if desired.
  • The Fixed Variant Filters section lets you specify permanent “hard” filters for SNP data. SNPs that do not meet thresholds specified in this section are permanently deleted without saving and will not be displayed at any point downstream.

Editable options differ by workflow and some workflows will only feature a subset of the options below. Default parameters vary according to the sequencing technology and project type specified elsewhere in the wizard, and values seldom need to be changed.

Parameter Description
Heterozygous peak threshold This option is only available if you are using Sanger trace data. It is designed to identify positions in a read that contain two different bases that are both real. This can occur, for example, when you sequence a PCR product from a diploid genome at sites that are heterozygous. The percentage threshold is the minimum height of the secondary peak relative to the primary peak that’s required to call the second base. Increasing the percentage increases the stringency at the cost of potentially increasing false negatives; decreasing the percentage calls more positions at the cost of potentially increasing false positives.
Filter stringency Use the down menu to specify low, medium, high or custom stringency. Choosing custom enables the next three options in the dialog, Otherwise, these options are disabled and instead populated with unchangeable default values based on your stringency selection.
Minimum variant percentage The minimum percent of non-reference bases required to call a SNP. When it performs SNP passes, SeqMan NGen will include regions in an assembly that have coverage less than or equal to the specified value. The default value is 5. A non-zero value is recommended when using Ion Torrent data, or working with larger genomes or doing population studies. Very low values will lead to larger files, but do not necessarily result in better SNP calls. This is only enabled when Custom is chosen as the Filter stringency.
P not ref (Editable Variant Filters) The minimum SNP quality score (Qcall) required to include a position as a putative SNP. For more information on the several ways to set P not Ref, see the topic Filter based on P not ref. This is only enabled when Custom is chosen as the Filter stringency.
Depth The minimum depth of coverage required to include a position as a putative SNP. This is only enabled when Custom is chosen as the Filter stringency.
Minimum variant percentage The minimum percent of non-reference bases required to call a SNP. When it performs SNP passes, SeqMan NGen will include regions in an assembly that have coverage less than or equal to the specified value. The default value is 5. A non-zero value is recommended when using Ion Torrent data, or working with larger genomes or doing population studies. Very low values will lead to larger files, but do not necessarily result in better SNP calls. Minimum variant percentage and Minimum variant count can be used in tandem to control the number of reportable SNPs, and by extension, the size of the SNP table.
P not ref (Fixed variant filters) The minimum SNP quality score (Qcall) required to include a position as a putative SNP. For more information on the several ways to set P not Ref, see Filter based on.
Minimum variant count The minimum number of non-reference bases required to call a SNP. When it performs SNP passes, SeqMan NGen will include regions in an assembly that have coverage less than or equal to the specified value. Minimum SNP percentage* and Minimum variant count can be used in tandem to control the number of reportable SNPs, and by extension, the size of the SNP table.
Minimum base quality score The minimum quality score below which a base will not be considered.
Check strands Place a checkmark in the Check strands box to enable the following options:

  • Minimum strand coverage – The minimum number of reads from each strand required to call a variant at a given position.

  • Maximum strand bias – The Strand Bias (SB) for a SNP is the bias for the SNP appearing on one strand versus the other. It is measured relative to the strand bias in the assembly at the location of the SNP. For example, in a column with 60 forward reads and 40 backward reads, 6 SNP bases on the forward strands, and 4 on the reverse strands would be unbiased.

    SB is given by the formula: SB = |SNP% f – SNP% r | / Total SNP%

    …where SNP% f and SNP% r are the percentage of reads containing the variant on the forward (top) and reverse (bottom) strands, respectively; and SNP% is the total percentage of reads containing the variant. SB is calculated based on an “absolute value,” and will therefore be a positive number.

    • 0 – Perfectly balanced (unbiased) strands. Reads with variants are present on both strands, and variants appear equally on both stands. .

    • 0-1, not inclusive – As the number ‘1’ is approached, more variants are called with unbalanced variants containing reads at that position. .

    • 1 – All variant-containing reads are on a single strand.

Note: If Maximum strand bias is blank or absent in the wizard, this indicates that the corresponding scripting parameter has been turned off in the script. For more information and an example, search this help document for the scripting parameter snp_maxStrandBias.
Bases to mask at ends of reads The specified number of bases from both the 5’ and 3’ ends of each read will be masked from the SNP caller and will not be considered during variant calling.
Bayesian-based removal of heterozygous indels Check this box to turn on H-factor, a Bayesian-based model that excludes heterozygous calls. If you want to view the MID column in the ArrayStar SNP Report, you must check this box. By default, the box is unchecked.
Force uniform weights This option pertains to Illumina data in cases where the forward and reverse reads are largely or completely overlapping. In this situation, SeqMan NGen first aligns the two pairs and generates a single consensus for that read. Most of the bases will have good quality scores, but near the ends, the quality may be poorer. As a result, some real variants could be missed. To mitigate this risk, check the box.
Combine multi-base variants This option is of interest mainly if following a resequencing workflow. From the menu, select one of the following options:

  • All – Combine multi-base variants for both insertions and deletions (default).

  • None – Never combine multi-base variants.

  • Insertions only – Combine multi-base variants only in the case of an insertion.

Once you are finished, click OK to save changes and return to the Assembly Options screen, or Cancel to return without saving changes.

Need more help with this?
Contact DNASTAR

Thanks for your feedback.