Disagreements Between the ArrayStar SNP Table and the SeqMan SNP Report

In the ArrayStar SNP Table, the column “Contig Pos” collapses all permutations for a given reference position into a single record with start and end positions separated by an ellipsis (e.g. 27…31). If multiple bases are listed, this indicates a multi-base insertion into the sample relative to the reference.

 

In most cases, the single record denotes that all included reference positions contain the same SNP. In some cases, however, the record may include one or more reference positions that do not have that SNP when the corresponding assembly is viewed in the SeqMan Alignment View or SNP Report.

 

This is due to a difference in how ArrayStar and SeqMan call SNPs. SeqMan checks the P value of each putative SNP. If a SNP has a roughly 50% chance of not being a change, it is not included in SeqMan Pro’s Alignment View or SNP Report. That’s because there are often low-probability columns present due to sequencing errors or alignment issues, and these do not indicate an actual insertion. However, these “possible” SNPs are still included in the ellipsis-separated range of the “Contig Pos” column in ArrayStar.

 

The scores for ArrayStar’s “Contig Pos” records are aggregated as follows:

 

      Depth: The mean depth among all of the columns considered.

 

      SNP %: The mean percentage among all of the columns considered.

 

      Q-call: The lowest confidence score among all of the columns considered.

 

      P-value: The lowest probability among the columns passing the filter.

 

An indel in a translated region is always considered a frameshift, unless it's a triplet, in which case it is considered non-synonymous.

 

If any of the used columns was heterozygous, the entire record is considered heterozygous. The alleles are not correlated in the data used by ArrayStar, and so all combinations of the individual bases' alleles are possibilities unless manual investigation of the assembly shows otherwise.