XNG Assembly Report Contents

The Assembly Report for XNG assemblies will contain a subset of the following results:

 

Run Statistics

Reference Seq Cnt

The total number of sequences in the reference (template).

Sequence Cnt

The total number of reads in the sample.

Total Reads Assembled

Pair Seqs Cnt

The number of paired sequences included in the assembly.

Single Pair Seq Cnt

The number of paired sequences of which only one pair was included in the assembly.

Split Seq Cnt

The number of sequences that were split in the assembly.

Bad Split Seq Cnt

The number of sequences that were split, and of which only one portion was included in the assembly.

Single Seq Cnt

The number of single (unpaired) sequences in the assembly.

Consistent Pair Cnt

The number of paired sequences that met pair constraints. One “pair” in this statistics represents two sequences.

Inconsistent Pair Cnt

The number of putative paired sequences that did not meet pair constraints.

Seqs score < 80%

Percentage of reads that exactly matched the template (i.e. “alignment score”).

Seqs score < 90%

Seqs score < 100%

Seqs score 100%

Unassembled Sequences

Unaligned Cnt

Total number of reads not included in the finished assembly.

LayoutMiss Cnt

The number of reads that didn't match the template at all. In other words, the number of sequences that contained no mer which matched a mer on a template sequence. This number is affected by the assembly parameters merSize and merSkip.

 

Example: A sequence that has no 21-mer in common with the template but does have a matching 17-mer would be included in LayoutMiss Cnt at a mer size of 21, but not at a mer size of 17.

LayoutPoor Cnt

The number of reads with an insufficient number of mer matches to be included in the assembly. This number is affected by the assembly parameter merLayoutMin.

Bad Seq Cnt

The number of reads with ≥25% ambiguous Ns in the sequence. Filtered Illumina data is sometimes included in this count, as well.

Excluded Seq Cnt

The number of contaminated reads.

ExcessiveCov. Seq Cnt

The number of reads unused due to excessive coverage.

SNP Info

Found SNP Cnt (incl. indel lengths)

The number of SNP positions plus the total number of coalesced bases, minus the number of multi-base indel entries.

Found User SNP

The number of SNPs found that match those in the user-supplied VCF SNP file.

Missing User SNP coverage

The number of SNPs from the user-supplied VCF SNP file that were not found, even though the area had coverage.

Missing User SNP zero coverage

The number of SNPs from the user-supplied VCF SNP file that were not found because the area had no coverage.

Assembly Parameters:

merSize

The values specified in the SeqMan NGen wizard prior to assembly.

merSkip

merSkipQuery

merLayoutMin

templateHitCntThresh