SNG Assembly Report Contents

The Assembly Report for SNG assemblies will contain a subset of the following results:

 

Assembly Totals

Contigs

Total number of contigs assembled.

Contigs > 2K

Total number of assembled contigs that are more than 2000 base pairs in length.

Contigs to Reach Genome Length ‘x’

The number of contigs needed to cover the genome length specified in the Workflow pane.

Contigs removed due to small size

The number of contigs removed due to being smaller than the threshold value.

Assembled Sequences

The number of sequences utilized in the assembly.

Unassembled Sequences

The number of sequences excluded from the assembly. These may be further categorized as: 1) Sequences not assembled due to complete trimming, and 2) Sequences removed due to small contig size.

All Sequences

Total number of sequences in the project.

Contig N50

Contig size at which 50% of the sequence data are represented.

 

Note: In a typical microbial genome assembly, Contig N50 values exceed 80K base pairs and genome coverage is attained in less than 100 contigs. In many assemblies, contig N50 exceeds 100K with genome coverage attained in 25 contigs. If paired-end Roche 454 Life Sciences data are used, contigs can be ordered into a handful of large scaffolds to attain genome coverage that greatly facilitates gap closure and completion of the genome assembly.

Average Coverage

Average depth of coverage in the assembly.

Average Totals

Sequences Per Contig

Average number of sequences used for each contig.

Average Lengths

Contigs

Average contig length.

Assembled Sequences

Average length of sequences used in the assembly.

Unassembled Sequences

Average length of sequences excluded from the assembly.

All Sequences

Average length of all sequences in the project.

Average Quality

Assembled Sequences

Average quality score of sequences used in the assembly.

Unassembled Sequences

Average quality score of sequences excluded from the assembly.

All Sequences

Average quality score of all sequences in the project.

Assembled Pair Statistics

Read Pairs

Total number of paired reads in the project.

Assembled Pairs

The number of paired reads included in the assembly.

Pairs Consistent Within a Contig

The number of paired sequences within a single contig that met pair constraints. One “pair” in this statistics represents two sequences.

Pairs Inconsistent Within a Contig

The number of putative paired sequences within a contig that did not meet pair constraints.

Split Pair Statistics (Ion Torrent paired reads and 454 data only)

Reads Split into Pairs

The number of reads that were split into pairs at the linker.

Unsplit Reads with Pair Linker(s)

The number of reads that were not split into reads because the linker was too far to one side.

Unsplit Reads without Pair Linker(s)

The number of reads that were not associated with a linker.

Assembly Parameters

Match Size

The values specified in the SeqMan NGen wizard prior to assembly.

Match Spacing

Minimum Match Percentage

Match Score

Mismatch Penalty

Gap Penalty

Max Gap

Genome Length

Expected Coverage