Assembly Options (Special Reference-Guided, Most De Novo)

The Assembly Options dialog allows you to specify the parameters to use for your assembly. If you are following most de novo or special reference-guided workflows, you will see the following version of the dialog. Depending on your workflow, some of the options described below may not be available.

 

 

      Repeat Handling – (De novo, non-Transcriptome/RNA-Seq workflows only) Checking this box automatically computes a threshold for determining the number of identical subsequences of bases, or mers, used to indicate a putative repeat. (For more information, see the Repeat Handling section.)

 

Expected genome length – If you know the approximate length of the genome/fragment being assembled, select this button and specify a length. SeqMan NGen will then calculate the expected average coverage empirically from the amount of data. This, in turn, allows repeat regions to be identified and handled more accurately, resulting in a better assembly. If the approximate genome length is not known, use the Expected coverage option.

 

Expected coverage – If you do not know the length of the genome/fragment, select this button and provide an estimate of the depth of the sequencing. The default value for this field is 20, and the maximum allowable value is 65,535. If you enter a value larger than the maximum, you may receive an error message and be prevented from continuing until you choose a value less than or equal to the maximum.

 

Note: Use caution when estimating the value for Expected coverage. If the value you use is significantly lower than the actual depth, the assembly may take a much longer time to complete and may have too many mers flagged as repeats. We recommend using Expected genome length whenever possible.

 

      Mer size – The minimum length of a mer (overlapping region of a fragment read), in bases, required to be considered a match when arranging reads into contigs. Mer size information is used to identify matches during the assembly layout phase. The default mer size is determined by the selected read technology and is shown in the window. For more information, see the Mer Tags section.

 

Automatic – Select this button to automatically set the size based on assembly type and sequencing technology.

 

Custom – Select this button to choose the size yourself. You must enter the desired number of base pairs in the field at right. Lowering the mer size increases the sensitivity of finding matches, but also increases the likelihood of finding spurious matches in addition to the correct match. Lowering the mer size can also greatly increase the requirements for storing intermediate and temporary files with large projects.

 

      Minimum match percentage – Specifies the minimum percentage of matches in an overlap required to join two sequences in the same contig. SeqMan NGen determines the percentage to use based on the sequencing technology you specified in the Assembly Options dialog. For more information, see the Match Percentage section.

 

Automatic – Select this button to automatically set the percentage based on assembly type and sequencing technology.

 

Custom – Select this button to designate the percentage yourself. You must enter a number in the field at right.

 

 

      Realign reads after assembly Check this box to include a realignment step after the assembly. This step analyzes each sequence at the nucleotide level to determine the exact position of each sequence in the alignment and realigns contigs as needed. For templated assemblies, this option may improve the accuracy of the final assembly by correcting occasional misalignments that can occur in gapped regions. However, this step may significantly increase the time needed to assemble.

 

      De novo assemble remaining unassembled reads – (Special reference-guided workflow only) Specifies whether, after a templated assembly has been completed, the unassembled sequences remaining should be assembled into contigs. If the template has been split and this box is checked, SeqMan NGen will attempt to join the split contigs together in new arrangements.

 

      Split reference at zero coverage – (Special reference-guided workflow only) Splits the template where there is a coverage of zero. Any split contigs will be grouped into scaffolds with a defined position to allow for easy sorting when the project is viewed in SeqMan Pro. By default, during reference-guided assembly with gap closure, the XNG assembler first finds structural variations (SVs) then splits the contig after each SV.

 

      Remove small contigs after assembly – Removes assembled, untemplated contigs that do not meet minimum thresholds. This can lead to a desirable decrease in project size. To select this option, check the box and then type values in one or both boxes: Minimum sequences disassembles any untemplated contigs with fewer than the specified number of sequences; Minimum length disassembles any untemplated contigs shorter than the specified length. Both options affect only untemplated contigs. No templated contigs are removed.

 

      Advanced Assembly Options – Click this button to open the Advanced Assembly Options dialog, which allows you to select additional assembly parameters and SNP handling options.

 

Once you are finished, click Next > to continue to the next wizard screen.

 

Note: If you check Repeat handling and do not specify an Expected genome length, clicking Next > will result in an error message: “Please enter the estimated genome length…” If you receive this message, click OK and adjust the dialog parameters before again clicking Next.