Medium-Sized Data Sets

Note: This topic is not applicable to next-gen assemblies, large data sets or BAM-based projects. For assembly of large or next-gen assemblies, please use SeqMan NGen.

 

The following considerations apply to medium-sized data sets, the largest sets that can be assembled via SeqMan Pro:

 

      When adding one or more medium-sized sequence files to SeqMan Pro, we strongly recommend dragging and dropping the files. This loads the files much faster than using the Add Sequences button or the Sequence > Add or Sequence > Add One menu commands.

 

      Use the Pro Assembler method instead of the Classic Assembler by going to Project > Parameters > Assembling and choosing ‘Use Pro Assembler’.

 

      Make sure that the “Don’t add single sequence contigs” option is selected in the Preassembly and Assembly Options dialog. Allowing single sequence contigs to be added to your project can greatly decrease the speed of the assembly. This option is on by default.

 

      Always do vector scanning and quality trimming. The default Medium Quality Stringency trimming works well for most datasets.

 

      Increase the consistency of sequence depth throughout your assembly by the type of coverage you expect:

 

If you expect fairly even coverage, select the “Use Repeat Handling” option under Assembling parameters. Choose the “Fragment Length” radio button and enter the length of the genome or fragment being assembled.

 

If you expect deep coverage, select the ‘Use Repeat Handling’ option under the Assembling parameters. Choose the ‘Fixed’ radio button and enter your expected coverage size.

 

      When adding sequences to an existing assembly (iterative assembly) and adding only a sparse number of sequences, it is usually better to turn off the “Use Repeat Handling” option.

 

      Changing the “Match Size” assembly parameter is not advised as it will increase false joins.

 

      Upon completion of a de novo assembly, all of the single sequences that did not enter a contig remain in the Unassembled Sequences window and can be viewed by going to Sequence > Add. To add these sequences to the existing assembly, first adjust the parameters, and then click Assemble from the Unassembled Sequences window. Any sequences not included in your assembly will remain in the Unassembled Sequences window and will be saved within your SeqMan Pro project file.

 

      Increasing your computer’s memory and processor speed will influence the speed of your assembly. The more RAM and the faster the processor, the faster your assemblies will complete. On Windows, increasing your virtual memory may help.