I have only long read or only NGS data
Depending on the data type, use either the NGS-Based de novo assembly or PacBio/Nanopore de novo assembly workflows.
For PacBio/Nanopore data, assemblies can be done using either the “raw” sequence reads or “corrected” reads, which are a set of overlapping reads spanning each contig whose base accuracy have first been improved by alignment with other homologous reads in the data set.
The goal is to assemble reads into one topologically correct contig for each of the chromosomes and large plasmids/organelles. In practice, large chromosomes are often broken up into multiple contigs. In addition, small plasmids are often not represented in the data due to library preparation procedures.
Output files:
– An SQD project file that can be edited using SeqMan Ultra.
– A FASTA file of the consensus sequences that can be used as a starting point for downstream polishing/finishing workflows.
I have both long read and paired-end NGS data
Use the PacBio/Nanopore workflow De novo assembly and polishing. This workflow effectively combines two other workflows that you would otherwise need to do sequentially. In this workflow, you must use paired-end NGS data from the same isolate or sample that the long read data came from.
Behind the scenes, this single workflow first assembles the long read data using the algorithm from the PacBio/Nanopore workflow De novo assembly. Next, it polishes the consensus sequences using the algorithm from NGS polishing of a draft genome.
Output files:
– An editable SQD file of the de novo assembly.
– An SQD project file of the polished assembly for review and initial editing in SeqMan Ultra.
– FASTA files of both the de novo assembled and polished consensus sequences.
I have paired-end NGS data and a long read assembled initial draft sequence from a de novo long read assembly in DNASTAR SeqMan NGen (or from a third-party assembler)
Perform an initial polishing of the consensus sequences using either the PacBio/Nanopore workflow NGS polishing of a draft genome or the NGS-Based workflow Genome finishing – initial error correction. While functionally the same, we present these workflows as separate options for those who may be more familiar with either the term “polishing” or “genome finishing” to refer to the correction of internal assembly errors in the contig consensus sequences.
Users with long read assemblies from HGAP, Canu or Unicycler may find the options in the NGS polishing of a draft genome workflow to be more familiar. The Genome finishing – initial error correction workflow can be used to polish NGS or long read assemblies with additional NGS data.
Both workflows serve as a first phase comprehensive finishing step by taking de novo assembled draft consensus sequences and using a series of automated steps with high accuracy NGS data to correct the mis-assemblies and misalignments within each sequence. Both workflows involve a two-step process:
1) Paired-end NGS data are used to correct internal errors (both small and large mis-assemblies/misalignments) in an existing set of contig consensus sequences from a draft assembly of long read data. Again, the NGS data should come from the same isolate or sample as was used to generate the initial draft assembly.
2) A final de novo assembly of unaligned NGS reads identifies pieces of the genome missed during initial assembly (e.g. small plasmids or gaps between sequences caused by low coverage in a long read data set).
Output files:
– An SQD project file of the polished assembly for review and editing in SeqMan Ultra.
– A FASTA file of the consensus sequences after polishing, but prior to any manual editing.
I have both paired-end NGS data and an initially polished set of consensus sequences
Correct any remaining errors and/or small gaps between contigs using Genome finishing – refinement.
This workflow is a rapid final finishing step that aligns high accuracy NGS data to polished consensus sequences. It allows for rapid cycles of editing and confirmation ensuring that that the final sequence(s) has no ambiguous bases and that there is uniform coverage of consistently placed paired end data across each of the contigs.
The workflow begins with an existing set of polished contig consensus sequences and corrects any remaining internal errors in each contig using paired-end NGS data. It will also close any remaining small gaps between contigs using the pair information. There is also a setup option to extend and align read data off the ends of each contig which facilitates closing small gaps between sequences.
Output file:
– An SQD project file for review and editing in SeqMan Ultra.
I have both paired-end NGS data from a new strain/isolate and a closely related reference sequence
Construct the new sequence using the NGS-Based workflow Combined reference-guided/de novo assembly.
If you have NGS data, this workflow is an alternative method to de novo assembly for constructing a new genome. It uses an existing reference sequence from a closely related strain or isolate as the starting template. It then applies the same series of comprehensive steps as the PacBio/Nanopore workflow NGS polishing of a draft genome or the NGS-Based workflow Genome finishing – initial error correction to replace sequence differences from the reference with those of the new genome.
This workflow can also be used to discover and determine the sequence of plasmids/organelles specific to the new genome. The consensus sequences from this workflow are also suitable for final finishing using the NGS-Based workflow Genome finishing – refinement.
Output file:
– An SQD project file for review and editing in SeqMan Ultra.