Variant Analysis/Resequencing workflows - User Guide to SeqMan NGen - 18.0

Welcome to SeqMan NGen
SeqMan NGen Tutorials
- Whole genome reference-guided workflow
- Whole genome de novo workflow with mate pair data
- De novo assembly using Sanger data
- Analysis of a whole genome de novo assembly
- RNA-Seq de novo transcriptome workflow
  - Part A: Setting up the transcriptome assembly in SeqMan NGen
  - Part B: Viewing annotated transcripts in SeqMan Ultra
- RNA-Seq reference-guided workflow with analysis in ArrayStar
- ChIP-Seq workflow with analysis in ArrayStar
- Copy number variation (CNV) workflow with analysis in ArrayStar and GenVision Pro
- Whole genome reference-guided workflow with analysis in ArrayStar
  - Part A: Setting up the assembly in SeqMan NGen
  - Part B: Analyzing the results in ArrayStar
- Long-read analysis with accuracy evaluation
  - Part A: Running the assembly in SeqMan NGen and viewing it in SeqMan Ultra
  - Part B (optional): Evaluating assembly accuracy using QUAST
- Exome workflow with analysis in ArrayStar
- Templated long-read workflow (ARTIC)
  - Part A: Creating draft genomes in SeqMan NGen and exporting a consensus from SeqMan Ultra
  - Part B: Using MegAlign Pro to determine the SARS-CoV-2 variant in an experimental sample
Wizard screen descriptions
- Welcome
- Workflow
  - Variant Analysis/Resequencing workflows
  - RNA-seq/transcriptomics workflows
    - Include DESeq2 or edgeR statistics
    - Include gene set enrichment analysis (GSEA) statistics
  - De novo genome assembling and editing workflows
    - Create a reference-guided assembly to use in the “SNP to Structure” workflow
    - Remove PhiX control reads from Illumina data prior to import
  - Metagenomics workflows
  - Variant Call Format (VCF) files workflows
  - Combine/Reanalyze Existing Assemblies
- Analysis Options
  - RNA-seq normalization methods
  - ChIP-seq peak detection methods
- Assembly Log
- Assembly Options
- Assembly Output
- Assembly Summary
- Cloud Monitor
- Define Binding Proteins
- Input Assemblies
- Input Assembly
- Input Contig Sequences
- Input Host Files
- Input Reference (Sequence, Genome, for Scaffolding, etc.)
  - Annotate reference sequences prior to import
  - Manually specify an isoform prior to import
  - Use RNA-Seq de novo transcriptome output as a reference
  - Specify a VCF, BED or Manifest file
- Input Sequences
  - Specify read technology
  - Specify paired-end data
    - Example regular expressions
  - Specify single sample, multi-sample or replicate data
  - Specify RNA-Seq options
- Input VCF Files
- Input Viral Genomes
- Post Assembly Options
- Preassembly Options
  - Preassembly Options for long-read workflows
  - Preassembly Options for all other workflows
- Run Assembly Project
  - Monitor the progress of a Cloud Assembly
- Set Contaminant
- Set Up Experiments
- Set Up Replicate Sets
- (Short Read) Polishing Options
- Transcript Annotation Database
  - Add a DNASTAR transcriptome package
  - Create a custom transcript annotation database
  - Use a local copy of RefSeq as a transcript annotation database
  - Annotation Options dialog
- Options tabs
  - Alignment tab
  - Layout tab
    - Layout tab (Preassembly Options, long read)
    - Layout tab (Assembly or Analysis Options)
  - Peak Detection tab
  - Scans tab
  - Trimming tab
    - Trimming tab (Preassembly Options, all others)
    - Trimming tab (Assembly Options)
  - Variants tab
    - Filter based on “P not Ref”
Log in to Cloud Assemblies
Use the DNASTAR Cloud Data Drive
- License and Credential Requirements
- The DNASTAR Cloud Data Drive User Interface
- Access the DNASTAR Cloud Data Drive
- Create a New Cloud Folder
- Transfer a Folder from a Physical Computer to the Cloud
- Transfer Files from a Physical Computer to the Cloud
- Transfer Files or Folders from the Cloud to a Physical Computer
- Permanently Remove Files and Folders from the Cloud
- Close the DNASTAR Cloud Data Drive
Navigate between wizard screens
Add and remove files in the wizard
- Add sequences from your computer or the cloud
- Add a genome template from DNASTAR
- Add a genome template from NCBI
- Remove a sequence from the list
Use editing commands in the wizard
Monitor the progress of a cloud assembly
Access and understand output files
- Accessing DESeq2 plots from an RNA-Seq assembly
- View the Project Report
  - Project Report contents for reference-guided workflows
  - Project Report contents for de novo workflows
- Reference-guided workflow output
  - Contents of the .assembly package
    - Contents of the -reports folder
      - Contents of the -zinternal folder
- De novo workflow output
- RNA-Seq reference-guided workflow output
- RNA-Seq de novo transcriptome workflow output
Appendix
- SeqMan NGen calculations
- Run SeqMan NGen through the command line
- Turn off usage logging
- Non-English keyboards
- Installed Lasergene file locations
- Troubleshoot failure to launch
- Troubleshoot issue with scrolling (macOS only)
- Research references

Download as PDF

The following table describes each of the workflows available in the Variant Analysis/Resequencing tab of the Workflow screen.

Group	Workflow	Description
ABI / Sanger	Whole genome	Align Sanger trace data from one or multiple samples to a genomic reference or genome template package for accurate SNP/Indel analysis. This type of assembly can include billions of reads and large eukaryotic genomes. After assembly, compare results in ArrayStar using the SNP Report.
	Amplicon	Align Sanger trace data from one or multiple samples to targeted genes or genomic regions for accurate SNP/Indel analysis. Assembles a region of interest produced by PCR amplification.
	Clone verification	Align reads to confirm clone integrity and insert orientation. (Note: For a dedicated clone verification workflow, see this topic in the SeqBuilder Pro User Guide).
	Haplotag generation	This option was developed for genome-assisted breeding (GAB), a modern plant breeding technique that uses genetic information to accelerate the development of new crop varieties with desired traits. Haplotag generation can help researchers identify specific genes associated with traits like yield, disease resistance, or nutritional content. This workflow also has utility for studying the genomes in a population.
NGS-based	Whole genome	Align NGS sequence data from one or multiple samples to a genomic reference or genome template package for accurate SNP/Indel analysis. This type of assembly can include billions of reads and large eukaryotic genomes.
	Amplicon, gene panel, exome	Align NGS sequence data from one or multiple samples to targeted genes or genomic regions for accurate SNP/Indel analysis. Gene panels look at specific gene regions, usually those corresponding to known defects. Exome assembly saves assembly time and resources by specifically targeting only exons and coding regions, but do require you to have the corresponding .bed file from the capture kit. For instance, if you used Human Genome build 38 as the reference, for example, the corresponding .bed file might be called Human genome build38.bed. If using this workflow with cancer samples, check the box next to Somatic/Cancer/Heterogeneous in the Analysis Options screen. In most cases, downstream analysis of these finished assembly will place in ArrayStar.
	Viral-host integration detection	Locate prophage and retro-viral insertion sites in host genome. Available in Lasergene 17.1 and later. Used to locate putative viral insertion sites or to predict the location of other inserted sequences, such as transposable elements. When you select this workflow, SeqMan NGen automatically sets up a templated assembly that is optimized for locating viral insertion sites. Since chimeric reads (sequences consisting of both host and viral DNA) usually indicate viral insertion sites, SeqMan NGen looks for chimeric reads in a multi-step process. First, the viral genome is used as the initial assembly template. Next, the sub-set of reads that mapped to the viral genome is then re-assembled against the host template. During both reference-guided assembly steps, SeqMan NGen “masks” (trims) whichever half of the chimeric read does not match the template for that step. The host template assembly results are output in BAM file format. SeqMan Ultra is used to explore possible viral insertion sites post-assembly. Launch SeqMan Ultra and use Contig > Contig Coverage to view tabular data for the individual contigs. Navigate to positions with multiple reads, as evidenced in the depth column. The reads at these positions should be trimmed to the same base indicating the insertion site. You may “untrim” the reads to verify that they also contain viral sequence.
PacBio / Nanopore	Whole genome	The long read version of the NGS workflow described above. This workflow uses a new long read alignment algorithm released with Lasergene 17.5 (July 2023). This aligner performs fast and accurate alignments for PacBio HiFi and ONT data while simultaneously calling variants.
	Amplicon, gene panel, exome	The long read version of the NGS workflow described above. This workflow uses a new long read alignment algorithm released with Lasergene 17.5 (July 2023). This aligner performs fast and accurate alignments for PacBio HiFi and ONT data while simultaneously calling variants.
	ARTIC Amplicon	Choose this workflow if you are running any templated assembly using Oxford Nanopore or PacBio CLR/HiFi long read data. The parameters for this workflow are tailored to viral long read data from PCR amplified fragments generated using ARTIC primer sets, but—despite the name—will work with any long read data.
Variant Call Format (VCF) files	Functional annotation of a single sample	Annotates the variant positions with functional information from a database, including affected genes and impact on protein encoding regions and/or splice sites.
Variant Call Format (VCF) files	Annotation and comparison of multiple samples	Allows multiple samples in VCF format to be annotated and then compared to identify genes and/or variants of interest in ArrayStar. This workflow is designed to use with assemblies created outside SeqMan NGen (e.g., using BWA + GATK). Such assemblies often have .vcf files as their only output.
Phylogenetics / Genome Alignment	Identify gene homologs and build trees	With the release of Lasergene 17.6, a gene homology. workflow for nucleotide sequences was added to the MegAlign Pro application. While the setup and computations appeared to take place within MegAlign Pro, they actually utilized the SeqMan NGen wizard and assembler. As of Lasergene 18, the workflow can now be initiated from within SeqMan NGen using this workflow option. However, the assembly output needs to be manually imported into MegAlign Pro and you will need to initiate alignment there. For a seamless alignment and downstream analysis experience, we strongly recommend running this workflow from within MegAlign Pro.

Variant calling accuracy workflow

Need more help with this?
Contact DNASTAR