TIPS FOR SUCCESSFUL DE NOVO TRANSCRIPTOME SEQUENCE ASSEMBLY
USING RNA-SEQ DATA
DOWNLOAD THIS GUIDE
Home > Product Literature > Lasergene Genomics vs CLC Genomics Workbench | DNASTAR
Chapter 3: Third-Party Comparison: Lasergene Genomics vs. CLC Genomics Workbench
Third-Party Comparison: Lasergene Genomics vs. CLC Genomics Workbench
We think the de novo transcriptome workflow in Lasergene Genomics is a leader in its class, and this 2018 study agreed that SeqMan NGen performed this workflow better than CLC Genomics Workbench in a variety of areas.
REFERENCE: Jonathan Chacon and Math P. Cuajungco. “Comparative De novo Transcriptome Assembly of Notophthalmus viridescens RNA-seq Data using Two Commercial Software Programs.” Calif J Health Promot. 2018 Jun; 16(1): 46–53.
Below, we expand on some of the findings identified in the paper.
To download this entire ebook as a PDF, click here.
Flexible assembly setup
The study authors report that SeqMan NGen “…allows users to specify rRNA or other input contaminant sequences prior to assembly. This option is not currently available in the CLC GW de novo transcriptome workflow.”
In addition to letting you specify rRNA and other contaminant sequences, SeqMan NGen’s wizard also lets you remove specific vector or adapter sequences (Figure 2). Alternatively, you can elect to perform fully automated adapter removal by checking the “Remove universal adapter” option.
Fewer and longer contigs
With other applications, de novo assembly of RNA-Seq data can potentially result in thousands of unlabeled contigs representing the expressed transcripts. Performing meaningful downstream analysis on this many unannotated contigs is nearly impossible.
Using its proprietary assembly algorithm, however, SeqMan NGen creates fewer and longer contigs than CLC Genomics Workbench. The study authors noted that “… the Lasergene SMN Trace Evidence consensus-calling algorithm generated longer contigs on average…Meanwhile, CLC GW had assembled over nine times the amount of contigs…”
How does SeqMan NGen do it? SeqMan NGen automatically attempts to group contigs from the same gene, and then name and annotate them based on the best match to a collection of annotated reference sequences (the “Transcript Annotation Database”) extracted from data on NCBI’s RefSeq website. The total count of transcript fragments that aligned and matched RefSeq sequences provides the sequencing coverage. Many data sets assembled with SeqMan NGen produce a large number of long transcripts that are likely full-length transcripts.
Contaminant sequence reporting
Software that lacks the ability to report excluded reads may be oversampling the reads, reducing the precision of the transcriptome assembly. By contrast, SeqMan NGen reports which reads were excluded. The comparison study found that SeqMan NGen “…clearly defines excluded reads in its project report…”
Downstream analysis capabilities
After de novo transcriptome assembly, other applications in the Lasergene Genomics package allow different types of downstream analysis.
Comprehensive reports
Want to know if you’re seeing something new? Open the finished assembly in SeqMan Ultra to view known and novel transcripts separately in two highly customizable and sortable reports (Figure 11).
According to the study authors, SeqMan NGen “produced both annotated and novel transcripts lists. The NCBI RefSeq database was used to obtain a number of known or homologous genes from the assembled transcript sequences.” By contrast, “The CLC GW assembly output contained a list of assembled transcripts and unassembled sequence reads.”
To see DNASTAR’s benchmarks comparing identified and novel transcripts assembled for different data sets, see this blog post.
By the way, if you’re curious why the average transcript length found by software is often shorter than the length of the organism’s mRNA, the blog post above also explains this phenomenon. The short answer is that read length makes a huge difference in de novo transcriptome assemblies. Illumina reads over 150bp in length typically produce much longer assembled transcripts–up to full length– while reads less than 150bp may produce transcripts as little as half the length of the mRNA.
Integrated heat maps and gene ontology
You can use ArrayStar to view transcriptome results as a heat map (Figure 12) and to perform gene expression analysis on the transcripts.
You can also use ArrayStar to explore gene ontology. According to the paper’s authors, “Gene ontology (GO) analysis provides functional description of the genes and existing relationship or functional nodes among genes.” SeqMan NGen “has an integrated tool to perform GO analysis, but not CLC Genomics Workbench.”