In part A of the tutorial, you will use SeqMan NGen to de novo assemble and annotate the RNA-Seq data.
- Download T5_RNA-Seq_DeNovo_Transcriptome.zip (147 MB) and extract it to any convenient location (i.e., your desktop). The tutorial data consist of the paired-end reads Yeast_RNASeq_1Mreads_1.fastq and Yeast_RNASeq_1Mreads_2.fastq.
- Launch SeqMan NGen and choose RNA-Seq/Transcriptomics on the left. On the right, click on De novo transcriptome.
- In the Set Contaminant screen, take the opportunity to verify that you are logged in by looking at the key icon in the bottom left corner. If there is a green check mark, click Next. If there is a yellow triangle, click the icon and enter the same login credentials you use for the DNASTAR website. Once you return to the Set Contaminant screen, click Next.
- In the Input Sequences screen, press Add and add the Yeast_RNASeq_1Mreads_1.fastq and Yeast_RNASeq_1Mreads_2.fastq files. Click Next.
- In the Transcript Annotation Database screen:
- Check the box next to Use database to annotate transcripts. This step is very important!
- Click the Download Database button to open the following popup.
- Choose RefSeq Fungi and press Select.
- Click Next.
- Check the box next to Use database to annotate transcripts. This step is very important!
- In the Assembly Options screen, click Next.
- In the Assembly Output screen, type Transcriptome into the Project Name text box, then use the Browse button to specify a Project Folder for your assembly output files. Click Next.
- In the Run Assembly Project screen, note that:
- The estimated disk requirement of 2.1 TB is based on the total length of the fungal Transcript Annotation Database, which is 4.2 GB: larger than a human genome. That estimate is based on reference guided genome assemblies that have fixed 50X coverage, not reference guided transcriptome assemblies, which have highly variable coverage. The assembly in this tutorial has extremely low coverage and uses far less disk space than what is estimated here.
- Cloud Assembly is not offered for the de novo transcriptome workflow because most data sets exceed the 48 hour time limit.
Click the link “Run assembly on this computer.” The assembly will take approximately one hour on a standard laptop.
- The estimated disk requirement of 2.1 TB is based on the total length of the fungal Transcript Annotation Database, which is 4.2 GB: larger than a human genome. That estimate is based on reference guided genome assemblies that have fixed 50X coverage, not reference guided transcriptome assemblies, which have highly variable coverage. The assembly in this tutorial has extremely low coverage and uses far less disk space than what is estimated here.
- Wait until being informed that assembly has finished, then click Next.
During assembly, any assembled transcripts with a database match exceeding the specified thresholds were termed “Identified Transcripts,” while assembled transcripts that did not have a database match were called “Novel Transcripts.”
- In the lower half of the Assembly Summary screen, do a quick check of the results. Your numbers will likely differ from those in the image below, but you should have at least 1300 Identified Transcripts and over 40 Novel Transcripts.
- In the upper part of the same screen, click the button View assembled transcripts to open transcriptome results in SeqMan Ultra.
- (optional), If desired, you may now return to SeqMan NGen and close that application by clicking Finish and then Yes.
Proceed to Part B: Viewing annotated transcripts in SeqMan Ultra.
Need more help with this?
Contact DNASTAR