The Transcriptome view opens automatically when you open a .Transcriptome file, the output from a de novo RNA-Seq assembly in SeqMan NGen. This view is a two-part report summarizing information about “Identified Transcripts” and “Novel Transcripts.” Use the drop-down menu at the upper-left of the view to switch between the two reports.
- Identified Transcripts – shows transcripts that had a Transcript Annotation Database match exceeding the specified thresholds. These are labeled with information from the best matching database entry following the default convention: [gene name]_[accession]_co_[assembly ID]_[contig ID]. If you click a column header, the table is not only sorted alphabetically according to the column, but also becomes color-coded for ease of reading.
- Novel Transcripts – shows assembled transcripts that either did not have a database match or had a preliminary match that then fell below thresholds upon further processing. The former is labeled following the convention cl_[assembly ID]_[contig ID] and the latter as [gene name]_[accession]_co_[assembly ID]_[contig ID] to give a hint as to the possible identity of that sequence.
Table columns:
Column name | Description |
---|---|
% Gene Match | Length of the matching segment in the database entry x 100, divided by the total length of the database entry. |
% Identity | Total number of identical bases in the matching region x 100, divided by the total number of bases in the matching region. |
% Transcript Match | Length of the matching segment in the transcript x 100, divided by the total length of the transcript. |
% of Full Length | Length of the assembled sequence x 100, divided by the length of the corresponding database entry. Values greater than 100% indicate that the assembled sequence is longer than the database entry. |
Accession Number * | Accession number of the best match. |
Assembled Reads | Total number of assembled reads for that sequence. |
Assembly ID | Name assigned to the assembled sequence, using the criteria specified in the wizard. |
Bit Score | Normalized value calculated from the raw score and expressed in units of “bits,” a common measure in information theory. |
Database | Database (e.g. RefSeq, Custom, etc.) from which the best matching gene came. |
Description * | Description of the best match. |
Gene End | Position in the database entry where the match ends. |
Gene Length | Length of the database entry, in bases. |
Gene Name * | Best matching gene meeting criteria defined in the wizard. |
Gene Start | Position in the database entry where the match begins. |
Organism Name * | Organism from which the best matching gene came. |
Transcript End | Position in the assembled sequence where the match ends. |
Transcript Length | Length of the assembled sequence, in bases. |
Transcript Start | Position in the assembled sequence where the match begins. |
eValue | “Expectation value,” an estimate of the probability of obtaining the observed alignment score with two random sequences. Expectation values are less sensitive to length than Bit scores and are therefore a better general measure of alignment quality. |
* Four columns use default names (e.g., Gene Name, Organism Name) if one of the default RefSeq databases was used in the SeqMan NGen assembly. However, if you used a custom GREP expression or a custom database that did not include these fields, these columns may have different names or be absent from the table.
About .Transcriptome files:
During the assembly stage of SeqMan NGen’s de novo transcriptome/RNA-Seq workflow, transcript consensus sequences are annotated and named via a search of a transcript annotation database e.g., NCBI’s RefSeq database. De novo transcriptome assembly output is saved to a package called [project name].Transcriptome. The package contains three sub-folders, as well as a text file with a high level summary of the results. The three subfolders are:
- Assemblies – Composed of a series of subfolders named sub_0, sub_1, sub_2, … each of which contain editable SQD documents of the Identified Transcripts assemblies. These SQD documents contain one or more contigs that match the same database entry. The Assemblies folder also contains a separate SQD entitled [project name]_NovelTranscripts.sqd, composed of the assembled contigs that did not have matches to the database, as well as a [project name]_AllUnassembled.fastq file containing the unassembled reads from the project in .fastq format.
- Reports – Contains a [project name].AllTranscripts.SearchResults, text file with summary information on both the Identified and Novel transcripts.
- Transcripts – Contains multi-sequence fasta files for the Identified and Novel Transcript consensus sequences.
Assembled transcripts with a database match exceeding the specified thresholds are referred to as “Identified Transcripts,” and are labeled with information from the best matching database entry following the default convention: [gene name]_[accession]_co_[assembly ID]_[contig ID]. In cases where a gene name is not provided for a database entry, the name will be truncated to [accession]_co_[assembly ID]_[contig ID].
Assembled transcripts that either did not have a database match or had a preliminary match that then fell below thresholds upon further processing are referred to as “Novel Transcripts”. The former is labeled following the convention cl_[assembly ID]_[contig ID] and the latter as [gene name]_[accession]_co_[assembly ID]_[contig ID] to give a hint as to the possible identity of that sequence.
Need more help with this?
Contact DNASTAR