Creating a Custom Transcription Annotation Database

The Transcript Annotation Database screen allows you upload a .fasta-formatted database for use in the Transcript Annotation workflow.

 

A custom database must meet the same formatting specifications as NCBI RefSeq files. They must:

 

      Be in fasta format (either single or multi-sequence files are supported)

 

      Use the field delimiter ‘|’ (without quotes) between fields

 

      Have a header line for each entry, written in the format:

 

ref|[Accession]|[Organism Name] [Description] ([Gene Name])

 

… where:

 

Accession - All characters between third and fourth field delimiters

 

Organism Name – The first two words after fourth field delimiter

 

Description - All words after Organism Name up to the end of the line, or up to a comma or parentheses, if the gene name exists

 

Gene Name - All characters in parentheses after Description

 

 

Example:

 

ref|XM_005842486.1|Chlorella variabilis hypothetical protein (CHLNCDRAFT_144668) mRNA, partial cds