Creating a Custom Transcription Annotation Database

The Transcript Annotation Database screen allows you upload a .fasta-formatted database for use in the Transcript Annotation workflow.

A custom database must meet the same formatting specifications as NCBI RefSeq files. They must:

• Be in fasta format (either single or multi-sequence files are supported)

• Use the field delimiter ‘|’ (without quotes) between fields

• Have a header line for each entry, written in the format:

ref|[Accession]|[Organism Name] [Description] ([Gene Name])

… where:

Accession - All characters between third and fourth field delimiters

Organism Name – The first two words after fourth field delimiter

Description - All words after Organism Name up to the end of the line, or up to a comma or parentheses, if the gene name exists

Gene Name - All characters in parentheses after Description

Example:

ref|XM_005842486.1|Chlorella variabilis hypothetical protein (CHLNCDRAFT_144668) mRNA, partial cds