To open the Annotation Options dialog, press the Transcript Annotation Options button in the Transcript Annotation Database screen. The Annotation Options dialog can be used to change the default naming convention used for RefSeq packages, or to specify a custom GREP and naming convention for non-RefSeq packages.
Annotation Naming Convention:
Check the Use default naming convention box if you want to keep or to return to the default naming convention (geneName accession). Uncheck the box to enable dialog options allowing you to customize the naming convention.
If you uncheck the Use default naming convention box, the dialog provides two ways to customize names.
- Manual selection of naming components – Using the Primary, Secondary and Tertiary name boxes, you can select up to three “compound name” components in any order. Click in each box you want to use to enable it, then make a selection from the list: accession, description, geneName, organismName or none. The Example below the boxes shows your current selection(s).
If you made the selections as shown in the preceding image, for example, then a sample E. coli transcript would automatically be assigned a name like: thrB - Escherichia coli str. K-12 substr. DH10B - 6058639
- Annotation name matching using grep syntax – If you uploaded a transcript annotation database, it automatically describes a GREP string that uses the FASTA headers to define rules for the naming convention. The same string will define the “default naming convention”. If you want to edit this GREP string, thus changing the rule regarding the extraction of contig name fragments from the FASTA headers, check the Use custom annotation name matching check box. After checking the box, you may edit the regular expression/GREP by typing in the Custom grep match textbox. An example is provided in the text box: gi\|.*\|ref\|(?‘accession’.*)\|[ ]*(PREDICTED:)*\s*(?‘organismName’\w* \w*)\s*((?‘description’.*)([^(]*)(\((?‘geneName’\w*)\)[^(]*,.*)|((?‘description’.*),.*)|(?‘description’.*))
Annotation Match Quality Settings:
This section of the dialog supplies two metrics for specifying what constitutes a valid match between the reference sequence (the “query”) and the database entry.
- Minimum percent of reference sequence matching the transcript – Enter the minimum percentage of the query that must match the database entry. The default is 80%.
- Minimum percent of transcript matching the reference sequence - Enter the minimum percentage of the database entry that must match the query. The default is 50%.
****************
To save the current selections and close the dialog, click Save. To return to the default settings, press Reset to Default. To close the dialog without saving changes, click Cancel.
Need more help with this?
Contact DNASTAR