The Sample Sequences template, located in the Templates panel, is used to make an output file that contains a filtered set of sequences from the source file. Source file sequences can be filtered according to one or more specified conditions, such as length, contents, and start/end sequence characters.
Initially, template options are pre-selected (or pre-filled) to show an example of how to filter for sequences at least 375 nt in length and containing the sequence “GATCT.” It is intended that you overwrite these selections to fit your own needs.
- One or more filter rows are needed in order to specify the sampling criteria. Two Filter rows have been provided as examples and can be edited or removed.
- To delete a Filter row or add a new one, click on the plus or minus tools () on the right of each row.
- To edit a Filter row, make selections from the Filter drop-down menus and filling in the corresponding Value boxes. The Filter drop-down menus offer the following options:
- To delete a Filter row or add a new one, click on the plus or minus tools () on the right of each row.
Use this filter: | To include: | Allowable values |
---|---|---|
Minimum Length | Only sequences the same or longer than the specified length. | Positive integer |
Contains | Only sequences containing a specified sequence fragment. For sequences of DNA or unknown type, matches can occur on either strand. | DNA or protein sequence fragment using 1-letter IUPAC codes. |
From Sequence Index | All sequences beginning with the sequence of this name. | Sequence name |
To Sequence Index | All sequences up to and including the sequence of this name. | Sequence name |
Sample Every | Every ‘nth’ sequence in the source file, where ‘n’ is a positive integer. | Positive integer |
Sequence Name | All sequences with this name. | Sequence name |
Probability to Include | A random subset of sequences. Each member of the source set individually has the given probability of being included. | A single-quoted decimal value from 0.0-1.0 (e.g., ‘0.7’) |
Maximum Length | Only sequences the same or shorter than the specified length. | Positive integer |
Starts With | Only sequences beginning with a specified sequence fragment. For sequences of DNA or unknown type, matches can occur at the beginning of either strand. | DNA or protein sequence fragment using 1-letter IUPAC codes. |
Ends With | Only sequences ending with a specified sequence fragment. For sequences of DNA or unknown type, matches can occur at the end of either strand. | DNA or protein sequence fragment using 1-letter IUPAC codes. |
- In the Choose Sequence(s) button row, enter the sequence you wish to sample (see Add and modify a sequence).
- In the Save Results As area, choose the name and location in which to save the output (see Specify output format and location).
- Use the Format drop-down menu to select the file type for the output file.
Example #1 input:
Example #1 output:
A .fasta file containing the name and length of each output sequence (sequences #1, 3, 5, 7 and 9), followed by the sequence itself.
Example #2 input:
Example #2 output:
All sequences from contig01.fas that contain the sequence segment “TTGTT” have the bases “ATG” added to the beginnings of their sequences and “TAG” added to the ends of their sequences.
Need more help with this?
Contact DNASTAR