The Collect Sequences template, located in the Templates panel, allows you to collect all files in the same folder that share a particular naming pattern, and then either save them as a sequence file or convert them to a sequence variable to be referenced as a data source in later steps.
- Specify whether you wish to save the output as a sequence variable or a sequence file.
- To save as a sequence variable: By default, the output is saved as a sequence variable named “collected_genomes.” If you do not wish to use the default name, type a new name next to Define sequence variable. To use the output from this step as the input for a later step, you will simply reference the variable name you selected here.
Note: If you do not see Define sequence variable at the top of the step, bring it back by right-clicking on the blue frame around the step and choosing Assign to Variable.
- To save as a sequence file: Right click on the blue frame around the step and choose Write to File.
The sequence variable input box at the top of the step disappears and is replaced with a “Save Results” section at the bottom of the step. If you wish to save to a non-default location, click Save Results As, navigate to the desired location for the output sequence, and then click Save. By default, the sequence is saved in GenBank format and is called test.gbk. To change the format, use the Format drop-down menu; to change the filename, enter a new filename in the center textbox.
- To save as a sequence variable: By default, the output is saved as a sequence variable named “collected_genomes.” If you do not wish to use the default name, type a new name next to Define sequence variable. To use the output from this step as the input for a later step, you will simply reference the variable name you selected here.
- To specify a pattern common to each of the sequence names, click the Edit Pattern button. This launches the Edit File Pattern dialog.
Within the Edit File Pattern dialog:
- Use the Filename Pattern dropdown menu to choose from a variety of file extensions and naming patterns.
- In the Filename Pattern text box, type in the common portion of the filename for the files you wish to collect. Use an asterisk as a wildcard to signify which portion of the name varies between sequences.
- Click OK to return to the Collect Sequences template.
- Use the Filename Pattern dropdown menu to choose from a variety of file extensions and naming patterns.
- The Extract Features as Sequences step is initially included as an “example step” after the Collect Sequences step. It can be used, removed (by right-clicking and choosing Delete Current Selection), or replaced with a different step, as desired.
- Click Browse and navigate to the folder containing the files that will be collected. After selecting the folder, press OK to close the Browse dialog and OK again to close the Edit File Pattern dialog.
Example 1:
If you wanted to collect two files named ysch2b1.seq and ysch2b2.seq, you would choose the .seq extension in Step 1. In Step 2, you would type ysch2b before the asterisk, yielding the following Filename Pattern text box entry: ysch2b*.seq.
Example 2:
The goal is to collect all files in the specified directory that have names beginning with “alpha” and extensions beginning with “fas.” The specified directory contains the files alpha.fas, alphabet.fasta, alpha.fap, and beta.fas.
SeqNinja collects all the sequences from multi-sequence files alpha.fas and alphabet.fasta and saves them to the multi-sequence file Alpha Results.fasta.
Need more help with this?
Contact DNASTAR