Functions resulting in sequence expressions always start with a dollar sign ($) and include:
Objective | Expression | Example | Comments |
---|---|---|---|
Reverse-complement sequences, or to assign a file to the reverse-complement of another file | ~(sequence_set) complement(sequence_set) |
$rc=~(foo.fasta) foo_rc.fasta=complement(foo.fasta) MG1655_rc.fasta=~("C:\data\MG1655_e_coli_k_12substrands.fasta") |
Though the command is “complement” (for brevity), SeqNinja is actually calculating the reverse complement of the selection. |
Cut a sequence into smaller pieces | cut(sequence_set, size) |
bar.fasta=cut(foo.gb, %i) |
|
Specify an overlap when cutting a sequence | cut(sequence_set, size, offset ) |
bar.fasta=cut(foo.gb, 180, 60) |
This can be used to create faux reads from an assembled sequence. |
Extract sub-sequences from sequences corresponding to the given features | extract(sequence_set, 'feature_type[,...]') |
bar.fasta=extract(foo.gb, 'CDS') bar.fasta=extract(foo.gb, 'CDS,gene') |
Single quotes are required for arguments other than the sequence set. |
Extract matching features | extract(sequence_set, 'feature_type:/tag="value"[,...]') |
bar.fasta=extract(foo.gb, 'CDS:/gene="thrC"') bar.fasta=extract(foo.gb, 'CDS:/gene="thrC",CDS:/gene="thrL"') |
When extracting matching features, a qualifier can be optionally specified. If no qualifier is specified, the result includes all features of the given type. If a qualifier is specified, the result includes features that include a matching qualifier. Note: Writing many small extractions to a file format supporting features can be slow. For each extraction, all of the features are evaluated for intersection. |
Extract matching features using wildcards | bar.fasta=extract(foo.gb, 'CDS:/gene="thr?"') |
Wildcards can be used in the qualifier value. A ‘?’ matches exactly one arbitrary character, and a ‘*’ matches zero or more arbitrary characters. The example matches CDS features with four-character gene names beginning with “thr”. | |
Ignore source sequence qualifiers | translate(sequence_set , '/codon_start=ignore') |
$A=translate("myfile.gb", '/codon_start=ignore') |
Including “/codon_start=ignore” in the qualifiers causes any “/codon_start” qualifiers in the source sequence to be ignored. |
Translate all source sequences from DNA/RNA to protein | translate(sequence_set) |
$A=translate("myfile.fasta") $B=translate("myfile.fasta", '/transl_table=11') Test.gb=translate("myfile.gb", '/transl_table=11') $C=translate( "myfile.fasta", '/transl_table=VERTM' ) |
If the input sequence is annotated, each CDS feature will be translated separately, and any translation table and/or codon_start annotations will be honored. If the input is unannotated, or contains no CDS features, the entire sequence will be translated. Note that translation only progresses with sequences with lengths that are multiples of three (i.e., codons); an “extra” base or two at the end will not be reflected in the output. The standard code is used as a default unless specified differently in the file or in the qualifier overrides. The first codon in a sequence is translated as a start codon, if recognized as such in the genetic code. Otherwise, the default translation is used. Translation of a DNA or RNA sequence with ambiguities might result in an amino sequence with ambiguities, or not. For example, the result of translate (“RAT”) is the ambiguity “B”, but the result of translate (“ACN”) is “T”. |
Override defaults or values specified in the file | translate(sequence_set, '/tag=value [,/tag=value...]') |
$A=translate("myfile.gb", '/transl_table=11,/codon_start=ignore') |
This argument is a comma-separated list of qualifiers. |
Mark the sequences in a set as being DNA, RNA or protein. | dna(sequence-set) rna(sequence-set) protein(sequence-set) |
example.gb=protein("myfile.fasta") bar.fasta=protein("myfile.fasta") ("NAN",rend) |
This ability may be useful for sequences originating in formats where type is unspecified, such as FASTA files. If you use any of these functions, the information will be added to the output file, where allowed (e.g., .gbk). The presence of sequence type information may affect the results of searching in an endpoint expression. For example, in DNA, “N”=anything, whereas in protein, “N”=asparagine. |
Mark the sequences in a set as being circular or linear. | circular(sequence-set) linear(sequence-set) |
$A=circular(myfile.fasta) $B=linear(myfile.gb) myfile.gb=circular(myfile.fasta)("TAG",lend+4) myfile.gb=circular("ATG"+$B+"TAG") |
This ability may be useful for sequences originating in formats where type is unspecified, such as FASTA files. If you use either of these functions, the information will be added to the output file, where allowed (e.g., .gbk). The presence of sequence type information may affect the results of searching in an endpoint expression. For example, a match can cross the origin in a circular sequence, but not in a linear one. |
Collect multiple sequence-sets into one | collect(sources) (specified either as individual arguments or as single-quoted file patterns) |
$a=collect(foo.fasta, bar.fasta, baz.fasta) $a=collect("KSLLQQLLTE", "ARTKQTAR", "RPKPLVDP") $a=collect('C:/MyFolder/*.fasta') collect('*.gb', '*.genbank') |
This function accepts one or more arguments, each of which may be a sequence expression or a file pattern. A file pattern is a single-quoted search string that can match zero or more file paths. Wildcards (asterisks) may be used in the filename part of this string. This function allows:
|
Strip specified data out of sequences in a set | strip (sequence expression) (named arguments)+ |
strip("foo.gb") strip("foo.gb", data='features') strip("foo.gb", data='features,comments') strip("foo.gb", features='CDS') strip("foo.gb", features!='CDS') strip("foo.gb", features='CDS,gene') strip("foo.gb", features='CDS:/gene=yaaA') |
If no arguments are provided, the function strips the sequences in the argument down to their name and residues. Most meta-information is stripped, including features and comments. Arguments:
|
Annotate sequences in the first argument with features obtained from a from* argument. | annotate (sequence expression)+(named arguments)+ |
annotate( to.fasta, fromFile='myfile.vcf' ) annotate( to.fasta, fromFile='myfile.vcf', ids='chr1,chr2,chr3,chrX' ) annotate( to.fasta, fromSequences=from.gb ) annotate( to.gb, fromSequences=from.gb ) annotate( to.gb, fromSequences=from.gb, features='CDS' ) annotate( to.gb, fromSequences=from.gb, features!='CDS' ) annotate( to.gb, fromSequences=from.gb, features='CDS:/gene=yaaA' ) |
Arguments:
|
Sample the sequences in the input set | sample(sequence-set, argument=value) (See description of arguments, below) |
sample("foo.fasta", from=10000, to=20000, by=10) sample("foo.fasta", p='0.95') sample("foo.fasta", name='GEK*') |
This can be useful for separating reads into different sets, or for reducing a very large number of reads to a smaller number (e.g., because of software limitations). Each of the arguments is optional. Any combination of arguments can be used, in any order. At most, one of each argument may be used. The output sequences are in the same order in which they appear in the original set. Specify sampling everything other than the specified value, precede the value with an exclamation mark ! . |
Arguments for the expression 'sample' :
|
Need more help with this?
Contact DNASTAR