Examples of expressions you may find useful regarding paired end naming specifications follow. Please note this is not a complete list of regular expressions, and the definitions of the terms used are limited to their application to SeqMan NGen paired end naming specifications.
Special Characters | |
---|---|
[ ] | Character class used to enclose a list of alternatives. For example: [Aa]bc matches abc and Abc. If the first character is a carat (^), it means anything but the characters on the list. Thus: [^a]bc matches xbc but not abc. |
\ | A switch that makes special characters literal and literal characters special. |
( ) | Grouping--used to delimit a string comprising a “phrase.” Phrases are necessary in paired end specification so you can match a pair of forward and reverse reads while still distinguishing their orientation. In SeqMan NGen, phrases in parentheses must match for two reads to qualify as a pair; phrases outside the parentheses are used to distinguish members of the same pair. |
\d | Any digit (0-9) |
\D | Any non-digit character. |
\w | Any alphanumeric “word” character (including “_”) |
. | Any character |
| | Alternate--either the term before “|” or after “|” |
^ | Match at the beginning of the line only. |
$ | Match at the end of the line only. |
Numerical Modifiers | |
* | 0 or more |
+ | 1 or more |
? | 1 or 0 |
{n} | Exactly n |
{n,} | At least n |
{n,m} | At least n but not more than m |
Example Expressions and Their Meanings | |
d | Literally the letter d |
\d | Any digit (0-9) |
\d* | Zero or more digits |
\d+ | One or more digits |
(\d+) | A phrase comprising one or more digits--same as “\d+”, but causes SeqMan NGen to match the names from the string inside the phrase when other characters in the name may not match. |
\. | Literally the period symbol (.) |
. | Any character |
.+ | One or more of any characters |
.* | Zero or more of any characters |
a|b | a OR b |
ab[i1] | abi or ab1 |
abi$ | Ends with abi |
[\.\d] | A period OR a digit |
[abc] | a OR b OR c |
[abc]+ | One or more characters from the set a, b, c |
.*f | Any number of any characters followed by the letter “f” |
(.*)f* | A phrase comprising any number of any characters, followed by the letter “f”--same as “.*f”, but causes SeqMan NGen to match the phrase in parentheses without matching the “f” in a read name |
(\D+)r(\d+) | One or more non-digit characters followed by “r” followed by one or more digits. |
(\d{2,4})f(\.abi) | Two, three or four digits followed by “f” followed by “.abi” |
Need more help with this?
Contact DNASTAR