Examples of expressions you may find useful regarding paired end naming specifications follow. Please note this is not a complete list of regular expressions, and the definitions of the terms used are limited to their application to SeqMan NGen paired end naming specifications.
Special Characters | |
[ ] |
Character class used to enclose a list of alternatives. For example:
[Aa]bc matches abc and Abc.
If the first character is a carat (^), it means anything but the characters on the list. Thus: [^a]bc matches xbc but not abc |
\ |
A switch that makes special characters literal and literal characters special |
( ) |
Grouping--used to delimit a string comprising a “phrase.” Phrases are necessary in paired end specification so you can match a pair of forward and reverse reads while still distinguishing their orientation. In SeqMan NGen, phrases in parentheses must match for two reads to qualify as a pair; phrases outside the parentheses are used to distinguish members of the same pair. |
\d |
Any digit (0-9) |
\D |
Any non-digit character |
\w |
Any alphanumeric “word” character (including “_”) |
. |
Any character |
| |
Alternate--either the term before “|” or after “|” |
^ |
Match at the beginning of the line only |
$ |
Match at the end of the line only |
Numerical Modifiers | |
* |
0 or more |
+ |
1 or more |
? |
1 or 0 |
{n} |
Exactly n |
{n,} |
At least n |
{n,m} |
At least n but not more than m |
Example Expressions and Their Meanings | |
d |
Literally the letter d |
\d |
Any digit (0-9) |
\d* |
Zero or more digits |
\d+ |
One or more digits |
(\d+) |
A phrase comprising one or more digits--same as “\d+”, but causes SeqMan NGen to match the names from the string inside the phrase when other characters in the name may not match. |
\. |
Literally the period symbol (.) |
. |
Any character |
.+ |
One or more of any characters |
.* |
Zero or more of any characters |
a|b |
a OR b |
ab[i1] |
abi or ab1 |
abi$ |
Ends with abi |
[\.\d] |
A period OR a digit |
[abc] |
a OR b OR c |
[abc]+ |
One or more characters from the set a, b, c |
.*f |
Any number of any characters followed by the letter “f” |
(.*)f |
A phrase comprising any number of any characters, followed by the letter “f”--same as “.*f”, but causes SeqMan NGen to match the phrase in parentheses without matching the “f” in a read name |
(\D+)r(\d+) |
One or more non-digit characters followed by “r” followed by one or more digits. |
(\d{2,4})f(\.abi) |
Two, three or four digits followed by “f” followed by “.abi” |