Example Regular Expressions

Examples of expressions you may find useful regarding paired end naming specifications follow. Please note this is not a complete list of regular expressions, and the definitions of the terms used are limited to their application to SeqMan NGen paired end naming specifications.


Special Characters

[ ]

Character class used to enclose a list of alternatives. For example:


[Aa]bc matches abc and Abc.


If the first character is a carat (^), it means anything but the characters on the list. Thus: [^a]bc matches xbc but not abc


A switch that makes special characters literal and literal characters special

( )

Grouping--used to delimit a string comprising a “phrase.” Phrases are necessary in paired end specification so you can match a pair of forward and reverse reads while still distinguishing their orientation. In SeqMan NGen, phrases in parentheses must match for two reads to qualify as a pair; phrases outside the parentheses are used to distinguish members of the same pair.


Any digit (0-9)


Any non-digit character


Any alphanumeric “word” character (including “_”)


Any character


Alternate--either the term before “|” or after “|”


Match at the beginning of the line only


Match at the end of the line only

Numerical Modifiers


0 or more


1 or more


1 or 0


Exactly n


At least n


At least n but not more than m

Example Expressions and Their Meanings


Literally the letter d


Any digit (0-9)


Zero or more digits


One or more digits


A phrase comprising one or more digits--same as “\d+”, but causes SeqMan NGen to match the names from the string inside the phrase when other characters in the name may not match.


Literally the period symbol (.)


Any character


One or more of any characters


Zero or more of any characters


a OR b


abi or ab1


Ends with abi


A period OR a digit


a OR b OR c


One or more characters from the set a, b, c


Any number of any characters followed by the letter “f”


A phrase comprising any number of any characters, followed by the letter “f”--same as “.*f”, but causes SeqMan NGen to match the phrase in parentheses without matching the “f” in a read name


One or more non-digit characters followed by “r” followed by one or more digits.


Two, three or four digits followed by “f” followed by “.abi