Example Regular Expressions

Examples of expressions you may find useful regarding paired end naming specifications follow. Please note this is not a complete list of regular expressions, and the definitions of the terms used are limited to their application to SeqMan NGen paired end naming specifications.

 

Special Characters

[ ]

Character class used to enclose a list of alternatives. For example:

 

[Aa]bc matches abc and Abc.

 

If the first character is a carat (^), it means anything but the characters on the list. Thus: [^a]bc matches xbc but not abc

\

A switch that makes special characters literal and literal characters special

( )

Grouping--used to delimit a string comprising a “phrase.” Phrases are necessary in paired end specification so you can match a pair of forward and reverse reads while still distinguishing their orientation. In SeqMan NGen, phrases in parentheses must match for two reads to qualify as a pair; phrases outside the parentheses are used to distinguish members of the same pair.

\d

Any digit (0-9)

\D

Any non-digit character

\w

Any alphanumeric “word” character (including “_”)

.

Any character

|

Alternate--either the term before “|” or after “|”

^

Match at the beginning of the line only

$

Match at the end of the line only

Numerical Modifiers

*

0 or more

+

1 or more

?

1 or 0

{n}

Exactly n

{n,}

At least n

{n,m}

At least n but not more than m

Example Expressions and Their Meanings

d

Literally the letter d

\d

Any digit (0-9)

\d*

Zero or more digits

\d+

One or more digits

(\d+)

A phrase comprising one or more digits--same as “\d+”, but causes SeqMan NGen to match the names from the string inside the phrase when other characters in the name may not match.

\.

Literally the period symbol (.)

.

Any character

.+

One or more of any characters

.*

Zero or more of any characters

a|b

a OR b

ab[i1]

abi or ab1

abi$

Ends with abi

[\.\d]

A period OR a digit

[abc]

a OR b OR c

[abc]+

One or more characters from the set a, b, c

.*f

Any number of any characters followed by the letter “f”

(.*)f

A phrase comprising any number of any characters, followed by the letter “f”--same as “.*f”, but causes SeqMan NGen to match the phrase in parentheses without matching the “f” in a read name

(\D+)r(\d+)

One or more non-digit characters followed by “r” followed by one or more digits.

(\d{2,4})f(\.abi)

Two, three or four digits followed by “f” followed by “.abi