Here, four digits specify the sequence names, followed by “f” for forward or “r” for reverse, followed by “*.abi.” Reads for which the four digits match comprise a pair:
Forward Name Examples |
Corresponding Reverse Names |
0001f.abi |
0001r.abi |
0124f.abi |
0124r.abi |
9999f.abi |
9999r.abi |
One expression used for this naming system is:
Forward Name |
Reverse Name |
(.*)f.* |
(.*)r.* |
This simply means that two reads will be considered a pair if the first part of the name preceding the “f” and the “r” matches. This will work if your naming system is rigorously adhered to. It also works whether the part of the name before the “f” or “r” is digits or characters. However, this expression is highly permissive—it considers the two sequence reads named fox.abi and rabbit.abi as a pair, because the string preceding the “f” or “r” in these sequence names matches perfectly—in both cases it is nothing. Instead, use:
Forward Name |
Reverse Name |
(\d{4})f\.abi |
(\d{4})r\.abi |
(Recall that “\.” is a literal period.) This system defines a forward/reverse pair as reads for which the first four digits of the name match, with one orientation ending in “f.abi” and the other in “r.abi.” Since there has to be something preceding the f or r, this expression does not consider fox.abi and rabbit.abi to be a pair.
However, you may need a more flexible system. For example, the above system requires that all paired sequence names in the project have exactly four digits preceding the “f” or “r”, followed by “*.abi.”