Convention 1A

Here, four digits specify the sequence names, followed by “f” for forward or “r” for reverse, followed by “*.abi.” Reads for which the four digits match comprise a pair:

 

Forward Name Examples

Corresponding Reverse Names

0001f.abi

0001r.abi

0124f.abi

0124r.abi

9999f.abi

9999r.abi

 

One expression used for this naming system is:

 

Forward Name

Reverse Name

(.*)f.*

(.*)r.*

 

This simply means that two reads will be considered a pair if the first part of the name preceding the “f” and the “r” matches. This will work if your naming system is rigorously adhered to. It also works whether the part of the name before the “f” or “r” is digits or characters. However, this expression is highly permissive—it considers the two sequence reads named fox.abi and rabbit.abi as a pair, because the string preceding the “f” or “r” in these sequence names matches perfectly—in both cases it is nothing. Instead, use:

 

Forward Name

Reverse Name

(\d{4})f\.abi

(\d{4})r\.abi

 

(Recall that “\.” is a literal period.) This system defines a forward/reverse pair as reads for which the first four digits of the name match, with one orientation ending in “f.abi” and the other in “r.abi.” Since there has to be something preceding the f or r, this expression does not consider fox.abi and rabbit.abi to be a pair.

 

However, you may need a more flexible system. For example, the above system requires that all paired sequence names in the project have exactly four digits preceding the “f” or “r”, followed by “*.abi.”