Convention 1C

Imagine a variant on convention 1B where you obtained some data files from a collaborator mid-way through your project. All your collaborator’s files have the “.scf” extension, but otherwise follow your naming convention. You are concerned that some of your collaborator’s sequence names may coincidentally be the same as some of yours, which end in “.abi,” even though they are not the same sequences. In this case, the following expressions could be used:

 

Forward Name

Reverse Name

(\d+)f(\..*)

(\d+)r(\..*)

 

In this case, the extension can be a period followed by anything (or nothing). Now that parts of the expression in front of the “f” or “r” and following the “f” or “r” are in parentheses, the system requires that both the first and second phrases in the names must match in order for two reads to be considered a pair. This system “knows” for example that the sequences named 123f.abi and 123r.scf do not comprise a pair of reads.