Use Paired End Data

Note: The following information does not apply to the normal templated workflow.

 

Paired end reads are typically in two files with the forward reads in one file and the reverse reads in the other. SeqMan NGen assumes the pair will be from opposite ends of the same DNA fragment, and sequenced from the end of the fragment inwards.

 

To add paired reads, go to the Input Sequence Files and Define Experiments or Individual Replicates dialog, check the Paired-end data box, and add your read files to the lower pane.

 

To enable SeqMan NGen to identify pairs, a sequence naming convention must systematically distinguish between different pair reads while specifying which pair reads are associated. Forward and reverse sequences must have identical names except for the unique portion that determines the direction of the clone. Expressions for these naming conventions are created using a subset of regular expressions, which utilize elements of the Grep language. The following rules apply:

 

      Two parallel files must use standard naming convention (e.g. s_7_1_sequence and s_7_2_sequence).

 

      “Forward” and “reverse” reads must be in exactly the same order in the two files.

 

      Both forward and reverse reads must be present for every pair, including pairs where one of the reads failed or is of very low quality.

 

As an example, forward and reverse Sanger pair files are named as follows: 01f.abi and 01r.abi, where “01” distinguishes that they are members of the same pair. The “f” and “r” at the end of each sequence name distinguishes the orientation.

 

In Grep, the naming convention would be written as follows:

 

      Forward convention: (.*)f\..*$

 

      Reverse convention: (.*)r\..*$

 

Note: For more information on Grep name patterns, see Example Regular Expressions.

 

SeqMan NGen considers paired-end reads whose fore and reverse reads start at the same position in two reads to be clonal. In these cases, the reads with highest scores are retained, while the other reads are ignored.

 

See the links below for read technology-specific information about using paired-end data.