Illumina Pairs

Paired end reads are typically in two files, or a small number of files if they are from multiple runs or lanes. These pairs are specified by a naming convention used in the .fasta file comment line.

 

For SNG assemblies with paired end reads, SeqMan NGen automatically adds the following information to the script:

 

setPairSpecifier pairs:

{ {

forward: “(.*)/1”

reverse: “(.*)/2”

min: 0

max: 750

key: Illumina

} }

 

If reads do not match one of the pair specifiers, or if the forward and reverse specifiers are represented by empty strings (““), SNG will attempt to match using the whole name of the sequence. If exactly two reads have the same name, they will be considered a match.

 

For XNG assemblies, SeqMan NGen adds the following information:

 

{

is Pair: true

file: “****”

SeqTech: “Illumina”

minDist: 0

maxDist: 750

}

 

For XNG assemblies with paired-end reads, SeqMan NGen recognizes the pairs by their file names. The following examples demonstrate some of the filename formats that SeqMan NGen supports for XNG pairs. Large-bold text in the examples is used to highlight the region of each filename that specifies the forward and reverse reads:

 

“R_2011_11_21_11_06_08_user_C29-100_PE_DH10B_11_Auto_C29-100_PE_DH10B_11_4120_reverse_pe2.fastq”,

“R_2011_11_21_11_06_08_user_C29-100_PE_DH10B_11_Auto_C29-100_PE_DH10B_11_4120_forward_pe1.fastq”,

 

“Strain1234_L7_R1_ATCACG_Index1.fastq”,

“Strain1234_L7_R2_ATCACG_Index1.fastq”,

 

“K12-1-B_TGACCA_L006_R1.fastq”,

“K12-1-B_TGACCA_L006_R2.fastq”,

 

“GBBC920_GGCTAC_L008_R1.filt.50bp.fastq”,

“GBBC920_GGCTAC_L008_R2.filt.50bp.fastq”

 

“tiny_1.txt”,

“tiny_2.txt”,

 

“tiny_1_sequence.txt”,

“tiny_2_sequence.txt”,

 

tiny1._qseq”,

tiny2._qseq”,

 

“s_1_1_sequence.txt”

“s_1_2_sequence.txt”

 

“C29-129_forward_pe1.fastq”

“C29-129_forward_pe2.fastq”

 

 

The Grep used to match the pairFileNames is shown below:

 

“(?'name'.*?)_R1_(?'ext'.*)\\.fastq”,

“(?'name'.*?)_R2_(?'ext'.*)\\.fastq”,

 

“(?'name'.*?)_R1\\.(?'ext'.*)\\.fastq”,

“(?'name'.*?)_R2\\.(?'ext'.*)\\.fastq”,

 

“(?'name'.*?)_forward_pe1(?'ext_p'\\.fastq)”,

“(?'name'.*?)_reverse_pe2(?'ext_p'\\.fastq)”,

 

“(?'name'.*?)_{0,1}1\\.fastq”,

“(?'name'.*?)_{0,1}2\\.fastq”,

 

“(?'name'.*?)1\\.fastq”,

“(?'name'.*?)2\\.fastq”,

 

“(?'name'.*?)1_sequence\\.txt”,

“(?'name'.*?)2_sequence\\.txt”,

 

“(?'name'.*?)1\\.txt”,

“(?'name'.*?)2\\.txt”,

 

“(?'name'.*?)1\\._qseq”,

“(?'name'.*?)2\\._qseq”,

 

“(?'name'.*?)1\\.fq”,

“(?'name'.*?)2\\.fq”,

 

The following script command can be used to add support for a new filename format. The command must be executed before assembly. The pattern will be used for all subsequent assembleTemplate commands for that run of XNG

 

pairFilePattern

forward: “(?'name'.*?)_R1_(?'ext'.*)\.fastq”

reverse: “(?'name'.*?)_R2_(?'ext'.*)\.fastq”