454 Data

Note: We recommend using SeqMan NGen to assemble all next-gen data, including 454 data.

 

SeqMan Pro can assemble data output created using 454 technology in the form of .sff files, or as a .fas file and its associated .qual file.

 

      For .fas files, SeqMan Pro automatically checks for .qual files with the same name in the same folder. If found, the quality values are read from the .qual file into the project and SeqMan Pro uses these values for trimming.

 

      For .sff files, SeqMan Pro uses the quality scores within the .sff file for trimming.

 

For optimal handling of quality scores, use .sff files.

 

Note: SeqMan Pro does not handle 454 paired end data in either .fas or .sff format.

 

SeqMan Pro will display the flowgram and corresponding quality scores for each sequence within an .sff file. For more information, see Viewing Flowgram Data.

 

For additional information, see Medium-Sized Data Sets.

 

 

If using 454 data:

 

      Enter the .sff file. The .sff format includes the base calls, qual scores and the flowgram data. In the Unassembled Sequences Window, the .sff file will be listed as a ‘Flow’ type.

 

      Set the “Minimum Sequence Length” Assembling parameter to a length reasonable for your data set. The default of 100 is often too long for 454 data. A value of 70 is typically a good place to start.

 

      Set the “Maximum Mismatch End Bases” Assembling parameter to 0. This represents the number of bases from an end where mismatches are not counted when calculating pairwise similarity.

 

      Set the ‘Match Spacing’ Assembling parameter to 15. The default of 150 is often too large for 454 data.

 

      Set the “Maximum Expected Coverage” parameter, found under Project > Parameters > Strategy Viewing to the maximum depth that you expect in your assembly. After assembly, areas exceeding this maximum value will be indicated in the Strategy View by thick, red areas in the Coverage Threshold graph.

 

      SeqMan Pro does not handle 454 paired end datasets.

 

      Reading long homopolymeric regions accurately is a limit of 454 technology.