Optimizing EST Assembly Parameters

Note: This topic is not applicable to BAM-based projects.

 

Optimizing EST assembly parameters is very much an empirical process. Adjust the Minimum Match Percentage in order to change the way ESTs assemble for different homologs or alleles. Lowering the value may increase the chance that distinct homologs will assemble into the same contig, at the expense of risking assembly of unrelated sequences. Increasing the value too much, however, may prevent alleles of the same gene from assembling into the same contig.

 

If you use the default Minimum Match Percentage value for EST assemblies, you will frequently observe distinct alleles assembling into the same contig, as in the example below, from a Zebrafish EST assembly. This segment of otherwise very similar reads in the same contig shows compelling evidence for allelic variation, because there are multiple examples of each of two consistent sequence differences. If you wish, you may be able to separate the reads for two or more alleles or homologs into different contigs using the Suggest Conflict Splits command.

 

Note: SeqMan Pro cannot unequivocally distinguish between alleles and homologs. The verification of alleles and homologs must be determined using additional scientific analysis.

 

 

Suggested allele: 1,2,1,1,2,1,2, from top to bottom

 

If you are assembling ESTs onto a genomic backbone, you may wish to edit the Assembling parameters to accommodate large gaps corresponding to introns, as follows:

 

      Decrease the Minimum Match Percentage.

 

      Decrease the Gap Length Penalty.