Multiple sequence alignment methods and options - User Guide to MegAlign Pro

MegAlign Pro offers both gene-level and genome-level multiple sequence alignment algorithms. (Gene homology alignments are covered in a separate topic.)

Gene-level alignment of either protein or nucleotide sequences:

In general, the three gene-level aligners have higher accuracy than the genome-level (Mauve) aligner. They offer editable options for speed, capacity, algorithm, etc., and are the only methods available for “profile” alignments. The disadvantages to gene-level aligners is that sequences must be on same strand, and that large rearrangements (e.g., inversions, translocations) are not allowed.

Clustal Omega – Clustal Omega (Sievers F et al., 2011) was developed at University College Dublin, and is the most advanced version of Clustal. It can align hundreds of thousands of sequences in just a few hours. This method has few editable options, but has high, 64-bit capacity on both Windows and Macintosh.

Clustal W – Clustal W aligns sequences using the method of Thompson et al. (1994). Clustal W was designed to create more accurate alignments than Clustal V when alignments include highly diverged sequences. However, Clustal W does not always handle end-gaps ideally. Also, note that true Clustal W performance is only achieved when you choose the default “Slow-Accurate” option rather than the “Fast-Approximate” option.

MAFFT – MAFFT (Multiple Alignment Fast Fourier Transform; Katoh M & Kumar M, 2002) was developed by the Computational Biology Research Center and generously donated to the public domain. This method has many editable options, and provides a variety of algorithms for different scenarios and a choice of very slow to very fast speeds. It has 32-bit capacity on Windows, and 64-bit on Macintosh. See the CBRC’s MAFFT page for additional references. As of the Lasergene 17.3 release (August 2021), MegAlign Pro uses MAFFT version 7. This powerful aligner can typically align thousands of viral genomes, for example, in under two minutes. The ability to align to a specified reference sequence was added in the Lasergene 17.3.1 release (January 2022). The ability to align to a reference is ideal for the small percentage of MegAlign Pro users that need an extremely high capacity alignment algorithm.

MUSCLE – The MUSCLE alignment algorithm was developed by Dr. Robert Edgar (Edgar RC, 2004 & 2004), who very kindly donated it to the public domain. It is one of the faster aligners, and has numerous selectable options. MUSCLE features iterative cycles that guide tree refinement and realignment. Like MAFFT, it has 32-bit capacity on Windows, and 64-bit on Macintosh. Note that the MUSCLE multiple alignment algorithm assumes that the sequences to be aligned have a certain degree of relatedness. Groups of relatively divergent sequences, especially those with very large data sets, may require considerably more computer resources, particularly RAM. When working with these types of data, we recommend using Align > Align with Options and changing Maximum iterations to 1 or 2. This will both reduce the amount of memory needed, and ensure that the sequences are sufficiently related to allow for the alignment process to be successful.

Genome-level alignment of nucleotide sequences:

In comparison to the methods above, the genome-level aligner allows large rearrangements. However, it only allows nucleotide sequences, and fine-scale gapping may not be as good as in gene-level aligners.

Mauve was developed in the Genome Evolution Laboratory at the University of Wisconsin-Madison (Darling AE, Mau B, and Perna NT, 2010) and is licensed under a GNU General Public License. Mauve has high capacity and uses MUSCLE to create multiple alignments for each block that contains more than a single sequence. The Mauve algorithm is currently the only MegAlign Pro alignment method that is:

suitable for aligning very long sequences up to genome-length.
capable of producing an alignment when one or more of the sequences are rearranged relative to one another.
capable of producing a multi-block alignment (see Overview).
only available for nucleotide sequences.

There are two minor issues involving Mauve that you should be aware of. These differences originate with the Mauve algorithm and not with the implementation in MegAlign Pro.

An alignment can stall if sequences are ordered one way, but not another. If this issue occurs, you may see up to three error messages after beginning a Mauve alignment: “progressiveMauve-[OS].exe has stopped working,” “Do you want to send more information about the problem?,” and “Unexpected error while running the alignment.” The last message provides a Details button which reveals the text: “Mauve quit unexpectedly. Reported error: Unrecognized file format.” In addition, a progress bar will open, but the alignment will not finish. To circumvent this issue, Cancel the alignment, drag the sequences into a different order in the Sequences View or Overview, and then try the alignment again.

Sequences aligned with Mauve may give slightly different results on Windows and Macintosh.

——————————————————————

When performing a multiple alignment, take the following tips into consideration.

The order in which sequences appear in the Overview and Sequences view may affect the results of the multiple sequence alignment. If you are not satisfied with an alignment, try reordering sequences and running the alignment again.

We caution against attempting a multiple alignment in MegAlign Pro using high-throughput sequencing reads. Analysis of these reads should instead be handled through DNASTAR’s SeqMan NGen and ArrayStar applications.

Perform an initial multiple sequence alignment

Clustal Omega alignment options

Need more help with this?
Contact DNASTAR