Which is better, multiple or pairwise alignment?
This question is difficult to answer because it very much depends on how the alignment is going to be used.
Mechanistically, the best sequence alignment is the one that produces the fewest number of mismatches. That metric can be misleading, especially if minimizing the score entails extreme amounts of gapping. Consider an example where the goal is to identify some particular conserved domains or a large insertion. Here the placement of gaps outside of the regions of interest may well be of limited concern. Note that with MegAlign Pro, you can select the interesting regions identified by a multiple alignment and copy them as subsequences to a new document for further analysis.
Now consider a situation where a multiple sequence alignment is used to represent the actual relatedness of a group of sequences. Here the alignment is essentially a model, typically of an evolutionary process. In this case the “best” alignment is the one that is most plausible in the light some biological theory or model. One way that this visualized, of course, is to use the alignment to make an evolutionary tree.
When should I use a pairwise alignment?
The answer to this question may seem obvious: use pairwise alignment when you are only interested in two sequences. Also, sometimes pairwise alignment is simply more suitable than multiple alignment. Additionally, there are situations where a multiple sequence alignment (MSA) might help identify pairs for sequences or sub-sequences that are worth a more detailed, pairwise comparison.
Beyond workflow considerations, there are some fundamental differences between the two categories of alignment that might make pairwise alignment a better option for some sequence comparisons. Due to the nature of progressive multiple aligners (including Clustal, MUSCLE and MAFFT), the final sequence alignment can contain inappropriately placed gaps, which adversely affect the interpretation of the results. To understand this, consider how progressive multiple alignments work. The process invariably begins with a single pairwise alignment, adding gaps as necessary in order to minimize the number of mismatches. As the aligner proceeds, additional gaps are added as single sequences and groups of sequences aligned during an earlier stage of the process are included in the growing multiple sequence alignment. During this phase, gaps may be added but are never removed.
This “once a gap, always a gap” approach is a potential drawback that is shared by all progressive multiple alignment algorithms. The heart of the problem is that gap placement (and therefore the alignment) might be affected by the order in which sequences are aligned to each other because sequences added later in the process might be incorrectly aligned. All of the multiple alignment engines used by MegAlign Pro use a “guide tree” based on pairwise similarities of sequences to determine the order in which to align sequences. The first pair chosen consists of the two that are least distant on the guide tree. If the nearest neighbor to this pair is more distant than some other pair are to each other, that pair gets aligned to each other. If not, the neighbor is aligned with the first pair and gaps are added as necessary. In later rounds there may be no singleton sequences left, just clusters of two or more sequences that got aligned. Imagine a case where one of a group of early aligned sequences should have been added later, or where a close relative was added too late. It’s hard to know when this has happened unless you have some a priori information, such as knowledge of the evolutionary relationship of your group of sequences.
The bottom line is that when you examine just a pair from a multiple sequence alignment you may not see the same results as a pairwise alignment of just the two. So the direct approach under these circumstances might give a better picture of the relatedness of the pair.
Need more help with this?
Contact DNASTAR