MegAlign Pro supports the following multiple alignment methods: Clustal Omega, Clustal W, MAFFT, MUSCLE and Mauve. Click here for descriptions and a comparison of the different methods.
To learn how to perform, modify, and troubleshoot multiple alignments, see the following topics:
- Perform an initial multiple alignment
- Multiple alignment methods and options
- Modify a multiple alignment
- Unalign aligned sequences
Multiple alignment tutorials:
The following tutorials all use free data that can be downloaded from the DNASTAR website.
- Try it! – Perform a Clustal Omega alignment
- Try it! – Perform a MUSCLE alignment with multi-segment sequences
- Try it! – Perform a genomic alignment with Mauve
The following video is a quick introduction to performing multiple alignments in MegAlign Pro:
This application supports local, global, semi-global, and chromosome-based pairwise alignment methods. Pairwise alignments can only be performed when two sequences, and only two sequences, have been selected. A common workflow is to first perform a multiple alignment on an entire group of sequences. From the resulting Tree view, two closely related sequences can then be further analyzed by selecting them and performing a pairwise alignment.
To perform a pairwise sequence alignment:
- Decide which two sequences you want to align. The sequences can be any length (DNASTAR has successfully aligned protein sequences up to 35,000 bases in length), but both must belong to the same category: DNA/RNA or protein. If one sequence is significantly longer than the other, use drag & drop in the Sequences view or Overview to organize them such that the longer sequence is above the shorter sequence.
- Select the two sequences and choose Align > Pairwise or right-click on the selection and choose Align Pairwise.
- Choose the desired settings.
- Align – Use this drop-down menu to specify which sequences to align.
- Using – Use this drop-down menu to choose the desired pairwise alignment method: Local, Global, or Semi-Global.
- Substitution matrix – Use this drop-down menu to choose the substitution matrix. A substitution matrix describes the rate at which a nucleotide or amino acid changes to another nucleotide or amino acid over time. Different matrices are available for nucleotide vs. protein sequences, as shown in table below these instructions. Also see the “Notes” section below that for additional information.
- Gap open penalty – Specify the amount that should be deducted from the alignment score for each gap in the alignment. Gaps of different sizes carry the same penalty. Default is 10.
- Gap extension penalty – Specify the amount that should be deducted from the alignment score after first multiplying it by the length of gaps. Longer gaps have a greater penalty than shorter gaps. Default is 1.
- Require minimum word match – If you want to specify the length of the smallest perfect match of contiguous bases/residues to consider in building an alignment, check the box and enter a value. The default is for the box to be unchecked. If checked, the default value is 7.
- Align – Use this drop-down menu to specify which sequences to align.
- Press OK to begin the alignment.
During the alignment, MegAlign Pro displays a progress window. In most cases, this will appear and disappear too suddenly to notice it. In the cases of longer alignments, you can interrupt the alignment, if necessary, by clicking its Cancel button or view a console window showing the start time and progress of the alignment by clicking its Show Console button.
If an alignment finishes successfully, a Pairwise view opens. If an alignment fails, you will receive a message with recommendations on how to obtain a successful alignment (e.g., by modifying options or choosing a different alignment method).
- (optional) If the Console is not already open, and you wish to view alignment statistics and other information there, select View > Console.
Substitution matrix descriptions and options:
Available for Sequence Type | Matrix | Description | Secondary Option |
---|---|---|---|
Nucleotide | NUC44 | DNASTAR’s modified version of NCBI’s NUC.4.4 algorithm, the modification being that U is treated as a synonym of T. In NUC44, exact matches, and T:U matches score as 5, and mismatches between unambiguous bases [G A T C U] score as -4. Matches between bases and ambiguous symbols [S W R Y K M B V H D N] have intermediate scores. A base versus a 2-way ambiguous category [R Y W S K M] to which it belongs scores as +1, and a mismatch to a 2-way group to which it doesn’t belong scores as -4. Example: C is in [S R M] but not in [W Y K] . The 3-way groupings are [B V H D] where C is in all but D (which means not C). Therefore, C vs [B V H] scores as -1 while C vs [D] scores as -4. |
|
Protein | BLOSUM | (Henikoff & Henikoff, 1992). These matrices are ideal for carrying out similarity searches. | Available matrices range from 30-100, and are provided in increments of 5 and 62. Choose larger numbers for less divergent sequences. |
Protein | GONNET | Derived from PAM matrices (Dayhoff et al., 1978) but more sensitive, and based on a much larger data set. | (Unchangeable default of 250) |
Protein | IDENTITY | Scores two identical amino acids as 1, and anything else as -10,000. | N/A |
Protein | MATCH | Scores two identical amino acids as 1, and anything else as -1. | N/A |
Protein | PAM | (Dayhoff et al., 1978). Widely used since the late 1970s. | Available matrices range from 10-500, and are provided in increments of 10. Choose larger numbers for more divergent sequences. |
Protein | VTML | Derived from PAM matrices (Dayhoff et al., 1978) by Müller T et al. (2002), . | Available matrices range from 10-500, and are provided in increments of 10. |
Notes:
- BLOSUM, PAM, GONNET, IDENTITY, and MATCH are part of NCBI’s BLAST distribution. For more information, see NCBI’s matrix page.
- The PAM, GONNET and VTML numbers are based on the presumed millions of years of divergence.
- In BLOSUM, the matrix number is proportional to the presumed degree of similarity. Therefore, BLOSUM100 would be the preferred matrix for near-identical sequences.
- VTLM and GONNET are considered to be updated versions of PAM250.
- In BLOSUM, PAM, and GONNET, match/mismatch scores vary with the series number. Also exact matches vary with the particular amino acid. For example, BLOSUM30 scores W:W as 20 and S:S as 4. BLOSUM100 scores these as 17 and 9, respectively.
Pairwise alignment tutorials:
The following tutorials use data that can be downloaded from the DNASTAR website:
Need more help with this?
Contact DNASTAR