Substitution matrices - User Guide to MegAlign Pro

A substitution matrix describes the rate at which a nucleotide or amino acid changes to another nucleotide or amino acid over time. When performing a pairwise alignment, you can specify the desired substitution matrix in the (Pairwise) Alignment Options dialog.

Available matrices for nucleotide sequences:

Matrix	Description
NUC44	DNASTAR’s modified version of NCBI’s NUC.4.4 algorithm, the modification being that U is treated as a synonym of T. In NUC44, exact matches, and T:U matches score as 5, and mismatches between unambiguous bases [G A T C U] score as -4. Matches between bases and ambiguous symbols [S W R Y K M B V H D N] have intermediate scores. A base versus a 2-way ambiguous category [R Y W S K M] to which it belongs scores as +1, and a mismatch to a 2-way group to which it doesn’t belong scores as -4. Example: C is in [S R M] but not in [W Y K] . The 3-way groupings are [B V H D] where C is in all but D (which means not C). Therefore, C vs [B V H] scores as -1 while C vs [D] scores as -4.

Matrix

Description

NUC44

DNASTAR’s modified version of NCBI’s NUC.4.4 algorithm, the modification being that U is treated as a synonym of T. In NUC44, exact matches, and T:U matches score as 5, and mismatches between unambiguous bases [G A T C U] score as -4. Matches between bases and ambiguous symbols [S W R Y K M B V H D N] have intermediate scores. A base versus a 2-way ambiguous category [R Y W S K M] to which it belongs scores as +1, and a mismatch to a 2-way group to which it doesn’t belong scores as -4.

Example: C is in [S R M] but not in [W Y K] . The 3-way groupings are [B V H D] where C is in all but D (which means not C). Therefore, C vs [B V H] scores as -1 while C vs [D] scores as -4.

Available matrices for protein sequences:

Matrix	Description	Secondary option
BLOSUM	(Henikoff & Henikoff, 1992). These matrices are ideal for carrying out similarity searches.	Available matrices range from 30-100, and are provided in increments of 5 and 62. Choose larger numbers for less divergent sequences.
GONNET	Derived from PAM matrices (Dayhoff et al., 1978) but more sensitive, and based on a much larger data set.	(Unchangeable default of 250)
IDENTITY	Scores two identical amino acids as 1, and anything else as -10,000.	N/A
MATCH	Scores two identical amino acids as 1, and anything else as -1.	N/A
PAM	(Dayhoff et al., 1978). Widely used since the late 1970s.	Available matrices range from 10-500, and are provided in increments of 10. Choose larger numbers for more divergent sequences.
VTML	Derived from PAM matrices (Dayhoff et al., 1978) by Müller T et al. (2002), .	Available matrices range from 10-500, and are provided in increments of 10.

BLOSUM, PAM, GONNET, IDENTITY, and MATCH are part of NCBI’s BLAST distribution. For more information, see NCBI’s matrix page.

The PAM, GONNET and VTML numbers are based on the presumed millions of years of divergence.

In BLOSUM, the matrix number is proportional to the presumed degree of similarity. Therefore, BLOSUM100 would be the preferred matrix for near-identical sequences.

VTLM and GONNET are considered to be updated versions of PAM250.

In BLOSUM, PAM, and GONNET, match/mismatch scores vary with the series number. Also exact matches vary with the particular amino acid. For example, BLOSUM30 scores W:W as 20 and S:S as 4. BLOSUM100 scores these as 17 and 9, respectively.

Comparison of pairwise alignment methods

Try It! – Follow a multiple alignment with Global pairwise alignments

Need more help with this?
Contact DNASTAR