A substitution matrix describes the rate at which a nucleotide or amino acid changes to another nucleotide or amino acid over time. When performing a pairwise alignment, you can specify the desired substitution matrix in the (Pairwise) Alignment Options dialog.
Available matrices for nucleotide sequences:
Matrix | Description |
---|---|
NUC44 | DNASTAR’s modified version of NCBI’s NUC.4.4 algorithm, the modification being that U is treated as a synonym of T. In NUC44, exact matches, and T:U matches score as 5, and mismatches between unambiguous bases [G A T C U] score as -4. Matches between bases and ambiguous symbols [S W R Y K M B V H D N] have intermediate scores. A base versus a 2-way ambiguous category [R Y W S K M] to which it belongs scores as +1, and a mismatch to a 2-way group to which it doesn’t belong scores as -4. Example: C is in [S R M] but not in [W Y K] . The 3-way groupings are [B V H D] where C is in all but D (which means not C). Therefore, C vs [B V H] scores as -1 while C vs [D] scores as -4. |
Available matrices for protein sequences:
Matrix | Description | Secondary option |
---|---|---|
BLOSUM | (Henikoff & Henikoff, 1992). These matrices are ideal for carrying out similarity searches. | Available matrices range from 30-100, and are provided in increments of 5 and 62. Choose larger numbers for less divergent sequences. |
GONNET | Derived from PAM matrices (Dayhoff et al., 1978) but more sensitive, and based on a much larger data set. | (Unchangeable default of 250) |
IDENTITY | Scores two identical amino acids as 1, and anything else as -10,000. | N/A |
MATCH | Scores two identical amino acids as 1, and anything else as -1. | N/A |
PAM | (Dayhoff et al., 1978). Widely used since the late 1970s. | Available matrices range from 10-500, and are provided in increments of 10. Choose larger numbers for more divergent sequences. |
VTML | Derived from PAM matrices (Dayhoff et al., 1978) by Müller T et al. (2002), . | Available matrices range from 10-500, and are provided in increments of 10. |
- BLOSUM, PAM, GONNET, IDENTITY, and MATCH are part of NCBI’s BLAST distribution. For more information, see NCBI’s matrix page.
- The PAM, GONNET and VTML numbers are based on the presumed millions of years of divergence.
- In BLOSUM, the matrix number is proportional to the presumed degree of similarity. Therefore, BLOSUM100 would be the preferred matrix for near-identical sequences.
- VTLM and GONNET are considered to be updated versions of PAM250.
- In BLOSUM, PAM, and GONNET, match/mismatch scores vary with the series number. Also exact matches vary with the particular amino acid. For example, BLOSUM30 scores W:W as 20 and S:S as 4. BLOSUM100 scores these as 17 and 9, respectively.
Need more help with this?
Contact DNASTAR