Descriptions of table data columns - User Guide to SeqMan Ultra

When you click the Change options in this view () tool from within the Contig Report and Features views you have the option of selecting which columns should appear in the view’s table. Only a subset of column types is available for each view and may further depend on the type of assembly that is open. Descriptions of each column appear below in alphabetical order.

Column Name	Description
‘N’ Cnt	The residue count, i.e., the number of a particular residue called in the aligned column. A dash (-) represents the reference base.
Ambiguous	A single base call could not be made.
Amino Acid Change	This column is only available if Show Codon Bases & Distance to feature is unchecked. The column shows the change(s) in the amino acid sequence, using the nomenclature established by the Human Genome Variation Society and The Sequence Ontology Project. This includes: Conservative in-frame insertions. Example: p.K2_M3insQSK denotes that the sequence GlnSerLys (QSK) was inserted between amino acids Lysine-2 (K) and Methionine-3 (M). Disruptive in-frame insertions. Example: p.C28delinsWV denotes a 3 bp insertion in the codon for Cysteine-28, generating codons for Tryptophan (W) and Valine (V). Conservative in-frame deletions. Example: p.(C28_M30del) a deletion of three amino acids, from Cysteine-28 to Methionine-30. Disruptive in-frame deletions. Example: p.(C28_M30delinsL) denotes a 9 bp deletion including 2 bp from the codon for Cysteine-28 and 1 bp from the codon for Methionine-30 resulting in replacement of C28 to M30 with leucine (L).
Called Base	The dominant variant in the aligned column. In the case of a heterozygote call, both bases at the position are shown, separated by a vertical bar. For multi-base insertions, the inserted string is shown. For multi-base deletions, the deleted bases are represented with dashes (-).
Coding Feature Distance	Shows whether variants are within or near a named feature, and the distance from that feature. For .assembly files and certain .sqd files (e.g., from de novo or special templated workflows), the following color scheme may be used: Gray + feature name + – Variant is within the named feature. Pink + arrow + feature name – Distance from the variant to the closest upstream coding feature. Orange + arrow + feature name – Distance from the variant to the closest downstream coding feature.
Codon	When a translated feature is present on the reference sequence at the position of a variant, a codon change is displayed. The codon and amino acid translation is shown for the reference sequence and compared to the codon and amino acid translation for the selected variant. The position number of the amino acid change is also displayed. If more than one translated feature is present at the variant position, SeqMan Ultra will use the first feature based on the current sorting in the Features view.
Cons Pos	Consensus position that includes gaps.
Contig ID	The name of the reference sequence or chromosome.
Contig Pos	Consensus position that includes gaps
COSMIC	The Catalogue of Somatic Mutations in Cancer (COSMIC) ID for positions with known variants. Double-clicking on the entry opens the corresponding page at COSMIC. For human assemblies only.
Count	(isoforms only) The total number of aligned reads assigned to an isoform feature, after adjustments for repeat distribution.
Coverage Depth	The average depth of coverage across the feature, calculated as the sum over all segments of the total number of bases covering each segment divided by the length of that segment. Note that the coverage for a gene may differ substantially from that of its corresponding CDS (e.g. exome or transcriptome sequencing). Values range from zero to the maximum coverage in the project and values are shown to the second decimal point to account for areas of very low, but non-zero coverage.
Coverage Strand Depth	(stranded RNA-seq only) The total number of sequenced bases within the feature and on the correct strand, divided by the length of the feature.
dbSNP ID	The dbSNP rs ID, if available, for positions with known variants. Double-clicking on the entry opens the corresponding page at dbSNP.
Deletion	The number of deleted bases in the aligned column.
Depth	The number of reads overlapping the aligned column. Since this calculation disregards bases below the quality threshold, the Alignment view may show a greater number of sequences than the Depth shown in tabular views. The default quality threshold for assembly in SeqMan NGen is 5. The threshold can be changed either pre-assembly, in SeqMan NGen, or post-assembly, in SeqMan Ultra.
DNA Change	Change(s) in the DNA sequence affecting either CDS features or splice sites are indicated using the nomenclature established by the Human Genome Variation Society. A “c.” prefix, followed by coordinates taken from the ORF, denotes a change in a CDS feature. For example: Substitutions. Example: c.76A>C denotes that at nucleotide 76 an A is changed to a C. Insertions within coding regions. Example: c.76_77insT denotes that a T is inserted between nucleotides 76 and 77. Deletions within coding regions. Example: c.76_78delACT denotes an ACT deletion from nucleotides 76 to 78. A “g.” prefix followed by genomic coordinates denotes a change in the intronic region of a splice site. Note: When a multibase variant affects both the intron and exon portions of a splice site, it is represented under two separate entries: one with g. coordinates and the other with c. coordinates.
Deletion
Depth
Feature Name	If a variant is located within an annotated feature in the reference sequence, the feature type and name are displayed. A single nucleotide change may sometimes be reported as affecting multiple overlapping features. These can include different overlapping genes on the same or opposite strands, as well as alternatively spliced messages from the same gene. In this case, SeqMan NGen produces multiple VCF Variant table entries at the same position, one for each reported feature. A bracketed number follows the Feature Name to indicate which isoform from the Feature view table was used (e.g., TP53 [2]). Note: If a non-gene feature (“mRNA”, “CDS”, etc.) exists in the template file, but has no corresponding “gene” feature, SeqMan NGen adds the “gene” feature automatically during assembly. The locations of any automatically added “gene” annotations are indicated by asterisks (*) in this column.
Feature Type	For variants within a gene feature, the feature type is shown in the following order of precedence: CDS, mRNA, Gene. If Show Codon Bases Distance to feature is selected, this column also contains a feature designation if the variant is within 150 bases of the nearest exon. Therefore, it is possible for a variant that is in a gene to also be listed as a CDS, mRNA, etc. When Show Codon Bases is checked, the Feature Type column will also show the distance to the nearby exon and an arrow indicating the direction of the feature.Feature types for different variant locations are shown below: gene – Within a gene feature, but not included in an mRNA or CDS feature for that gene. (Variants within the intron portion of a splice site are indicated as CDS features.) CDS – Within an exon or splice site. mRNA – In the 5’ and 3’ untranslated portions of an mRNA.
GERP	The Genomic Evolutionary Rate Profiling (GERP) score representing the calculated evolutionary constraint at that position. GERP data is automatically delivered when you use DNASTAR’s human template package prior to performing a templated assembly in SeqMan NGen. To limit the size of the data file required, only positions with scores of 1.0 or greater are displayed. GERP is a tool that provides a score for each position in the human genome that estimates whether that position is under purifying selection or not (Davydov et al. 2010). GERP uses alignments between the human genome and 33 other mammalian genomes to quantitate the position-specific constraint in terms of rejected substitutions, defined as the difference between the neutral rate of substitution and the observed rate, estimated by maximum likelihood. Substitutions in sites under selection are assumed to be more deleterious than those not under selection. Scores range from negative values to ~6. Positions with scores below or near zero are not under selection. Conversely, the more positive the score, the more constrained the position. GERP information can be useful in evaluating the impact of non-synonymous variants in coding regions and the impact of changes in or near promoter elements, among others.
Genotype	When the “Diploid” SNP detection method is used in a SeqMan NGen assembly, there are four possibilities: 1) homozygous variant (both alleles have the same base and it is different from the reference), 2) reference (both alleles have the same base and it is the same as the reference), 3) heterozygous reference (two different alleles are called, one with the same base as the reference, the other with a variant base), and 4) heterozygous not reference (two different alleles, neither of which match the reference base). It is quite rare for the reference case to occur in the table. This only happens in cases where there is sufficient evidence of the possibility of a variant to pass the filtering threshold, but where the evidence is still quite weak. These cases are usually eliminated by even modest filtering. When the Haploid SNP detection method is used, only variant and reference are possible. Note: In this column, if one or more of the adjacent variants is called as a heterozygote, the coalesced variant is also called a heterozygote. Therefore, for a coalesced variant to be called homozygous, all positions must be called homozygous.
Group	Specifies whether the variant was called from the NGS or the Sanger data. The NGS group is named using the Read technology selected in the SeqMan NGen wizard. If you are investigating a putative variant that was identified in the initial run of the NGS data, you can find that position here and open it in the Alignment view to check whether there is a Sanger-called variant at the same position.
Homopolymer	Indicates whether the variant occurs within a homopolymeric run, which is defined as two or more identical bases in a row. When using Pacific Biosciences (PacBio) or Ion Torrent data, SeqMan Pro and SeqMan Ultra may not list all homopolymeric indels. Note: When possible, insertions or deletions are placed at the 5’ end (top strand) of the run during alignment.
Impact	The impact of the variant or indel on the protein, displayed as: Synonymous – No amino acid changes. Non Synonymous – Amino acid substitution only. Nonsense – Amino acid to translational stop. Frameshift – An indel within a coding region and which is not a multiple of 3, thereby changing the reading frame. No Start – A change that disrupts the start codon. No Stop – A change that converts a stop codon to an amino acid, and thereby extends the reading frame. Inframe Insertion – An insertion within a coding region whose length is divisible by 3. The type is followed by the word Conservative if the insertion occurs between two codons, and Disruptive if it occurs with a codon. Inframe Deletion – A deletion within a coding region whose length is divisible by 3. The type is followed by the word Conservative if the insertion occurs between two codons, and Disruptive if it occurs within two codons. If sorting by the Impact column, the column is ordered by severity. For example, a Frameshift is more severe than a Nonsense change.
Location	The range of sequence associated with the feature, including gaps.
MID	Displays variants separately for each MID sample. If the same variant (same base change, same position) occurs in more than one sample, there will be an entry for each sample. Similarly, if the same position is affected, but the base change is different, there will be separate entries and columns will correspond to that sample only, and not all the reads at that position.
Name	The name of the feature. The /dnas_title qualifier is used for the feature name. If no /dnas_title is available, SeqMan Ultra will use the value of the first qualifier listed for the name.
P Not Ref	The probability that the called base at this position is not the reference base. For coalesced variants, this value is equal to the minimum value of all “child” values. The minimum allowed value is 30%.
PDB ID	Worldwide Protein Data Bank (PDB) ID number.
Q Call	The Phred-like quality score of the called genotype. It is a measure of the probability that the called genotype is correct.
Quality
Raw_count	(isoforms only) The number of reads that were initially and uniquely assigned to an isoform, fragment or peak. This value is not normalized and does not account for repeat reads.
Raw_repeat_count	(isoforms only) The total fraction of repeat reads that map to the peak. For example, if one read maps equally well to two peaks, each peak would have a Raw_repeat_count of 0.5. If multiple repeat reads map to a single peak, the proportions are summed to get the Raw_repeat_count. This column is only available if repeat handling was specified in SeqMan NGen.
Ref
Ref Base	The reference sequence base in this position. For multi-base deletions, the reference sequence of the string is shown, beginning with the base at the Ref Pos coordinate. If there is no reference sequence present, the Ref Base column displays the most frequently occurring non-ambiguous base at this position. If no such base exists, the consensus base at this position is shown.
Ref ID	Reference sequence or chromosome.
Ref Pos	Reference position that does not include gaps. Coordinates matching entries in the VCF Variant table are shown. For deletions, Ref Pos is the genomic coordinate of the first deleted base. For insertions, Ref Pos is the genomic coordinate of the base preceding the insertion.
Region Capture	Indicates whether the variant occurs within a region specified in the .bed or manifest file used. Values are Yes and No.
Repeat_distrib_count	(isoforms only) The proportional number of repeated reads assigned to this exon, gene or isoform.
Repeat_distrib_percent	(isoforms only) A rough estimate of the percentage of the total repeated reads which could have been assigned and that were assigned. In other words, the percentage of Repeat_distrib_count in Raw_repeat_count.
Replicate set total count	(RNA-seq with no normalization specified only) The total number of aligned reads averaged over the set of replicate samples. Only available if replicates and replicate sets were specified in SeqMan NGen.
Replicate set total RPK	(RNA-seq with RPK normalization specified only) The RPK value averaged over the set of replicate samples. Only available if replicates and replicate sets were specified in SeqMan NGen.
Replicate set total RPKM	(RNA-seq with RPKM normalization specified only)The RPKM value averaged over the set of replicate samples. Only available if replicates and replicate sets were specified in SeqMan NGen.
Replicate set total RPM	(RNA-seq with RPM normalization specified only)The RPM value averaged over the set of replicate samples. Only available if replicates and replicate sets were specified in SeqMan NGen.
Sequence	The name of the sequence that contains the feature.
Sequence Pos
Skew	When a maximum “strand bias” (also called “skew”) is set for a templated SeqMan NGen assembly (i.e., by setting the scripting parameter snp_maxStrandBias to ‘true’), SeqMan NGen calculates the strand bias for each variant. The results can be viewed in the Skew column of this table. The strand bias for a variant is a bias in the variant appearing on one strand instead of the other. It is measured relative to the strand bias in the assembly at the location of the variant. For example, in a column with 60 forward reads and 40 backward reads, 6 variant bases on the forward strands and 4 on the reverse would be unbiased and have a skew of zero. SeqMan NGen calculates strand bias using the following formula: Strand bias = (SNP%f – SNP%r )/(SNP%) … where: SNP%f= Strand-specific SNP percentage for the forward strand SNP%r= Strand-specific SNP percentage for the reverse strand SNP%= Overall SNP percentage Below are interpretations of several Skew values: 0 = unbiased 2 = all SNPs on one strand, where strands are equally abundant > 2 = SNPs are present in large numbers on a strand that is, itself, rare. The maximum theoretical strand bias is equal to the depth of coverage. In practice, however, numbers over four are seldom seen, as they require such low variant percentages that they are unlikely to be called as variants. When a variant occurs only on strands in one direction, no strand bias can be calculated. In this case, the variant is not filtered. This situation can be prevented by setting a Minimum strand coverage value in SeqMan NGen prior to assembly.
SNP	Shows the symbol for whether this is a confirmed (checkmark), putative (question mark), rejected (‘x’) or mixed (dot) variant.
SNP %	The percentage of the single most prevalent non-reference base in the aligned column. As a very general rule, significant variants tend to occur at 25% and higher.
Splice	Variant is in or near an exon splice site. Splice site variations are changes to the 5’ (“donor”) or 3’ (“acceptor”) consensus splice site sequences. The DNA sequence for the donor is 5’-AGGTRAGT-3’ and for the acceptor is 5’-YYYYYYYYCAGGT-3’. Note that the AG dinucleotide on the 5’ end of the donor and the GT dinucleotide on the 3’ end of the acceptor are within the exon. Therefore, changes at these positions can also cause changes in the amino acid sequence of the resulting proteins. Changes in the intron portion of the splice site are marked as “Splice” in the column while those in the exon portion of the site are labeled “Splice in CDS.” Only the position where the change occurs is considered, not the identity of the base.
Total_count	(RNA-seq with no normalization specified only) The sum of the counts for all of the isoform features within a gene.
Target Length	The target length of the fragment, equal to the end position minus the start position.
Total raw count	(genes only) The total number of reads which were initially, uniquely assigned to any isoform within the gene.
Total_repeat_distrib_count	(genes only) The total count of reads which were proportionally distributed to any isoform within the gene.
Total RPK	(RNA-seq with RPK normalization specified only) RPK (reads assigned per kilobase of target) = The signal values for each experiment divided by the total bases of target sequence divided by one thousand.
Total RPKM	(RNA-seq with RPKM normalization specified only)Reads assigned per kilobase of target per million mapped reads (RPKM) = The signal values for each experiment divided by the total bases of target sequence divided by one thousand; the resulting number is then divided by the total number of mapped reads divided by one million.
Total RPM	(RNA-seq with RPM normalization specified only)Reads assigned per million mapped reads (RPM) = The signal values for each experiment divided by the total number of mapped reads divided by one million.
Trace%
Transcript ID	Transcript ID number from ENSEMBL.
Type	Specifies the variation type as SNP, Del (deletion) or Ins (insertion). For .assembly files and certain .sqd files (e.g., from de novo or special templated workflows), pink typeface may be used to indicate non-synonymous variants.
User ID	Positions corresponding to a custom VCF Variant Table are labeled with the ID from that set.
Variant Count	The number of called variants in the feature. Note that a gene feature in an exome project can have a very low Coverage Depth value and still have one or many variants.
Visible	A checkmark in this column indicates that the feature is currently displayed in the Alignment view (for features on constituent sequences) or the Strategy view (for features on the consensus sequence). The absence of a checkmark indicates that this feature is hidden. Note: The Visible column does not apply to variant features, which are available only in .sqd projects. To control the display of variants, select Variant > Show/Hide Variants. For more information, see Work with Variants.

Customize tables in the views

Customize the layout of other window components

Need more help with this?
Contact DNASTAR