Viral-Host Integration Workflow

Viral-Host Integration is a special type of assembly used to locate putative viral insertion sites. It can also be used to predict the location of other inserted sequences, such as transposable elements.

 

To follow this workflow, select Viral-Host Integration from the Choose Assembly Workflow screen. When you make this selection, SeqMan NGen automatically sets up a templated assembly that is optimized for locating viral insertion sites.

 

      In the Input Host Files screen, input one or more host sequence files.

 

      In the Input Viral Genomes screen, add the viral genome(s) of interest.

 

      In the Input Sequence Files and Define Experiments or Individual Replicates screen, input your sequencing reads. These should consist of the virus-infected host DNA for which you wish to determine likely viral insertion sites.

 

Since chimeric reads (sequences consisting of both host and viral DNA) usually indicate viral insertion sites, SeqMan NGen looks for chimeric reads in a three-step process:

 

1)  The viral genome is used as the initial assembly template.

 

2)  The sub-set of reads that mapped to the viral genome is then re-assembled against the host template.

 

3)  The host template assembly results are output in BAM file format.

 

To explore possible viral insertion sites, launch SeqMan Pro and view the Coverage Reports for the individual contigs (Contig > Coverage Report).

 

During both templated assembly steps, SeqMan NGen “masks” (trims) whichever half of the chimeric read does not match the template for that step. Use the Coverage Report to navigate to positions with multiple reads, as evidenced in the depth column. The reads at these positions should be trimmed to the same base indicating the insertion site. You may “untrim” the reads to verify that they also contain viral sequence.