| Literature DB >> 35428171 |
Emanuele Marchi1, Mathew Jones2, Paul Klenerman2, John Frater2, Gkikas Magiorkinis3, Robert Belshaw4.
Abstract
BACKGROUND: Retroviruses replicate by integrating a DNA copy into a host chromosome. Detecting novel retroviral integrations (ones not in the reference genome sequence of the host) from genomic NGS data is bioinformatically challenging and frequently produces many false positives. One common method of confirmation is visual inspection of an alignment of the chimaeric (split) reads that span a putative novel retroviral integration site. We perceived the need for a program that would facilitate this by producing a multiple alignment containing both the viral and host regions that flank an integration.Entities:
Keywords: Detection; Insertion; Integration; NGS; Provirus; Retrovirus
Mesh:
Year: 2022 PMID: 35428171 PMCID: PMC9013057 DOI: 10.1186/s12859-022-04621-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Sample output from BreakAlign showing an illustrative HERV-K integration. The top line is the section of the host genome spanning the integration site. Numbered sequences below show individual NGS reads aligned to the host genome with viral DNA in lower case, red and underlined (viral sequences on the left are the 3′ end of the 3′ LTR; viral sequences on the right are the 5′ end of the 5′ LTR). Reads containing insertion/deletions (indels) are indicated, with insertions and substitutions shown in lower case and deletions shown as hyphens. The intervening 'TGGT' motif is the TSD—see Marchi et al. [9] for a detailed explanation of how the TSD is formed and the adjacent HERV-K LTR sequences in both forward and reverse orientation integrations. Note, in this integration there is a known A to G substitution in the upstream TSD [7, 8] which leads to the characteristic six nucleotide HERV-K TSD—in this case TGGTAA—appearing to be shortened to only four nucleotides. Image shown is the output HTML file, and a text version without colours or underlining is also output
Fig. 2Flow chart illustrating the basic steps within BreakAlign. Parts of chimaeric reads that align to the virus are shown in red. Generating the local blastn database can be time-consuming (see text). Other stages in Breakalign will typically take a minute or less