| Literature DB >> 29158564 |
Matthew Hobbs1, Andrew King1, Ryan Salinas2, Zhiliang Chen2, Kyriakos Tsangaras3,4, Alex D Greenwood3,5, Rebecca N Johnson1, Katherine Belov6, Marc R Wilkins2,7, Peter Timms8.
Abstract
The koala retrovirus (KoRV) is implicated in several diseases affecting the koala (Phascolarctos cinereus). KoRV provirus can be present in the genome of koalas as an endogenous retrovirus (present in all cells via germline integration) or as exogenous retrovirus responsible for somatic integrations of proviral KoRV (present in a limited number of cells). This ongoing invasion of the koala germline by KoRV provides a powerful opportunity to assess the viral strategies used by KoRV in an individual. Analysis of a high-quality genome sequence of a single koala revealed 133 KoRV integration sites. Most integrations contain full-length, endogenous provirus; KoRV-A subtype. The second most frequent integrations contain an endogenous recombinant element (recKoRV) in which most of the KoRV protein-coding region has been replaced with an ancient, endogenous retroelement. A third set of integrations, with very low sequence coverage, may represent somatic cell integrations of KoRV-A, KoRV-B and two recently designated additional subgroups, KoRV-D and KoRV-E. KoRV-D and KoRV-E are missing several genes required for viral processing, suggesting they have been transmitted as defective viruses. Our results represent the first comprehensive analyses of KoRV integration and variation in a single animal and provide further insights into the process of retroviral-host species interactions.Entities:
Mesh:
Year: 2017 PMID: 29158564 PMCID: PMC5696478 DOI: 10.1038/s41598-017-16171-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
KoRV integration sites found in the assembled koala genome or in unassembled PacBio long reads.
| type | subgroup | completeness | in koala genome contigs | only in pacbio reads | total |
|---|---|---|---|---|---|
| KoRV | KoRV-A | full | 48 | 19 | 67 |
| indel(s) | 6 | 3 | 9 | ||
| KoRV-B | full | 0 | 8 | 8 | |
| KoRV-D | internal deletions | 0 | 5 | 5 | |
| KoRV-E | internal deletions | 0 | 5 | 5 | |
| cannot be determined |
| 5 | 1 | 6 | |
| recKoRV | recKoRV1 | full | 11 | 11 | 22 |
| indel(s) | 1 | 3 | 4 | ||
| recKoRV2 | 1 | 0 | 1 | ||
| recKoRV3 | 2 | 0 | 2 | ||
| solo LTR | 2 | 2 | 4 | ||
| total | 76 | 57 | 133 |
Figure 1Sequence conservation at sites of KoRV and recKoRV integration into the koala genome. Conservation is graphically represented as a sequence logo with the height of each stack of letters corresponding to conservation at a position, and the height of each letter within a stack the frequency of that letter at the position (measured as information content; see ref.[30]). The logo was derived from a sequence alignment of 51 sites for which a 4 bp target sequence repeat was clearly identified (Supplementary Figure 1). The numbering is with respect to the centre of the 4 bp repeat (boxed).
Figure 2KoRV virus identified in the genome of the koala Bilbo, their structure, and relationship to other KoRV types. (A) KoRV-A and KoRV-B were both found in the genome as primarily full-length provirus. In contrast, all KoRV-D and KoRV-E provirus shared significant deletions of the gag and pol genes that would prevent processing and assembly of the virus. (B) Protein alignment of the envelope gene (env) of Gibbon Ape Leukemia Virus (GALV) and KoRV Variable Region A (VRA), showing 3 distinct and unrelated receptor binding domains that characterize the various KoRV subtypes. GALV and KoRV-A Receptor Binding Domains (RBDs) code for agonist of the Pit-1 cell surface receptor. The KoRV-B RBD codes for the agonist of the THTR1 receptor. The RBD for the group containing KoRV-C, D, E, F, G, H and I is characterized by a related but variable region with repeat motifs, deletions and substitutions. No cell surface receptor has been identified for any of these KoRV subtypes. (C) Analysis of the protein translation of the KoRV env VRA region from the genome of koala “Bilbo” in the context of previously published KoRV types. Galv, KoRV-A and KoRV-B form supported clades. GALV is most closely related to KoRV-A (they both use the PIT-1 receptor). Unfortunately, phylogenetic analysis of data containing unrelated receptor binding domains is unsuitable for resolving the evolutionary relationship of the published KoRV types. Analysis was performed in Genious Pro 5.4 using Treebuilder (Jukes Cantor model, NJ tree, 1000 Bootstraps, no defined outgroup). (D) Phylogenetic analysis of protein translation of the KoRV env VRA region with the receptor binding domain removed. This was done to assess the evolutionary relationship of the KoRV provirus types. This analysis indicates that KoRV-A, C, D, E, F, G, H, and I are closely related and that KoRV-B is a more recently evolved virus that is currently undergoing expansion. The evolutionary history was inferred by using the Maximum Likelihood method based on the Kimura 2-parameter model [1]. The tree with the highest log likelihood (−1731.9672) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. (Only branches with support > 60% are shown) Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.5017)). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 31 nucleotide sequences. Codon positions included were 1st + 2nd + 3rd + Noncoding. There were a total of 377 positions in the final dataset, analyses were conducted in MEGA7 [2].
Figure 3Depth of coverage of KoRV integration sites by PacBio reads. Sites with low coverage (1–2 reads only) are putative somatic insertions and those with 20× to 50× coverage are consistent with haploid coverage of germline insertions. The two sites with highest read coverage appear to be homozygous as they are the only sites whose pre-integration allelic sequence is not present in the Bilbo genome sequence assembly. KoRV type is shown in colour.
Figure 4Structure of KoRV and recKoRV sequences. The gag, pol and env genes are shown as well as the LTRs that flank these genes. The non-KoRV component of recKoRV is not shown. (A) Forms found within the genome assembly, including both the primary and alternate contigs. (B) Forms found only within unassembled PacBio long sequence reads.
Figure 5Recombination between KoRV and PhER can generate RecKoRV. (A) Parental structures. Dotted lines indicate putative break. PhER has no protein-coding capacity but has a region of low similarity to part of the KoRV env gene (env*) indicating a partially degraded gene (see Supplementary Figure 3). (B) Recombinant structures, which differ only in the composition of their terminal repeats.
Figure 6(A) comparison of the U3 LTR enhancer regions identified in koala “Bilbo” compared with published sequences. All KoRV-B U3 enhancer regions in koala “Bilbo” were novel LTR variants. The position and number of direct repeats DR-1 and DR-2 are indicated along with regulatory signals CAAT and TATAA box. DR-3 is a previously undescribed repeat region. The sequence and position of the direct repeats with respect to a KoRV-A reference sequence (AF151794). Minor sequence variation within these repeats was observed.