| Literature DB >> 35729348 |
Shinya Ishihara1,2, Masahiko Kumagai3, Aisaku Arakawa4, Masaaki Taniguchi5, Ngo Thi Kim Cuc6, Lan Doan Pham6, Satoshi Mikawa1, Kazuhiro Kikuchi1.
Abstract
The Vietnamese native pig (VnP)-a porcine breed with a small body-has proven suitable as a biomedical animal model. Here, we demonstrate that, compared to other breeds, VnPs have fewer copies of porcine endogenous retroviruses (PERVs), which pose a risk for xenotransplantation of pig organs to humans. More specifically, we sought to characterize non-reference PERVs (nrPERVs) that were previously unidentified in the reference genome. To this end, we used whole-genome sequencing data to identify nrPERV loci with long terminal repeat (LTR) sequences in VnPs. RetroSeq was used to estimate nrPERV loci based on the most current porcine reference genome (Sscrofa11.1). LTRs were detected using de novo sequencing read assembly near the loci containing the target site duplication sequences in the inferred regions. A total of 21 non-reference LTR loci were identified and separated into two subtypes based on phylogenetic analysis. Moreover, PERVs within the detected LTR loci were identified, the presence of which was confirmed using conventional PCR and Sanger sequencing. These novel loci represent previously unknown PERVs as they have not been identified in the porcine reference genome. Thus, our RetroSeq method accurately detects novel PERV loci, and can be applied for development of a useful biomedical model.Entities:
Mesh:
Year: 2022 PMID: 35729348 PMCID: PMC9213404 DOI: 10.1038/s41598-022-14654-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Conceptual diagram of sequencing reads mapping to the reference pig genome. White boxes denote an image of the pig reference genome sequence. Blue and red boxes connected with lines denote the 5′ and 3′ ends of a paired-end sequencing read. Most paired-end reads were identified as proper mapping while a small percentage of them were non-proper mapping. One end of the paired-end sequence mapped correctly while the other end was only partially identified at the expected locus on the reference genome. The unidentified sequence could be mapped anywhere else on the reference genome. For singletons, one end of the paired-end sequence mapped correctly while the other end did not map on the reference genome. For unmapped read pairs, neither read mapped to the reference genome.
Figure 2Pipeline for the detection of non-reference porcine endogenous retroviruses-long terminal repeats (PERV-LTRs) in whole-genome sequencing (WGS) read data. The presence of target site duplications (TSD) was confirmed at each locus detected by RetroSeq, extracted support reads from the TSD loci, performed local assembly, and analyzed the contigs for the presence of LTR-genome junctions from both sides. The upper panel (1) is a representative view of the integrative genomics viewer (IGV) used to determine potential PERV loci.
Detected target site duplication (TSD) position and sequences.
| Chromosome | Position | TSD sequence | VnP1 | VnP2 | VnP3 | Gene in the flanking region |
|---|---|---|---|---|---|---|
| SSC 1 | 38,667,241 | CTAT | LTR | LTR | LTR | |
| SSC 1 | 256,173,876 | CCCC | PERV-B | PERV-B | LTR | ENSSSCG00000046664 |
| SSC 1 | 259,647,577 | AATC | LTR | LTR | LTR | N/A |
| SSC 2 | 3,400,723 | AGAAC | PERV-B | LTR | LTR | N/A |
| SSC 4 | 77,324,504 | CCCC | LTR | LTR | LTR | N/A |
| SSC 4 | 78,524,842 | ATTAC | LTR | LTR | LTR | |
| SSC 4 | 121,221,912 | GGGG | LTR | LTR | non-LTR | N/A |
| SSC 6 | 73,460,691 | GTAT | LTR | LTR | LTR | |
| SSC 8 | 137,488,280 | CTAT | LTR | LTR | LTR | |
| SSC 9 | 61,533,579 | GGTG | LTR | LTR | non-LTR | N/A |
| SSC 9 | 76,895,449 | GAAC | PERV-B | PERV-B | PERV-B | N/A |
| SSC 9 | 135,717,008 | AAGAG | LTR | LTR | LTR | N/A |
| SSC 12 | 60,076,460 | CTGCT | PERV-B | PERV-B | PERV-B | LOC110256117 |
| SSC 13 | 57,502,585 | TAAA | LTR | LTR | LTR | N/A |
| SSC 13 | 60,210,737 | GTAG | LTR | LTR | non-LTR | LOC106505659c |
| SSC 13 | 73,434,304 | TTAT | LTR | LTR | non-LTR | N/A |
| SSC 14 | 4,896,607 | AGGGT | LTR | LTR | non-LTR | N/A |
| SSC 14 | 27,599,572 | ATGC | PERV-B | PERV-B | LTR | N/A |
| SSC X | 70,665,683 | ATAT | PERV-B | PERV-B | PERV-B | LOC102165634 |
| SSC X | 75,151,968 | CCAG | PERV-B | PERV-B | PERV-B | |
| SSC X | 119,479,008 | AATT | LTR | LTR | non-LTR | N/A |
| SSC 8 | 51,601,922 | ATGA | PERV-C | PERV-C | PERV-C | LOC106504658b |
| SSC 8 | 137,628,915 | ATGAC | non-LTR | non-LTR | LTR | |
| SSC 13 | 107,045,657 | ATTC | PERV-A | non-LTR | non-LTR | LOC100153543 |
| SSC 14 | 8,846,347 | GAGG | LTR | LTR | non-LTR | N/A |
| SSC 18 | 4,030,456 | ATGT | non-LTR | non-LTR | non-LTR | N/A |
Chromosome number and position are based on the Sscrofa11.1 reference genome. Gene symbols are as follows: NKAIN2, Na + /K + transporting ATPase interacting 2; ENSSSCG00000046664, lncRNA; SNTG1, syntrophin gamma 1; KAZN, kazrin, periplakin interacting protein; CFAP299, cilia and flagella associated protein 299; ANTXR2, anthrax toxin receptor 2; LOC110256117; mRNA-multidrug and toxin extrusion protein 1-like, transcript variant; LOC100153543, multiple epidermal growth factor-like domains protein 10-like (predicted); PCDH11X, protocadherin 11 X-linked. LOC102165634, LOC106504658, and LOC106505659 are uncharacterized genes. N/A, not applicable.
aTSD position is 2.5 kb downstream of the gene.
bTSD position is 25 kb downstream of the gene.
cTSD position is 16 kb downstream of the gene.
Figure 3Phylogenetic tree of non-reference long terminal repeats (LTRs). The tree with the highest log likelihood (− 2693.75) is shown. A discrete gamma distribution was used to model the differences in evolutionary rate among sites (five categories; + G, parameter = 0.9509). This analysis involved 21 LTR sequences. There were 796 positions in the final dataset and two main clusters (LTR-A and LTR-B) were obtained.
Figure 4Structure for detecting non-reference long terminal repeats (LTRs) in the U3 region. (a) Porcine endogenous retroviruses (PERV)-LTR structure. The PERV-LTRs were classified into types B and A according to the patterns of their repeat sequences at 18 bp and 21 bp. Type B LTRs were divided into the subtypes LTR B1, LTR B2, and LTR B3 based on the number of repeats in their sequences. Type A LTRs were divided into the subtypes LTR A1 and LTR A2. (b) Type B repeat sequences are shown in light and dark gray at the top of the figure. Type A repeat sequences are shown in dark gray and stripes at the bottom of the figure. Nucleotides are denoted in green (A), blue (C), purple (G), and red (T). From top to bottom, the labels at left show the LTR loci chr8_137488280, chr9_61533579, chr14_4896607, chr14_27599572, chr1_38667241, chr9_76895449, chrX_70665683, chr1_256173876, chr2_3400723, chr4_78524842, chrX_75151968, chr1_259647570, chr4_121221912, chr13_73434391, chr13_57502585, chrX_119479008, chr9_135717008, chr6_73460686, chr4_77324504, chr12_60076460, and chr13_60210737.