Literature DB >> 31727090

Codon usage patterns of LT-Ag genes in polyomaviruses from different host species.

Myeongji Cho1, Hayeon Kim2, Hyeon S Son3,4.   

Abstract

BACKGROUND: Polyomaviruses (PyVs) have a wide range of hosts, from humans to fish, and their effects on hosts vary. The differences in the infection characteristics of PyV with respect to the host are assumed to be influenced by the biochemical function of the LT-Ag protein, which is related to the cytopathic effect and tumorigenesis mechanism via interaction with the host protein.
METHODS: We carried out a comparative analysis of codon usage patterns of large T-antigens (LT-Ags) of PyVs isolated from various host species and their functional domains and sequence motifs. Parity rule 2 (PR2) and neutrality analysis were applied to evaluate the effects of mutation and selection pressure on codon usage bias. To investigate evolutionary relationships among PyVs, we carried out a phylogenetic analysis, and a correspondence analysis of relative synonymous codon usage (RSCU) values was performed.
RESULTS: Nucleotide composition analysis using LT-Ag gene sequences showed that the GC and GC3 values of avian PyVs were higher than those of mammalian PyVs. The effective number of codon (ENC) analysis showed host-specific ENC distribution characteristics in both the LT-Ag gene and the coding sequences of its domain regions. In the avian and fish PyVs, the codon diversity was significant, whereas the mammalian PyVs tended to exhibit conservative and host-specific evolution of codon usage bias. The results of our PR2 and neutrality analysis revealed mutation bias or highly variable GC contents by showing a narrow GC12 distribution and wide GC3 distribution in all sequences. Furthermore, the calculated RSCU values revealed differences in the codon usage preference of the LT-AG gene according to the host group. A similar tendency was observed in the two functional domains used in the analysis.
CONCLUSIONS: Our study showed that specific domains or sequence motifs of various PyV LT-Ags have evolved so that each virus protein interacts with host cell targets. They have also adapted to thrive in specific host species and cell types. Functional domains of LT-Ag, which are known to interact with host proteins involved in cell proliferation and gene expression regulation, may provide important information, as they are significantly related to the host specificity of PyVs.

Entities:  

Keywords:  Codon usage pattern; Functional domains; LT-Ag; Polyomavirus; RSCU; Sequence motif

Mesh:

Substances:

Year:  2019        PMID: 31727090      PMCID: PMC6854729          DOI: 10.1186/s12985-019-1245-2

Source DB:  PubMed          Journal:  Virol J        ISSN: 1743-422X            Impact factor:   4.099


Background

Polyomaviruses (PyVs) are non-enveloped double-stranded DNA viruses; a total of 86 PyV species have been classified by the International Committee on Taxonomy of Viruses. The classified member species belong to four genera, i.e., Alphapolyomavirus (36), Betapolyomavirus (32), Deltapolyomavirus (4), and Gammapolyomavirus (9), within the family Polyomaviridae (unassigned), while a genus of five species has not yet been classified. Their hosts are diverse, including humans, non-human primates (chimpanzees, gorillas, orangutans, and monkeys), non-primate mammals (bats, mice, racoon, badgers, cows, horses, elephants, alpacas, sea lions, seals, and dolphins), avian species (penguins, geese, and birds), and fish (sharks, perch, and cod) (https://talk.ictvonline.org/ictv-reports/ictv_online_report/dsdna-viruses/w/polyomaviridae). The first PyV discovered was mouse PyV (MPyV), which was isolated from a murine tumor [1, 2] in the mid-1950s. Since then, simian virus 40 (SV40) was discovered in the renal cells of rhesus monkeys in the 1960s [3]. As mostly animal viruses were studied, the viruses seemed to be irrelevant to human diseases. However, two human PyVs, BKPyV and JCPyV, were found [4, 5], and in 2008, MCPyV was identified in human Merkel cell carcinoma tissue [6]. Thus, the various animal and human PyVs reported so far have drawn renewed attention. Most mammalian PyVs do not directly cause severe acute disease in infected hosts. However, an inconspicuous primary infection can persist for a lifetime, and when the host is in an immunosuppressed or immunocompromised state, such infection can lead to multiple diseases, such as progressive multifocal leukoencephalopathy and hemorrhagic cystitis, due to virus reactivation [7, 8]. PyV has a strong species-specific tendency, similar to papillomavirus [9, 10], and is thought to have co-evolved with amniotes. Various studies have been carried out to determine the infection characteristics of PyV. Therefore, it is necessary to understand their evolutionary history and their interaction with their hosts, as well as to interpret their genetic information. Early and late gene RNAs of PyVs encode two and three proteins, respectively. The early gene is translated into 2 T-antigens (large T-antigen (LT-Ag) and small T-antigen), and the late gene is translated into three capsid proteins (VP1, VP2, and VP3) [11]. Among these, LT-Ag is directly related to tumorigenesis. Notably, the LT-Ag protein is known to bind to the p53 and Rb proteins, which are products of two typical tumor suppressor genes [12]. It has also been found to be a major factor determining the biochemical function of SV40 and MCPyV, which cause tumors in rodents and humans [13, 14]. The LT-Ag of PyV has functionally conserved domains, such as the DnaJ domain, LXCXE motif, NLS domain, Helicase domain, and p53 binding domain, that are present in most virus species [13]. Among these, the DnaJ domain, LXCXE motif, and p53 binding domain bind to proteins belonging to the cellular Hsc70 and Rb family and p53 cellular suppressor proteins, respectively, affecting replication and proliferation of the viral genome through DNA binding, ATP-dependent helicase, and ATPase activity. Specifically, when the early gene LT-Ag is continuously expressed, although PyV cannot to replicate its genome in nonpermissive hosts, cell transformation is induced, resulting in tumorigenesis. Each domain is considered to play an important role in this carcinogenesis. PyVs vary in terms of toxicity to hosts, so their effects on hosts differ (Table 1). Variations in the infection characteristics of these viruses (whether they induce tumors due to binding to host proteins) among various hosts indicate the importance of the biochemical function of the LT-Ag protein in relation to host range and tumorigenesis. Therefore, in this study, we performed codon usage pattern, sequence similarity, and phylogenetic analyses using the genetic information of LT-Ag gene coding sequences (CDS) and major domains, to compare genetic characteristics. Based on the results of these analyses, we investigated the differences in the codon usage patterns depending on the taxon and PyV host and identified the relationships between phylogeny and sequence similarity among viruses. The genetic and evolutionary differences among the viruses identified by the comparative analysis offer a basis for explaining variations in their host range and toxicity. Based on these results, it is possible to infer the causes of the functional differences in LT-Ag among various PyVs.
Table 1

Proven and possible diseases associated with PyVs

HostVirus nameSpeciesAbbr.Clinical correlateRef.
HumanMerkel cell polyomavirusHuman polyomavirus 5MCPyVMerkel cell cancer[6]
HumanTrichodysplasia spinulosa-associated polyomavirusHuman polyomavirus 8TSPyVTrichodysplasia spinulosa[15]
HumanBK polyomavirusHuman polyomavirus 1BKPyV

Polyomavirus-associated nephropathy; haemorrhagic

cystitis

[4]
HumanJC polyomavirusHuman polyomavirus 2JCPyVProgressive multifocal leukoencephalopathy (PML)[5]
HumanHuman polyomavirus 6Human polyomavirus 6HPyV6HPyV6 associated pruritic and dyskeratotic dermatosis (H6PD)[16]
HumanHuman polyomavirus 7Human polyomavirus 7HPyV7HPyV7-related epithelial hyperplasia[16]
MonkeySimian virus 40Macaca mulatta polyomavirus 1SV40PML-like disease in Immunocompromised animals[3]
Hamsterhamster polyomavirus

Mesocricetus auratus

polyomavirus 1

HaPyVSkin tumors[17]
Mousemouse pneumotropic virusMus musculus polyomavirus 2MPtVRespiratory disease in suckling mice[18]
Birdbudgerigar fledgling disease virusAves polyomavirus 1BFDVBudgerigar fledgling disease; polyomavirus disease[1921]
FinchFinch polyomavirusPyrrhula pyrrhula polyomavirus 1FPyVPolyomavirus disease[22]
GooseGoose hemorrhagic polyomavirusAnser anser polyomavirus 1GHPVHemorrhagic nephritis and enteritis[23]

References are specified for first description

Proven and possible diseases associated with PyVs Polyomavirus-associated nephropathy; haemorrhagic cystitis Mesocricetus auratus polyomavirus 1 References are specified for first description

Methods

Data acquisition

The virus name, abbreviation, and classification information of 86 species belonging to the family Polyomaviridae were checked (https://talk.ictvonline.org/ictv-reports/ictv_online_report/dsdna-viruses/w/polyomaviridae), and the reference sequences were downloaded from the National Center for Biotechnology Information GenBank® (https://www.ncbi.nlm.nih.gov) (Table 2). The CDS regions of the LT-Ag genes to be analyzed were extracted and classified into the following five groups, according to the host of each virus species: non-primate mammals (Group M); non-human primates (Group P); humans (Group H); avian (Group A); and fish (Group F). Known ORFs were concatenated for total codon analyses of LT-Ag. Accordingly, we performed the analysis using CDS regions in the form of the complement (join, codon start = 1) of LT-Ag from PyV reference sequences. Accession numbers are given in Table 2. To identify the domain regions contained in each LT-Ag gene CDS and extract the corresponding sequences, the amino acid sequence encoding each gene was scanned through PROSITE (https://prosite.expasy.org/), and the ScanProsite results were obtained in addition to ProRule-based predicted intra-domain features. The sequence information of the corresponding region was extracted and used for analysis. PROSITE provides predicted results and related information regarding protein domains, families, and functional sites through ProRule, a collection of rules based on profiles and patterns. Therefore, in this study, the sequence information of 54 DnaJ domains (PROSITE entry: PS50076) and 86 superfamily 3 helicases of DNA virus domains (PROSITE entry: PS51206), along with 86 complete gene sequences, was used for analysis (Table 3). Java programming was performed for LXCXE motif and sequence extraction and processing.
Table 2

Description of sequence data used in this study

No.ICTV TaxonomyNCBI Reference Sequence
Virus nameAbbr.Accession No.Host speciesIsolation sourceCountryYearbpGroup(host)Ref.
1bat polyomavirus 4aBatPyV4aNC_038556.1Artibeus planirostrisspleenFrench Guiana20115187M[24]
2Ateles paniscus polyomavirus 1ApanPyV1NC_019853.1Ateles paniscusNAGermanyNA5273P[25]
3bat polyomavirus 5b1BatPyV5b-1NC_026767.1Pteropus vampyrusspleenIndonesia20125047M[26]
4bat polyomavirus 5aBatPyV5aNC_026768.1Dobsonia moluccensisspleenIndonesia20125075M[26]
5Bornean orang-utan polyomavirusOraPyV-BorNC_013439.1Pongo pygmaeusbloodNANA5168P[27]
6Cardioderma polyomavirusCardiodermaPyVNC_020067.1Cardioderma correctal swabKenya20065372M[28]
7bat polyomavirus 4bBatPyV4bNC_028120.1Carollia perspicillataspleenFrench Guiana20115352M[24]
8chimpanzee polyomavirusChPyVNC_014743.1Pan troglodytes verusbloodNANA5086P[29]
9vervet monkey polyomavirus 1VmPyV1NC_019844.1Chlorocebus pygerythrusspleenZambia20095157P[30]
10vervet monkey polyomavirus 3VmPyV3NC_025898.1Chlorocebus pygerythrusspleenZambia20095055P[30]
11Eidolon polyomavirus 1EidolonPyVNC_020068.1Eidolon helvumrectal swabKenya20095294M[28]
12Gorilla gorilla gorilla polyomavirus 1GgorgPyV1NC_025380.1Gorilla gorilla gorillaNACongo Republic20085300P[31]
13Human polyomavirus 9HPyV9NC_015150.1Homo sapiensNAGermany20095026H[32]
14Human polyomavirus 12HPyV12NC_020890.1Homo sapiensNAGermany20075033H[33]
15Macaca fascicularis polyomavirus 1MfasPyV1NC_019851.1Macaca fascicularisNAGermanyNA5087P[25]
16Merkel cell polyomavirusMCPyVNC_010277.2Homo sapiensskinUSA20095387H[16]
17hamster polyomavirusHaPyVNC_001663.2Mesocricetus auratus strain Z3NAGermany19675372M[34]
18bat polyomavirus 3bBatPyV3bNC_028123.1Molossus molossusspleenFrench Guiana20114903M[24]
19mouse polyomavirusMPyVNC_001515.2Mus musculusNANANA5307MNA
20New Jersey polyomavirusNJPyVNC_024118.1Homo sapiensbicep muscleUSA20135108H[35]
21Otomops polyomavirus 2OtomopsPyVNC_020066.1Otomops martienssenirectal swabKenya20064914M[28]
22Otomops polyomavirus 1OtomopsPyV1NC_020071.1Otomops martienssenirectal swabKenya20065176M[28]
23Pan troglodytes verus polyomavirus 2aPtrovPyV2aNC_025370.1Pan troglodytes verusNACote d’Ivoire20105309P[31]
24Pan troglodytes verus polyomavirus 3PtrovPyV3NC_019855.1Pan troglodytes verusNACote d’IvoireNA5333P[25]
25Pan troglodytes verus polyomavirus 4PtrovPyV4NC_019856.1Pan troglodytes verusNACote d’IvoireNA5349P[25]
26Pan troglodytes verus polyomavirus 5PtrovPyV5NC_019857.1Pan troglodytes verusNACote d’IvoireNA4994P[25]
27Pan troglodytes schweinfurthii polyomavirus 2PtrosPyV2NC_019858.1Pan troglodytes schweinfurthiiNAUgandaNA4970P[25]
28Pan troglodytes verus polyomavirus 1aPtrovPyV1aNC_025368.1Pan troglodytes verusNACote d’Ivoire20095303P[31]
29Piliocolobus badius polyomavirus 2PbadPyV2NC_039051.1Piliocolobus badiusNACote d’Ivoire20055148P[36]
30Piliocolobus rufomitratus polyomavirus 1PrufPyV1NC_019850.1Piliocolobus rufomitratusNACote d’IvoireNA5140P[25]
31raccoon polyomavirusRacPyVNC_023845.1raccoonNAUSA20115016M[37]
32Rattus norvegicus polyomavirus 1RnorPyV1NC_027531.1Rattus norvegicusspleenGermany20055318M[38]
33bat polyomavirus 3a-B0454BatPyV3a-B0454NC_038557.1Sturnira liliumspleenFrench Guiana20115058M[24]
34Sumatran orang-utan polyomavirusOraPyV-SumNC_028127.1Pongo abeliibloodNANA5358P[27]
35Trichodysplasia spinulosa-associated polyomavirusTSPyVNC_014361.1Homo sapiensskinNetherlands20095232H[15]
36yellow baboon polyomavirus 1YbPyV1NC_025894.1Papio cynocephalusspleenZambia20095064P[30]
37African elephant polyomavirus 1AelPyV1NC_022519.1Loxodonta africanaprotruding ulcerated fibromaDenmark20115722M[39]
38BatPyV4aBatPyV2cNC_038558.1Artibeus planirostrisspleenFrench Guiana20115371M[24]
39Myodes glareolus polyomavirus 1BVPyVNC_028117.1Myodes glareolusblood serum and body fluidsGermany20135032M[40]
40bat polyomavirus 6aBatPyV6aNC_026762.1Acerodon celebensisspleenIndonesia20135019M[26]
41bat polyomavirus 6bBatPyV6bNC_026770.1Dobsonia moluccensisspleenIndonesia20125039M[26]
42bat polyomavirus 6cBatPyV6cNC_026769.1Dobsonia moluccensisspleenIndonesia20125046M[26]
43California sea lion polyomavirus 1SLPyVNC_013796.1Zalophus californianustongueUSA20065112M[41]
44Cebus albifrons polyomavirus 1CalbPyV1NC_019854.2Cebus albifronsNAGermanyNA5013P[25]
45Cercopithecus erythrotis polyomavirus 1CeryPyV1NC_025892.1Cercopithecus erythrotisNACameroonNA5189P[25]
46vervet monkey polyomavirus 2VmPyV2NC_025896.1Chlorocebus pygerythruskidneyZambia20095167P[30]
47Microtus arvalis polyomavirus 1CVPyVNC_028119.1Microtus arvalisblood serum and body fluidsGermany20135024M[40]
48bat polyomavirus 2aBatPyV2aNC_028122.1Desmodus rotundusspleenFrench Guiana20115201M[24]
49equine polyomavirusEPyVNC_017982.1Equus caballuseyeUSA20034987M[42]
50BK polyomavirusBKV; BKPyVNC_001538.1Homo sapiensNANANA5153H[43]
51KI polyomavirusKIPyVNC_009238.1Homo sapiensNANANA5040H[44]
52JC polyomavirusJCV; JCPyVNC_001699.1Homo sapiensNANANA5130H[45]
53Weddell seal polyomavirusWsPyVNC_032120.1Leptonychotes weddelliikidneyAntarctica20145186MNA
54simian virus 40SV40NC_001669.1Macaca mulattaNANANA5243P[46]
55Mastomys polyomavirusMasPyVNC_025895.1Mastomys natalensisspleenZambia20094899M[47]
56Meles meles polyomavirus 1MmelPyV1NC_026473.1Meles melessalivary glandFrance20145187M[48]
57Miniopterus polyomavirusMiniopterusPyVNC_020069.1Miniopterus africanusrectal swabKenya20065213M[28]
58mouse pneumotropic virusMPtVNC_001505.2Mus musculusNANANA4754M[49]
59Myotis polyomavirusMyPyVNC_011310.1Myotis lucifugusNACanada20075081M[50]
60Pan troglodytes verus polyomavirus 8PtrovPyV8NC_028635.1Western chimpanzeecolonNetherlands20145163P[51]
61Pteronotus polyomavirusPteronotusPyVNC_020070.1Pteronotus davyioral swabGuatemala20095136M[28]
62bat polyomavirus 2bBatPyV2bNC_028121.1Pteronotus parnelliispleenFrench Guiana20115041M[24]
63rat polyomavirus 2RatPyV2NC_032005.1Rattus norvegicusNAUSA20165108MNA
64Saimiri sciureus polyomavirus 1SsciPyV1NC_038559.1Saimiri sciureusNAGermanyNA5067PNA
65squirrel monkey polyomavirusSquiPyVNC_009951.1Saimiri boliviensisspleenNANA5075P[52]
66alpaca polyomavirusAlPyVNC_034251.1Vicugna pacosNAUSA20145052M[53]
67WU polyomavirusWUPyVNC_009539.1Homo sapiensNAAustraliaNA5229H[54]
68yellow baboon polyomavirus 2YbPyV2AB767295.2Papio cynocephalusspleen and kidneyZambia20095181P[30]
69Human polyomavirus 6HPyV6NC_014406.1Homo sapiensskinUSA20094926H[16]
70Human polyomavirus 7HPyV7NC_014407.1Homo sapiensskinUSA20094952H[16]
71MW polyomavirusMWPyVNC_018102.1Homo sapiensstoolMalawi20084927H[55]
72STL polyomavirusSTLPyVNC_020106.1Homo sapiensfecal specimenMalawiNA4776H[56]
73Adélie penguin polyomavirusADPyVNC_026141.2Pygoscelis adeliaefecal materialAntarctica20124988A[57]
74budgerigar fledgling disease virusBFDVNC_004764.2Falconiformes and Psittaciformes (wild birds)NANANA4981A[58]
75butcherbird polyomavirusButcherbird PyVNC_023008.1Cracticus torquatusperiocular skinAustralia20095084A[59]
76canary polyomavirusCaPyVNC_017085.1Serinus canarialiverNetherlands20075421A[60]
77crow polyomavirusCpyVNC_007922.1Corvus monedulaNANA20055079A[22]
78Erythrura gouldiae polyomavirus 1EgouPyV1NC_039052.1Erythrura gouldiaeliverPoland20145172A[61]
79finch polyomavirusFPyVNC_007923.1Pyrrhula pyrrhula griseiventrisNANA20055278A[22]
80goose hemorrhagic polyomavirusGHPVNC_004800.1gooseNAGermany20015256A[62]
81Hungarian finch polyomavirusHunFPyVNC_039053.1Lonchura majakidney and liverHungary20115284A[63]
82black sea bass-associated polyomavirus 1BassPyV1NC_025790.1Centropristis striataNAUSA20147369F[64]
83bovine polyomavirusBPyVNC_001442.1Bos tauruskidneyNANA4697M[65]
84dolphin polyomavirus 1DPyVNC_025899.1Delphinus delphistracheaUSA20105159M[66]
85giant guitarfish polyomavirusGfPyV1NC_026244.1Rhynchobatus djiddensisskin lesionUSA20143962F[67]
86sharp-spined notothenia polyomavirusSspPyVNC_026944.1Trematomus pennelliiNAAntarctica20136219FNA

No. 1~36: Alphapolyomaviruses; No. 37~68: Betaphapolyomaviruses; No. 69~72: Deltapolyomaviruses; No. 73~81: Gammapolyomaviruses; No. 82~86: Unassigned polyomaviruses; NA Not available

All 86 viruses were classified into 5 groups according to their host as follows: non-primate mammals (Group M); non-human primate (Group P); human (Group H); avian (Group A); fish (Group F)

Table 3

Domains and motifs of PyVs used in this study

No.Abbr.Accession no.DnaJ domainLXCXE motifHelicase domain
StartEndnt lengthStartEnda.a. sequenceStartEndnt length
1BatPyV4aNC_038556.11267168107111LRCDE405564480
2ApanPyV1NC_019853.11277198122126LFCNE441601483
3BatPyV5b-1NC_026767.11274189376536483
4BatPyV5aNC_026768.11267168382546495
5OraPyV-BorNC_013439.11277198122126LFCDE422602543
6CardiodermaPyVNC_020067.11277198212216LYCDE556716483
7BatPyV4bNC_028120.1152156LLCEE458651582
8ChPyVNC_014743.11296255379580606
9VmPyV1NC_019844.11280207107111LHCNE479640486
10VmPyV3NC_025898.11275192131135LFCSE462622483
11EidolonPyVNC_020068.1236240LRCDE588752495
12GgorgPyV1NC_025380.1200204LFCDE554714483
13HPyV9NC_015150.11286225123127LFCSE446606483
14HPyV12NC_020890.1473635489
15MfasPyV1NC_019851.11286225125129LFCTE465665603
16MCPyVNC_010277.2212216LFCDE567727483
17HaPyVNC_001663.2130134LTCQE522682483
18BatPyV3bNC_028123.1107111LYCDE467630492
19MPyVNC_001515.2142146LFCYE549709483
20NJPyVNC_024118.11280207107111LHCDE476636483
21OtomopsPyVNC_020066.11292243107111LYCDE483643483
22OtomopsPyV1NC_020071.1185189LRCDE520680483
23PtrovPyV2aNC_025370.1200204LFCDE556716483
24PtrovPyV3NC_019855.11275192486646483
25PtrovPyV4NC_019856.11275192489646474
26PtrovPyV5NC_019857.11286225123127LFCSE439599483
27PtrosPyV2NC_019858.11285222108112LYCSE432632603
28PtrovPyV1aNC_025368.1203207LYCDE558718483
29PbadPyV2NC_039051.11292243107111LHCNE476637486
30PrufPyV1NC_019850.11293246107111LHCNE476637486
31RacPyVNC_023845.1167171LFCEE504685546
32RnorPyV1NC_027531.1128132LYCSE535698492
33BatPyV3a-B0454NC_038557.1107111LHCHE477637483
34OraPyV-SumNC_028127.11275192489649483
35TSPyVNC_014361.11277198122126LFCHE445605483
36YbPyV1NC_025894.11275192131135LFCSE463663603
37AelPyV1NC_022519.1400564495
38BatPyV2cNC_038558.1223227LLCEE559719483
39BVPyVNC_028117.11267168146150LTCHE383574576
40BatPyV6aNC_026762.18488LFCHE395557489
41BatPyV6bNC_026770.198102LFCHE407570492
42BatPyV6cNC_026769.1100104LFCRE426587486
43SLPyVNC_013796.11277198113117LHCHE397556480
44CalbPyV1NC_019854.2100104LFCNE410570483
45CeryPyV1NC_025892.11275192105109LFCHE402562483
46VmPyV2NC_025896.11275192105109LFCHE402562483
47CVPyVNC_028119.11267168145149LSCNE382573576
48BatPyV2aNC_028122.11280207406565480
49EPyVNC_017982.11286225105109LRCDE402562483
50BKPyVNC_001538.11275192105109LFCHE402562483
51KIPyVNC_009238.1108112LRCNE410572489
52JCPyVNC_001699.11275192105109LFCHE401561483
53WsPyVNC_032120.11277198113117LHCNE400561486
54SV40NC_001669.11275192103107LFCSE400560483
55MasPyVNC_025895.1101105LFCNE414576489
56MmelPyV1NC_026473.11280207111115LRCDE365559585
57MiniopterusPyVNC_020069.11275192103107LHCHE369560576
58MPtVNC_001505.2103107LFCNE418573468
59MyPyVNC_011310.1441603489
60PtrovPyV8NC_028635.11275192105109LFCHE402562483
61PteronotusPyVNC_020070.11280207108112LRCDE405564480
62BatPyV2bNC_028121.11280207108112LRCDE406617636
63RatPyV2NC_032005.11279204178182LHCDE474634483
64SsciPyV1NC_038559.1101105LFCHE410572489
65SquiPyVNC_009951.1101105LFCHE411570480
66AlPyVNC_034251.11267168107111LYCNE407567483
67WUPyVNC_009539.11289234108112LRCNE417579489
68YbPyV2AB767295.21275192105109LFCHE402562483
69HPyV6NC_014406.1109113LYCDE393571537
70HPyV7NC_014407.1109113LYCTE416576483
71MWPyVNC_018102.1105109LSCNE421580480
72STLPyVNC_020106.11283216105109LTCNE406566483
73ADPyVNC_026141.28611626973LYCEE408582525
74BFDVNC_004764.2682231372532483
75Butcherbird PyVNC_023008.18671807074LFCDE410572489
76CaPyVNC_017085.18611626771LSCNE390550483
77CpyVNC_007922.111802106973LQCEE405569495
78EgouPyV1NC_039052.18752047074LYCEE374572597
79FPyVNC_007923.16701956064LFCDE382543486
80GHPVNC_004800.18812226569LFCDE404599588
81HunFPyVNC_039053.16772166064LFCDE382543486
82BassPyV1NC_025790.1105109LMCGE338495474
83BPyVNC_001442.110731929397LHCDE391586588
84DPyVNC_025899.111772018286LYCDE357536540
85GfPyV1NC_026244.1348517510
86SspPyVNC_026944.1372529474

ScanProsite results together with ProRule-based predicted intra-domain features were used for functional domains retained in LT-Ag of PyVs. LXCXE motifs and their encoding sequences were extracted through the JAVA programming

Description of sequence data used in this study No. 1~36: Alphapolyomaviruses; No. 37~68: Betaphapolyomaviruses; No. 69~72: Deltapolyomaviruses; No. 73~81: Gammapolyomaviruses; No. 82~86: Unassigned polyomaviruses; NA Not available All 86 viruses were classified into 5 groups according to their host as follows: non-primate mammals (Group M); non-human primate (Group P); human (Group H); avian (Group A); fish (Group F) Domains and motifs of PyVs used in this study ScanProsite results together with ProRule-based predicted intra-domain features were used for functional domains retained in LT-Ag of PyVs. LXCXE motifs and their encoding sequences were extracted through the JAVA programming

Phylogenetic analysis

Multiple sequence alignments were performed for each sequence using MUSCLE, and the phylogeny was reconstructed using the maximum likelihood (ML) method based on the Tamura-Nei model [68] using MEGA 7.0.26 [69]. Bootstrap analysis [70] was carried out with 1000 replicates of the dataset to determine the robustness of the individual nodes. The reconstructed trees confirmed the phylogenetic relationships for viral sequences of the LT-Ag gene, DnaJ, and helicase from different host species. Based on these results, the 86 viral species were divided into five groups [non-primate mammals (Group M), non-human primates (Group P), humans (Group H), avian (Group A), and fish (Group F)]. For the purpose of this study, virus group information based on the phylogenetic relationships was considered when conducting various analyses and interpreting and discussing the results.

Compositional analysis

The CodonW (https://sourceforge.net/projects/codonw/) and CALcal (http://genomes.urv.es/CAIcal/) programs were used to perform nucleotide composition analysis. Various nucleotide compositional properties were calculated for the sequences corresponding to the CDS of the PyV LT-Ag gene, DnaJ domain, and helicase domain. The frequency of each nucleotide (%A, %C, %T, and %G), GC and AT contents (%GC and %AT), each nucleotide at the third position of synonymous codons (%A3, %C3, %T3, and %G3), G + C (%GC3) and A + T contents (%AT3) at the third codon, and G + C (%GC12) and A + T mean values (%AT12) at the first and second codons were calculated. Genetic variability was analyzed by calculating the nucleotide variability of the LT-Ag genes and two domains in each virus group. The total number of segregating sites, total number of mutations, average number of nucleotide differences between sequences, and nucleotide diversity were estimated using DnaSP v. 5.10.01 [71].

Effective number of codons (ENC) analysis

Analysis of the effective number of codons (ENC) was used to quantify the absolute codon usage bias in the PyV LT-Ag gene CDS, independent of the gene length. ENC values range from 20 to 61; 20 represents the largest codon usage bias, in which only one of the possible synonymous codons is used for the corresponding amino acid; 61 indicates no bias and means that all possible synonymous codons are used equally for the corresponding amino acid. Generally, genes are considered to have significant codon bias when the ENC value is less than 35 [72, 73].

Parity rule 2 (PR2) analysis

Parity rule 2 (PR2) analysis is commonly used to investigate the effects of mutations and selection pressure on codon usage bias in genes. The PR2 plot positions the AT-bias [A3/(A3 + T3)] and GC-bias [G3/(G3 + C3)] at the third codon of four-codon amino acids [fourfold degenerate codon families: Ala (A), Arg (R), Gly (G), Leu (L), Pro (P), Ser (S), Thr (T), and Val (V)] of the entire genome are shown on the vertical axis (y) and horizontal axis (x), respectively. The location of the plot with both coordinates at 0.5 is A = T, G = C (PR2), indicating no bias between the effects of mutation and natural selection (replacement rate). The distance between the coordinate position (0.5, 0.5) and the plot dot, which is the center of the plot, indicates the degree and direction of the PR2 bias [74, 75].

Neutral evolution analysis

Neutrality plots are used to evaluate the relationship between the third codon positions to reflect the role of directional mutation pressure. Consequently, the gradients of the regression lines in the neutrality plot depict the relationship between GC12s and GC3s, elucidating the evolutionary rates of directional mutation pressure–natural selection equilibrium. When the gradient of the regression line is 0 (all plot dots are located on a line parallel to the abscissa), there are no effects from directional mutation pressure. When the gradient is 1 (all plot dots are located on the diagonal), we have complete neutrality. Therefore, the regression lines of the neutrality plot can be used to determine the main factor controlling evolution by measuring the degree of neutrality [76]. DnaSP v. 5.10.01 [71] was used to calculate Tajima’s D [77], Fu and Li’s D*, and F* [78] as tests of neutrality. Tajima’s D statistic measures the departure from neutrality for all mutations in a genomic region [77] and is based on the differences between the number of segregating sites and the average number of nucleotide differences. Fu and Li’s D* test is based on the differences between the number of singletons (mutations appearing only once in the sequence) and the total number of mutations. Fu and Li’s F* test is based on the differences between the number of singletons and the average number of nucleotide differences between every pair of sequences [78, 79].

Relative synonymous codon usage (RSCU) analysis

Relative synonymous codon usage (RSCU), a measure of the preference for the use of a synonymous codon, is defined as the ratio of the observed number of synonymous codons used to the expected value of the codon occurrence frequency [80]. In general, codons with an RSCU value greater than 1.0 are considered to have a higher preference (abundant codons), and those with an RSCU value lower than 1.0 have a lower preference (less-abundant codons). When the RSCU value is equal to 1.0, either the preference for synonymous codons is the same or the codon usage is random [81]. Specifically, a codon with an RSCU value of 1.6 or more is an over-represented codon, and a codon with an RSCU value of 0.6 or less is considered an under-represented codon (≤0.6) [82]. Using the CodonW and CAIcal programs, the RSCU values of the sequences of the 54 DnaJ domains and 86 helicase domains were calculated, along with 86 LT-Ag gene CDS. Comparative analysis and visualization of each group were performed using XLSTAT.

Calculation of the codon adaptation index (CAI)

The codon adaptation index (CAI) is a quantitative measurement ranging from 0 to 1 that predicts gene expression levels based on CDS. The most frequent codons show the highest relative adaptation to the host, and sequences with a higher CAI are preferred over those with a lower CAI [83]. CAI analysis of the LT-Ag gene CDS was carried out using CAIcal [84], and the synonymous codon usage pattern of Homo sapiens, which was downloaded from the Codon Usage Database (CUD) [85], was used as the reference dataset.

Correspondence analysis (COA)

Each group of RSCU values was analyzed using the correspondence analysis (COA) method, and the results were visualized using XLSTAT. Individual data representing the LT-Ag gene coding region were expressed as a vector with 59 dimensions, and we included 59 codons, excluding methionine (ATG) and tryptophan (TGG), without synonymous codons in the analysis.

Selection pressure analysis

The number of non-synonymous substitutions per non-synonymous site (dN), the number of synonymous substitutions per synonymous site (dS), and the dN/dS ratios for the nucleotide sequences of the LT-Ag genes and two domains were estimated for all isolates in each virus group using MEGA 7.0.26 [69]. A gene is under positive (or diversifying) selection when the dN/dS ratio is > 1, neutral selection when dN/dS ratio = 1, and negative (or purifying) selection when the dN/dS ratio < 1.

Results

Sequence similarity and evolutionary relationships among PyVs

Phylogenetic analyses using the LT-Ag gene, DnaJ domain, and helicase domain revealed that, except for two bat viruses, Alphapolyomavirus and Betapolyomavirus were grouped independently, and Gammapolyomavirus formed a separate cluster. Deltapolyomavirus and the unassigned viruses clustered together or were independent in all of the trees. Thus, except for some exceptional cases [bat PyV 2c (BatPyV2c), bat PyV 4a (BatPyV4a), and DPyV] in the ML-based tree, the viruses were generally grouped by genes. When the clustering pattern per host was examined, Groups M, P, and H formed a large cluster. In other trees, except for the DnaJ domain-based tree for which domain information was lacking (Group F was not included in the analysis), Group A (avian viruses) and Group F (fish viruses) were grouped independently (Fig. 1).
Fig. 1

Phylogenetic trees of PyV LT-Ag genes. PyVs were classified according to the host species (mammal, avian, and fish) in the ML-based trees constructed using nucleotide sequences of LT-Ag coding genes, DnaJ domains, and helicase domains (Alphapolyomaviruses []; Betaapolyomaviruses []; Deltapolyomaviruses []; Gammapolyomaviruses []; unassigned [])

Phylogenetic trees of PyV LT-Ag genes. PyVs were classified according to the host species (mammal, avian, and fish) in the ML-based trees constructed using nucleotide sequences of LT-Ag coding genes, DnaJ domains, and helicase domains (Alphapolyomaviruses []; Betaapolyomaviruses []; Deltapolyomaviruses []; Gammapolyomaviruses []; unassigned [])

Compositional properties of LT-Ag genes

To confirm the effect of differences in composition on the codon usage patterns observed in 86 PyV species isolated from different hosts, we analyzed the nucleotide compositions of the complete sequences of the LT-Ag genes, as well as those of the DnaJ domain and helicase domain regions of the LT-Ag protein, in each virus (Table 4). These domains play particularly important roles in the biochemical function of LT-Ag and are relatively well conserved in various PyV species compared to other domain regions. Thus, it is possible to extract more accurate homologous sequences based on the protein sequence pattern and profile information using these domains. Hence, these became the subjects of this analysis. After analyzing the mean composition of each group (%), nucleotide A was the highest in all groups, and C was lowest in all sequences except for the DnaJ domain CDS of Group A (Fig. 2). In the nucleotides observed at the third position of the synonymous codons (A3, T3, G3, and C3), G3 was higher than C3. T3 was higher than A3 in all groups except Group A, H, and P of the DnaJ domain. In all analyzed sequences, the GC and GC3 values were significantly higher in Groups A and F (> 45), and Groups H, M, and P exhibited high AT and AT3 values (> 60). In particular, group H viruses had significantly higher AT3 values (> 70). According to the nucleotide frequency at the third position of the codon, all sequences except the DnaJ domain CDS of avian PyVs belonging to Group A were AT-rich, but at the individual nucleotide level, G and A were dominant over C and T. In previous studies, the GC values for the entire genomes of JCPyV, BKPyV, SV40, budgerigar fledgling disease virus (BFDV), MPyV, goose hemorrhagic PyV (GHPyV), and bovine PyV (BPyV) were 0.41, 0.41, 0.42, 0.5, 0.48, 0.42, and 0.42, respectively, and the GC3 values were 0.3, 0.28, 0.31, 0.45, 0.42, 0.43, and 0.33, respectively [86]. Based on the LT-Ag CDS results for the above viruses, the %GC values of the corresponding virus were 38.12, 35.82, 37.85, 46.44, 46.57, 44.43, and 38.55, respectively, and the %GC3 values were 33.82, 28.16, 34.27, 47.67, 44.06, 44.11, and 33.06, respectively. As in previous studies using whole genome sequences, the GC and GC3 values of the bird PyV in the LT-Ag gene were higher than those of the mammalian PyV.
Table 4

Nucleotide compositions of the LT-Ag genes of 86 polyomaviruses

CAIH: result of comparison with Homo sapiens as reference set

dashed line: avain polyomaviruses, solid line: fish polyomaviruses

Fig. 2

Compositional features of nucleotide sequences of LT-Ag coding genes, DnaJ domains, and helicase domains. a Nucleotide distribution of A, C, U, and G. b Distribution frequency calculated only for the third codon base. c GC and AT content at all codon positions (GC% and AT%) and at the third position (GC3s and AT3s)

Nucleotide compositions of the LT-Ag genes of 86 polyomaviruses CAIH: result of comparison with Homo sapiens as reference set dashed line: avain polyomaviruses, solid line: fish polyomaviruses Compositional features of nucleotide sequences of LT-Ag coding genes, DnaJ domains, and helicase domains. a Nucleotide distribution of A, C, U, and G. b Distribution frequency calculated only for the third codon base. c GC and AT content at all codon positions (GC% and AT%) and at the third position (GC3s and AT3s)

Codon usage patterns in the LT-Ag genes from different hosts

The ENC values were calculated to estimate the magnitude of the codon usage bias in the LT-Ag sequences of the PyVs. A mean value of 45.4 ± 4.9 was confirmed for all LT-Ag gene sequences analyzed. The lowest ENC value was observed in dolphin PyV 1 (DPyV) (34.8), and the highest value was observed in BFDV (58.4). Groups A and F viruses had ENC ranges of 50.8–58.4 and 52.7–58.1, respectively. The mean ENC values of Groups H, M, and P viruses were 42.254, 45.078, and 43.520, respectively, significantly lower than those of Groups A and F (53.311 and 55.700, respectively). Thus, the sequence compositions in the LT-Ag gene according to host species had higher ENC values (> 50) in avian PyV and fish PyV than in mammalian PyV (Groups M, P, and H), implying that the codon diversity was greater in the LT-Ag CDS region of Groups A and F viruses. A similar ENC range pattern was observed in both domains. In the DnaJ domain, Group A viruses had an ENC range of 47.26–61.0. The mean ENC values of Groups H, M, and P viruses were 39.5, 42.0, and 39.3, respectively, significantly lower than the mean ENC value of Group A (53.0). In the helicase domain, Groups A and F viruses had ENC ranges of 44.94–56.81 and 46.53–61.0, respectively. The mean ENC values of Groups H, M, and P viruses were 40.8, 44.3, and 42.0, respectively, which were significantly lower than those of Groups A and F (51.5 and 53.9, respectively). These results indicate that host-specific ENC value distribution characteristics were present in the LT-Ag gene and the CDS of the domain regions contained in the LT-Ag gene. Whereas avian PyV and fish PyV included significant codon diversity, mammalian viruses belonging to Groups M, P, and H exhibited conservative and host-specific evolution of codon usage bias (Table 4, Fig. 3). Genetic variability, which was estimated by measuring the average number of pairwise nucleotide differences (k) and nucleotide diversity (π), was highest for the LT-Ag gene (k = 910.333, π = 0.54939) and helicase domain (k = 210, π = 0.46358) in Group F (Table 5).
Fig. 3

The range of ENC values of the LT-Ag genes and two functional domains. The cross (×) indicates the mean ENC value, and the dot (•) indicates the minimum/maximum ENC value of the LT-Ag genes and two domains within LT-Ag. Each group, which we classified by host, was composed of 9 (Group A), 3 (Group F), 13 (Group H), 36 (Group M), and 25 (Group P) nucleotide sequence data of LT-Ag genes and helicase domains. DnaJ domains were not identified in 32 protein sequences, including 3 fish PyVs; thus, a total of 54 sequence data were used for the analysis

Table 5

Nucleotide diversity, selection pressure, and neutrality tests of the LT-Ag genes and two domains of the PyV groups

Genetic variabilityNeutrality testsSelection pressure
RegionGroupmnSηkπTajima’s DFu and Li’s DFu and Li’s FdN/dS
LT-AgAll869448372129418.2450.44306−0.04390ns1.45113ns0.96702ns2.163
Group A9172512832383737.8890.42776−0.82814ns0.0858ns−0.15345ns0.282
Group F3165712091522910.3330.54939NANANA0.684
Group H13164813362725725.1920.44004−0.80590ns0.16114ns−0.11521ns1.673
Group M36140412052813615.9890.43874−0.35097ns0.89680ns0.54139ns0.523
Group P25160212682653666.1470.41582− 0.20916ns0.88010ns0.62234ns0.318
DnaJ domainAll5416014435271.2040.44503−0.28170ns1.14715ns0.71186ns0.261
Group A916211921468.0830.42027−0.70347ns0.14282ns−0.07065ns0.298
Group H719214623782.1430.42783−0.88626ns−0.18339ns− 0.37879ns0.417
Group M1916213627763.4740.39181−0.83778ns0.31536ns−0.03513ns0.289
Group P1919215329178.5850.4093−0.23632ns0.71490ns0.50101ns0.262
Helicase domainAll86424348827165.8670.39120.02756ns1.22733ns0.85387ns0.316
Group A9471288499159.3610.33835−0.68870ns0.11803ns−0.08782ns0.150
Group F34532853452100.46358NANANA0.379
Group H13477326632174.6670.36618−0.65740ns0.21440ns−0.02451ns0.260
Group M36447346738170.8760.38227−0.15171ns0.86494ns0.60171ns0.503
Group P25471317619161.560.34301−0.05815ns0.97206ns0.75361ns0.142

m, number of sequences used; n, total number of sites (excluding sites with gaps/missing data); S, number of segregating sites; η, total number of mutations; k, average number of pairwise nucleotide differences; π, nucleotide diversity; dS, average number of synonymous substitutions per site; dN, average number of non-synonymous substitutions per site; NA, not available due to limited sequences for analysis of the gene-specific sequence dataset; ns, not significant

The range of ENC values of the LT-Ag genes and two functional domains. The cross (×) indicates the mean ENC value, and the dot (•) indicates the minimum/maximum ENC value of the LT-Ag genes and two domains within LT-Ag. Each group, which we classified by host, was composed of 9 (Group A), 3 (Group F), 13 (Group H), 36 (Group M), and 25 (Group P) nucleotide sequence data of LT-Ag genes and helicase domains. DnaJ domains were not identified in 32 protein sequences, including 3 fish PyVs; thus, a total of 54 sequence data were used for the analysis Nucleotide diversity, selection pressure, and neutrality tests of the LT-Ag genes and two domains of the PyV groups m, number of sequences used; n, total number of sites (excluding sites with gaps/missing data); S, number of segregating sites; η, total number of mutations; k, average number of pairwise nucleotide differences; π, nucleotide diversity; dS, average number of synonymous substitutions per site; dN, average number of non-synonymous substitutions per site; NA, not available due to limited sequences for analysis of the gene-specific sequence dataset; ns, not significant The NC plot showing the relationship between ENC and GC3 revealed that the results from excluding eight DnaJ domains and three helicase domain CDS, while including the entire LT-Ag gene CDS were plotted under the expected ENC curve, suggesting that the codon usage was biased. This pattern was observed overall, regardless of group. However, in the LT-Ag gene sequence analysis, Groups A and F viruses exhibited more diverse codon usage, as they were located closer to the expected ENC curve. However, Groups M, P, and H had relatively more biased codon usage (Fig. 4). This codon usage pattern was consistent with the characteristics of the avian virus, which is known to have a broad host range, as opposed to the mammalian virus, with a narrow host range [7].
Fig. 4

The relationship between ENC and GC3 (NC plot). ENC were plotted against GC content at the third codon position. The expected ENC from GC3 are shown as a solid line

The relationship between ENC and GC3 (NC plot). ENC were plotted against GC content at the third codon position. The expected ENC from GC3 are shown as a solid line PR2 and neutrality analyses were performed to investigate the effects of mutation pressure and natural selection on codon usage patterns of LT-Ag CDS of PyVs. After analyzing the relationship between AT and GC contents, A was used at the third codon position of 65 fourfold degenerate codon families of 86 gene sequences at a frequency higher than or equal to T; in the fourfold degenerate codon families of 45 gene sequences, G was used at a frequency equal to or greater than C. In the DnaJ domain, A was used at the third codon position of 43 fourfold degenerate codon families of 54 gene sequences at a frequency higher than or equal to T, and in the fourfold degenerate codon families of 31 gene sequences, G was used at a frequency greater than or equal to C. In the helicase domain, A was used at the third codon position of 64 fourfold degenerate codon families of 86 sequences at a frequency higher than or equal to T, and in the fourfold degenerate codon families of 63 gene sequences, G was used at a frequency equal to or greater than C. When the distances and directions of all plot dots from the plot coordinate (0.5, 0.5) were examined, there were no significant differences between groups, and various distance distributions and similar directionality (T → A) were detected. Therefore, the bias shown in the PR2 plot results from the difference in the usage frequencies of T and A, which is generally shown in the fourfold degenerate codon families of the sequences encoding the LT-Ag genes of the PyVs and the domains contained therein, rather than differences between the groups. Unequal use of these nucleotides may imply the overlapping effect of natural selection and mutation pressure on codon selection in the corresponding gene sequences (Fig. 5). Negative values of Tajima’s D, Fu and Li’s D*, and Fu and Li’s F* were obtained for the DnaJ domain in Group H, indicating an excess of low-frequency polymorphisms caused by background selection, genetic hitchhiking, or population expansions [79, 87, 88]. The values of Tajima’s D, Fu and Li’s D*, and Fu and Li’s F* for the helicase domain in the overall population were positive, which arose from an excess of intermediate-frequency alleles and can result from population bottlenecks, structure, or balancing selection [87]. However, the P-values for Tajima’s D, Fu and Li’s D*, and Fu and Li’s F* tests were not significant (P > 0.10) in all cases (Table 5), indicating that the results were less convincing; it is also plausible that purifying selection is acting on each of the viral groups. It was impossible to do these statistical tests for the DnaJ domain in Group F, as the analysis using DnaSP software requires at least four sequences [71].
Fig. 5

PR2-bias plot analysis. A3/(A3 + T3) were plotted against G3/(G3 + C3). The A3 content is greater than T3, and the G3 content is greater than C3 in CDS of LT-Ag genes, DnaJ domains, and helicase domains from different host species. These LT-Ag genes and their retained domains prefer to use the T-end and G-end codons

PR2-bias plot analysis. A3/(A3 + T3) were plotted against G3/(G3 + C3). The A3 content is greater than T3, and the G3 content is greater than C3 in CDS of LT-Ag genes, DnaJ domains, and helicase domains from different host species. These LT-Ag genes and their retained domains prefer to use the T-end and G-end codons In terms of the evolution of synonymous codon usage, mutation pressure either increases or decreases the GC content, and the GC content (GC3) at the third codon position expresses the most neutral nucleotides that make an important contribution to directional mutation pressure [76]. Thus, the effect of directional mutation and natural selection on the codon usage pattern of the PyV’s LT-Ag gene CDS isolated from different host species and two functional domains contained in the gene was estimated based on the neutrality plot. Neutrality analysis also confirmed that mutation pressure and natural selection both affected the codon usage bias of the LT-Ag gene CDS. The analyzed genes showed a narrow GC12 distribution and a wide GC3 distribution, indicating a significant correlation (r = 0.715, p < 0.0001). This may indicate high mutation bias or highly variable GC contents in the corresponding genes. When comparing the gradients of the regression lines for each group, Group F had the largest regression slope of 0.5957, followed by Groups P (0.2476), H (0.2298), M (0.2135), and A (0.1654). This indicates that the relative neutrality (directional mutation pressure) of the viruses belonging to each group was 59.57, 24.76, 22.98, 21.35, and 16.54%, respectively. Therefore, the contribution of natural selection to the codon usage pattern of each group was higher in the order of Groups A (83.46%), M (78.65%), H (77.02%), and P (75.24%). Group F was less affected by natural selection than the other groups (40.43%). A comparison of the gradients of the regression lines of all groups based on our neutrality analysis of the helicase domain revealed that the contribution of natural selection to the codon usage pattern of each group was, in descending order, Groups H (89.51%), P (86.92%), M (83.51%), and A (81.87%). Group F was less affected by natural selection than the other groups were (74.58%). In the case of the DnaJ domain, natural selection had a relatively low effect on Group A (58.24%), whereas its effect on other groups (Groups H, M, and P) was 80% or higher. Thus, the effect of the relative neutrality (directional mutation pressure) was found to be large (Fig. 6).
Fig. 6

Neutrality plot of GC12 vs. GC3. GC12 were plotted against GC3. GC12 is the ordinate, and GC3 is the abscissa, so each point in the figure represents one LT-Ag gene from a different host organism. The neutrality plotting results for LT-Ag genes show that the distribution of GC12 is relatively concentrated, GC3 is during 0.171 (Delphinus delphis [short-beaked common dolphin]) to 0.596 (Pygoscelis adeliae [Adélie penguin]). Neutrality plotting results for two functional domains also show that the distribution of GC12 is relatively concentrated, while GC3 is incompactly dispersed in the range of 0.175 (Pongo pygmaeus [Bornean orangutan]) to 0.646 (Pygoscelis adeliae [Adélie penguin]) for DnaJ domains and 0.128 (Delphinus delphis [short-beaked common dolphin]) to 0.606 (Pygoscelis adeliae [Adélie penguin]) for helicase domains

Neutrality plot of GC12 vs. GC3. GC12 were plotted against GC3. GC12 is the ordinate, and GC3 is the abscissa, so each point in the figure represents one LT-Ag gene from a different host organism. The neutrality plotting results for LT-Ag genes show that the distribution of GC12 is relatively concentrated, GC3 is during 0.171 (Delphinus delphis [short-beaked common dolphin]) to 0.596 (Pygoscelis adeliae [Adélie penguin]). Neutrality plotting results for two functional domains also show that the distribution of GC12 is relatively concentrated, while GC3 is incompactly dispersed in the range of 0.175 (Pongo pygmaeus [Bornean orangutan]) to 0.646 (Pygoscelis adeliae [Adélie penguin]) for DnaJ domains and 0.128 (Delphinus delphis [short-beaked common dolphin]) to 0.606 (Pygoscelis adeliae [Adélie penguin]) for helicase domains

Variation in RSCU value and codon usage preference

We calculated the RSCU values reflecting the codon preference in the LT-Ag genes of PyVs and analyzed their distribution pattern by group (Fig. 7) to compare them in terms of their host species (Fig. 8). First, the total mean RSCU values of the LT-Ag gene CDS in 86 species were calculated. The mean RSCU values for TTA (leu), ATT (ile), CCT (pro), GCT (ala), and AGA (arg) were 1.88, 1.62, 1.76, 1.74, and 3.78, respectively. Thus, they were over-represented codons. When the distribution pattern for each group was examined, the differences in codon usage preference among the mammalian viruses belonging to Groups H, M, and P were not significant. The difference between Groups A and F and the three groups of avian and fish viruses was relatively large. When the mean RSCU values of each group were compared, Groups H, M, and P had mean RSCU values of 1.6 or higher in codon TTT (phe), TTA (leu), ATT (ile), and GCT (ala), differing from Groups A and F. Codon AGA (arg) exhibited the largest difference in codon usage preference among the groups, and the mean RSCU value for each group was 1.55 (Froup A), 2.07 (F), 4.40 (H), 3.90 (M), and 4.28 (P). The color distribution according to the group or host species in Fig. 8 confirms such differences. Based on the analysis of each domain, the mean RSCU values of CCT (pro), ACT (thr), AGA (arg), and GGA (gly) were 2.13, 1.64, 3.88, and 1.64, respectively, in terms of the 54 DnaJ domain CDS. Thus, they were over-represented codons. When we compared the mean RSCU values of each group, Groups H, M, and P exhibited values of 1.6 or higher in codon TCT (ser), CCT (pro), and ACT (thr), showing differences from Group A. The total mean RSCU values for 86 helicase domain CDS were 1.66, 2.00, 2.09, 1.95, 1.70, and 4.12 for TTT (phe), TTA (leu), AGT (ser), CCT (pro), GCT (ala), and AGA (arg), respectively, indicating over-represented codons. When the mean RSCU values of the groups were compared, Groups H, M, and P had values greater than 1.6 in codon TTT (phe), TTA (leu), and ACT (thr), differing from Groups A and F. The codons AGT (ser) and CCT (pro) had values greater than 1.6 in all groups except Group F. Similar to the LT-Ag gene CDS, the greatest difference in codon usage preference between the groups was detected in the case of codon AGA (arg) in the two functional domains. The mean RSCU values for each group were 1.6 (Group A), 4.76 (H), 4.47 (M), and 4.05 (P) in the DnaJ domain and 2.13 (Group A), 2.26 (F), 4.76 (H), 4.18 (M), and 4.63 (P) in the helicase domain.
Fig. 7

RSCU analysis of PyVs. There is variation in the differences between the codon preferences of the five groups in terms of the LT-Ag genes. We can see that there are relatively large differences among groups in the RSCU values of specific codons, such as codon AGA(arg) and TTA(leu)

Fig. 8

Difference in RSCU values of 86 PyVs. Respective RSCU of the 86 LT-Ag coding genes, 54 DnaJ domain coding sequences, and 86 helicase coding sequences. All RSCU values are shown in the chromaticity diagram via chromaticity co-ordinates. The chrominance difference enables visual comparison of large data sets with various host species

RSCU analysis of PyVs. There is variation in the differences between the codon preferences of the five groups in terms of the LT-Ag genes. We can see that there are relatively large differences among groups in the RSCU values of specific codons, such as codon AGA(arg) and TTA(leu) Difference in RSCU values of 86 PyVs. Respective RSCU of the 86 LT-Ag coding genes, 54 DnaJ domain coding sequences, and 86 helicase coding sequences. All RSCU values are shown in the chromaticity diagram via chromaticity co-ordinates. The chrominance difference enables visual comparison of large data sets with various host species A preference for a particular codon is a common evolutionary phenomenon, reflecting the evolution of the biological group and carrying important meaning as a tool for explaining basic biological phenomena at the molecular level. RSCU analysis is one of the most important methods for analyzing synonymous codons in various organisms, including viruses. As shown in Fig. 7 and Fig. 8, the RSCU values of 86 LT-Ag genes differed by group and host, and there were differences in preference for codon usage. In Table 6 and Fig. 9, the results of comparing the mean RSCU and codon frequencies between different viral groups with their respective host species are seen more clearly. Notably, the greatest difference in codon usage preference between genes and groups was detected in codon AGA (arg) of all datasets. The CAI was calculated to compare the adaptability of synonymous codon usage. In this study, the CAI value of H. sapiens was used as the reference dataset. The range of the total value was 0.690–0.790, and the mean ± standard deviation was 0.74 ± 0.02. The CAI values did not vary significantly between groups, and PyVs derived from various host species generally had high similarity to the reference data in terms of both codon usage pattern and expression level. Thus, regardless of the host species, they showed relatively high adaptability in human hosts.
Table 6

RSCU distances of the host pairs calculated from the RSCU values for the abundant codons (RSCU ≥1.6) in the LT-Ag genes and two domains of PyVs

RegionHost pairsRSCU distances witin host pairs for abundant codons (RSCU≥1.6)
TTTTTAATTTCTCCTACTGCTAGAAGGAvg.
LT-AgA–F0.0820.165b0.4060.6760.1340.2440.1110.5210.7800.346
A–H0.558a1.593a0.5220.680a0.3010.764a0.5072.855a0.7490.948
A–M0.4351.1690.5120.3340.2570.4710.2042.3550.6450.709
A–P0.4601.3350.572a0.6010.385a0.7070.2792.7310.785a0.873
F–H0.4771.4280.1170.004b0.1670.5200.618a2.3350.0320.633
F–M0.3531.0040.1070.3420.1230.2270.3151.8340.1350.493
F–P0.3781.1700.1660.0740.2510.4630.3892.2100.005b0.567
H–M0.1230.4240.010b0.3460.044b0.2930.3030.5010.1040.239
H–P0.0980.2590.0500.0790.0830.057b0.2280.125b0.0360.113
M–P0.025b0.1660.0600.2670.1270.2360.075b0.3760.1400.164
DnaJA–H1.682a1.0640.224b1.896a0.5421.5800.3953.157a0.1511.188
A–M0.5110.6620.5611.5660.6691.668a0.6512.8740.012b1.019
A–P1.4761.230a0.788a1.6641.304a1.2120.786a2.4470.4491.262
H–M1.1710.4020.3380.3300.127b0.088b0.2560.283b0.1390.348
H–P0.207b0.166b0.5640.2320.7620.3680.3910.7110.600a0.444
M–P0.9650.5680.2260.098b0.6350.4560.135b0.4270.4610.441
HelicaseA–F0.1420.4940.4890.6330.6530.003b0.1680.1280.7160.381
A–H0.4161.739a0.4920.4350.106b0.9460.735a2.628a1.044a0.949
A–M0.3861.0740.647a0.0590.1840.8370.053b2.0520.7370.670
A–P0.3631.3490.5940.0490.4451.0130.5992.5020.9490.873
F–H0.558a1.2450.002b1.069a0.5460.9490.5682.5000.3280.863
F–M0.5280.5800.1580.5740.8360.8400.1151.9240.021b0.620
F–P0.5050.8540.1040.5851.098a1.016a0.4312.3740.2320.800
H–M0.0290.6650.1550.4950.2900.1100.6820.5760.3070.368
H–P0.0530.3900.1020.4840.5520.0660.1360.126b0.0960.223
M–P0.024b0.274b0.0530.011b0.2620.1760.5460.4500.2120.223

A–F avian–fish, A–H avian–human, A–M avian–non-primate mammals, A–P avian–non-human primate, F–H fish–human, F–M fish–non-primate mammals, F–P fish–non-human primate, H–M human–non-primate mammals, H–P human–non-human primate, M–P non-primate mammals–non-human primate; alargest RSCU distances among the host pairs for the corresponding codon; bsmallest RSCU distances among the host pairs for the corresponding codon

Fig. 9

Mean RSCU distances of the host pairs calculated from the RSCU values for the abundant codons (RSCU ≥1.6) in the LT-Ag genes and two domains of PyVs

RSCU distances of the host pairs calculated from the RSCU values for the abundant codons (RSCU ≥1.6) in the LT-Ag genes and two domains of PyVs A–F avian–fish, A–H avian–human, A–M avian–non-primate mammals, A–P avian–non-human primate, F–H fish–human, F–M fish–non-primate mammals, F–P fish–non-human primate, H–M human–non-primate mammals, H–P human–non-human primate, M–P non-primate mammals–non-human primate; alargest RSCU distances among the host pairs for the corresponding codon; bsmallest RSCU distances among the host pairs for the corresponding codon Mean RSCU distances of the host pairs calculated from the RSCU values for the abundant codons (RSCU ≥1.6) in the LT-Ag genes and two domains of PyVs

COA results for RSCU values

We carried out COA using the RSCU value to identify trends associated with differences in codon preference among the gene sequences used in this study. In the COA-RSCUs generated in this study, axis 1 (y) and axis 2 (x) accounted for 74.01 and 14.96% of the total mutations, respectively. Figure 10 shows the COA results for over-represented codons, with RSCU values greater than or equal to 1.6, calculated from 86 LT-Ag gene CDS. Scatter plots B–F show high similarity in terms of the distribution patterns of the plot dots in the range (− 0.2 to + 0.3, − 0.4 to ~ + 0.4) in all groups. Specifically, two dots plotted outside the corresponding range were identified as LT-Ag genes of BFDV and Adélie penguin PyV (ADPyV). Thus, they were presumed to indicate mutations in codon usage patterns. These are all avian PyVs belonging to Group A, and host species are wild birds and Pygoscelis adeliae (Adélie penguin), respectively (Fig. 10). The distances between the genes in the plots shown in Fig. 10 reflects the dissimilarity in the RSCU with respect to axis 1 and axis 2. These results explain a significant portion (74.01%) of the variation in codon usage in 86 LT-Ag genes, so natural selection may have played a very important role.
Fig. 10

Correspondence analysis results for the RSCU values of strongly preferred codons in 86 PyVs (COA-RSCU). The COA results for over-represented codons (RSCU > 1.6) for five groups are shown in scatter plots b-f for groups A, F, H, M, and P, respectively. The plot dot distribution patterns of groups A and F vs. groups H, M, and P were compared (a). Overall, the plotted dots show high similarity in terms of distribution patterns in all groups, with a scattered range (− 0.2 to + 0.3, − 0.4 to + 0.4). Specifically, two dots plotted over the range were identified as LT-Ag genes for BFDV and ADPyV, and thus they can be seen to vary in terms of codon usage patterns. They are all avian polyomaviruses belonging to group A, and host organisms are wild birds and Pygoscelis adeliae (Adélie penguin) (a)

Correspondence analysis results for the RSCU values of strongly preferred codons in 86 PyVs (COA-RSCU). The COA results for over-represented codons (RSCU > 1.6) for five groups are shown in scatter plots b-f for groups A, F, H, M, and P, respectively. The plot dot distribution patterns of groups A and F vs. groups H, M, and P were compared (a). Overall, the plotted dots show high similarity in terms of distribution patterns in all groups, with a scattered range (− 0.2 to + 0.3, − 0.4 to + 0.4). Specifically, two dots plotted over the range were identified as LT-Ag genes for BFDV and ADPyV, and thus they can be seen to vary in terms of codon usage patterns. They are all avian polyomaviruses belonging to group A, and host organisms are wild birds and Pygoscelis adeliae (Adélie penguin) (a)

Selection pressure

The dN/dS ratio was used to estimate the natural selection pressure acting on the LT-Ag gene. The average dN/dS values for the DnaJ and helicase domains in the overall population and in each Group (Groups A, H, M, and P for DnaJ; Groups A, F, H, M, and P for helicase) were less than 1, showing that these two functional regions experience negative selection pressure (Table 5). Similarly, negative selection pressure was estimated for LT-Ag sequence pairs within Groups A, F, M, and P, ranging from 0.282 to 0.684, while the values within the overall population and Group H exceeded 1, which suggests that human PyVs have evolved by positive selection.

Discussion

In this study, we compared the nucleotide sequences encoding all PyV-encoded LT-Ag that have been classified so far and their major domains. Of the various virus species used for analysis, avian PyVs differed significantly from mammalian PyVs in terms of nucleotide composition, ENC value, and codon usage patterns. Avian PyVs are known to cause acute and chronic diseases in various bird species (Table 3). In particular, PyV disease [19-22], which is caused by BFDV and FPyV (finch PyV) infection, and hemorrhagic nephritis and enteritis [23], which is caused by GHPV infection, are inflammatory diseases that cause high mortality in young avians. The high virulence of these avian PyVs contrasts with mammalian PyVs, which generally cause harmless, persistent infection in natural hosts with healthy immune systems. Mammalian PyVs, such as SV40, are known to induce tumors in nonpermissive host rodents after inoculation [89], which is rarely seen in avian PyV-infected birds. In general, the avian PyV’s infectious nature, destroying numerous cells in the infected organism, is considered to cause serious diseases. The cause of significant cell damage by these viruses has not yet been elucidated. However, while avian PyV infection in chicken embryonic fibroblasts causes remarkable cell damage by induction of apoptosis, SV40 infection of Vero cells mainly causes necrosis. Thus, the induction of necrosis by avian PyVs is thought to contribute to virulence through the efficient release of virus progeny and spread across the entire organism [58]. The differences in the virulences of viruses may reflect differences in the biochemical functions of LT-Ag, which were also confirmed by the genetic and evolutionary differences observed in the LT-Ag gene and domains of PyVs isolated from various hosts, based on the sequence analysis performed in this study.

Conclusions

One possible explanation for the presence or absence of specific domains or sequence motifs in the LT-Ag of various PyV species, and thus the mutations and evolutionary differences observed in these functional and structural regions, is that PyVs have evolved so that each viral protein interacts with host cell targets, and they have adapted to thrive in particular host species and cell types. They are known to interact specifically with host proteins involved in cell proliferation and gene expression regulation, have a significant association with the functional domains of LT-Ag, and vary with respect to size and composition in various virus species. Thus, even though various PyV species adopt a common survival strategy, some viral LT-Ags can target new host systems or cell types. Furthermore, the domains of LT-Ag may appear to be widely conserved, but, as indicated by the genetic and evolutionary differences observed in this study, the host function regulation mechanism of LT-Ag varies with the host species. These differences can be used to study virus–host interactions, cellular pathways, mechanisms of tumorigenesis by viral infection, and treatments for new infectious diseases. As new PyVs continue to be found in various organisms, it is necessary to conduct further studies on the mechanisms involved in host-specific toxic manifestations of PyVs, host system regulation, and cell transformation.
  88 in total

1.  DnaSP v5: a software for comprehensive analysis of DNA polymorphism data.

Authors:  P Librado; J Rozas
Journal:  Bioinformatics       Date:  2009-04-03       Impact factor: 6.937

Review 2.  Evolution of the papillomaviridae.

Authors:  Koenraad Van Doorslaer
Journal:  Virology       Date:  2013-06-14       Impact factor: 3.616

3.  Virus-associated skin tumors of the Syrian hamster: preliminary note.

Authors:  A Graffi; T Schramm; I Graffi; D Bierwolf; E Bender
Journal:  J Natl Cancer Inst       Date:  1968-04       Impact factor: 13.506

4.  Identification of a novel polyomavirus in a pancreatic transplant recipient with retinal blindness and vasculitic myopathy.

Authors:  Nischay Mishra; Marcus Pereira; Roy H Rhodes; Ping An; James M Pipas; Komal Jain; Amit Kapoor; Thomas Briese; Phyllis L Faust; W Ian Lipkin
Journal:  J Infect Dis       Date:  2014-05-01       Impact factor: 5.226

5.  Avian polymavirus in wild birds: genome analysis of isolates from Falconiformes and Psittaciformes.

Authors:  R Johne; H Müller
Journal:  Arch Virol       Date:  1998       Impact factor: 2.574

6.  Nucleotide sequence and genome organization of the murine polyomavirus, Kilham strain.

Authors:  M Mayer; K Dörries
Journal:  Virology       Date:  1991-04       Impact factor: 3.616

7.  Complete genome sequence of a polyomavirus isolated from horses.

Authors:  Randall W Renshaw; Annabel G Wise; Roger K Maes; Edward J Dubovi
Journal:  J Virol       Date:  2012-08       Impact factor: 5.103

8.  Identification of MW polyomavirus, a novel polyomavirus in human stool.

Authors:  Erica A Siebrasse; Alejandro Reyes; Efrem S Lim; Guoyan Zhao; Rajhab S Mkakosya; Mark J Manary; Jeffrey I Gordon; David Wang
Journal:  J Virol       Date:  2012-06-27       Impact factor: 5.103

9.  Complete Genome Sequence of a Variant Pyrrhula pyrrhula polyomavirus 1 Strain Isolated from White-Headed Munia (Lonchura maja).

Authors:  Szilvia Marton; Károly Erdélyi; Ádám Dán; Krisztián Bányai; Enikő Fehér
Journal:  Genome Announc       Date:  2016-12-01

10.  Complete Sequence of the Smallest Polyomavirus Genome, Giant Guitarfish (Rhynchobatus djiddensis) Polyomavirus 1.

Authors:  Jennifer A Dill; Terry F F Ng; Alvin C Camus
Journal:  Genome Announc       Date:  2016-05-19
View more
  4 in total

1.  Analysis of protein determinants of host-specific infection properties of polyomaviruses using machine learning.

Authors:  Myeongji Cho; Hayeon Kim; Hyeon S Son
Journal:  Genes Genomics       Date:  2021-03-01       Impact factor: 1.839

2.  Comparative Plastome Analysis of Three Amaryllidaceae Subfamilies: Insights into Variation of Genome Characteristics, Phylogeny, and Adaptive Evolution.

Authors:  Rui-Yu Cheng; Deng-Feng Xie; Xiang-Yi Zhang; Xiao Fu; Xing-Jin He; Song-Dong Zhou
Journal:  Biomed Res Int       Date:  2022-03-24       Impact factor: 3.411

3.  Human genes with relative synonymous codon usage analogous to that of polyomaviruses are involved in the mechanism of polyomavirus nephropathy.

Authors:  Yu Fan; Duan Guo; Shangping Zhao; Qiang Wei; Yi Li; Tao Lin
Journal:  Front Cell Infect Microbiol       Date:  2022-09-08       Impact factor: 6.073

Review 4.  Pathogenicity of Avian Polyomaviruses and Prospect of Vaccine Development.

Authors:  Chen-Wei Wang; Yung-Liang Chen; Simon J T Mao; Tzu-Chieh Lin; Ching-Wen Wu; Duangsuda Thongchan; Chi-Young Wang; Hung-Yi Wu
Journal:  Viruses       Date:  2022-09-19       Impact factor: 5.818

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.