| Literature DB >> 34835120 |
Kazuo Nakamichi1, Toshio Shimokawa2.
Abstract
JC virus (JCV), as an archetype, establishes a lifelong latent or persistent infection in many healthy individuals. In immunocompromised patients, prototype JCV with variable mutations in the non-coding control region (NCCR) causes progressive multifocal leukoencephalopathy (PML), a severe demyelinating disease. This study was conducted to create a database of NCCR sequences annotated with transcription factor binding sites (TFBSs) and statistically analyze the mutational pattern of the JCV NCCR. JCV NCCRs were extracted from >1000 sequences registered in GenBank, and TFBSs within each NCCR were identified by computer simulation, followed by examination of their prevalence, multiplicity, and location by statistical analyses. In the NCCRs of the prototype JCV, the limited types of TFBSs, which are mainly present in regions D through F of archetype JCV, were significantly reduced. By contrast, modeling count data revealed that several TFBSs located in regions C and E tended to overlap in the prototype NCCRs. Based on data from the BioGPS database, genes encoding transcription factors that bind to these TFBSs were expressed not only in the brain but also in the peripheral sites. The database and NCCR patterns obtained in this study could be a suitable platform for analyzing JCV mutations and pathogenicity.Entities:
Keywords: JC virus; database; mutational pattern; non-coding control region; statistical analysis; transcription factor binding sites
Mesh:
Substances:
Year: 2021 PMID: 34835120 PMCID: PMC8620444 DOI: 10.3390/v13112314
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Overall workflow of the data processing and analysis of transcription factor binding sites (TFBSs) in the non-coding control region (NCCR) of JC virus (JCV). The nucleotide sequences (seq) of JCV NCCRs in GenBank were extracted and aligned, and their origins were confirmed. TFBSs in the NCCRs of JCV isolates from the urine of healthy individuals and cerebrospinal fluid (CSF) of PML patients were identified using computer simulation. A database was created based on the NCCR sequences and TFBS annotations, and the patterns and metadata of TFBSs were examined using statistical analysis and public databases.
Figure 2Representative example showing the type and location of TFBSs within the JCV NCCR. TFBS matrices in the NCCR sequences of 140 JCV isolates were identified using MatInspector. The data were imported into the CLC Genomics Workbench as FASTA format files, and the types and locations of TFBS matrices in each JCV isolate were annotated to NCCR sequences. Data represent all TFBS matrices identified only for the CY strain, a representative archetype JCV. The TFBS matrices (right, labeled “Forward”) were identified in the 5′ and 3′ nucleotide positions (1–267), whereas those to the left (labeled “Reverse”) were present in the complementary strand. The horizontal boxes (bottom) indicate the location and base number of regions A through F within the NCCR of the CY strain of JCV.
Possession rates of TFBSs within the NCCRs of JCVs in the urine of healthy individuals and CSF of PML patients.
| Urine of Healthy Individuals | CSF of PML Patients | |||||
|---|---|---|---|---|---|---|
| Matrix Name | DNA Strand a | JCV Isolates with Matrix b | Possession Rate (%) c | JCV Isolates with Matrix | Possession Rate (%) | |
| V$MIF1.01 | FWD | 47 | 95.9 | 19 | 20.9 | <0.001 |
| V$IRF1.01 | FWD | 47 | 95.9 | 20 | 22.0 | <0.001 |
| V$HOXC10.01 | FWD | 48 | 98.0 | 27 | 29.7 | <0.001 |
| V$PLAGL1.01 | FWD | 48 | 98.0 | 30 | 33.0 | <0.001 |
| V$PDX1.01 | FWD | 48 | 98.0 | 33 | 36.3 | <0.001 |
| V$CRX.01 | FWD | 48 | 98.0 | 34 | 37.4 | <0.001 |
| V$NKX61.01 | FWD | 48 | 98.0 | 34 | 37.4 | <0.001 |
| V$BRN5.03 | FWD | 48 | 98.0 | 36 | 39.6 | <0.001 |
| V$ZBTB3.01 | FWD | 47 | 95.9 | 59 | 64.8 | <0.001 |
| V$FOXJ3.01 | FWD | 48 | 98.0 | 62 | 68.1 | <0.001 |
| V$SPI1.02 | FWD | 49 | 100 | 80 | 87.9 | 0.008 |
| V$IRF7.01 | FWD | 49 | 100 | 81 | 89.0 | 0.015 |
| V$MZF1.02 | FWD | 49 | 100 | 81 | 89.0 | 0.015 |
| V$CMYB.02 | REV | 47 | 95.9 | 23 | 25.3 | <0.001 |
| V$ROAZ.01 | REV | 48 | 98.0 | 28 | 30.8 | <0.001 |
| V$PLAGL1.01 | REV | 48 | 98.0 | 30 | 33.0 | <0.001 |
| V$ZNF232.01 | REV | 48 | 98.0 | 32 | 35.2 | <0.001 |
| V$ZBTB3.01 | REV | 47 | 95.9 | 32 | 35.2 | <0.001 |
| V$HOXC9.01 | REV | 48 | 98.0 | 33 | 36.3 | <0.001 |
| V$MEIS1.03 | REV | 48 | 98.0 | 33 | 36.3 | <0.001 |
| V$SMARCA3.02 | REV | 48 | 98.0 | 55 | 60.4 | <0.001 |
| V$PRDM4.01 | REV | 49 | 100 | 80 | 87.9 | 0.008 |
Abbreviations: CSF, cerebrospinal fluid; FWD, forward; JCV, JC virus; NCCR, non-coding control region; PML, progressive multifocal leukoencephalopathy; REV, reverse; TFBS, transcription factor binding site. a The direction of the DNA strand is mentioned in the Figure 2 legend. b The number of JCV isolates with the respective matrices for TFBSs in the NCCR. c The proportion of JCV isolates that possessed the respective matrices in each group. d The possession rates of 52 matrices between the two groups were analyzed using Fisher’s exact test and the Benjamini–Hochberg method. Matrices with statistically significant differences are shown.
Figure 3Types and locations of TFBSs lost in rearranged NCCR sequences of prototype JCV. TFBS matrices showing statistically significant decreases in possession rate in the NCCRs of prototype JCV isolates and shown in the sequence of the archetype virus (CY strain). The method of illustrating the TFBS matrices is the same as that shown in Figure 1.
Profiles of TFBSs that are reduced in the NCCRs of JCVs in the CSF of PML patients.
| Transcription Factor a | |||||
|---|---|---|---|---|---|
| Matrix Name | DNA Strand b | HGNC ID | Symbol | Full Name | Gene Expression |
| V$MIF1.01 d | FWD | 4921/9982 | HIVEP2/ | HIVEP zinc finger 2/ | Brain (cerebrum), Blood (T cells)/ |
| V$IRF1.01 | FWD | 6116 | IRF1 | Interferon regulatory factor 1 | Blood (Leukocytes), Bone marrow (CD34+ cells), Colon, Heart, Lung, Lymph node, Placenta, Small intestine, Thymus |
| V$HOXC10.01 | FWD | 5122 | HOXC10 | Homeobox C10 | Kidney |
| V$PLAGL1.01 | FWD | 9046 | PLAGL1 | PLAG1-like zinc finger 1 | Adrenal gland, Bone marrow (CD34+ cells), Colon, Pituitary gland, Placenta, Prostate, Retina, Small intestine, Smooth muscle, Uterus |
| V$PDX1.01 | FWD | 6107 | PDX1 | Pancreatic and duodenal homeobox 1 | NA (Ubiquitous) |
| V$CRX.01 | FWD | 2383 | CRX | Cone-rod homeobox | Pineal gland, Retina |
| V$NKX61.01 | FWD | 7839 | NKX6-1 | NK6 homeobox 1 | NA (Ubiquitous) |
| V$BRN5.03 | FWD | 9224 | POU6F1 | POU class 6 homeobox 1 | NA (Ubiquitous) |
| V$ZBTB3.01 | FWD | 22918 | ZBTB3 | Zinc finger and BTB domain containing 3 | NA (Ubiquitous) |
| V$FOXJ3.01 | FWD | 29178 | FOXJ3 | Forkhead box J3 | NA (Ubiquitous) |
| V$SPI1.02 | FWD | 11241 | SPI1 | Spi-1 proto-oncogene | Blood (Monocytes), Lung |
| V$IRF7.01 | FWD | 6122 | IRF7 | Interferon regulatory factor 7 | Blood (Leukocytes), Bone marrow (CD34+ cells), Heart, Lung, Lymph node, Thymus, Tonsil |
| V$MZF1.02 | FWD | 13108 | MZF1 | Myeloid zinc finger 1 | Blood (leukocytes), Blood vessel (endothelial cells), Bone marrow (CD34+ cells), Pineal gland, Prostate, Thyroid gland |
| V$CMYB.02 | REV | 7545 | MYB | MYB proto-oncogene, transcription factor | Blood vessel (endothelial cells), Bone marrow (CD34+ cells), Thymus |
| V$ROAZ.01 | REV | 16762 | ZNF423 | Zinc finger protein 423 | Brain (whole), Pineal gland, Retina, Small intestine, Uterus |
| V$ZNF232.01 | REV | 13026 | ZNF232 | Zinc finger protein 232 | NA (Ubiquitous) |
| V$HOXC9.01 | REV | 5130 | HOXC9 | Homeobox C9 | NA (Ubiquitous) |
| V$MEIS1.03 | REV | 7000 | MEIS1 | Meis homeobox 1 | Adrenal gland, Bone marrow (CD34+ cells), Brain (cerebellum), Colon, Ovary, Salivary gland, Small intestine, Smooth muscle, Trachea, Uterus |
| V$SMARCA3.02 | REV | 11099 | HLTF | Helicase like transcription factor | Blood (T cells and NK cells), Blood vessel (endothelial cells), Bone marrow (CD34+ cells), Pineal gland, Pituitary gland, Thyroid gland |
| V$PRDM4.01 | REV | 9348 | PRDM4 | PR/SET domain 4 | Blood (B cells), Bone marrow (CD34+ cells), Pineal gland |
Abbreviations: CD, cluster of differentiation; FWD, forward; HGNC, Human Genome Organization Gene Nomenclature Committee; ID, identification; JCV, JC virus; NA, not applicable; NCCR, non-coding control region; PML, progressive multifocal leukoencephalopathy; REV, reverse; TFBS, transcription factor binding site. a The gene ontology of transcription factors predicted to bind each sequence was confirmed using the HGNC database in accordance with the metadata of the matrices. b The direction of the DNA strands is mentioned in the Figure 2 legend. c Gene-expression profiles of transcription factors in human tissues and blood were obtained using BioGPS microarray data, and the sites with 3-fold higher expression levels relative to the median are indicated. d This matrix is defined as the sequence targeted by the HIVEP2–RFX1 complex.
Multiplicity of TFBSs within the NCCRs of JCVs in the urine of healthy individuals and CSF of PML patients.
| Poisson Mean (95% CI) a | ||||||
|---|---|---|---|---|---|---|
| Matrix Name | DNA Strand b | Urine of Healthy Individuals | CSF of PML Patients | |||
| V$HIC1.01 | FWD | 1.04 | [0.75, 1.33] | 1.89 | [1.59, 2.18] | <0.001 |
| V$NF1.03 | FWD | 1.06 | [0.77, 1.35] | 1.93 | [1.63, 2.23] | <0.001 |
| V$NFY.03 | FWD | 2.00 | [1.60, 2.40] | 3.19 | [2.81, 3.56] | <0.001 |
| V$SOX6.01 | FWD | 1.02 | [0.73, 1.31] | 1.66 | [1.39, 1.94] | 0.001 |
| V$LEF1.01 | FWD | 1.02 | [0.73, 1.31] | 1.63 | [1.36, 1.90] | 0.002 |
| V$PAX9.02 | REV | 1.04 | [0.75, 1.34] | 1.89 | [1.59, 2.18] | <0.001 |
| V$PAX6.01 | REV | 1.04 | [0.75, 1.34] | 1.88 | [1.58, 2.17] | <0.001 |
Abbreviations: CI, confidence interval; CSF, cerebrospinal fluid; FWD, forward; JCV, JC virus; NCCR, non-coding control region; PML, progressive multifocal leukoencephalopathy; REV, reverse; TFBS, transcription factor binding site. a The number of each matrix within the NCCRs of JCV isolates (multiplicity) was predicted by using the Poisson distribution for modeling count data. b The direction of the DNA strands is mentioned in the Figure 2 legend. c P-values were adjusted for multiple testing using the Benjamini–Hochberg method, and the matrices with statistically significant differences are shown.
Figure 4Types and locations of TFBSs tend to overlap in rearranged NCCR sequences of prototype JCV. TFBS matrices with statistically higher multiplicity in the NCCR of the prototype JCV as compared with those of the archetype JCV were determined by Poisson distribution for modeling count data. The locations of these matrices are depicted in the NCCR sequence of archetype JCV (CY strain).
Profiles of TFBSs likely to multiply in the NCCRs of JCVs in the CSF of PML patients.
| Transcription Factor a | |||||
|---|---|---|---|---|---|
| Matrix Name | DNA Strand b | HGNC ID | Symbol | Full Name | Gene Expression |
| V$HIC1.01 | FWD | 4909 | HIC1 | HIC ZBTB transcriptional repressor 1 | NA (Ubiquitous) |
| V$NF1.03 | FWD | 7784 | NFIA | Nuclear factor I A | NA (Ubiquitous) |
| V$NF1.03 | FWD | 7785 | NFIB | Nuclear factor I B | Brain (cerebrum, cerebellum, olfactory bulb), Colon, Ovary, Pancreatic islet, Prostate, Retina, Salivary gland, Skin, Small intestine, Smooth muscle, Tongue, Trachea, Uterus |
| V$NF1.03 | FWD | 7786 | NFIC | Nuclear factor I C | Skeletal muscle |
| V$NF1.03 | FWD | 7788 | NFIX | Nuclear factor I X | NA (Ubiquitous) |
| V$NFY.03 | FWD | 7804 | NFYA | Nuclear transcription factor Y subunit alpha | NA (Ubiquitous) |
| V$SOX6.01 | FWD | 16421 | SOX6 | SRY-box transcription factor 6 | NA d |
| V$LEF1.01 | FWD | 6551 | LEF1 | Lymphoid enhancer binding factor 1 | Blood (T cells), Thymus |
| V$PAX9.02 | REV | 8623 | PAX9 | Paired box 9 | NA (Ubiquitous) |
| V$PAX6.01 | REV | 8620 | PAX6 | Paired box 6 | Brain (cerebrum, cerebellum), Pancreatic islet, Pineal gland, Retina, Skeletal muscle |
Abbreviations: CSF, cerebrospinal fluid; FWD, forward; HGNC, Human Genome Organization (HUGO) Gene Nomenclature Committee; ID, identification; JCV, JC virus; NA, not applicable; NCCR, non-coding control region; PML, progressive multifocal leukoencephalopathy; REV, reverse; TFBS, transcription factor binding site. a The gene ontology of transcription factors predicted to bind to each sequence was confirmed using the HGNC database according the metadata of the matrices. b The direction of the DNA strands is mentioned in the Figure 2 legend. c Gene-expression profiles of transcription factors in human tissues and blood were obtained using BioGPS microarray data, and the sites with 3-fold higher expression levels relative to the median are indicated. d The gene-expression profile for SOX6 was not included in the dataset.