| Literature DB >> 31209392 |
Miao Xu1,2, Youyuan Yao1,3, Hui Chen4, Shanshan Zhang1, Su-Mei Cao1, Zhe Zhang5, Bing Luo6, Zhiwei Liu7, Zilin Li2, Tong Xiang1, Guiping He1, Qi-Sheng Feng1, Li-Zhen Chen1, Xiang Guo1,8, Wei-Hua Jia1, Ming-Yuan Chen1, Xiao Zhang1, Shang-Hang Xie1, Roujun Peng1, Ellen T Chang9,10, Vincent Pedergnana4, Lin Feng1, Jin-Xin Bei1, Rui-Hua Xu1, Mu-Sheng Zeng1, Weimin Ye7, Hans-Olov Adami7,11, Xihong Lin2, Weiwei Zhai12,13,14, Yi-Xin Zeng15, Jianjun Liu16,17.
Abstract
Epstein-Barr virus (EBV) infection is ubiquitous worldwide and is associated with multiple cancers, including nasopharyngeal carcinoma (NPC). The importance of EBV viral genomic variation in NPC development and its striking epidemic in southern China has been poorly explored. Through large-scale genome sequencing of 270 EBV isolates and two-stage association study of EBV isolates from China, we identify two non-synonymous EBV variants within BALF2 that are strongly associated with the risk of NPC (odds ratio (OR) = 8.69, P = 9.69 × 10-25 for SNP 162476_C; OR = 6.14, P = 2.40 × 10-32 for SNP 163364_T). The cumulative effects of these variants contribute to 83% of the overall risk of NPC in southern China. Phylogenetic analysis of the risk variants reveals a unique origin in Asia, followed by clonal expansion in NPC-endemic regions. Our results provide novel insights into the NPC endemic in southern China and also enable the identification of high-risk individuals for NPC prevention.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31209392 PMCID: PMC6610787 DOI: 10.1038/s41588-019-0436-5
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Figure 1Principal component and phylogenetic analyses of EBV genomes.
(a) Principal component analysis of 270 EBV isolates sequenced in the current study and 97 published isolates. The first two principal-component scores (PC1 and PC2) are plotted. Explaining 26.9% of the total genomic variance, PC1 discriminates between East Asian and Western/African strains; PC2 explains 15.3% of the total variance. (b) Phylogeny of 230 EBV single strains sequenced in the current study and 97 published strains. Macacine herpesvirus 4 genome sequence (NC_006146) was used as the outgroup to root the tree. Type 1 and Type 2 EBV lineages are indicated. The red dot on the phylogeny indicates the lineage of the NPC-dominant EBV strains, where 22 of 37 strains from healthy controls in NPC-endemic regions in southern China were located. Dashed lines in (a) and (b) indicate the separation between East Asian and Western/African strains. (c) Geographical origins and phenotypes of samples from which EBV strains were sequenced are shown with colors as indicated. (d) The normalized values of PC1 and PC2 scores are shown from blue to red. (e) The genotypes of SNPs 162215C>A, 162476T>C, and 163364C>T in each isolate.
Figure 2Genome-wide association analysis of EBV variants in 156 NPC cases and 47 controls.
(a) Manhattan plot of genome-wide P values from the association analysis using a generalized-linear mixed model. The −log10-transformed P values (y axis) of 1545 variants in 156 NPC cases and 47 controls are presented according to their positions in the EBV genome. The minimum P value (SNP 162507C>T) is 9.17×10−5. The red line is the suggestive genome-wide significance P value threshold of 4.07×10−4. The three SNPs 162507C>T, 162852G>T and 162215C>A reaching genome-wide significance are labeled as green. (b) The regional plot of the posterior probabilities of association. The EBV genome was partitioned into overlapping 20-variant bins with 10-variant overlap between adjacent bins. The sum of the posterior probabilities for variants was assigned to each region. The one region from position 160971 to 163629 with strong evidence (> 0.85) for association with NPC risk is shown in green. (c) Schematic of EBV genes. Repetitive regions in EBV genomes are masked by light blue.
The association of three non-synonymous SNPs in BALF2 gene and the risk for NPC
| SNP | High-risk genotype | Discovery | Validation | Combined | Odds ratio | 95% CI | Annotation | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 156 cases | 47 controls | 483 cases | 605 controls | 639 cases | 652 controls | 163364 | 162476 | ||||||||
| 162215C>A | C | 96.15% | 65.96% | 3.22×10−04 | 95.03% | 74.71% | 9.92×10−16 | 95.31% | 74.08% | 1.42×10−18 | 7.60 | 4.97–11.62 | 7.78×10−05 | 1.94×10−01 | |
| 162476T>C | C | 93.59% | 61.70% | 5.09×10−03 | 94.00% | 65.12% | 1.94×10−23 | 93.90% | 64.88% | 9.69×10−25 | 8.69 | 5.79–13.03 | 1.10×10−06 | ||
| 163364C>T | T | 88.46% | 48.94% | 7.95×10−03 | 83.85% | 45.45% | 6.92×10−32 | 84.98% | 45.71% | 2.40×10−32 | 6.14 | 4.59–8.22 | 4.84×10−11 | ||
The association of three EBV SNPs with NPC risk was tested in discovery and validation samples and with a meta-analysis of the combined discovery and validation samples. Frequencies of high-risk genotypes in discovery, validation and combined analyses are indicated. Odds ratios conferred by high-risk genotypes and the 95% confidence intervals (CI) were estimated from the meta-analysis of the combined discovery and validation phases. Conditional regression analyses were performed in combined samples, and P values of SNP associations in conditional analyses are listed.
EBV haplotypes composed of SNPs 162215C>A, 162476T>C and 163364C>T and the risk for NPC
| EBV subtype (162215–162476–163364) | 639 cases | 652 controls | Odds ratio | 95% CI | |||
|---|---|---|---|---|---|---|---|
| no. | % | no. | % | ||||
| L-L-L (A-T-C) | 25 | 3.91% | 171 | 26.23% | - | - | |
| H-H-H (C-C-T) | 539 | 84.35% | 293 | 44.94% | 11.71 | 7.44–19.26 | 2.39×10−24 |
| H-H-L (C-C-C) | 57 | 8.92% | 118 | 18.10% | 3.50 | 2.02–6.24 | 1.22×10−05 |
| H-L-L (C-T-C) | 13 | 2.03% | 65 | 9.97% | 1.12 | 0.47–2.50 | 7.83×10−01 |
| other subtypes | 5 | 0.78% | 5 | 0.77% | 4.26 | 0.80–19.63 | 6.71×10−02 |
Odds ratios of individual EBV subtypes and 95% confidence intervals (CI) were estimated with a logistic model by categorizing each subtype as a single variable and adjusting for age, sex, the status of single- or multiple-infection and human GWAS SNPs (rs2860580 and rs2894207) in the combined discovery and validation data sets. Subjects with EBV subtype A-T-C, a common low-risk subtype, were used as the reference category. H represents the high-risk genotype; L represents the low-risk genotype.