| Literature DB >> 29142225 |
Jiaying Deng1,2, Hu Chen3,4, Daizhan Zhou5,6, Junhua Zhang1,2, Yun Chen1,2, Qi Liu1,2, Dashan Ai1,2, Hanting Zhu1,2, Li Chu1,2, Wenjia Ren1,2, Xiaofei Zhang1,2, Yi Xia1,2, Menghong Sun2,7, Huiwen Zhang8, Jun Li4, Xinxin Peng4, Liang Li9, Leng Han8, Hui Lin5,6, Xiujun Cai5,6, Jiaqing Xiang10, Shufeng Chen10, Yihua Sun10, Yawei Zhang10, Jie Zhang10, Haiquan Chen10, Shijian Zhang11, Yi Zhao11, Yun Liu12, Han Liang13,14,15, Kuaile Zhao16,17.
Abstract
Esophageal squamous cell carcinoma is a major histological type of esophageal cancer, with distinct incidence and survival patterns among races. Although previous studies have characterized somatic mutations in this disease, a rigorous comparison between different patient populations has not been conducted. Here we sequence the samples of 316 Chinese patients, combine them with those from The Cancer Genome Atlas, and perform a comparative analysis between Asian and Caucasian patients. We find that mutated CSMD3 is associated with better prognosis in Asian patients. Applying a robust computational strategy that adjusts for both technical and biological confounding factors, we find that TP53, EP300, and NFE2L2 show higher mutational frequencies in Asian patients. Moreover, NFE2L2 mutations correlate with the allele status of a nearby high-Fst SNP, suggesting their potential interaction. Our study provides insights into the molecular basis underlying the striking racial disparities of this disease, and represents a general computational framework for such a cross-population comparison.Entities:
Mesh:
Year: 2017 PMID: 29142225 PMCID: PMC5688099 DOI: 10.1038/s41467-017-01730-x
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Schematic representation of the analytic strategy. a ESCC whole-exome sequencing data of three patient cohorts, Caucasian, Vietnamese and Chinese, were respectively obtained from this study and TCGA. Our strategy includes two major steps to remove confounders. To remove technical confounders, we applied the same procedure to process sequencing reads generated from the Hi-seq sequencing platform. We then performed downsampling to balance the depth of coverage among the three cohorts, followed by a stringent method to call somatic single-nucleotide mutations using multiple mutational callers. Second, to remove biological confounders, we calculated propensity scores, reweighted samples in the cohorts, and compared gene mutation frequencies between two balanced cohorts. We considered five biological factors (age at diagnosis, gender, tumor stage, smoking history, and alcohol consumption history) in the propensity score adjustment. b Hierarchical clustering pattern of patient samples by common SNP status in the exonic regions. Asian patients and Caucasian patients form two distinct clusters
Characteristics of patient cohorts surveyed in this study
| Clinical factors | TCGA | This study | ||
|---|---|---|---|---|
| WES | WES | Additional targeted sequencing | ||
| Caucasian | Vietnamese | Chinese | Chinese | |
|
| ||||
| Male | 29 | 39 | 65 | 215 |
| Female | 10 | 2 | 13 | 23 |
|
| ||||
| 30–40 | 0 | 2 | 0 | 1 |
| 40–50 | 6 | 11 | 11 | 27 |
| 50–60 | 19 | 15 | 35 | 106 |
| 60–70 | 9 | 9 | 27 | 100 |
| 70–80 | 4 | 4 | 5 | 4 |
| 80–90 | 4 | 0 | 0 | 0 |
|
| ||||
| I | 6 | 0 | 0 | 12 |
| II | 17 | 31 | 63 | 118 |
| III | 14 | 9 | 15 | 108 |
| IV | 2 | 1 | 0 | 0 |
|
| ||||
| Smoked/smoking | 25 | 22 | 46 | 160 |
| Never | 11 | 19 | 32 | 78 |
| Unknown | 3 | 0 | 0 | 0 |
|
| ||||
| Yes | 25 | 30 | 36 | 105 |
| No | 12 | 11 | 42 | 133 |
| Unknown | 2 | 0 | 0 | 0 |
Fig. 2Significantly mutated genes in ESCC. a Significantly mutated genes (SMGs) identified by MutSigCV on a combined cohort of Caucasian, Vietnamese, and Chinese WES samples. Each column denotes an ESCC patient, and each row is a gene. On top is the number of somatic mutations per sample. On the left are the mutation frequencies of each SMG. The bar plot on the right shows the composition of mutations in the gene. Genes are ordered by their mutation frequencies. b Overlap of SMGs reported by five studies. c, d Kaplan–Meier curves according to the mutational status of CSMD3 gene in c 78 Chinese WES cases and d other 354 Asian cases (consisting of 41 Vietnamese WES cases and 313 Chinese targeted sequencing cases). Mutated groups show significantly better overall survival outcomes (log-rank test)
Fig. 3Comparison of mutational patterns of different ESCC patient populations. a Distribution of depth of coverage of exons captured in WES. Chinese tumors and matched normal tissues had higher depth of coverage than those of TCGA ESCC cases before downsampling. After downsampling, the three patient cohorts had similar distributions of depth of coverage. b The number of somatic mutations for each patient cohort before and after downsampling. The error bars and numbers on the top of the bars were calculated based on mutation calls from 10 downsampling iterations. c Box plots showing the distribution of the number of somatic mutations in each sample in the three cohorts before and after downsampling. The center lines in the boxes are the median numbers of somatic mutations for different patient cohorts, while the upper and lower hinges are the 25th and 75th percentiles. Whiskers above and below the boxes indicate 1.5 times interquartile range. Individual points are those outside of the range. d Mutational signature of each cohort after downsampling
Fig. 4Race-biased genes identified by the propensity score algorithm. a Distribution of five biological factors (age at diagnosis, gender, tumor stage, smoking history, and alcohol consumption history) in three patient groups, justifying a need to balance these confounders. Analysis of variance test was used to calculate the P-value for age at diagnosis, and chi-squared test was used for the other factors. b Mutational landscape of six race-biased genes. Genes were ordered by mutational frequencies and samples were grouped by race groups. c Mutational frequencies of EP300, NFE2L2, and TP53 in Chinese patients with WES, with targeted sequencing with matched WES and with additional targeted sequencing data. d Mutual exclusivity of TP53, NFE2L2, and EP300. On the top are based on the WES data of combined Chinese and Vietnamese patients, and on the bottom are the targeted sequencing data for 313 Chinese patients. P-values were calculated by CoMEt algorithm
Fig. 5Correlation of NFE2L2 mutations with a nearby high-Fst SNP in Asian patients. a Mutational distribution on NFE2L2 from all the WES samples. There are two major mutational hotspots. b Fst-index values of the SNPs on the exonic regions of NFE2L2, among which rs113671272, located in 5′UTR, shows the highest Fst index in the comparison between southern Han Chinese and European. c The gene regulation tracks from the UCSC genome browser show that rs113671272 is located within a region with high-density regulatory binding sites and high conservation scores. d The effect of the SNP rs113671272 on the mRNA expression level of NFE2L2 across 12 TCGA cancer types. After excluding tumor samples with somatic mutations in NFE2L2, the cancer types with at least three samples in the SNP-containing group were included in the analysis. The expression levels between the sample groups (with or without the SNP) across cancer types were compared with paired Wilcoxon signed-rank test. e Mutual exclusivity pattern of rs113671272 SNP and the somatic mutation status in the Asian WES samples with sufficient coverage. P-value was calculated by CoMEt algorithm