| Literature DB >> 34964846 |
Xiaoshun Shi1,2, Ruidong Li3, Jianxue Zhai1, Allen Menglin Chen4, Kailing Huang4, Zhouxia Zheng4, Zhuona Chen4, Xiaoyin Dong1, Xiguang Liu1, Di Lu1, Siyang Feng1, Dingwei Diao1, Pengfei Ren1, Zhaoguo Liu1, Grant Morahan2, Kaican Cai1.
Abstract
Pathogenic germline variants in cancer-associated genes are risk factors for cancer predisposition. However, systematic mining and summarizing of cancer pathogenic or likely pathogenic variants has not been performed for people of East Asian descent. This study aimed to investigate publicly available data to identify germline variants in East Asian cancer cohorts and compare them to variants in Caucasian cancer cohorts. Based on the data we retrieved, we built a comprehensive database, named COGVIC (Catalog of Germline Variants in Cancer). A total of 233 variants in the East Asian population were identified. The majority (87%) of genes with cancer-associated variants were not shared between the East Asian and Caucasian cohorts. This included pathogenic variants in BRCA2. Our study summarized the prevalence of germline variants in East Asian cancer cohorts and provides an easy-to-use online tool to explore germline mutations related to cancer susceptibility. DATABASE URL: http://www.cogvic.vip/.Entities:
Mesh:
Year: 2021 PMID: 34964846 PMCID: PMC8730286 DOI: 10.1093/database/baab075
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.The COGVIC workflow of data selection and germline variant identification. (A) A total of 1 677 337 SRA entries with 17 545 study accessions were filtered out based on the inclusion criteria. (B) The pipeline of the COGVIC germline variant identifier.
The frequency of cancer pathogenic variants in the East Asian population
| Cancer | Mutation frequency | Mutation cases/number of samples |
|---|---|---|
| Esophageal squamous cell carcinoma | 9.1% | 42/464 |
| Gastric carcinoma | 11.1% | 26/234 |
| Nasopharyngeal carcinoma | 12.6% | 26/206 |
| Breast carcinoma | 16.8% | 32/190 |
| Colorectal carcinoma | 12.3% | 19/154 |
| Hepatocellular carcinoma | 14.9% | 22/148 |
| Bladder cancer | 3.8% | 5/131 |
| Cholangiocarcinoma | 10.3% | 13/126 |
| Clear cell renal cell carcinoma | 3.7% | 4/108 |
| Cervical cancer | 9.8% | 10/102 |
| Pancreatic cancer | 5.0% | 5/101 |
| Breast fibroepithelial tumors | 4.4% | 3/68 |
| Lung cancer | 9.4% | 5/53 |
| Oral squamous cell carcinoma | 14% | 7/50 |
| Lymphoma | 8% | 2/25 |
| Prostate cancer | 10% | 2/20 |
| Follicular thyroid carcinoma | 5.6% | 1/18 |
| Hepatoblastoma | 16.7% | 1/6 |
| Acute myeloid leukemia | 25% | 2/5 |
| Ovarian serous carcinoma | 25% | 1/4 |
| Cervical intraepithelial neoplasia | 2.0% | 1/51 |
| Esophageal precancerous lesion | 8.3% | 1/12 |
| Other | 2.4% | 3/125 |
Germline variants that might strongly increase the risk of cancer susceptibility in East Asian populations
| Gene | East_Asian_OR | 95% CI | World_OR | 95% CI |
|---|---|---|---|---|
| MCM2 | 22.7 | [2.7, 188.2] | 309.6 | [31.5, 3048.8] |
| ERBB3 | 11.3 | [1.2, 108.8] | 154.7 | [11.3, 2111.6] |
| CUL7 | 11.3 | [1.2, 108.8] | 77.3 | [15.1, 395.8] |
Figure 2.Genes with pathogenic variants identified in the COGVIC and TCGA cohorts. Distribution of the variants among 153 genes in both cohorts. The number of cases with pathogenic variants by gene in the TCGA database (green upper bars) is compared with those in the East Asian population (blue lower bars). Differences between the databases are indicated by asterisks above the gene name (when the number of TCGA cases is greater) or below the gene name (when the number of cases for the gene in the East Asian population is greater). The numbers in brackets beside the gene name are consistent with this system, i.e. asterisks above the number in the bracket indicate the number of TCGA cases, while asterisks below it indicate the number of cases in the East Asian population. The Venn diagram shows 64 genes with mutations unique to TCGA, 65 genes with mutations unique to the East Asian population and 24 genes with mutations detected in the two populations.
The function and distribution of pathogenic BRCA2 variants in the COGVIC and TCGA cohorts
| COGVIC and TCGA Asian cohort | TCGA Caucasian cohort | ||
|---|---|---|---|
| Exonic functional change | Number of BRCA2 variants | 23 | 63 |
| Synonymous SNV | 2 | ||
| Frameshift insertion | 1 | 5 | |
| Nonsynonymous SNV | 10 | 2 | |
| Unknown | 3 | 3 | |
| Stop gain | 11 | 12 | |
| Nonframeshift deletion | 1 | 1 | |
| Nonframeshift insertion | |||
| Frameshift deletion | 1 | 40 | |
| Frameshift substitution | 3 | ||
| Pathogenicity classification | Benign | ||
| Benign/likely benign | |||
| Likely benign | |||
| Conflicting interpretations of pathogenicity | 11 | 1 | |
| Uncertain significance | 6 | ||
| Pathogenic/likely pathogenic | 2 | ||
| Pathogenic | 9 | 61 | |
| Unknown | 4 | 1 | |
| Number of variants | |||
| Variants in BRCA2 exon | Exon 1 | ||
| Exon 2 | 2 | 1 | |
| Exon 3 | 1 | ||
| Exon 4 | |||
| Exon 5 | |||
| Exon 6 | |||
| Exon 7 | 1 | ||
| Exon 8 | 2 | ||
| Exon 9 | 1 | ||
| Exon 10 | 1 | 4 | |
| Exon 11 | 12 | 37 | |
| Exon 12 | 1 | ||
| Exon 13 | 3 | ||
| Exon 14 | 2 | 2 | |
| Exon 15 | 4 | 2 | |
| Exon 16 | 1 | 1 | |
| Exon 17 | 1 | ||
| Exon 18 | 1 | ||
| Exon 19 | |||
| Exon 20 | 1 | 1 | |
| Exon 21 | |||
| Exon 22 | |||
| Exon 23 | 2 | 2 | |
| Exon 24 | 1 | 1 | |
| Exon 25 | |||
| Exon 26 | |||
| Exon 27 | 4 | 1 | |
| Variants in BRCA2 splicing site | Exon 1-2 | 1 | |
| Exon 7-8 | 1 |
Figure 3.Distribution of identified pathogenic variants of BRCA2 around the world. (A) The proportion of BRCA2 variants in different exons in Asian cases from the COGVIC and TCGA cohorts. The colors represent different exons, e.g. exon 1 and exon 2. (B) The proportion of BRCA2 variants in different exons in the TCGA Caucasian cohort. The colors are consistent with the exons shown in A. (C) All variants placed on the BRCA2 gene map. The numbers indicate the exon numbers. The detected variants are indicated by arrows. Red arrows represent the East Asian population, whereas the green arrows represent the Caucasian population. There are two splicing mutations, one between exons 1 and 2 and one between exons 7–8.
Figure 4.The main functions of the COGVIC database. This figure shows examples of outputs to specific search queries. Search by gene: results of search by gene symbol; Search by chr: search by chromosome ID; Users can choose by clicking the chromosome ideogram graphic. Search by rsID: uses SNP rs# from dbSNP; Search by population: retrieves population-specific information mutation frequency (0: not Asian; other: Asian); Search by disease: search with disease names finds associated mutations.