| Literature DB >> 35551364 |
Sanket Desai, Rohit Mishra1, Suhail Ahmad1,2, Supriya Hait1,2, Asim Joshi1,2, Amit Dutt1,2.
Abstract
Cancer is a somatic disease. The lack of Indian-specific reference germline variation resources limits the ability to identify true cancer-associated somatic variants among Indian cancer patients. We integrate two recent studies, the GenomeAsia 100K and the Genomics for Public Health in India (IndiGen) program, describing genome sequence variations across 598 and 1029 healthy individuals of Indian origin, respectively, along with the unique variants generated from our in-house 173 normal germline samples derived from cancer patients to generate the Tata Memorial Centre-SNP database (TMC-SNPdb) 2.0. To show its utility, GATK/Mutect2-based somatic variant calling was performed on 224 in-house tumor samples to demonstrate a reduction in false-positive somatic variants. In addition to the ethnic-specific variants from GenomeAsia 100K and IndiGenomes databases, 305 132 unique variants generated from 173 in-house normal germline samples derived from cancer patients of Indian origin constitute the Indian specific, TMC-SNPdb 2.0. Of 305 132 unique variants, 11.13% were found in the coding region with missense variants (31.3%) as the most predominant category. Among the non-coding variations, intronic variants (49%) were the highest contributors. The non-synonymous to synonymous SNP ratio was observed to be 1.9, consistent with the previous version of TMC-SNPdb and literature. Using TMC SNPdb 2.0, we analyzed a whole-exome sequence from 224 in-house tumor samples (180 paired and 44 orphans). We show an average depletion of 3.44% variants per paired tumor and significantly higher depletion (P-value < 0.001) for orphan tumors (4.21%), demonstrating the utility of the rare, unique variants found in the ethnic-specific variant datasets in reducing the false-positive somatic mutations. TMC-SNPdb 2.0 is the most exhaustive open-source reference database of germline variants occurring across 1800 Indian individuals to analyze cancer genomes and other genetic disorders. The database and toolkit package is available for download at the following: Database URL http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNPdb2/TMCSNPdb2.html.Entities:
Mesh:
Year: 2022 PMID: 35551364 PMCID: PMC9216475 DOI: 10.1093/database/baac029
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 4.462
Figure 1.Development of TMC-SNPdb 2.0 and characteristic features of the variants in the database; A) schematic workflow of the steps in the development of the database; the raw variants obtained from the analysis of 173 normal samples were subject to quality/recurrence filter, followed by the depletion of variants found in germline and somatic databases to obtain novel germline variants; B) distribution of coding and non-coding variants; C) distribution of different types of synonymous variants, non-synonymous variants and INDELs; D) proportion of types of non-coding variants, identified in the TMC-SNPdb 2.0. IGR and RNA (in panel D) correspond to the intergenic and non-coding RNA variants in the database, respectively.
Figure 2.Somatic variant comparison across paired and orphan tumors; A) log-scaled raw mutation count across the paired and orphan exome sequence samples used in the study, B) percent variant depletion by the ethnic-specific germline variant set (GenomeAsia, IndiGenomes, TMC-SNPdb 2.0 and in-house PON created using 173 normal exome samples), over and above gnomAD/dbSNP depletion. Comparison between two groups performed using the Mann–Whitney test.
Statistics of variants obtained from analysis of exome sequencing samples from paired and orphan tumors following depletion of germline variants with the global population variation databases (gnomAD, dbSNP) and Asian/Indian (GenomeAsia, IndiGenomes, TMC-SNPdb 2.0) population germline variant databases, along with variants from the PON derived from 173 in-house normal exome samples
| Unique variants retained upon depletion with databases | |||||
|---|---|---|---|---|---|
| Unique variants | gnomAD + dbSNP | Indian/Asian ethnic-specific databases | Per tumor median % reduction by Indian/Asian databases post dbSNP + GnomAD depletion | Total variants depleted by Indian/Asian databases post dbSNP + GnomAD depletion | |
| Paired Tumor-Normal Samples ( | 360 352 | 119 608 | 117 029 | 2.27 | 2579 |
| Orphan samples ( | 378 995 | 88 089 | 86 729 | 3.18 | 1360 |