| Literature DB >> 28759038 |
K R Velmurugan1,2, R T Varghese1,2, N C Fonville3, H R Garner1,2.
Abstract
There remains a large discrepancy between the known genetic contributions to cancer and that which can be explained by genomic variants, both inherited and somatic. Recently, understudied repetitive DNA regions called microsatellites have been identified as genetic risk markers for a number of diseases including various cancers (breast, ovarian and brain). In this study, we demonstrate an integrated process for identifying and further evaluating microsatellite-based risk markers for lung cancer using data from the cancer genome atlas and the 1000 genomes project. Comparing whole-exome germline sequencing data from 488 TCGA lung cancer samples to germline exome data from 390 control samples from the 1000 genomes project, we identified 119 potentially informative microsatellite loci. These loci were found to be able to distinguish between cancer and control samples with sensitivity and specificity ratios over 0.8. Then these loci, supplemented with additional loci from other cancers and controls, were evaluated using a target enrichment kit and sample-multiplexed nextgen sequencing. Thirteen of the 119 risk markers were found to be informative in a well powered study (>0.99 for a 0.95 confidence interval) using high-depth (579x±315) nextgen sequencing of 30 lung cancer and 89 control samples, resulting in sensitivity and specificity ratios of 0.90 and 0.94, respectively. When 8 loci harvested from the bioinformatic analysis of other cancers are added to the classifier, then the sensitivity and specificity rise to 0.93 and 0.97, respectively. Analysis of the genes harboring these loci revealed two genes (ARID1B and REL) and two significantly enriched pathways (chromatin organization and cellular stress response) suggesting that the process of lung carcinogenesis is linked to chromatin remodeling, inflammation, and tumor microenvironment restructuring. We illustrate that high-depth sequencing enables a high-precision microsatellite-based risk classifier analysis approach. This microsatellite-based platform confirms the potential to create clinically actionable diagnostics for lung cancer.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28759038 PMCID: PMC5701090 DOI: 10.1038/onc.2017.256
Source DB: PubMed Journal: Oncogene ISSN: 0950-9232 Impact factor: 9.867
Figure 1Flow chart of informative loci identification and marker validation. This work can be divided into two phases: the computational identification of informative MST loci phase and the marker validation phase. Phase 1: 488 non-small cell lung germline cancer samples from the TCGA and 390 germline non-cancer control samples from the 1000 genomes project we analyzed. This analysis yielded 119 MST loci that have significant genotype difference in the cancer and control samples. Phase 2: this set of 119 markers, along with 144 MST markers that were computationally found to be significant for others cancers, were pooled into a target enrichment kit which was used to sequence at high depth a total of 30 lung cancer samples and 89 non-cancer control samples. Of these 263 (119+144) MST markers, 21 were found to consistently differentiate lung cancer and control samples.
Figure 2The computationally harvested LUAD and LUSC MST loci differentiate their corresponding cancer type from 1000 genomes non-cancer control samples with high sensitivity (LUAD: 0.87, LUSC: 0.88). (a) A sample with 39% (vertical black line; identified via ROC analysis) or more of the 96 LUAD specific MST loci with cancer genotype will be called ‘at-risk’ for adenocarcinoma of the lung. (b) A sample with 37% or more of the 67 LUSC specific MST loci with cancer genotype will be called ‘at-risk’ for squamous cell carcinoma. Blue bars represent control samples and orange bars represent lung cancer germline samples.
MST loci that can precisely differentiate between the lung cancer samples and non-tumor samples
| chr2:60918364-60918376 | T | Intron | 5966 | LUAD, LUSC | 39.92 | |
| chr6:157174818-157174831 | T | Intron | 57 492 | LUAD, LUSC, MB, SKCM | 13.57 | |
| chr6:76018867-76018880 | A | Intron | 3617 | LUSC, OV | 12.28 | |
| chr3:94035443-94035458 | T | Intron | 200 894 | GBM, LUAD, SKCM | 11.20 | |
| chr3:112534347-112534360 | A | Intron | 64 422 | GBM, LGG, LUAD, LUSC | 10.29 | |
| chr8:129862369-129862381 | A | Intron | 51 571 | LUAD, LUSC | 7.01 | |
| chr9:130622843-130622857 | A | Intron | 8939 | GBM, LGG, LUAD, LUSC, MB, OV | 6.93 | |
| chr7:135414296-135414309 | A | Intron | 4850 | LUAD | 5.07 | |
| chr2:48461120-48461133 | T | Intron | 129 285 | LGG, LUAD, LUSC, MB, SKCM | 4.43 | |
| chr2:55332516-55332530 | A | Intron | 55 704 | LUAD, LUSC, SKCM | 3.90 | |
| chr13:31148484-31148500 | A | Intron | 3315 | LUAD, SKCM | 3.70 | |
| chr15:20458509-20458521 | A | Intron | 283 755 | LUAD | 3.00 | |
| chr10:13591929-13591943 | T | Intron | 8559 | LUSC | 2.25 | |
| chr2:202815832-202815844 | A | Intron | 130 026 | BC, GBM, OV | 7.61 | |
| chr13:114236623-114236635 | T | Intron | 8881 | LGG, SKCM | 6.19 | |
| chr12:106106383-106106396 | A | Intron | 9891 | OV | 5.74 | |
| chr3:98580864-98580876 | A | Intron | 1371 | BC, OV | 5.02 | |
| chr16:70839964-70839978 | T | Intron | 54 768 | GBM | 4.58 | |
| chr2:233460070-233460083 | A | Intron | 8527 | OV | 3.82 | |
| chr5:87383860-87383873 | T | Intron | 5921 | OV | 2.93 | |
| chr8:23852057-23852082 | TG | Intron | 6781 | BC | 1.87 |
Abbreviations: GBM, glioblastoma; LGG, lower grade glioma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma. MB, medulloblastoma; OV, ovarian cancer; SKCM, skin cutaneous melanoma.
Figure 3The 13 lung cancer specific MST loci and 8 MST loci specific for other diseases can differentiate between the lung cancer and non-cancer control sample groups. The blue and red bars represent the non-cancer control and lung cancer samples, respectively. (a) A sample with 61% or more of the 13 MST loci with cancer genotype will be termed ‘at-risk’ for lung cancer with sensitivity and specificity values of 0.90 and 0.94. (b) A sample with 57% or more of the 21 MST loci with cancer genotype will be termed ‘at-risk’ for lung cancer with sensitivity and specificity values of 0.93 and 0.97. The vertical black line corresponds to the optimum cutoff values found from the ROC analysis.
Figure 4Schematic describing potential mechanism underlying lung carcinogenesis. Two genes out of 13 have significant oncogenic potential.