| Literature DB >> 33937015 |
Jing Su1,2,3, Lynn S Huang3, Ryan Barnard3, Graham Parks4, James Cappellari4, Christina Bellinger5, Travis Dotson5, Lou Craddock1, Bharat Prakash5, Jonathan Hovda6, Hollins Clark7, William Jeffrey Petty8, Boris Pasche1, Michael D Chan6, Lance D Miller1, Jimmy Ruiz8,9.
Abstract
The Comprehensive, Computable NanoString Diagnostic gene panel (C2Dx) is a promising solution to address the need for a molecular pathological research and diagnostic tool for precision oncology utilizing small volume tumor specimens. We translate subtyping-related gene expression patterns of Non-Small Cell Lung Cancer (NSCLC) derived from public transcriptomic data which establish a highly robust and accurate subtyping system. The C2Dx demonstrates supreme performance on the NanoString platform using microgram-level FNA samples and has excellent portability to frozen tissues and RNA-Seq transcriptomic data. This workflow shows great potential for research and the clinical practice of cancer molecular diagnosis.Entities:
Keywords: cancer subtyping; elastic net regularization; fine needle aspiration; logistic regression; molecular signature; non-small cell lung cancer
Year: 2021 PMID: 33937015 PMCID: PMC8085404 DOI: 10.3389/fonc.2021.584896
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Overall workflow of the C2Dx development. The development of the C2Dx NSCLC subtyping device was composed of the following four steps: 1. Knowledge translation. Four microarray-based transcriptomics datasets of NSCLC cases were extracted from Gene Expression Omnibus (GEO) and used to identify 67 subtyping-related genes, including 23 LUAD genes, 40 LUSC genes, and 4 housekeeping genes. A corresponding nanoString gene panel was established. 2. Data generation. Targeted transcriptomics data were generated using the nanoString 67-gene panel for 83 FNA biopsy samples collected from WFBH NSCLC patients, including 47 LUAD, 25 LUSC, and 11 NOS cases. 3. Model development. Intensive resampling strategy was used in elastic-net regularized logistic regression to develop the final 19-gene molecular subtyping model. 4. Model validation. The developed subtyping model was validated using the transcriptomics data of the 19 genes collected from frozen tissue from WFBH’s tumor bank using nanoString platform and from the TCGA’s RNA-Seq data generated from bulk tissues.
Patients’ characteristics.
| Characteristics | Exploring | Training | Validation | |
|---|---|---|---|---|
| WFBH-FNA | WFBH-TB | TCGA | ||
| Overall Cohort Size | 490 | 83 | 42 | 1,016 |
| Age, mean (sd) | 65.1 (10.2) | 65.9 (9.5) | 65.7 (7.7) | 66.7 (9.4) |
| Gender, n (%) | ||||
| Female | 218 (44.5%) | 50 (60.2%) | 18 (40.9%) | 406 (40.0%) |
| Male | 272 (55.5%) | 33 (39.8) | 26 (59.1%) | 610 (60.0%) |
| Race, n (%) | — | |||
| Caucasian | 72 (86.8%) | 38 (86.4%) | 738 (72.6%) | |
| African American | 9 (10.8%) | 5 (11.4%) | 82 (8.1%) | |
| Others | 2 (2.4%) | 1 (2.2%) | 196 (19.3%) | |
| Adenocarcinoma | 384 (78.4%) | 47 (56.6%) | 20 (47.7%) | 515 (53.7%) |
| Age, mean (sd) | 64.1 (10.3) | 65.2 (9.5) | 65.3 (9.0) | 65.7 (10.0) |
| Gender, n (%) | ||||
| Female | 180 (46.9%) | 22 (46.8%) | 10 (47.6%) | 276 (53.6%) |
| Male | 204 (53.1%) | 25 (53.2%) | 11 (52.4%) | 239 (46.4%) |
| Race, n (%) | — | |||
| Caucasian | 40 (85.1%) | 19 (90.5%) | 389 (75.5%) | |
| African American | 6 (12.8%) | 2 (9.5%) | 52 (10.1%) | |
| Others | 1 (2.1%) | 0 | 74 (14.4%) | |
| Squamous Cell Carcinoma | 106 (21.6%) | 25 (30.1%) | 22 (52.3%) | 501 (46.3%) |
| Age, mean (sd) | 68.6 (9.1) | 66.2 (8.6) | 66.0 (6.5) | 67.7 (8.6) |
| Gender, n (%) | ||||
| Female | 38 (35.8%) | 6 (24%) | 8 (34.8%) | 130 (25.9%) |
| Male | 68 (64.2%) | 19 (76%) | 15 (65.2%) | 371 (74.1%) |
| Race, n (%) | — | |||
| Caucasian | 21 (84%) | 19 (82.6%) | 349 (69.7%%) | |
| African American | 3 (12%) | 3 (13.0%) | 30 (6.0%) | |
| Others | 1 (4%) | 1 (4.4%) | 122 (24.4%) | |
| NOS | — | 11 (13.3%) | — | — |
The demographical and clinical characteristics of the 3 cohorts in this study were summarized. WFBH-FNA: the training cohort with tumor tissues collected from FNA biopsy of WFBH patients. WFBH-TB and TCGA: the validation cohorts with frozen tumor tissues collected from surgeries of WFBH patients and TCGA patients, respectively.
Figure 2The expression pattern of the 67 diagnostic genes in the Exploration Cohort. The Exploration Cohort was composed of 490 samples collected from LUAD (n = 384) and LUSC (n = 106) tumors. The 67-gene diagnostic panel was composed of 27 LUAD-specific and 40 LUSC-specific genes. The gene expression pattern was derived from the normalized microarray-based transcriptomics data and scaled for visualization.
Genes and coefficients of the molecular subtyping model.
| Category | Gene | coefficient |
|---|---|---|
| LUSC-related | TP63 | -0.58872 |
| LUSC-related | KRT14 | -0.39198 |
| LUSC-related | ANXA8L2 | -0.27091 |
| LUSC-related | KRT5 | -0.25936 |
| LUSC-related | SERPINB13 | -0.10086 |
| LUSC-related | SNAI2 | -0.048 |
| LUSC-related | KRT6A | -0.0221 |
| LUSC-related | PKP1 | -0.00488 |
| (Intercept) | 0.27297 | |
| LUAD-related | SPINK1 | 0.00302 |
| LUAD-related | CD55 | 0.00562 |
| LUAD-related | NKX2-1 | 0.02709 |
| LUAD-related | MUC1 | 0.13468 |
| LUAD-related | GPR116 | 0.13792 |
| LUAD-related | PNMA2 | 0.15816 |
| LUAD-related | TMC5 | 0.31305 |
Signature genes used in the molecular subtyping model were listed according to the absolute values of their logistic regression coefficients. The categories of the genes were color coded with respect to the associated subtypes (vermilion for LUAD and bluish green for LUSC). Genes are sorted according to significance (absolute values of the corresponding coefficients, with positive values favoring LUAD and negative values favoring LUSC subtypes).
Figure 3Model performance. The predicted probabilities of subtype LUAD for (A) each FNA samples and (B) each TCGA samples was visualized in the bar plot. Samples classified as LUAD (p ≥ 0.5) or LUSC (p < 0.5). Signature genes used in the molecular subtyping model were listed according to the absolute values of their logistic regression coefficients. The categories of the genes were color coded with respect to the associated subtypes (vermilion for LUAD and bluish green for LUSC). (C) The receiver operating characteristic curves (ROCs) of the model performance on the WFBH FNA, the WFBH tissue bank, and the TCGA cohorts. The corresponding c-statistics (AUC, area under the ROC curve) were 0.986, 0.911, and 0.982, respectively. When the probability threshold is set at θ = 0.5, model accuracies are 0.931, 0.881, and 0.945 for the WFBH FNA, the WFBH tissue bank, and the TCGA cohorts, respectively (marked as open circles). The optimal accuracies are reached when optimal θ levels are used, which are: 0.958 at θ = 0.60 for the WFBH FNA cohort, 0.905 at θ = 0.59 for the WFBH tissue bank cohort, and 0.946 at θ = 0.68 for the TCGA cohort (solid gray circles).
Figure 4Profiling LUAD and LUSC subtypes and NOS cases. The cumulative probability distribution curves (left) and the estimated probability density distribution curves (right) for the prediction results of the LUAD, LUSC, and NOS cases in WFBH (A) and TCGA (B) cohorts, respectively. Each circle in the left panels represents a sample.