| Literature DB >> 31089177 |
Min Young Lee1, Taek-Kyun Kim1, Kathie-Anne Walters1, Kai Wang2.
Abstract
Implementation of multi-gene biomarker panels identified from high throughput data, including microarray or next generation sequencing, need to be adapted to a platform suitable in a clinical setting such as quantitative polymerase chain reaction. However, technical challenges when transitioning from one measurement platform to another, such as inconsistent measurement results can affect panel development. We describe a process to overcome the challenges by replacing poor performing genes during platform transition and reducing the number of features without impacting classification performance. This approach assumes that a diagnostic panel reflects the effect of dysregulated biological processes associated with a disease, and genes involved in the same biological processes and coordinately affected by a disease share a similar discriminatory power. The utility of this optimization process was assessed using a published sepsis diagnostic panel. Substitution of more than half of the genes and/or reducing genes based on biological processes did not negatively affect the performance of the sepsis diagnostic panel. Our results suggest a systematic gene substitution and reduction process based on biological function can be used to alleviate the challenges associated with clinical development of biomarker panels.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31089177 PMCID: PMC6517383 DOI: 10.1038/s41598-019-43779-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Identification of substitutes for Stanford11. (A) The overall procedure of the identification of substitutes for Stanford11. (B) The six key biological processes represented by the Stanford11 panel.
The list of substitutable genes for features in the Stanford11.
| Chemotaxis, adhesion, migration | Antigen processing, immune response | Transcription by RNA pol II | Platelet activation | Apoptosis | Metabolism | |
|---|---|---|---|---|---|---|
| CEACAM1↑ | ADAMTS3↑, | |||||
| C3AR1↑ | ANXA3↑, | GPR84↑ | ||||
| TGFBI↓ | CCR1↑, | |||||
| GNA15↑ | CCR1↑, | FCER1G↑, | ||||
| HLA-DPB1↓ | BPI↑, CCR1↑, | |||||
| BATF↑ | IL10↑, | |||||
| MTCH1↓ | LCN2↑, | |||||
| C9orf95↑ | C9orf103↑, |
Substitution candidates (n = 28) for eight genes of Stanford11 were selected based on six key biological processes. ↑ And ↓ indicates increased or decreased expression level in sepsis, respectively. No substitutable gene was retained for TGFBI and HLA-DPB1 after considering consistency in directional changes.
Figure 2Importance of biological processes on classification performance. (A) The distribution of classification performances of the 100,000 random gene sets. (B) The number of genes of the top and bottom 250 gene sets in 97 GOBPs represented by Stanford11. (C) Clusters of GOBPs in the top 250 and bottom 250 gene sets. Count and percent indicate the average number and percentage of genes in each GOBP cluster.
Figure 3The performances of one gene substitution. The distribution of AUCs in the 12 microarray datasets when (A) BATF; (B) C3AR1; (C) C9orf95; (D) CEACAM1; (E) GNA15; (F) MTCH1 was replaced with a substitute gene. *Indicates P-value from DeLong test comparing a substituted panel (blue bars) with the median AUC and the original Stanford11 (gray bars) less than 0.05.
Figure 4The performances of five gene substitution. The AUCs in the 12 microarray datasets when genes representing all five functional categories were replaced with substitute genes. *and **indicate P-value from DeLong test comparing a substituted panel with the median AUC and the original Stanford11 less than 0.05 and 0.01, respectively.
Figure 5The performances of gene reduction. The AUCs in the 12 microarray datasets when only one gene in (A) platelet activation function; in (B) chemotaxis, adhesion, migration function; (C) in both processes was retained. *and **indicate P-value from DeLong test comparing a substituted panel with the median AUC and the original Stanford11 less than 0.05 and 0.01, respectively.
Figure 6The performances of 1,482 six-gene combinations. *and **indicate P-value from DeLong test comparing a substituted panel with the median AUC and the original Stanford11 less than 0.05 and 0.01, respectively.
Optimized six-gene panels (n = 22) with higher performance in the validation sets.
| 6-gene Panels | Chemotaxis, adhesion, migration | Antigen processing, immune response | Transcription by RNA pol II | Platelet activation | Apoptosis | Metabolism | GSE65682 | GSE74224 | E-MEXP-3589 |
|---|---|---|---|---|---|---|---|---|---|
| 1 | CCR1 | HLA-DPB1 | BATF | C3AR1 | ARHGEF18 | C9orf95 | 0.7807 | 0.8854 | 0.6633 |
| 2 | CCR1 | HLA-DPB1 | BATF | C3AR1 | MTCH1 | C9orf103 | 0.8170 | 0.9058 | 0.6990 |
| 3 | CCR1 | HLA-DPB1 | BATF | C3AR1 | MTCH1 | C9orf95 | 0.8095 | 0.8967 | 0.6786 |
| 4 | CCR1 | HLA-DPB1 | BATF | GNA15 | MTCH1 | C9orf95 | 0.8080 | 0.8893 | 0.6378 |
| 5 | CD177 | HLA-DPB1 | BATF | C3AR1 | MTCH1 | C9orf95 | 0.8014 | 0.8827 | 0.6684 |
| 6 | CD63 | HLA-DPB1 | PLAC8 | FCER1G | MTCH1 | C9orf95 | 0.8158 | 0.9128 | 0.6122 |
| 7 | CD63 | HLA-DPB1 | PLAC8 | GNA15 | ARHGEF18 | C9orf95 | 0.7852 | 0.9220 | 0.5663 |
| 8 | CD63 | HLA-DPB1 | PLAC8 | GNA15 | MTCH1 | C9orf95 | 0.8107 | 0.9333 | 0.5918 |
| 9 | CD63 | HLA-DPB1 | BATF | C3AR1 | MTCH1 | C9orf95 | 0.8080 | 0.9067 | 0.6633 |
| 10 | CD63 | HLA-DPB1 | BATF | GNA15 | MTCH1 | C9orf95 | 0.8029 | 0.8915 | 0.6378 |
| 11 | EMR1 | HLA-DPB1 | BATF | C3AR1 | MTCH1 | C9orf95 | 0.8149 | 0.8963 | 0.6582 |
| 12 | EMR1 | HLA-DPB1 | BATF | GNA15 | MTCH1 | C9orf95 | 0.8044 | 0.8836 | 0.6071 |
| 13 | FCER1G | HLA-DPB1 | PLAC8 | C3AR1 | MTCH1 | C9orf95 | 0.8080 | 0.9241 | 0.6531 |
| 14 | FCER1G | HLA-DPB1 | PLAC8 | GNA15 | ARHGEF18 | C9orf95 | 0.7861 | 0.9098 | 0.5408 |
| 15 | FCER1G | HLA-DPB1 | PLAC8 | GNA15 | MTCH1 | C9orf95 | 0.8092 | 0.9185 | 0.5816 |
| 16 | FCER1G | HLA-DPB1 | BATF | C3AR1 | MTCH1 | SEPHS2 | 0.8086 | 0.8867 | 0.6582 |
| 17 | FCER1G | HLA-DPB1 | BATF | C3AR1 | MTCH1 | C9orf95 | 0.8071 | 0.8963 | 0.6480 |
| 18 | FES | HLA-DPB1 | PLAC8 | FCER1G | MTCH1 | C9orf95 | 0.8089 | 0.8945 | 0.6173 |
| 19 | FES | HLA-DPB1 | PLAC8 | GNA15 | ARHGEF18 | C9orf103 | 0.7870 | 0.9098 | 0.6122 |
| 20 | FES | HLA-DPB1 | PLAC8 | GNA15 | MTCH1 | C9orf95 | 0.8005 | 0.9150 | 0.6071 |
| 21 | FES | HLA-DPB1 | BATF | C3AR1 | MTCH1 | C9orf95 | 0.8026 | 0.8806 | 0.6786 |
| 22 | C3AR1 | HLA-DPB1 | BATF | GNA15 | MTCH1 | C9orf95 | 0.7999 | 0.9133 | 0.6480 |
Among six gene panels (n = 73) that have higher performance than the lower bound of 95% confidence intervals of the original Stanford11 panel in all discovery datasets, 22 panels have even higher performance in two independent datasets.
Genes used in evaluation of the impact of biological function information.
| The 11 highest correlated genes | Chemotaxis, adhesion, migration processes | SVM-RFE | LR-LASSO | k-Top Scoring Pairs | |||
|---|---|---|---|---|---|---|---|
| Stanford11 | Panel-HC | CC | Stanford 82 | 14 genes | 10 genes | 6 genes | 6 genes |
| BATF | DDAH2 | 0.7642 | C3AR1, CD177, FCER1G, CEACAM1, ADGRE1, CCR1, TGFBI, SIGLEC9, CD63, PSTPIP2, FES, ANXA3, IL10, RETN | BATF, TGFBI, GNA15, C9orf95, MTCH1, C3AR1, ZDHHC19, KIAA1370, RPGRIP1, CEACAM1 | BATF, C3AR1, C9orf95, GNA15, MTCH1, TGFBI | TGFBI - C3AR1 | |
| C3AR1 | SQRDL | 0.5393 | √ | ||||
| C9orf95 | PTPN22 | 0.4870 | |||||
| CEACAM1 | GPR84 | 0.8169 | √ | ||||
| GNA15 | FERMT3 | 0.5874 | |||||
| ZDHHC19 | GPR84 | 0.7408 | √ | ||||
| HLA-DPB1 | HLA-DMB | 0.7329 | |||||
| KIAA1370 | KIAA1468 | 0.6517 | |||||
| MTCH1 | CDK5RAP3 | 0.6607 | |||||
| RPGRIP1 | NOV | 0.5266 | |||||
| TGFBI | CPVL | 0.8078 | |||||
√ in Stanford82 column indicates the genes of Stanford82. CC indicates the Pearson’s correlation coefficient between the genes.
Figure 7Evaluation of the impact of biological function information. The importance of biological function information was evaluated in five different approaches. (A) The AUCs of the Panel-HC. (B) The AUCs of the Panel-AM. (C) The AUCs of the Panel-SVM. (D) The AUCs of the Panel-LR. (E) The AUCs of the Panel-kTSP. *and **indicate p-value from DeLong test less than 0.05 and 0.01, respectively.