| Literature DB >> 28645323 |
Sirajul Salekin1, Mehrab Ghanat Bari2, Itay Raphael3, Thomas G Forsthuber3, Jianqiu Michelle Zhang4.
Abstract
BACKGROUND: Identifying disease correlated features early before large number of molecules are impacted by disease progression with significant abundance change is very advantageous to biologists for developing early disease diagnosis biomarkers. Disease correlated features have relatively low level of abundance change at early stages. Finding them using existing bioinformatic tools in high throughput data is a challenging task since the technology suffers from limited dynamic range and significant noise. Most existing biomarker discovery algorithms can only detect molecules with high abundance changes, frequently missing early disease diagnostic markers.Entities:
Keywords: Biomarker discovery; Disease correlated features; Early stage of disease; Feature selection; Gene/protein expression change; Multiple Sclerosis
Mesh:
Substances:
Year: 2017 PMID: 28645323 PMCID: PMC5481992 DOI: 10.1186/s12859-017-1712-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Protein expression time profile across different days for (a) proteins identified by ERI at day 5; (b) the significant proteins identified by SAM and Localfdr at day 10 but not significant according to ERI at day 5; and (c) significant proteins identified by SAM and Localfdr at day 5
Summary of clinical datasets used in this study
| Dataset | Genes | Samples class size | No of features | Source | ||
|---|---|---|---|---|---|---|
| (ERI) | (SAM) | (Localfdr) | ||||
| GSE14333 | 54675 | 138/91 | 0 | 2 | 0 | [ |
| GSE27854 | 54675 | 57/58 | 0 | 0 | 0 | [ |
| CNS | 7129 | 21/39 | 4 | 0 | 2 | [ |
| Colon Cancer | 2000 | 40/22 | 3 | 46 | 8 | [ |
| GLI-85 | 22283 | 26/59 | 51 | 1458 | 1198 | [ |
| Lung Cancer | 7129 | 24/62 | 0 | 0 | 0 | [ |
| Prostate Cancer | 10509 | 50/52 | 11 | 946 | 769 | [ |
| SMK-CAN-187 | 19993 | 90/97 | 8 | 289 | 271 | [ |
| Breast Cancer | 22283 | 138/71 | 5 | 14 | 2 | [ |
Fig. 2Feature selection flow diagram of ERI algorithm
Fig. 3Comparison of overlap of discovered significant proteins between selecting (a) 200 and 300 features; (b) 300 and 400 features in pre-filtering step
Fig. 4Comparison of overlap of discovered significant proteins (a) between ERI and SAM at day 5; (b) between ERI and Localfdr at day 5; (c) between ERI at day 5 and SAM, Localfdr at day 10; (d) between SAM at day 5 and SAM at day 10 (e) between Localfdr at day 5 and Localfdr at day 10 and (f) between day 5 and day 10 by ERI, SAM and Localfdr combined
Fig. 5Relative protein abundance change between day 0 and 5 for proteins having significant ERI score and those without significant ERI but picked up by SAM and Localfdr on day 10
Number of significant proteins identified by three methods across different days of EAE dataset
| Method | Day 5 | Day 7 | Day 10 | Day 15 | Day 20 | Day 25 |
|---|---|---|---|---|---|---|
| ERI | 73 | 5 | 2 | 13 | 23 | 38 |
| SAM | 35 | 106 | 191 | 17 | 27 | 18 |
| Localfdr | 18 | 219 | 152 | 5 | 21 | 7 |