| Literature DB >> 24223463 |
Jing Wang1, Bobbie-Jo M Webb-Robertson, Melissa M Matzke, Susan M Varnum, Joseph N Brown, Roderick M Riensche, Joshua N Adkins, Jon M Jacobs, John R Hoidal, Mary Beth Scholand, Joel G Pounds, Michael R Blackburn, Karin D Rodland, Jason E McDermott.
Abstract
BACKGROUND: The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24223463 PMCID: PMC3809975 DOI: 10.1155/2013/613529
Source DB: PubMed Journal: Dis Markers ISSN: 0278-0240 Impact factor: 3.434
Figure 1Flowchart of the ISIC framework in a biosignature discovery process.
Optimal integrated CAs derived from (A) the distance-based hierarchical clustering and (B) the disease-model-related functional selection approaches.
|
Clustering based on | Optimal Integrated CA | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Distance matrix | Data expression profiles | Functional relationships | A combination of the two | ||||||
| No. of clusters | 1 | 6 | 12 | 6 | 12 | 6 | 12 | ||
| (A) Different distance matrices | BALF | Full | 0.83 | 0.86 | 0.79 | 0.93 | 0.80 | 0.81 | 0.81 |
| Partial | 0.93 | 0.96 | 0.96 |
|
|
| |||
| Plasma | Full | 0.66 | 0.68 | 0.54 | 0.62 | ||||
| Partial | 0.77 | 0.79 |
| ||||||
|
| |||||||||
| Number of proteins1 | All | Top 3 | |||||||
| No. of clusters | 1 | 12 | 1 | 12 | |||||
|
| |||||||||
| (B) Disease model-related functional selection | BALF | Full | 0.81 | 0.90 | 0.88 | 0.82 | |||
| Partial | 0.93 |
| |||||||
| Plasma | Full | 0.57 | 0.56 | 0.59 | 0.65 | ||||
| Partial | 0.73 |
| |||||||
1This refers to the different number of significantly changed proteins (all proteins or top 3 proteins) used in each cluster.
The list of the enriched general functional groups from the Ada-deficient model of COPD extracted by the expert knowledge-driven functional analysis using the BALF data. This list is based on (A) a GO-based biological process enrichment and (B) the disease-model-related expert selection.
| No. | Enriched general functional group | (A) GO-based biological process | (B) The disease model-related GO cluster |
|---|---|---|---|
| 1 | Immune system process | (1) Immune system process | (13-1)1 Immune system process |
|
| |||
| 2 | Stress/stimulus response | (2) Response to stimulus | (1) Response to stimulus; |
|
| |||
| 3 | Cellular response to stimulus | (3) Cellular process | (3) Cellular response to stimulus |
|
| |||
| 4 | Metabolic process | (4) Metabolic process | (4) Small molecule metabolic process; |
|
| |||
| 5 | Biological regulation | (5) Biological regulation | (9) Regulation of immune system process; |
|
| |||
| 6 | Death | (6) Death | (12) Death |
|
| |||
| 7 | Localization | (7) Localization; | (13-2)1 Localization; |
|
| |||
| 8 | Cellular organization | (9) Cellular component organization or biogenesis | (13-4)1 Cellular component organization or biogenesis |
|
| |||
| 9 | Proliferation | (10) Cell proliferation; | |
|
| |||
| 10 | Others | (13) Reproduction and reproductive process; | (13-1)1 Immune system process; |
1This term belongs to the 13th cluster (others) from the approach B.
The validation results (in CA) on the cluster-based biomarker candidates using a human plasma data set.
| Functional group no. in | CA from mouse plasma-defined clusters in | CA from mouse BALF-defined clusters in | ||
|---|---|---|---|---|
| Mouse plasma | Human plasma | Mouse BALF | Human plasma | |
| 1 | 0.54 | 0.93 | 0.79 | 0.93 |
| 2 | 0.58 | 0.86 | 0.93 | 0.79 |
| 3 | 0.56 | 0.71 | 0.72 | 0.71 |
| 41 | 0.83 | 0.71 | 0.79 | 0.79 |
| 5 | 0.63 | 0.79 | 0.83 | 0.64 |
| 6 | 0.53 | 0.79 | 0.62 | 0.64 |
| 1, 4 | 0.80 | 0.86 | ||
| 2, 51 | 1.00 | 0.71 | ||
| 1–6 | 0.83 |
| 0.81 |
|
1The best performing clusters from the indicated mouse data set.
The validation results in CA on the individual biomarker candidates of COPD in the human data and the CAs from the mouse data.
| Individual protein or a panel of proteins | Optimal CA in | Belong to the general functional group1 | ||
|---|---|---|---|---|
| Mouse BALF | Mouse plasma | Human plasma | ||
| Prothrombin, THRB | 0.86 | 0.50 | 0.93 | 1–10 |
| Vitamin D-binding protein, VTDB | 0.69 | 0.63 | 0.86 | 2–10 |
| Complement C3, CO3 | 0.69 | 0.67 | 0.79 | 1–10 |
| Adiponectin, ADIPO | 0.66 | 0.53 | 0.64 | 1–9 |
| THRB; VTDB | 0.97 | 0.57 | 0.93 | |
| THRB; CO3 | 0.86 | 0.70 | 0.93 | |
| VTDB; CO3 | 0.83 | 0.67 | 0.79 | |
| THRB; CO3; ADIPO | 0.86 | 0.70 | 1.00 | |
| VTDB; CO3; ADIPO | 0.83 | 0.70 | 0.93 | |
| THRB; VTDB; CO3; ADIPO |
|
|
| |
1The enriched functional clusters refer to the general functional groups listed in Table 2: 1: immune system process; 2: stress/stimulus response; 3: cellular response to stimulus; 4: metabolic process; 5: biological regulation; 6: death; 7: localization; 8: cellular organization; 9: proliferation; 10: others.
Figure 2The bar graph of the average fold changes of the protein abundances in diseased group relative to their controls of four potential biomarkers identified in mouse BALF. The positive fold changes indicate the observed upregulation in the diseased group, the Ada −/− mice, and the negative fold changes indicate the observed downregulation. The significances of these changes are indicated with two (P value is between 0.01 and 0.001), one (P value is between 0.01 and 0.05), or no asterisk (P value is greater than 0.05). The arrow in dashed line of ADIPO shows that this protein was present in the BALF of the Ada −/− mice but absent in the Ada +/− group. The mouse data from the last two time points (on days 38 and 42) were used for this analysis.
Figure 3The comparative results in the optimal CAs between the BALF (solid lines) and plasma (dashed lines) from the Ada-deficient mice during the disease developmental course. The CAs were derived from the cumulatively (blue) and individually (green) significantly changed proteins at different time points. The CAs were obtained from the resulting clusters using the joint distance matrix of protein expression patterns and their functional relationships (XOA).