| Literature DB >> 30598093 |
Dong Wang1, Jie Li2, Rui Liu1, Yadong Wang1.
Abstract
BACKGROUND: With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data.Entities:
Keywords: Clustering algorithm; Gene ontology; Gene set annotation; Gene set enrichment
Mesh:
Substances:
Year: 2018 PMID: 30598093 PMCID: PMC6311910 DOI: 10.1186/s12918-018-0659-6
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1Overview of the proposed method. The cell line and cancer gene expression are introduced into our experiment. All of gene expression data are normalized us by Z-score. Differentially expressed(DE) genes are introduced as the genes to be optimized. Gene annotation data obtain the structure information using the dRW algorithm. Then, the normalized gene expression data and gene annotations contained the GO structure information will be introduced into the gene set annotation optimization algorithm. Gene set annotation optimization: gene sets divided by the clustering algorithm will own similar annotations using gene expression data. Finally, the proposed method and the state-of-the-art methods are evaluated using three metrics
Fig. 2The enrichment FDR values calculated by unoptimized annotations, EMVC-optimized annotations and the-proposed-method-optimized annotations using p53 gene expression data. The ratio of different colors indicates the number of minimum FDR values compared to the two methods
The Kendall’s W and AUC on MSigDB C2 v1.0 and p53 gene expression data
| The proposed method | The EMVC algorithm | |
|---|---|---|
| Kendall’s W | 0.325 | 0.322 |
| AUC | 0.515 | 0.489 |
Fig. 3The enrichment FDR values calculated by unoptimized annotations, EMVC-optimized annotations and the-proposed-method-optimized annotations using colon cancer gene expression data. The ratio of different colors indicates the number of minimum FDR values compared to the two methods
The Kendall’s W and AUC on MSigDB C4 v6.0 and colon cancer gene expression data
| The proposed method | The EMVC algorithm | |
|---|---|---|
| Kendall’s W | 0.975 | 0.975 |
| AUC | 0.642 | 0.622 |
Fig. 4The enrichment FDR values calculated by unoptimized annotations, EMVC-optimized annotations and the-proposed-method-optimized annotations using breast cancer subtype gene expression data. The ratio of different colors indicates the number of minimum FDR values compared to the two methods
The AUC on MSigDB C4 v6.0 and breast cancer subtype gene expression data
| AUC | HER2-enriched subtype | Luminal A subtype | Luminal B subtype | Normal-like subtype |
|---|---|---|---|---|
| The proposed method | 0.570 | 0.617 | 0.646. | 0.563 |
| The EMVC algorithm | 0.540 | 0.614 | 0.635 | 0.562 |
The Kendall’s W on MSigDB C4 v6.0 and breast cancer subtype gene expression data
| Kendall’s W | HER2-enriched subtype | Luminal A subtype | Luminal B subtype | Normal-like subtype |
|---|---|---|---|---|
| The proposed method | 0.978 | 0.980 | 0.984 | 0.963 |
| The EMVC algorithm | 0.979 | 0.980 | 0.985 | 0.961 |
| Unoptimized annotations | 0.969 | 0.969 | 0.978 | 0.935 |