| Literature DB >> 23282288 |
Jérôme Jourquin1, Dexter Duncan, Zhiao Shi, Bing Zhang.
Abstract
BACKGROUND: Answering questions such as "Which genes are related to breast cancer?" usually requires retrieving relevant publications through the PubMed search engine, reading these publications, and creating gene lists. This process is not only time-consuming, but also prone to errors.Entities:
Mesh:
Year: 2012 PMID: 23282288 PMCID: PMC3535723 DOI: 10.1186/1471-2164-13-S8-S20
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Overall quality of the retrieved gene lists
| Query | GO/ MIM gene count | GLAD4U gene count | EBIMed gene count | GLAD4U | EBIMed | |
|---|---|---|---|---|---|---|
| 1039 | 6037 (958) | 1469 (387) | Precision | 0.1587 | 0.2634 | |
| [195715] | [10000] | Recall | 0.9220 | 0.3725 | ||
| F-measure | 0.2708 | 0.3086 | ||||
| 785 | 4195 (691) | 1725 (305) | Precision | 0.1647 | 0.1769 | |
| [125144] | [10000] | Recall | 0.8802 | 0.3885 | ||
| F-measure | 0.2775 | 0.2431 | ||||
| 282 | 2476 (263) | 1100 (180) | Precision | 0.1062 | 0.1636 | |
| [60952] | [10000] | Recall | 0.9326 | 0.6383 | ||
| F-measure | 0.1907 | 0.2605 | ||||
| 87 | 2046 (77) | 135 (27) | Precision | 0.0376 | 0.2000 | |
| [323818] | [10000] | Recall | 0.8851 | 0.3103 | ||
| F-measure | 0.0721 | 0.2432 | ||||
| 111 | 1778 (110) | 350 (59) | Precision | 0.0619 | 0.1686 | |
| [141615] | [10000] | Recall | 0.9910 | 0.5315 | ||
| F-measure | 0.1165 | 0.2560 | ||||
| 94 | 1725 (90) | 382 (44) | Precision | 0.0522 | 0.1152 | |
| [91194] | [10000] | Recall | 0.9574 | 0.4681 | ||
| F-measure | 0.0990 | 0.1849 | ||||
Numbers in parentheses indicate the number of genes overlapping between the GLAD4U or EBIMed lists and the corresponding gold standard, numbers in square brackets indicate the number of publications retrieved by the query.
Figure 1Precision/recall curves for different prioritization methods. Precision/recall curves for GLAD4U Counts, GLAD4U Hypergeometric and EBIMed are colored in black, red, and green, respectively. Dashed lines correspond to the precision levels of 0.8 and 0.5.
Comparison of different prioritization methods
| Apoptosis | Cell Adhesion | DNA Repair | Hypertension | Obesity | Schizophrenia | |
|---|---|---|---|---|---|---|
| AP | 0.4939 | 0.4611 | 0.6670 | 0.3947 | 0.5698 | 0.4601 |
| Precision at | 0.8600 | 0.8000 | 0.8800 | 0.4800 | 0.7800 | 0.5400 |
| Precision at | 0.8300 | 0.7300 | 0.8100 | 0.3800 | 0.5500 | 0.4200 |
| AP | 0.4942 | 0.5723 | 0.8139 | 0.4564 | 0.4782 | 0.4280 |
| Precision at | 0.9400 | 0.9000 | 1.0000 | 0.5800 | 0.6200 | 0.4800 |
| Precision at | 0.9000 | 0.8500 | 0.9700 | 0.3900 | 0.5200 | 0.4400 |
| AP | 0.1567 | 0.1256 | 0.3517 | 0.1336 | 0.2673 | 0.2318 |
| Precision at | 0.6200 | 0.4800 | 0.8400 | 0.3137 | 0.5652 | 0.4423 |
| Precision at | 0.5980 | 0.4848 | 0.6700 | 0.2821 | 0.1586 | 0.3200 |
AP: Average Precision
First 10 genes retrieved by GLAD4U and not listed in the gold standard lists
| Rank | Entrez-Gene ID | Score | PMIDs* |
|---|---|---|---|
| 41 | 4193 (MDM2) | 53.5212 | 21051655, 21051533, 20849854, 20849851, 20832750, 20822933, 20708156, 20659896, 20657550, 20644561 |
| 48 | 1432 (MAPK14) | 40.8288 | 20736797, 20573801, 20558744, 20473571, 20463961, 20430109, 20393480, 20345980, 20307495, 20299663 |
| 49 | 4609 (MYC) | 37.27.98 | 20714214, 20598117, 20596624, 20573831, 20564213, 20515470, 20232342, 20071475, 19996270, 19966300 |
| 54 | 6774 (STAT3) | 35.2695 | 20562100, 20514402, 20507639, 20490331, 20459702, 20447714, 20213502, 20197401, 20164027, 20154216 |
| 77 | 5580 (PRKCD) | 23.3017 | 20548952, 20547768, 20471435, 20093486, 19932628, 19917613, 19875824, 19833733, 19808702, 19747914 |
| 78 | 29126 (CD274) | 23.1218 | 20636820, 20617899, 20587542, 20506224, 20445553, 20363965, 19916867, 19826049, 19811426, 19794071 |
| 79 | 142 (PARP1) | 22.9308 | 20940411, 20665026, 20644561, 20629644, 20564216, 20453000, 20388712, 20181890, 20177052, 20072652 |
| 86 | 406991 (MIR21) | 18.9856 | 20813833, 20515755, 20514462, 20447717, 20404348, 20372781, 20371612, 20346171, 20153722, 20148895 |
| 96 | 7295 (TXN) | 16.1886 | 20619274, 20430109, 20298786, 20103619, 19671194, 19566940, 19328186, 19120277, 18983687, 18848838 |
| 100 | 3621 (ING1) | 15.3784 | 19085961, 18836436, 18801192, 18691180, 18655775, 18533182, 18388957, 17585055, 17379210, 16607280 |
| 10 | 5972 (REN) | 61.9237 | 20925572, 20662730, 20577119, 20537141, 20429690, 20223792, 20160196, 19891555, 19673942, 19536175 |
| 12 | 3291 (HSD11B2) | 45.7032 | 20597806, 19811365, 19150652, 18837962, 18573267, 18178212, 17551100, 16872738, 16778331, 16109323 |
| 14 | 4879 (NPPB) | 36.9570 | 20713912, 20368210, 20350538, 20346360, 20234137, 20142024, 20113292, 20102554, 20087954, 20083731 |
| 17 | 4524 (MTHFR) | 32.2080 | 21072525, 21060006, 20960113, 20852445, 20812180, 20717043, 20669348, 20637366, 20592457, 20479155 |
| 19 | 1401 (CRP) | 31.9446 | 21044781, 20805569, 20733302, 20683147, 20676960, 20346360, 20339115, 20184533, 20074254, 20068351 |
| 20 | 4878 (NPPA) | 31.6082 | 20577119, 20543198, 20368210, 20346360, 20137368, 19635983, 19479237, 19430483, 19346663, 19330901 |
| 21 | 155 (ADRB3) | 28.4824 | 20831043, 20144152, 20044737, 19842096, 19779464, 19479237, 19131662, 18724972, 18510051, 18088254 |
| 24 | 1584 (CYP11B1) | 24.8304 | 20708777, 20339375, 19820005, 19567537, 19082699, 18663314, 18294861, 17980006, 17296872, 17121536 |
| 27 | 59272 (ACE2) | 21.4649 | 20831027, 20813695, 20679547, 20349406, 20160196, 20117991, 19926873, 19684612, 19289653, 19286756 |
| 29 | 9370 (ADIPOQ) | 19.6898 | 21044781, 20593932, 20552610, 20528971, 20516205, 20443850, 20385503, 20376890, 20166815, 20150538 |
* Only the 10 most recent supporting publications are shown here. See additional files 1 and 2 for the complete list of false-positive genes and their corresponding supporting publications.
Figure 2F-measure evaluations of GLAD4U before and after thresholding. F-measure evaluations of GLAD4U before and after thresholding, for each disease-associated gene lists. A higher F-measure indicates a better GLAD4U performance.
Figure 3GLAD4U output page. A typical result page generated by a query with GLAD4U. The summary section presents the main statistics for the query, along with two hyperlinked icons to download the results as an entire archive of all pages of results ("compressed" icon), a CSV ("Excel" icon) or a text ("text" icon) file. Right below the summary, a link is available to send the results for functional enrichment analysis. In the main result section, the prioritized genes are presented. The user can click the "+" to show/hide the supporting publications, which are all hidden by default to help the read-out of the gene information. ADIPOQ gene is presented with its supporting publications as an example.