| Literature DB >> 25379168 |
Rick Jordan1, Shyam Visweswaran2, Vanathi Gopalakrishnan2.
Abstract
BACKGROUND: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids.Entities:
Keywords: Biofluid; Biomarker; Breast cancer; Literature mining; Lung cancer; Text mining
Year: 2014 PMID: 25379168 PMCID: PMC4215335 DOI: 10.1186/2043-9113-4-13
Source DB: PubMed Journal: J Clin Bioinforma ISSN: 2043-9113
Figure 1Semi-automated flowchart of the information retrieval process. Python scripts were written to process text files. ABNER was used for tagging biological entities, and the z-score calculation was performed using Microsoft Excel.
Size of the abstract sets returned from queries of breast and lung cancer
| Bile | 360 | 40,250 | Bile | 328 | 40,290 |
| Blood | 18,939 | 1,540,721 | Blood | 15,710 | 1,522,046 |
| Breastmilk | 1,047 | 17,874 | Breastmilk | 99 | 18,834 |
| CSF | 252 | 42,711 | CSF | 298 | 42,676 |
| Mucus | 116 | 25,122 | Mucus | 1,445 | 23,801 |
| Plasma | 4,327 | 342,415 | Plasma | 3,227 | 343,678 |
| Saliva | 149 | 22,694 | Saliva | 86 | 22,770 |
| Semen | 40 | 12,956 | Semen | 9 | 12,989 |
| Serum | 7,410 | 415,218 | Serum | 6,029 | 412,897 |
| SF | 18 | 7,699 | SF | 18 | 7,671 |
| Stool | 123 | 37,574 | Stool | 90 | 37,619 |
| Sweat | 321 | 11,079 | Sweat | 88 | 11,673 |
| Tears | 40 | 11,651 | Tears | 10 | 11,673 |
| Urine | 1,154 | 125,462 | Urine | 918 | 86,776 |
| Total | 34,296 | 2,653,396 | Total | 28,355 | 2,595,034 |
CSF = cerebrospinal fluid; SF = synovial fluid.
Figure 2Number of markers identified across the range of possible Z-scores. Decreasing the Z-score threshold allows for more significant markers to be identified.
Number of markers identified for each disease-biofluid combination
| Bile | 200 | 26 | 58 | 7 | 51 | 87.93 |
| Blood | 2084 | 150 | 196 | 9 | 187 | 95.41 |
| Breastmilk | | | | | | |
| CSF | 116 | 8 | 18 | 0 | 18 | 100.00 |
| Mucus | 63 | 13 | 8 | 3 | 5 | 62.50 |
| Plasma | 1002 | 88 | 100 | 5 | 95 | 95.00 |
| Saliva | 73 | 9 | 10 | 2 | 8 | 80.00 |
| Semen | 35 | 3 | 6 | 0 | 6 | 100 |
| Serum | 1327 | 106 | 145 | 6 | 139 | 95.86 |
| SF | 21 | 0 | 4 | 0 | 4 | 100.00 |
| Stool | 68 | 8 | 7 | 3 | 4 | 57.14 |
| Sweat | 123 | 15 | 28 | 3 | 25 | 89.29 |
| Tears | 26 | 2 | 3 | 0 | 3 | 100.00 |
| Urine | 310 | 32 | 38 | 3 | 35 | 92.11 |
| Bile | 167 | 17 | 25 | 1 | 24 | 96.00 |
| Blood | 1863 | 141 | 152 | 7 | 145 | 95.39 |
| Breastmilk | 77 | 15 | 11 | 2 | 9 | 81.82 |
| CSF | 106 | 7 | 11 | 1 | 10 | 90.91 |
| Mucus | 276 | 27 | 73 | 10 | 63 | 86.30 |
| Plasma | 843 | 75 | 65 | 4 | 61 | 93.85 |
| Saliva | 53 | 3 | 7 | 1 | 6 | 85.71 |
| Semen | 11 | 2 | 0 | 0 | 0 | 0 |
| Serum | 1109 | 100 | 103 | 3 | 100 | 97.09 |
| SF | 13 | 2 | 3 | 0 | 3 | 100.00 |
| Stool | 45 | 2 | 5 | 0 | 5 | 100.00 |
| Sweat | 44 | 5 | 4 | 0 | 4 | 100.00 |
| Tears | 12 | 0 | 1 | 0 | 1 | 100.00 |
| Urine | 256 | 30 | 56 | 6 | 50 | 89.29 |
Known markers were determined by identification of the given gene symbol in our known biomarker lists (Additional file 6: Table S1 or Additional file 7: Table S2). Significant markers had a z-score >1.0.