| Literature DB >> 16803617 |
Thomas Karopka1, Juliane Fluck, Heinz-Theodor Mevissen, Anne Glass.
Abstract
BACKGROUND: Autoimmune diseases are disorders caused by an immune response directed against the body's own organs, tissues and cells. In practice more than 80 clinically distinct diseases, among them systemic lupus erythematosus and rheumatoid arthritis, are classified as autoimmune diseases. Although their etiology is unclear these diseases share certain similarities at the molecular level i.e. susceptibility regions on the chromosomes or the involvement of common genes. To gain an overview of these related diseases it is not feasible to do a literary review but it requires methods of automated analyses of the more than 500,000 Medline documents related to autoimmune disorders.Entities:
Mesh:
Year: 2006 PMID: 16803617 PMCID: PMC1525205 DOI: 10.1186/1471-2105-7-325
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistics summarising the content of the Autoimmune Disease Database.
| N | 2,661,938 | # abstracts in all of Medline containing proteins/genes recognised by ProMiner |
| NDisease | 401,128 | # documents that mention at least one autoimmune disease in the title or abstract. |
| NDiseaseGene | 85,425 | # documents in NDisease containing a gene recognised by ProMiner |
| NMeSH | 416,742 | # documents that have an autoimmune disease as MeSH term |
| NMeSHGene | 74,610 | # documents in NMeSH containing a gene recognised by ProMiner |
| NAIDB | 541,690 | # documents union of subset NDisease and NMeSH |
| NAIDBGene | 117,021 | # documents in Ndisease and NMeSH containing a gene recognised by ProMiner |
| Ng | 132,577 | # protein/gene names recognised by ProMiner including synonyms and orthographic variants |
| Ngdiff | 13,272 | # different genes that are recognised by ProMiner in all of Medline |
| Ngaid | 5,471 | # genes in the subset related to autoimmune diseases NAIDBGene |
| MeSH terms | 79 | # MeSH terms in the context of autoimmune diseases |
| Concepts | 103 | # Concepts for autoimmune diseases |
Statistics of the AIDB content. Note that the values shown in the table are from 6th of April 2006. The actual values may differ due to an update of the database.
UMLS concept Takayasu's arteritis.
| Takayasu's Arteritis | Takayasu's Arteritis MeSH | 850 |
| Takayasu's Disease | 452 | |
| CUI: C0039263 | Takayasu Arteritis | 387 |
| PULSELESS DISEASE MeSH | 291 | |
| Aortic arch syndrome | 246 | |
| TAKAYASU DISEASE | 126 | |
| Nonspecific aortoarteritis | 97 | |
| Atypical coarctation | 73 | |
| Takayasu Syndrome | 53 | |
| Middle aortic syndrome | 40 | |
| Takayasu's syndrome MeSH | 33 | |
| Primary arteritis | 22 | |
| Nonspecific arteritis | 22 | |
| Takayasu's arteriopathy | 12 | |
| Idiopathic aortitis | 8 | |
| Martorell syndrome | 7 | |
| Aortic arch syndrome | 5 | |
| ARTERITIS TAKAYASU | 4 | |
| Aortic arch arteritis | 3 | |
| Young female arteritis | 2 | |
| BRACHIOCEPHALIC ISCHEMIA | 2 | |
| TAKAYASUS ARTERITIS | 1 | |
| Reverse coarctation | 1 | |
| Idiopathic medial aortopathy and arteriopathy | 1 | |
| TAKAYASU ARTERIOPATHY | 0 | |
| Sclerosing aortitis and arteritis | 0 | |
| Occlusive thromboarteriopathy | 0 | |
| Raeder-Harbitz syndrome | 0 | |
| Reversed coarctation syndrome | 0 |
The concept "Takayasu's arteritis" and the synonyms for this concept as well as their occurrences in Medline. This concept has 29 synonyms and orthographic variants. The terms that are also listed in the MeSH vocabulary are indicated with "MeSH" in column 2. Note, that on the other extreme there are concepts with no synonym like "Psoriasis".
Performance of the ProMiner system.
| Mouse | 0.77 | 0.81 | 0.79 |
| Yeast | 0.97 | 0.84 | 0.90 |
| Fly | 0.83 | 0.80 | 0.82 |
| Fly | 0.74 | 0.83 | 0.79 |
| Human | 0.86 | 0.81 | 0.84 |
The performance of the ProMiner system for the organisms fly, mouse and yeast was evaluated in the BioCreAtIvE assessment. For the human dictionary we annotated a corpus of 250 abstracts which served as reference corpus to determine recall, precision and F-score. All names are only matched to a gene entry if the recognised synonym is associated only to one gene entry in the corresponding dictionary (called unique matches). Only for the fly organism BioCreAtIvE results with different parameter setting are visualised (c.f. 4th row). Here a recognised name in the text could be matched to up to three different gene entries if the recognised synonym is associated with theses entries.
Figure 1Disease query for the MeSH term Grave's disease. Screen shot for the MeSH term "Graves' disease". The first column (Gene) contains the gene name. Moving with the mouse over the gene name shows the synonyms in a tool-tip. Clicking on the gene name initiates a search for this gene in the concept space. The second column contains the RANK coefficient, the third column the number of corresponding PMIDs and the forth column contains a pull-down menu containing the PMIDs. Like in the Concept view it is possible to link to the PubMed abstracts via the "Go" or the "LIST" button showing the abstract corresponding to the actual PMID or all abstracts respectively. The last column contains links to Entrez Gene, Ensembl and Swiss-Prot for further information about the current gene or protein.
Figure 2Result for the search of the string "*neuropathy*". Screen shot for the search of the string "*neuropathy*". In the first section all genes are listed that contain the search string. The search term is highlighted in green. The following two sections show the matching MeSH terms and disease concepts respectively.
Figure 3Gene query for the Gene TNF. Screen shot for a gene query for the gene TNF. All autoimmune concepts and disease names that co-occur with TNF or one of its synonyms are listed on the page. Each line corresponds to a concept or disease. The second column shows the number of corresponding PMIDs indicating how frequent the terms are co-mentioned. The last column contains all PMIDs in a drop down list. Pressing the "Go" button links to the original abstract in the PubMed database of the PMID currently shown in the drop down list. Pressing the "List" button opens a new window containing all abstracts contained in the drop down list.
Assessment of the "Top 50 genes" page.
| 35 | 9 | 6 | 88% (44/50) | ||
| 34 | 11 | 5 | 90% (45/50) |
The top 50 genes from the AIDB were evaluated in comparison to GAD (# genes found in GAD, correct genes not found in GAD and falsely recognised genes). The resulting precision of gene recognition is 88% for the RANK based method and 90% for the frequency based method. Erroneously recognised genes are listed in the last column.
Evaluation of the database content in comparison to the Genetic Association Database (GAD).
| multiple sclerosis | 57 | 84% (48/57) | 82 | 73% (60/82) |
| Graves' disease | 14 | 100% (14/14) | 27 | 66% (18/27) |
| Addison's Disease | 2 | 100% (2/2) | 3 | 100% (3/3) |
| sarcoidosis | 10 | 100% (10/10) | 13 | 69% (9/13) |
| myasthenia gravis | 6 | 66% (4/6) | 8 | 38% (3/8) |
| alopecia areata | 4 | 75% (3/4) | 4 | 75% (3/4) |
| Crohn's Disease | 14 | 93% (13/14) | 29 | 93% (27/29) |
| Psoriasis | 16 | 94% (15/16) | 30 | 73% (22/30) |
| Ulcerative Colitis | 11 | 90% (10/11) | 16 | 81% (13/16) |
| Behcet's Disease | 8 | 100% (8/8) | 20 | 85% (17/20) |
| Narcolepsy | 4 | 100% (4/4) | 7 | 86% (6/7) |
| Inflamatory bowel Disease | 17 | 94% (16/17) | 24 | 92% (22/24) |
| - | - |
The evaluation results of the AIDB content using gene disease associations referenced in the GAD for the diseases 'multiple sclerosis' and 'Graves' disease' as well as 10 randomly selected diseases are shown. Column 1 lists the number of gene-disease associations extracted out of the GAD database. Column 2 shows the number of these associations found in the AIDB. Column 3 lists the number of PubMed references in GAD and column 4 lists the number of correctly retrieved PubMed references in the AIDB.
Error analysis for false negative recognition.
| Not in dictionary | Ambiguousgene name | Recognition error | Not in dictionary | ||
| 5 (33%) | 4 (27%) | 1 (6.7%) | 4 (27%) | 1 (6.7%) | |
| C6 | FAS | MS | |||
The error analysis of not recognised associations shows different error classes for missing gene name recognition (columns 1,2,3) and missing disease term recognition (column 4). The first row shows the number of missed references as well as the percentage, the second row shows examples for the different error classes.
Evaluation of the database content for the top 50 genes in multiple sclerosis.
| 3/50 | 67% (2/3) | 47/50 | 81% (38/47) | MMP27, IFNA16, ADAMTS14 | IBD2, Bw3, ABHs, THADA | |
| 24/50 | 79% (19/24) | 26/50 | 85% (22/26) | CCL5, MMP9, GFAP, CCL2, VCAM1, NOS2A | CD4, CD8A, CD86, CD25, CD28, CDR3 | |
Evaluation of the quality of the 50 highest ranked genes in multiple sclerosis for the RANK (first row) and the frequency (second row) based methods. The number of genes that are also in GAD (1st column) and the number of correctly retrieved PubMed references (2nd column) are listed in the left part of the table. The right part (column 3–6) contains the results for the genes that were exclusively found in AIDB. Examples for true recognition are given in column 5 and examples for false recognition are given in column 6.
Evaluation of the database content for autoimmune hypophysitis.
| PRL | 4 | Y | 11683401 | |
| Y | 6325687 | |||
| Y | 3923349 | |||
| Y | 16392184 | |||
| POMC | 3 | Y | 11683401 | |
| Y | 2840382 | |||
| Y | 1310997 | |||
| GNRH1 | 2 | Y | 3923349 | |
| Y | 2840382 | |||
| CD4 | 1 | N | 15493593 | CD4 T cell recognised |
| CYP19A1 | 1 | Y | 15493593 | |
| COL14A1 | 1 | N | 16425001 | "und" recognised as synonym |
| INA | 1 | Y | 15234547 | |
| LY75 | 1 | Y | 15493593 | |
| CTLA4 | 1 | Y | 16224277 | |
| PTPRC | 1 | Y | 15493593 | |
| TG | 1 | Y | 1310997 | |
| AGMX2 | 1 | N | 15963060 | "with growth hormone deficiency" as synonym |
| TNFRSF11B | 1 | Y | 15493593 | |
| TNFRSF25 | 1 | N | 7800142 | DR3 recognised as synonym |
| TPO | 1 | Y | 1310997 |
Results for autoimmune hypophysitis. The term "autoimmune hypophysitis" is not in the MeSH vocabulary. 81% of the documents were correctly retrieved.