| Literature DB >> 17403693 |
Steven Van Vooren1, Bernard Thienpont, Björn Menten, Frank Speleman, Bart De Moor, Joris Vermeesch, Yves Moreau.
Abstract
Biomedical literature provides a rich but unstructured source of associations between chromosomal regions and biomedical concepts. By mining MEDLINE abstracts, we annotate the human genome at the level of cytogenetic bands. Our method creates a set of chromosomal aberration maps that associate cytogenetic bands to biomedical concepts from a variety of controlled vocabularies, including disease, dysmorphology, anatomy, development and Gene Ontology branches. The association between a band (e.g. 4p16.3) and a concept (e.g. microcephaly) is assessed by the statistical overrepresentation of this concept in the abstracts relating to this band. Our method is validated using existing genome annotation resources and known chromosomal aberration maps and is further illustrated through a case study on heart disease. Our chromosomal aberration maps provide diagnostics support to clinical geneticists, aid cytogeneticists to interpret and report cytogenetic findings and support researchers interested in human gene function. The method is available as a web application, aBandApart, at http://www.esat.kuleuven.be/abandapart/.Entities:
Mesh:
Year: 2007 PMID: 17403693 PMCID: PMC1885641 DOI: 10.1093/nar/gkm054
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The most frequently occurring species in a set of 36 082 cytogenetic MEDLINE abstracts mentioning cytogenetic bands
| Rank | Phrase | Rank | Phrase |
|---|---|---|---|
| 14 865 | Human | 126 | Pig |
| 3664 | Mouse | 107 | Primates |
| 1252 | Rat | 98 | Papillomavirus |
| 590 | Rodent | 70 | Cat |
| 474 | Hamster | 70 | Bacteria |
| 240 | Bovine | 68 | Zebrafish |
| 214 | Melanogaster | 67 | Sheep |
| 183 | Chicken | 63 | Canine |
| 178 | Porcine | 63 | Troglodytes |
| 135 | Rabbit | 61 | Monkey |
Different controlled vocabularies in aBandApart
| Name | Function | Example | Size |
|---|---|---|---|
| MeSH | Medical subject headings | Chemicals, medical concepts | 16.998 |
| GO.B | Biological processes | ‘Cell growth’, ‘signal transduction’ | 1.120 |
| GO.C | Cellular components | ‘Proteasome’, ‘nucleus’ | 402 |
| GO.M | Molecular functions | ‘ATPase activity’ | 701 |
| GO.E | Gene ontology | All of the above | 2.170 |
| LDDB | London dysmorphology database | ‘Microcephaly’ or ‘small head’ | 808 |
| OMIM | Genetic disorders | ‘Attention deficit hyperactivity disorder’ | 1.716 |
| CBIL | Human anatomy | ‘Heart muscle’ | 303 |
| OHDA | Embryo development | ‘Early stage, fetus’ | 380 |
| TDMS.s | Systems, tissues and sites | ‘Cardiovascular system’ | 392 |
| TDMS.l | Microscopic lesions | ‘Disseminated intravascular coagulation’ | 204 |
A total of 11 vocabularies are present, shown above with an example concept and the number of concepts in each vocabulary.
Five most relevant hits for query heart on vocabulary CBIL
| Band name | BC | B | |
|---|---|---|---|
| 22q11 | 164 | 1092 | 0 |
| 22q11.2 | 83 | 755 | 1.28e−26 |
| 20p12 | 19 | 113 | 3.03e−10 |
| 21q22.2 | 16 | 171 | 5.88e−06 |
| 7q11.23 | 20 | 301 | 1.12e−04 |
The concept heart has a total of 1324 documents associated to it. The four columns show the hit, the number of documents that are linked to both band and concept, the number of documents linked to the band (hit) and the P-value.
Highly significant hits (P-value <0.01) for query 7q11.23 on vocabulary CBIL
| Concept | BC | B | |
|---|---|---|---|
| Valve | 5 | 51 | 8.23e−7 |
| Connective tissue | 6 | 96 | 2.64e−6 |
| Aorta | 5 | 70 | 5.43e−6 |
| Metencephalon | 1 | 2 | 3.92e−5 |
| Heart | 20 | 1324 | 1.12e−4 |
| Hepatocyte | 3 | 79 | 1.58e−3 |
| Carotid artery | 1 | 10 | 1.71e−3 |
| Pons | 1 | 13 | 2.92e−3 |
| Tonsil | 1 | 14 | 3.40e−3 |
| Artery | 3 | 120 | 7.06e−3 |
| Penis | 1 | 22 | 8.34e−3 |
| Cardiovascular system | 1 | 22 | 8.34e−3 |
| Brain | 23 | 2267 | 9.16e−3 |
| Skeletal muscle | 9 | 664 | 9.78e−3 |
| Midbrain | 1 | 24 | 9.88e−3 |
The band 7q11.23 has a total of 301 documents associated to it. The four columns show the hit, the number of documents that are linked to both band and concept, the number of documents linked to the concept (hit), and the P-value.
NIH book validation for chromosome 1
| Gene | Disease/concept | H | S | P | T | NIH | Top | |
|---|---|---|---|---|---|---|---|---|
| UROD | Porphyria cutanea tarda | 1 | 1 | 0 | 1 | 1p34.1 | 1p34 | 0.70E−4 |
| GBA | Gaucher disease | 1 | 1 | 1 | 1 | 1q21 | 1q21 | 2.41E−22 |
| GLC1A | Glaucoma | 1 | 1 | 1 | 0 | 1q24.3 | 1q24 | 2.21E−26 |
| HPC1 | Prostate cancer | 1 | 1 | 1 | 0 | 1q25.3 | 8p22 | 0.00E−0 |
| PS2 | Alzheimer disease | 0 | 1 | 1 | 0 | 1q42.13 | 1q42.1 | 0.24E−2 |
On this chromosome, five disease genes are annotated. Further columns indicate whether (H) the method assigned a highly significant P-value (<0.01) to the band to which the disease is actually associated, (S) whether it assigned a significant P-value (<0.05), (P) whether it delineated the band at the maximum level of karyotype resolution and (T) whether it rated the band as the most significant candidate for this disease, ranking higher or as high as all other bands.
Congenital malformation validation
| Malformation | Band | Type | ||
|---|---|---|---|---|
| Aortic stenosis | 11q23-24 | del | ||
| Hypoplastic left heart | 11q23-25 | del | ✓ | ✓ |
| Hypoplastic left heart | 16q11-12 | dup | ||
| Patent ductus arteriosus | 16q22 | dup | ✓ | ✓ |
| Pulmonary stenosis | 20p13-11 | del | ✓ | ✓ |
| Pulmonary stenosis | 22q11 | del | ✓ | ✓ |
| Pulmonary stenosis | 8q22-24 | dup | ||
| Tetralogy of fallot | 8q22-24 | dup | ✓ | ✓ |
| Truncus arteriosus | 22q11 | del | ✓ | ✓ |
| Truncus arteriosus | 2q22 | del | ||
| Ventricular septal defect | 22q11 | del | ✓ | ✓ |
| Ventricular septal defect | 4q31 | del | ✓ | |
| Ventricular septal defect | 8q24 | dup | ✓ |
All 13 cardiac anomalies discussed by Brewer et al. are shown. Check marks indicate the significance with which our method associated band and concept.