| Literature DB >> 22931688 |
Vida Abedi1, Ramin Zand, Mohammed Yeasin, Fazle Elahi Faisal.
Abstract
BACKGROUND: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds "crisp semantic associations" among entities of interest - that is a step towards bridging such gaps.Entities:
Year: 2012 PMID: 22931688 PMCID: PMC3497588 DOI: 10.1186/1756-0381-5-13
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Figure 1Flow diagram of the hypothesis generation framework (HGF).A) In a medical and biological setting, Ontology Mapping could use the Medical Subject Heading (MeSH) and generate a context specific dictionary, which is one of the parameters of the POLSA model. Associated factors are ranked based on a User Query which can be any word(s) in the dictionary. These factors are subsequently grouped into three different bins (unknown factors, potential factors or established factors) based on our Disease Model. B) Ontology Mapping to create domain specific dictionary. C) Parameter Optimized Latent Semantic Analysis Module. D) Disease Model Module.
Potential risk factors and/or contributing factors selected by medical expert
| Asthma, autism, schizophrenia, HIV, immunological disorder, bipolar,
hypertension, osteoporosis, coronary heart disease (CHD), diabetes,
allergy, herpes, leukemia, breast cancer, lymphoma, hypothyroidism,
hyperthyroidism, insomnia, depression, viral infection, bacterial infection,
hepatitis B virus, retrovirus, enterovirus | Disease / medical condition |
| morning cortisol level, cholesterol level, head trauma, abdominal
adiposity, fracture, bone mineral density (BMD), body mass index (BMI),
pregnancy outcome, maternal influenza, postmenopause, mood, volume
of cerebrum, volume of hippocampus, volume of lateral ventricle, family
history, motor activity assessment | Sign / symptom |
| caffeine, hormone, aflatoxin, calcium deficiency or calcium overdose,
phosphorus deficiency or phosphorus overdose, magnesium deficiency
or magnesium overdose, sodium deficiency or sodium overdose, potassium
deficiency or potassium overdose, sulphur deficiency or sulphur overdose,
chloride deficiency or chloride overdose, chromium deficiency or chromium
overdose, copper deficiency or copper overdose, fluoride deficiency or
fluoride overdose, iodine deficiency or iodine overdose, iron deficiency or
iron overdose, manganese deficiency or manganese overdose, molybdenum
deficiency or molybdenum overdose, selenium deficiency or selenium
overdose, zinc deficiency or zinc overdose, vitamin A or Retinol, vitamin B1
or Thiamine, vitamin B2 or Riboflavin, vitamin B3 or Niacin, vitamin B5 or
Pantothenic acid, vitamin B6 or Pyridoxine, vitamin B7 or Biotin, vitamin, B9
or Folic acid, vitamin B12 or Cyanocobalamin, vitamin C or Ascorbic acid,
vitamin D or Calciferol, vitamin E or Tocopherol, vitamin K or Phylloquinone,
Cannabis, cocaine, bisphenol-A (PBA), diethylstilbestrol (DES), estradiol (E2),
oral contraceptive (OC) | Chemical compound |
| air pollutants, volatile organic compounds, Pesticide, chemical agents, wood dust (exposure), silica dust (exposure), night shift work, outdoor workers, indoor workers, exposure polycyclic aromatic hydrocarbons, heterosexual, homosexual, Tobacco smoking, alcohol consumption, health education and health promotion, addiction, lifestyle intervention, diet nutrition, stress, age gender, breast-feeding | Environmental / life style and behavioral factors |
Figure 2Model for the distribution of associated factors of a given disease. If associated factors – such as risk factors – of a disease are well known as in the case for Disease Y, then the two dominating distributions are the factors that are associated and those that are not associated with the disease; if on other hand the associated factors of a disease are not well documented (Disease X) then the dominating distribution is that of the potential factors.
Figure 3Number of factors identified by MedLink Neurology and by HGF for IS and PD. Association levels for IS measured by HGF are high (0.3 < cosine score) and possible (0.1 < cosine score < 0.3); association levels for PD measured by HGF are high (0.2 < cosine score), possible (0.1 < cosine score < 0.2) or low (0.05 < cosine score < 0.1).
Figure 4Distribution of similarity score (dashed line) for risk factors associated with IS and PD. The frequency represents the number of factors at each cosine similarity level (−1 to +1). Tri-modal distribution models are represented by solid lines.
A subset of factors identified only by the hypothesis generation framework
| Calcium/Minerals | 0.13 | [ | |
| Depression (morning cortisol level, mood,
stress) | 0.48, 0.18, and 0.12 | [ | |
| Vitamin E | 0.12 | [ | |
| Immunological disorders | 0.29 | [ | |
| Hyperthyroidism | 0.1 | [ |