| Literature DB >> 29358745 |
Winston A Haynes1,2,3, Aurelie Tomczak1,2, Purvesh Khatri4,5.
Abstract
We found tremendous inequality across gene and protein annotation resources. We observed that this bias leads biomedical researchers to focus on richly annotated genes instead of those with the strongest molecular data. We advocate that researchers reduce these biases by pursuing data-driven hypotheses.Entities:
Mesh:
Year: 2018 PMID: 29358745 PMCID: PMC5778030 DOI: 10.1038/s41598-018-19333-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Inequality in gene annotations. (A) We measured the Gini coefficient across a variety of gene annotation resources. (B) We compared the growth in the Gini coefficient of the Gene Ontology to different models of increasing and decreasing inequality. See also Figure S1.
Figure 2Published Disease-Gene Associations Not Reflected in Molecular Data. (A) The number of publications for every disease-gene pair was not significantly correlated with the gene expression multicohort analysis effect size FDR rank [Spearman’s correlation = −0.003, p = 0.836]. (B) The number of publications for every disease-gene pair correlated with the number of non-inferred from electronic annotation (non-IEA) Gene Ontology annotations [Spearman’s correlation = 0.110, p = 2.1e–16]. Orange points represent disease-gene associations published in our prior meta-analyses[27,30,37]. Purple points have at least 1000 publications. See also Figure S2.