| Literature DB >> 29572502 |
Aurelie Tomczak1,2, Jonathan M Mortensen2, Rainer Winnenburg2, Charles Liu1, Dominique T Alessi2, Varsha Swamy1, Francesco Vallania1, Shane Lofgren1, Winston Haynes1, Nigam H Shah2, Mark A Musen2, Purvesh Khatri3,4.
Abstract
Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis - the ontology and the annotations - evolve rapidly. We used gene signatures derived from 104 disease analyses to systematically evaluate how enrichment analysis results were affected by evolution of the GO over a decade. We found low consistency between enrichment analyses results obtained with early and more recent GO versions. Furthermore, there continues to be a strong annotation bias in the GO annotations where 58% of the annotations are for 16% of the human genes. Our analysis suggests that GO evolution may have affected the interpretation and possibly reproducibility of experiments over time. Hence, researchers must exercise caution when interpreting GO enrichment analyses and should reexamine previous analyses with the most recent GO version.Entities:
Mesh:
Year: 2018 PMID: 29572502 PMCID: PMC5865181 DOI: 10.1038/s41598-018-23395-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overview of methods. We analyzed (1) changes in input variables of GO enrichment analyses and (2) how those changes affected enrichment analysis results over time.
Figure 2Gene ontology annotation developments, human genome, 2004 to 2015. (A) Number of GO annotations and their distribution across poorly characterized (blue) and well-characterized (gold) human genes over time. (B) GO annotation status of the human genome (2004 vs. 2015). Genes are classified by annotation status in uncharacterized (black) vs. poorly characterized (blue) vs. well characterized (gold). Only terms relevant for enrichment analysis results were counted (excluding: IEA, ND and cellular component). C) Comparison of the average information content (IC) of poorly characterized vs. well-characterized human genes in 2015 shows that the mean IC for genes with more annotations was higher (p = 4e-229). The same difference was observed in 2004 (p = 2e-19, Supplementary Figure 4).
Figure 3Significance of biological process GO terms over time with annual GO version updates (year of GO version = year of GO annotation version). Development of p-value significance in GO enrichment analysis result term sets in different GO versions are shown for subsets of significantly enriched biological process GO terms (p-value < 0.05 in at least one GO version) in three representative diseases: (A) influenza, (B) non-small cell lung cancer, and (C) pancreatic cancer. Terms belonging to selected top-level branches in the biological process ontology are indicated in color (e.g. cellular process in violet).
Figure 4Effect of ontology and annotation version on consistency and significance of GO enrichment analysis results. (A) Effect in influenza for the GO term response to interferon-gamma. (B) Number of human genes annotated with the GO term response to interferon-gamma (including all child terms) in influenza gene set vs. background. (C) Comparison of enrichment p-value and information content (IC) developments with annual updates of GO and GO annotations (year of GO version = year of GO annotation version) for response to interferon-gamma in influenza. (D) GO term enrichment significance for cell cycle in non-small cell lung cancer (see Supplementary Figure 8 for pancreatic cancer). (E) Number of human genes annotated with the GO term cell cycle (including child terms) in pancreatic and non-small cell lung cancer gene sets vs. background (human genome). (F) Comparison of enrichment p-value and IC developments with annual updates of GO and GO annotations for cell cycle in pancreatic and non-small cell lung cancer.
GO annotation and ontology versions used in this analysis.
| Year (GO version) | ‘04 | ‘05 | ‘06 | ‘07 | ‘08 | ‘09 | ‘10 |
|---|---|---|---|---|---|---|---|
| Annotation version | 2004/03/29 | 2005/02/11 | 2006/01/19 | 2007/01/13 | 2008/01/20 | 2009/01/18 | 2010/01/24 |
| Ontology version | 2004/02/14 | 2005/01/03 | 2006/01/03 | 2007/01/03 | 2008/01/05 | 2009/01/09 | 2010/01/05 |
|
|
|
|
|
|
|
| |
| Annotation version | 2011/01/01 | 2012/01/11 | 2013/01/08 | 2014/01/18 | 2015/01/08 | 2015/03/05 | |
| Ontology version | 2011/01/05 | 2012/01/04 | 2013/01/05 | 2014/01/07 | 2015/01/07 | 2015/03/13 |