| Literature DB >> 35830477 |
Chintan J Joshi1, Wenfan Ke2, Anna Drangowska-Way2, Eyleen J O'Rourke2, Nathan E Lewis1,3,4,5.
Abstract
The concept of "housekeeping gene" has been used for four decades but remains loosely defined. Housekeeping genes are commonly described as "essential for cellular existence regardless of their specific function in the tissue or organism", and "stably expressed irrespective of tissue type, developmental stage, cell cycle state, or external signal". However, experimental support for the tenet that gene essentiality is linked to stable expression across cell types, conditions, and organisms has been limited. Here we use genome-scale functional genomic screens together with bulk and single-cell sequencing technologies to test this link and optimize a quantitative and experimentally validated definition of housekeeping gene. Using the optimized definition, we identify, characterize, and provide as resources, housekeeping gene lists extracted from several human datasets, and 10 other animal species that include primates, chicken, and C. elegans. We find that stably expressed genes are not necessarily essential, and that the individual genes that are essential and stably expressed can considerably differ across organisms; yet the pathways enriched among these genes are conserved. Further, the level of conservation of housekeeping genes across the analyzed organisms captures their taxonomic groups, showing evolutionary relevance for our definition. Therefore, we present a quantitative and experimentally supported definition of housekeeping genes that can contribute to better understanding of their unique biological and evolutionary characteristics.Entities:
Mesh:
Year: 2022 PMID: 35830477 PMCID: PMC9312424 DOI: 10.1371/journal.pcbi.1010295
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.779
Data sources used for this study.
| Organism (sample type) | Data source | Modifications |
|---|---|---|
|
| ||
| Humans | Eisenberg and Levanon [ | Gene nomenclature within the original dataset was changed to NCBI Entrez gene identifiers. Due to issues related to mapping gene nomenclature, the original set of 3804 was reduced to 3688 genes. |
|
| ||
| Human (tissues) | HPA [ | - |
| Brawand et al., 2011 [ | Converted to TPM from read per base | |
| GTEx [ | ||
| Human (NCI-60 cancer cell lines) | CellMiner [ | - |
| Klijn et al. 2015 [ | - | |
| Cao et al., 2017 [ | - | |
| Chicken, Platypus, Orangutan, Bonobo, Gorilla, Chimpanzee, Macaque, Mouse, Opossum (tissues) | Brawand et al., 2011 [ | Converted to TPM from read per base using the following formula: |
| Chinese hamster ovaries (cell lines) | See | |
|
| ||
| Human (NCI-60 cancer cell lines)–CRISPR-Cas9 | DepMap [ | - |
| CHO cell lines–CRISPR-Cas9 | Xiong et al. 2020 [ | - |
| New data provided here by Eyleen J. O’Rourke; method described in Ke et al. 2018 [ | - | |
C. elegans essential genes have significantly lower GC than non-essential genes.
| Geneset | Geneset definition | Number of genes in the class | Number of genes with GC >0 | Sign test (median GCclass < median GCall) |
|---|---|---|---|---|
|
| High-confidence essential | 48 | 38 | 5.808 x 10−5 |
|
| Medium-confidence essential | 64 | 49 | 0.0427 |
|
| 112 | 87 | 4.5304 x 10−5 | |
|
| Wild-type | 1095 | 532 | 0.9814 |
|
| Untknown | 174 | 97 | 0.0335 |
|
| Untested | 94 | 58 | 0.347 |
(a) Numbers in this column are smaller than in column C because genes with GC equal to zero were excluded from the analysis.
(b) Unknown includes genes for which, we hypothesize due to strong effects on health and/or development, we were not able to generate large enough populations of worms for quantitative analyses. This hypothesis, and the observations that led us to propose it, are in agreement with the low GC−essentiality correlation p value observed for this class.
(c) Untested corresponds to core metabolic genes that were not tested due to lack of RNAi clone/s or other technical limitations.