Literature DB >> 27799467

OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines.

Wei-Hua Chen¹, Guanting Lu², Xiao Chen³, Xing-Ming Zhao³, Peer Bork^4,5,6,7.

Abstract

OGEE is an Online GEne Essentiality database. To enhance our understanding of the essentiality of genes, in OGEE we collected experimentally tested essential and non-essential genes, as well as associated gene properties known to contribute to gene essentiality. We focus on large-scale experiments, and complement our data with text-mining results. We organized tested genes into data sets according to their sources, and tagged those with variable essentiality statuses across data sets as conditionally essential genes, intending to highlight the complex interplay between gene functions and environments/experimental perturbations. Developments since the last public release include increased numbers of species and gene essentiality data sets, inclusion of non-coding essential sequences and genes with intermediate essentiality statuses. In addition, we included 16 essentiality data sets from cancer cell lines, corresponding to 9 human cancers; with OGEE, users can easily explore the shared and differentially essential genes within and between cancer types. These genes, especially those derived from cell lines that are similar to tumor samples, could reveal the oncogenic drivers, paralogous gene expression pattern and chromosomal structure of the corresponding cancer types, and can be further screened to identify targets for cancer therapy and/or new drug development. OGEE is freely available at http://ogee.medgenius.info.

Entities: Disease Species

Mesh：

Year: 2016 PMID： 27799467 PMCID： PMC5210522 DOI： 10.1093/nar/gkw1013

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Essential genes are those genes of an organism that are critical for its survival; essential genes are of particular importance because of their theoretical and practical applications such as studying the robustness of a biological system (1), defining a minimal genome/organism (2,3) and identifying effective therapeutic targets in pathogens (4–6) and human cancers (7–11). In recent years, the technologies used for gene essentiality studies have been evolving rapidly, ranging from low-throughput single gene knockout experiment (12,13) to high-throughput mutagenesis (3), RNAi (7,8) and more recently CRISPR-based genome editing methods (14–18); recent studies showed that CRISPR technology outperformed other methods (14,19), featuring low noise and minimal off-target effects (19). Being essential is not an intrinsic property of a gene; rather, it is highly dependent on a variety of factors including the function and expression pattern of the gene, the genetic background of the host, the environment and other settings. For example, genes coding for proteins involved in the biosynthesis of amino acids, nucleic acids and vitamins are essential for cell survival in minimal media, but not in rich media where the corresponding metabolites are supplied (20). In addition, different experimental methods may generate different results. For example, CRISPR-based methods could identify more essential genes than siRNA-based methods (21), while cell lines generate lower proportion of essential genes than in vivo if the same multi-cellular organism is used (22). Genes with variable essentiality statuses under different circumstances are referred to ‘conditionally essential genes (CEGs)’ or ‘differentially essential genes (DEGs)’ (14,22). CEG is a biologically meaningful and very important concept; e.g. genes that are essential in a cancer cell line but are non-essential in human tissues can reveal the oncogenic drivers, paralogous gene expression pattern and chromosomal structure of the corresponding cancer type (14). In 2012, we introduced OGEE v1 (22) to promote the concept of ‘conditional essentiality’, which had not been widely adopted by existing essential gene databases at the time, and to advance our understanding on gene essentiality. We did so by including not only essential and non-essential genes, but also associated gene properties that are known to affect gene essentiality; we provided tools that allow users to compare gene essentiality among different gene groups, or compare properties of essential genes to non-essential genes. In addition, we organized experimentally tested genes into data sets according to their sources and tagged those with variable essentiality statuses across data sets as CEGs. In this study we introduce an updated version of OGEE. In this new version we added new species and new data sets; we added genes with intermediate essentiality statuses (fitness genes) and non-coding essential genes. In addition, we re-organized the 16 gene essentiality data sets from human cancer cell lines corresponding to nine cancer types in order to help users to explore the shared and differentially essential genes within and between cancer types, because these genes, especially those derived from cell lines that are similar to tumor samples, could be further screened to identify targets for cancer therapy and/or new drug development.

DATA GENERATION

Collection and organization of genes tested for essentiality

We collected 99 large-scale gene essentiality experiments (data sets) for 48 species, including 34 data sets for 9 eukaryotes and 65 data sets for 39 prokaryotes. We added 1609 noncoding genes and 122 non-transcribed genomic regions from 10 species. In addition to essential and nonessential genes, we also included 1911 fitness genes from 10 species, and 37 growth-advanced genes from two species. Fitness genes are defined as those whose removal is not lethal but could result in significantly decreased fitness, while growth-advanced genes are defined as genes whose removal lead to significantly increased fitness. In the statistics below, fitness genes are counted as non-essential genes. In sum, our database contains 167 799 genes tested for essentiality from 48 species, increased significantly from the 91 436 genes and 24 species respectively from the last version (22). In total 43 961 genes are covered by multiple data sets (in each species the text-mining results are considered as a data set), representing ∼26.2% of all collected genes; among which 13 397 genes are CEGs, accounting for ∼30.5% of those covered by multiple data sets. The proportion of conditionally essential genes (PCEG) in species having more than 200 genes covered by two or more data sets ranges from 9.2% in Staphylococcus aureus subsp. aureus NCTC 8325 to 41.6% in Salmonella enterica subsp. enterica serovar Typhimurium str. SL1344, as shown in Table 1. The number of data sets does not seem to contribute significantly to PCEG (Pearson's correlation coefficient = −0.24, P-value = 0.36).

Table 1.

Statistics on conditionally essential genes in selected species with least 200 genes covered by multiple data sets in OGEE

Species	data sets	tested genes	essential genes	genes covered by multiple data sets	conditionally essential genes
Homo sapiens	18	21 556	7168	18 855	6985 (37.0%)
Schizosaccharomyces pombe	7	5509	1571	2522	279 (11.1%)
Drosophila melanogaster	2	13 781	408	437	141 (32.3%)
Pseudomonas aeruginosa UCBPP-PA14	4	5966	1842	5300	1455 (27.5%)
Escherichia coli K12	4	4322	740	4066	509 (12.5%)
Mycobacterium tuberculosis H37Rv	5	4008	1028	4002	1388 (34.7%)
Salmonella enterica subsp. enterica serovar Typhimurium str. SL1344	4	3774	1514	2715	1130 (41.6%)
Staphylococcus aureus subsp. aureus NCTC 8325	2	2899	557	2713	250 (9.2%)
Haemophilus influenzae Rd KW20	4	1750	847	1634	617 (37.8%)
Mycoplasma pneumoniae M129	2	1203	508	1203	196 (16.3%)

Species having at least 200 genes covered by multiple data sets are listed here; the species are ordered first by the kingdom they are in (the 1st column) and then by the number of genes covered by multiple data sets (the 5th column). Essential genes are those that are essential in any collected data sets, i.e. genes that are essential in one data set but non-essential in others are also counted. The proportion of conditionally essential genes (PCEG, percentage in parentheses of the last column) is calculated as the ratio between the ‘conditionally essential genes’ (the last column) and the ‘genes covered by multiple data sets’ (the 5th column). Please note that text-mining results, if available, will be counted as one data set in a species; please consult the ‘Browse’ page of the database for a complete and interactive version of the table.

Collection of gene properties influencing gene essentiality

We also collected several gene properties that are known to influence gene essentiality, including duplication status (23), the number of homologous genes (family size) in the same genome, connectivity in protein–protein interaction (PPI) networks (defined as the number of direct neighbors) (24), functional category of a gene (25) and the earliest expression stage during embryonic development. We used the BLAST tool (26) to search for duplicated genes within each genome using parameters and cutoffs described previously (27), and calculated the family size for each duplicated gens accordingly. We also calculated evolutionary measurements for each duplicated gene and its best BLAST hit, including Ka, Ks and Ka/Ks ratio using KaKs_Calculator (v2.0) (28). We obtained the PPI data from STRING v10.0 (29), the functional category data from Gene Ontology (30) and the expression data (for multi-cellular organisms only) from NCBI UniGene database (31). For more information, please consult the ‘Help’ page of the database.

BUILT-IN TOOLS FOR ANALYZING COLLECTED GENE PROPERTIES

We also provided integrated tools in the ‘Analyze’ page for users to analyze the impact of the collected gene properties on gene essentiality: users can divide genes into distinct groups according to one of the available properties, calculate the proportion of essential genes (PE) in each group, and then plot the results as bar-chart. To illustrate this feature, we plotted in Figure 1 the PE values of different groups of mouse genes as functions of their involvement in development (Figure 1A) and duplication status (Figure 1B). These results showed that that developmental genes are more essential than non-developmental genes, while singletons are more essential than duplicated genes, consisting to previous results (23,25); these trends are generally true in other species.

Figure 1.

Screenshots taken from the ‘Analyze’ page. With integrated tools, users can easily analyze the collected data and visualize the results. Shown here are the proportion of essential genes (PE) as a function of involvement in development (developmental versus non-developmental genes, panel (A) and duplication statuses (duplicates versus singlets, panel (B)) in mouse.

RE-ORGANIZATION OF ESSENTIAL GENES FROM HUMAN CANCER CELL LINES

In recent years, ‘conditional essentiality’ or ‘differential essentiality’ has been increasingly used as a tool for researchers to interrogate genes that are essential under specific conditions and search for genes required by the survival of human cancer cell lines (7,9–11,14,16). In OGEE we collected in total 16 such data sets and re-organized them into 9 groups according to their cancers of origin, including breast cancer, Burkitt's lymphoma, chronic myelogenous leukemia (CML), colon cancer, esophageal squamous carcinoma, glioblastoma (GBM), non-small cell lung cancer (NSCLC), ovarian cancer and pancreatic cancer. Shared and differentially essential genes within and between cancer types were pre-calculated. Shown in Table 2 are the brief summary on the nine cancers, including the number of data sets for each cancer, the number of total essential genes and the number of uniquely essential genes. Here, the ‘uniquely essential genes’ are defined as those that are non-essential in any other human data sets available in OGEE. An up-to-date version of this table and additional results can be found in the ‘Cancer’ page of our website.

Table 2.

Summary of the 16 gene essentiality data sets from 9 human cancers collected in OGEE

Cancer	Data sets	Essential genes	Uniquely essential genes
breast	1	146	67
Burkitt's lymphoma	2	1897	198
CML	4	3210	1324
colon	2	1394	899
esophageal squamous	1	41	34
GBM	1	21	14
NSCLC	1	28	20
ovarian	2	130	87
pancreatic	2	199	126

‘Essential genes’ (the 3rd column) are genes that are essential in any of the data set(s) of a particular cancer type; ‘Uniquely essential genes’ (the last column) are genes that are subset of ‘Essential genes’ but non-essential in any other collected human data sets. An up-to-date version of this table can be found at http://ogee.medgenius.info/cancer/. Lineage-specific essential genes, i.e. those that are essential only in a particular cancer type, are important targets for cancer therapies; because they are likely the results of the unique mutational profile and subsequent functional consequences of the cell line, targeting these genes in cancer therapies will achieve high efficiency and specificity. Cancer cell lines are often used as models for cancer research. However, recent studies suggest that although some cell lines are indeed good models for cancers, some other cancer cell lines could have pronounced differences as compared to tumor samples of the same origin in terms of copy-number changes, key mutations and mRNA expression profiles, due in part to ambiguity in classification and annotation (32,33). Thus, in the future, we will exclude cell lines that are remarkably different from cancers of the same origin, and keep only the good ones, should such information are reliable and easily accessible.

DATA ACCESS

All data are freely accessible to all academic users. This work is licensed under a Creative Commons Attribution 3.0 Unported License (CC BY 3.0). Users can download combined data from the ‘Downloads’ page. Users can also download individual data sets or combined data sets for individual species in the ‘Browse’ page.

CONCLUSIONS

In this article, we introduced OGEE v2, an online gene essentiality database. Updates since the last updated version include increased numbers of species and gene essentiality data sets, inclusion of non-coding essential sequences and fitness genes. We also re-organize the essentiality data sets from nine human cancers so that our users can easily explore the shared and differentially essential genes within and between cancer types. As compared with existing gene essentiality databases such as DEG (34), OGEE provides several unique features. For example, (i) OGEE provides both essential and non-essential genes from large-scale as well as small-scale studies; (ii) OGEE introduces ‘conditional essentiality’ to reflect the complexity of biological systems and the interplay between gene functions, genetic backgrounds and environments; (iii) OGEE lists a variety of gene properties known to influence gene essentiality; (iv) OGEE provides a set of online tools to explore and analyze the data and to visualize the results. We thus believe that OGEE should be highly useful to biologists and bioinformaticians studying gene essentiality, whether focusing on individual genes or on genome-wide analyses. In the future, we aim to update OGEE regularly in order to provide up-to-date contents to our users.

33 in total

Review 1. Cre recombinase: the universal reagent for genome tailoring.

Authors: A Nagy
Journal: Genesis Date: 2000-02 Impact factor: 2.487

2. Targeted Chromosomal Translocations and Essential Gene Knockout Using CRISPR/Cas9 Technology in Caenorhabditis elegans.

Authors: Xiangyang Chen; Mu Li; Xuezhu Feng; Shouhong Guang
Journal: Genetics Date: 2015-10-19 Impact factor: 4.562

3. A Comprehensive, CRISPR-based Functional Analysis of Essential Genes in Bacteria.

Authors: Jason M Peters; Alexandre Colavin; Handuo Shi; Tomasz L Czarny; Matthew H Larson; Spencer Wong; John S Hawkins; Candy H S Lu; Byoung-Mo Koo; Elizabeth Marta; Anthony L Shiver; Evan H Whitehead; Jonathan S Weissman; Eric D Brown; Lei S Qi; Kerwyn Casey Huang; Carol A Gross
Journal: Cell Date: 2016-05-26 Impact factor: 41.582

4. Essential gene identification and drug target prioritization in Aspergillus fumigatus.

Authors: Wenqi Hu; Susan Sillaots; Sebastien Lemieux; John Davison; Sarah Kauffman; Anouk Breton; Annie Linteau; Chunlin Xin; Joel Bowman; Jeff Becker; Bo Jiang; Terry Roemer
Journal: PLoS Pathog Date: 2007-03 Impact factor: 6.823

5. STRING v10: protein-protein interaction networks, integrated over the tree of life.

Authors: Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi P Tsafou; Michael Kuhn; Peer Bork; Lars J Jensen; Christian von Mering
Journal: Nucleic Acids Res Date: 2014-10-28 Impact factor: 16.971

6. Defining a minimal cell: essentiality of small ORFs and ncRNAs in a genome-reduced bacterium.

Authors: Maria Lluch-Senar; Javier Delgado; Wei-Hua Chen; Verónica Lloréns-Rico; Francis J O'Reilly; Judith Ah Wodke; E Besray Unal; Eva Yus; Sira Martínez; Robert J Nichols; Tony Ferrar; Ana Vivancos; Arne Schmeisky; Jörg Stülke; Vera van Noort; Anne-Claude Gavin; Peer Bork; Luis Serrano
Journal: Mol Syst Biol Date: 2015-01-21 Impact factor: 11.429

7. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens.

Authors: Wei Li; Han Xu; Tengfei Xiao; Le Cong; Michael I Love; Feng Zhang; Rafael A Irizarry; Jun S Liu; Myles Brown; X Shirley Liu
Journal: Genome Biol Date: 2014 Impact factor: 13.583

8. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies.

Authors: Dapeng Wang; Yubin Zhang; Zhang Zhang; Jiang Zhu; Jun Yu
Journal: Genomics Proteomics Bioinformatics Date: 2010-03 Impact factor: 7.691

9. Human monogenic disease genes have frequently functionally redundant paralogs.

Authors: Wei-Hua Chen; Xing-Ming Zhao; Vera van Noort; Peer Bork
Journal: PLoS Comput Biol Date: 2013-05-16 Impact factor: 4.475

10. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements.

Authors: Hao Luo; Yan Lin; Feng Gao; Chun-Ting Zhang; Ren Zhang
Journal: Nucleic Acids Res Date: 2013-11-15 Impact factor: 16.971

56 in total

Review 1. Genetic Network Complexity Shapes Background-Dependent Phenotypic Expression.

Authors: Jing Hou; Jolanda van Leeuwen; Brenda J Andrews; Charles Boone
Journal: Trends Genet Date: 2018-06-11 Impact factor: 11.639

2. Human C-to-U Coding RNA Editing Is Largely Nonadaptive.

Authors: Zhen Liu; Jianzhi Zhang
Journal: Mol Biol Evol Date: 2018-04-01 Impact factor: 16.240

3. Common homozygosity for predicted loss-of-function variants reveals both redundant and advantageous effects of dispensable human genes.

Authors: Antonio Rausell; Yufei Luo; Marie Lopez; Yoann Seeleuthner; Franck Rapaport; Antoine Favier; Peter D Stenson; David N Cooper; Etienne Patin; Jean-Laurent Casanova; Lluis Quintana-Murci; Laurent Abel
Journal: Proc Natl Acad Sci U S A Date: 2020-06-02 Impact factor: 11.205

Review 4. Review: Genomics of bull fertility.

Authors: Jeremy F Taylor; Robert D Schnabel; Peter Sutovsky
Journal: Animal Date: 2018-04-05 Impact factor: 3.240

5. In vivo and in vitro human gene essentiality estimations capture contrasting functional constraints.

Authors: Jose Luis Caldu-Primo; Jorge Armando Verduzco-Martínez; Elena R Alvarez-Buylla; Jose Davila-Velderrain
Journal: NAR Genom Bioinform Date: 2021-07-13

6. The Transposable Element Environment of Human Genes Differs According to Their Duplication Status and Essentiality.

Authors: Margot Correa; Emmanuelle Lerat; Etienne Birmelé; Franck Samson; Bérengère Bouillon; Kévin Normand; Carène Rizzon
Journal: Genome Biol Evol Date: 2021-05-07 Impact factor: 3.416

7. On the relation of gene essentiality to intron structure: a computational and deep learning approach.

Authors: Ethan Schonfeld; Edward Vendrow; Joshua Vendrow; Elan Schonfeld
Journal: Life Sci Alliance Date: 2021-04-27

8. An interactome perturbation framework prioritizes damaging missense mutations for developmental disorders.

Authors: Siwei Chen; Robert Fragoza; Lambertus Klei; Yuan Liu; Jiebiao Wang; Kathryn Roeder; Bernie Devlin; Haiyuan Yu
Journal: Nat Genet Date: 2018-06-11 Impact factor: 38.330

Review 9. Functions of Bacterial tRNA Modifications: From Ubiquity to Diversity.

Authors: Valérie de Crécy-Lagard; Marshall Jaroch
Journal: Trends Microbiol Date: 2020-07-25 Impact factor: 17.079

10. Genome-scale metabolic network reconstruction of model animals as a platform for translational research.

Authors: Hao Wang; Jonathan L Robinson; Pinar Kocabas; Johan Gustafsson; Mihail Anton; Pierre-Etienne Cholley; Shan Huang; Johan Gobom; Thomas Svensson; Mattias Uhlen; Henrik Zetterberg; Jens Nielsen
Journal: Proc Natl Acad Sci U S A Date: 2021-07-27 Impact factor: 11.205