Literature DB >> 18974178

DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes.

Ren Zhang1, Yan Lin.   

Abstract

Essential genes are those indispensable for the survival of an organism, and their functions are therefore considered a foundation of life. Determination of a minimal gene set needed to sustain a life form, a fundamental question in biology, plays a key role in the emerging field, synthetic biology. Five years after we constructed DEG, a database of essential genes, DEG 5.0 has significant advances over the 2004 version in both the number of essential genes and the number of organisms in which these genes are determined. The number of prokaryotic essential genes in DEG has increased about 10-fold, mainly owing to genome-wide gene essentiality screens performed in a wide range of bacteria. The number of eukaryotic essential genes has increased more than 5-fold, because DEG 1.0 only had yeast ones, but DEG 5.0 also has those in humans, mice, worms, fruit flies, zebrafish and the plant Arabidopsis thaliana. These updates not only represent significant advances of DEG, but also represent the rapid progress of the essential-gene field. DEG is freely available at the website http://tubic.tju.edu.cn/deg or http://www.essentialgene.org.

Entities:  

Mesh:

Year:  2008        PMID: 18974178      PMCID: PMC2686491          DOI: 10.1093/nar/gkn858

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Essential genes are those indispensable for the survival of an organism under certain conditions, and the functions they encode are therefore considered a foundation of life. Essential genes of an organism constitute its minimal gene set, which is the smallest possible group of genes that would be sufficient to sustain a functioning cellular life form under the most favorable conditions (1–3). Determination of the minimal gene set for an organism addresses a conceptually important question: what are the basic functions needed to sustain a life form, and therefore the minimal-gene-set concept plays a key role in the emerging field, synthetic biology (4). Essential-gene studies are of interest for practical reasons as well. For instance, essential genes, because of lethality from their disruptions, are attractive targets of antibiotics (5). Some essential genes that are conserved across species are candidates for broad-spectrum drug targets, whereas those specific for one bacterium are candidates for species-specific ones. In 2004, we constructed DEG 1.0, a database of essential genes (6). In the past five years, fueled by the accumulation of sequenced genomes, sophisticated genome-wide mutagenesis techniques (7), and the burgeoning field of synthetic biology (8–10), significant advances have been made in determining essential genes in a wide range of organisms. This paper represents an update, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes.

SUMMARY OF DATABASE UPDATES

In parallel to the rapid progress of the essential-gene field, DEG 5.0 has significant advances over DEG 1.0 by the following changes: The number of prokaryotic essential genes has increased about 10-fold, from 543 to 5260 (Table 1). (i) In DEG 1.0, some essential genes, e.g. those in Escherichia coli, were collected from literature searches, but in DEG 5.0 these records were replaced by those determined by genome-wide studies using the genetic footprinting technique (11) and systematic gene knockout experiments (12), (ii) In DEG 1.0, some essential genes, e.g. those in Haemophilus influenzae were determined by theoretical prediction from comparative genomics studies (13), but in DEG 5.0 these records were replaced by those determined by genome-wide studies using global transposon mutagenesis (14) and (iii) In 2004, only two genome-wide studies in identifying bacterial essential genes were done, but now 12 have been finished.
Table 1.

Contents of DEG version 5.0

No.KingdomOrganismEssential gene no.MethodSaturated or near saturatedReferences
1ProkaryoteAcinetobacter baylyi499Single-gene deletionsY(23)
2ProkaryoteBacillus subtilis271Single-gene deletionsY(24)
3ProkaryoteEscherichia coli712Genetic footprinting and single-gene deletionsY(11,12)
4ProkaryoteFrancisella novicida392Transposon mutagenesisY(25)
5ProkaryoteHaemophilus influenzae642Transposon mutagenesisY(14)
6ProkaryoteHelicobacter pylori323Transposon mutagenesisY(26)
7ProkaryoteMycobacterium tuberculosis614Transposon mutagenesisY(27)
8ProkaryoteMycoplasma genitalium381Transposon mutagenesisY(18)
9ProkaryoteMycoplasma pulmonis310Transposon mutagenesisY(28)
10ProkaryotePseudomonas aeruginosa335Transposon mutagenesisY(29)
11ProkaryoteSalmonella typhimurium230Insertional-duplication mutagenesisY(31)
12ProkaryoteStaphylococcus aureus302Antisense RNAN(21,22)
13ProkaryoteStreptococcus pneumoniae244Single-gene deletions and allelic replacement mutagenesisN(19,20)
14ProkaryoteVibrio cholerae5Transposon mutagenesisN(40)
15EukaryoteArabidopsis thaliana777T-DNA insertionN(36)
16EukaryoteCaenorhabditis elegans294RNA interferenceN(34)
17EukaryoteDanio rerio288Insertion mutagenesisN(35)
18EukaryoteDrosophila melanogaster339P-element insertion mutagenesisN(33)
19EukaryoteHomo sapiens118Literature searchN(38)
20EukaryoteMus musculus2114Literature searchN(37)
21EukaryoteSaccharomyces cerevisiae878Single-gene deletionsY(32)

aIn some cases, there is a slight difference between the essential-gene number reported and that in DEG. This is mainly due to annotation changes in the latest versions of genomes such that some records could not be found, or because reported results contain identical records or are only partially published.

Contents of DEG version 5.0 aIn some cases, there is a slight difference between the essential-gene number reported and that in DEG. This is mainly due to annotation changes in the latest versions of genomes such that some records could not be found, or because reported results contain identical records or are only partially published. The number of essential genes in eukaryotes has increased more than 5-fold, from 878 to 4808, because DEG 1.0 only had yeast essential genes, but DEG 5.0 also has those in humans, mice, worms, fruit flies, zebrafish and the plant Arabidopsis thaliana.

DATABASE DESCRIPTION

Essential genes in prokaryotes

Determination of a minimal gene set for cellular life was made possible by the availability of the first two completely sequenced genomes from the bacteria Mycoplasma genitalium (15) and H. influenzae (16). An attempt to determine a minimal gene set was pioneered by Koonin and coworkers by comparing these two sequenced genomes that belong to two ancient bacterial lineages, based on a notion that genes that are conserved between them are likely essential for cellular functions (13). In 1999, Venter's group performed the first global transposon mutagenesis in M. genitalium to experimentally address the question of what is the minimal gene set for a living organism (17), and about 300 genes were estimated to be essential and were included in DEG 1.0. However, the concept of global transposon mutagenesis is in fact based on the identification of non-essential genes, i.e. those disrupted by transposons are identified, and those not disrupted are considered essential. Therefore, to gain the proof of gene dispensability in M. genitalium, Venter's group isolated and characterized every Tn4001 insertion mutants that were present in individual colonies picked from agar plates (18). Consequently, 382 genes were demonstrated to be essential, and these genes were included in DEG 5.0 by replacing those in version 1.0. A high-density transposon mutagenesis strategy was also applied to H. influenzae (14), and the essential genes so obtained replaced corresponding records in DEG 1.0, which were determined by comparative genomics (13). In DEG 1.0, essential genes of E. coli were collected from http://magpie.genome.wisc.edu/~chris/essential.html, in which essential genes were obtained by searching related literatures. Using a genetic footprinting technique, Gerdes et al. (11) conducted a genome-wide, comprehensive experimental assessment of the E. coli genes necessary for robust aerobic growth, and consequently, 620 genes were identified to be essential. In addition, the Keio collection contains 303 essential genes that were determined by systematic single-gene knockout experiments (12). Therefore, in DEG 5.0, essential gene records obtained by literature search were replaced by those obtained through both genome-wide mutagenesis studies (11) and systematic single-gene knockout experiments (12), except that only one copy is retained for the 205 genes that overlap between the two studies. About 100 Streptococcus pneumoniae essential genes were determined by a high-throughput gene disruption system (19). Later, 133 essential genes were determined by allelic replacement mutagenesis (20). In DEG 5.0, the two results were combined by removing redundant records, resulting in 244 essential genes in S. pneumoniae. DEG 1.0 contained 65 Staphylococcus aureus essential genes determined by using antisense RNA technique (21), and DEG 5.0 now contains 302 S. aureus essential genes by combining with results from the studies using the rapid shotgun antisense RNA method (22). In the past several years, many genome-wide mutagenesis studies have been performed in a wide range of bacteria. In addition to those mentioned above, DEG 5.0 contains essential genes determined by large-scale single-gene deletion studies in Acinetobacter baylyi (23) and Bacillus subtilis (24), those determined by global transposon mutagenesis in Francisella novicida (25), Helicobacter pylori (26), Mycobacterium tuberculosis (27), Mycoplasma pulmonis (28) and Pseudomonas aeruginosa (29,30), and those determined by trapping lethal insertions in Salmonella typhimurium (31).

Essential genes in eukaryotes

Another major improvement in DEG 5.0 is the inclusion of essential genes of many eukaryotes, including animals and the plant A. thaliana, whereas the only eukaryotic species in DEG 1.0 was Saccharomyces cerevisiae (32). The goal of determining bacterial minimal gene set also applies to eukaryotes, i.e. to define a minimal gene set needed to produce a living multicellular organism or a viable plant. Although this goal is obviously too ambitious at the current stage, much effort has already been devoted in the identification of essential genes in eukaryotes. In the Drosophila genome, about 25% of genes were disrupted by P-element insertions by The Berkeley Drosophila Genome Project (33), and those genes whose disruption had lethal phenotypes were collected in DEG 5.0. In the Caenorhabditis elegans genome, using the RNA interference, Kamath et al. (34) inhibited the activity of about 86% of all genes, and characterized their phenotypes, and genes whose inhibition were lethal were included in DEG 5.0. Hopkins and coworkers conducted a large-scale insertional mutagenesis in zebrafish to identify genes essential for embryonic and early larval development (35), and the identified essential genes were collected in DEG 5.0. The first large-scale identification of essential genes in a flowering plant was performed by Meinke and coworker in A. thaliana by characterizing a large number of T-DNA insertion lines (36), and the identified essential genes were collected in DEG 5.0. Large-scale gene inactivation studies have not been performed in mice, likely due to technical difficulties and labor intensiveness, however, because mice are probably the most important model organism, a large number of genes have already been inactivated by individual laboratories. In a study comparing essentiality between duplicate genes and singleton genes, Liao and Zhang (37) analyzed nearly 3900 individually inactivated mouse genes, and found that about 55% were essential in both singletons and duplicates. The essential genes analyzed in this study were collected in DEG 5.0. In another study comparing human and mouse essential genes, Liao and Zhang (38) extensively reviewed literatures to find genes whose null mutations in humans are lethal, and these human essential genes were also collected in DEG 5.0.

User interface and data access

The whole database is divided into two subdatabases, those of prokaryotic and eukaryotic essential genes. Each entry has a unique DEG identification number, gene name, gene reference number, gene function, and DNA and protein sequences. For prokaryotic essential genes, a link to the COG information (39) is also provided. All information is stored and operated by an open-source database management system, MySQL, which allows rapid data retrieval. There are several ways by which users can have access to the data. Users can browse the essential gene records, and can also search for essential genes by their names, functions, accession numbers and organisms. In addition, users can also perform BLAST searches against DEG for query DNA or protein sequences. Because the database is composed of two subdatabases, i.e. those for prokaryotes and eukaryotes, users need to perform the functions of Browse, Search and BLAST in individual databases. In addition, the whole database can also be downloaded upon request.

CONCLUSION AND FUTURE DEVELOPMENT

DEG 5.0 has significant advances over DEG 1.0 in both the number of essential genes and the number of organisms in which these genes are determined. These updates not only represent significant advances over the 2004 version of DEG, but also represent the rapid progress of the essential-gene field. In future, in prokaryotes, fueled by the availability of more and more complete genomes and the emerging field, synthetic biology, it is expected that the increase in the essential gene number will accelerate, whereas in eukaryotic model organisms, because most gene essentiality screens are far from saturated, the number of essential genes is also expected to grow. These advances will be reflected timely by DEG future updates. We welcome users' comments, corrections and new information, which will be used for updating. DEG is freely available at the website http://tubic.tju.edu.cn/deg or http://www.essentialgene.org.

FUNDING

The present work was supported in part by the National Natural Science Foundation of China (NNSF 90408028). Funding for open access charge: NNSF 90408028. Conflict of interest statement. None declared.
  40 in total

Review 1.  Transposon-based approaches to identify essential bacterial genes.

Authors:  N Judson; J J Mekalanos
Journal:  Trends Microbiol       Date:  2000-11       Impact factor: 17.079

Review 2.  Searching for drug targets in microbial genomes.

Authors:  M Y Galperin; E V Koonin
Journal:  Curr Opin Biotechnol       Date:  1999-12       Impact factor: 9.740

3.  Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA.

Authors:  Y Ji; B Zhang; S F Van; P Warren; G Woodnutt; M K Burnham; M Rosenberg
Journal:  Science       Date:  2001-09-21       Impact factor: 47.728

4.  Global transposon mutagenesis and a minimal Mycoplasma genome.

Authors:  C A Hutchison; S N Peterson; S R Gill; R T Cline; O White; C M Fraser; H O Smith; J C Venter
Journal:  Science       Date:  1999-12-10       Impact factor: 47.728

5.  A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae.

Authors:  Brian J Akerley; Eric J Rubin; Veronica L Novick; Kensey Amaya; Nicholas Judson; John J Mekalanos
Journal:  Proc Natl Acad Sci U S A       Date:  2002-01-22       Impact factor: 11.205

6.  MIPS: a database for genomes and protein sequences.

Authors:  H W Mewes; D Frishman; U Güldener; G Mannhaupt; K Mayer; M Mokrejs; B Morgenstern; M Münsterkötter; S Rudd; B Weil
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

7.  Large-scale transposon mutagenesis of Mycoplasma pulmonis.

Authors:  Christopher T French; Ping Lao; Ann E Loraine; Brian T Matthews; Huilan Yu; Kevin Dybvig
Journal:  Mol Microbiol       Date:  2008-04-28       Impact factor: 3.501

8.  TnAraOut, a transposon-based approach to identify and characterize essential bacterial genes.

Authors:  N Judson; J J Mekalanos
Journal:  Nat Biotechnol       Date:  2000-07       Impact factor: 54.908

Review 9.  How many genes can make a cell: the minimal-gene-set concept.

Authors:  E V Koonin
Journal:  Annu Rev Genomics Hum Genet       Date:  2000       Impact factor: 8.929

10.  Null mutations in human and mouse orthologs frequently result in different phenotypes.

Authors:  Ben-Yang Liao; Jianzhi Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2008-05-05       Impact factor: 11.205

View more
  156 in total

1.  Prioritization of SNPs for genome-wide association studies using an interaction model of genetic variation, gene expression, and trait variation.

Authors:  Hyojung Paik; Junho Kim; Sunjae Lee; Hyoung-Sam Heo; Cheol-Goo Hur; Doheon Lee
Journal:  Mol Cells       Date:  2012-03-28       Impact factor: 5.034

Review 2.  Essential biological processes of an emerging pathogen: DNA replication, transcription, and cell division in Acinetobacter spp.

Authors:  Andrew Robinson; Anthony J Brzoska; Kylie M Turner; Ryan Withers; Elizabeth J Harry; Peter J Lewis; Nicholas E Dixon
Journal:  Microbiol Mol Biol Rev       Date:  2010-06       Impact factor: 11.056

3.  Study of intra-inter species protein-protein interactions for potential drug targets identification and subsequent drug design for Escherichia coli O104:H4 C277-11.

Authors:  Shakhinur Islam Mondal; Zabed Mahmud; Montasir Elahi; Arzuba Akter; Nurnabi Azad Jewel; Md Muzahidul Islam; Sabiha Ferdous; Taisei Kikuchi
Journal:  In Silico Pharmacol       Date:  2017-04-11

4.  Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis.

Authors:  Sriram Chandrasekaran; Nathan D Price
Journal:  Proc Natl Acad Sci U S A       Date:  2010-09-27       Impact factor: 11.205

5.  Cancer vulnerabilities unveiled by genomic loss.

Authors:  Deepak Nijhawan; Travis I Zack; Yin Ren; Matthew R Strickland; Rebecca Lamothe; Steven E Schumacher; Aviad Tsherniak; Henrike C Besche; Joseph Rosenbluh; Shyemaa Shehata; Glenn S Cowley; Barbara A Weir; Alfred L Goldberg; Jill P Mesirov; David E Root; Sangeeta N Bhatia; Rameen Beroukhim; William C Hahn
Journal:  Cell       Date:  2012-08-17       Impact factor: 41.582

6.  mGenomeSubtractor: a web-based tool for parallel in silico subtractive hybridization analysis of multiple bacterial genomes.

Authors:  Yucheng Shao; Xinyi He; Ewan M Harrison; Cui Tai; Hong-Yu Ou; Kumar Rajakumar; Zixin Deng
Journal:  Nucleic Acids Res       Date:  2010-04-30       Impact factor: 16.971

7.  Genome subtraction for novel target definition in Salmonella typhi.

Authors:  Bhawna Rathi; Aditya N Sarangi; Nidhi Trivedi
Journal:  Bioinformation       Date:  2009-10-11

8.  NeMeSys: a biological resource for narrowing the gap between sequence and function in the human pathogen Neisseria meningitidis.

Authors:  Christophe Rusniok; David Vallenet; Stéphanie Floquet; Helen Ewles; Coralie Mouzé-Soulama; Daniel Brown; Aurélie Lajus; Carmen Buchrieser; Claudine Médigue; Philippe Glaser; Vladimir Pelicic
Journal:  Genome Biol       Date:  2009-10-09       Impact factor: 13.583

9.  Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi.

Authors:  Alexander G Holman; Paul J Davis; Jeremy M Foster; Clotilde K S Carlow; Sanjay Kumar
Journal:  BMC Microbiol       Date:  2009-11-28       Impact factor: 3.605

10.  Steps toward broad-spectrum therapeutics: discovering virulence-associated genes present in diverse human pathogens.

Authors:  Chris J Stubben; Melanie L Duffield; Ian A Cooper; Donna C Ford; Jason D Gans; Andrey V Karlyshev; Bryan Lingard; Petra C F Oyston; Anna de Rochefort; Jian Song; Brendan W Wren; Rick W Titball; Murray Wolinsky
Journal:  BMC Genomics       Date:  2009-10-29       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.