Literature DB >> 19536255

Unlocking the secrets of the genome.

Susan E Celniker1, Laura A L Dillon, Mark B Gerstein, Kristin C Gunsalus, Steven Henikoff, Gary H Karpen, Manolis Kellis, Eric C Lai, Jason D Lieb, David M MacAlpine, Gos Micklem, Fabio Piano, Michael Snyder, Lincoln Stein, Kevin P White, Robert H Waterston.   

Abstract

Entities:  

Mesh:

Year:  2009        PMID: 19536255      PMCID: PMC2843545          DOI: 10.1038/459927a

Source DB:  PubMed          Journal:  Nature        ISSN: 0028-0836            Impact factor:   49.962


× No keyword cloud information.
The primary objective of the Human Genome Project was to produce high-quality sequences not just for the human genome but also for those of the chief model organisms: Escherichia coli, yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans), fly (Drosophila melanogaster) and mouse (Mus musculus). Free access to the resultant data has prompted much biological research, including development of a map of common human genetic variants (the International HapMap Project)[1], expression profiling of healthy and diseased cells[2] and in-depth studies of many individual genes. These genome sequences have enabled researchers to carry out genetic and functional genomic studies not previously possible, revealing new biological insights with broad relevance across the animal kingdom[3,4]. Nevertheless, our understanding of how the information encoded in a genome can produce a complex multicellular organism remains far from complete. To interpret the genome accurately requires a complete list of functionally important elements and a description of their dynamic activities over time and across different cell types. As well as genes for proteins and non-coding RNAs, functionally important elements include regulatory sequences that direct essential functions such as gene expression, DNA replication and chromosome inheritance. Although geneticists have been quick to decode the functional elements in the yeast S. cerevisiae, with its small compact genome and powerful experimental tools[5,6], our understanding of the more complex genomes of human, mouse, fly and worm is still rudimentary. Intrinsic signals that define the boundaries of protein-coding genes can only be partly recognized by current algorithms, and signals for other functional elements are even harder to find and interpret. Experimental approaches, notably the sequencing of complementary DNA and expressed sequence tags, have been invaluable, but unfortunately these data sets remain incomplete[7]. Non-coding RNA genes present an even greater challenge[8,9,10], and many remain to be discovered, particularly those that have not been strongly conserved during evolution. Flies and worms have roughly the same number of known transcription factors as humans[11], but comprehensive molecular studies of gene regulatory networks have yet to be tackled in any of these species. In an attempt to remedy this situation, the National Human Genome Research Institute (NHGRI) launched the ENCODE (Encyclopedia of DNA Elements) project in 2003, with the goal of defining the functional elements in the human genome. The pilot phase of the project focused on 1% of the human genome and a parallel effort to foster technology development[12]. The initial ENCODE analysis revealed new findings but also made clear just how complex the biology is and how our grasp of it is far from complete[13]. On the basis of this experience, the NHGRI launched two complementary programmes in 2007: an expansion of the human ENCODE project to the whole genome (http://www.genome.gov/ENCODE) and the model organism ENCODE (modENCODE) project to generate a comprehensive annotation of the functional elements in the C. elegans and D. melanogaster genomes (http://www.modencode.org; http://www.genome.gov/modENCODE). These two model organisms, with their ease of husbandry and genetic manipulation, are pillars of modern biological research, and a systematic catalogue of their functional genomic elements promises to pave the way to a more complete understanding of the human genome. Studies of these animals have provided key insights into many basic metazoan processes, including developmental patterning, cellular signalling, DNA replication and inheritance, programmed cell death and RNA interference (RNAi). The genomes are small enough to be investigated comprehensively with current technologies and findings can be validated in vivo. The research communities that study these two organisms will rapidly make use of the modENCODE results, deploying powerful experimental approaches that are often not possible or practical in mammals, including genetic, genomic, transgenic, biochemical and RNAi assays. modENCODE, with its potential for biological validation, will add value to the human ENCODE effort by illuminating the relationship between molecular and biological events. The modENCODE project (Table 1) complements other systematic investigations into these highly studied organisms. In both organisms, RNAi collections have been developed and used to uncover novel gene functions[14,15,16,17,18]. Mutants are being recovered through insertional mutagenesis[19] and targeted deletions (http://celeganskoconsortium.omrf.org; http://www.shigen.nig.ac.jp/c.elegans), with the eventual goal of one for every known gene. Genome sequences of related species are now also available for both fly[20,21] and worm[22], and multiple independent wild isolates are being characterized (T. MacKay, personal communication, http://www.dpgp.org[23]; R.H.W.). First-generation catalogues have been assembled of gene expression patterns during development and in different tissues[24,25,26,27,28,29,30,31,32,33,34].
Table 1

modENCODE CONSORTIUM

modENCODE CONSORTIUM Research and analysis The modENCODE project will operate as an open consortium and participants can join on the understanding that they will abide by the set criteria (http://www.genome.gov/26524644). An important aim of the project is to respond to the needs of the broader Drosophila and C. elegans scientific communities, and several avenues will be open for suggestions on which experiments to prioritize. For example, researchers can visit http://www.modencode.org/Vote.shtml now to help prioritize transcription factors for studies using chromatin immunoprecipitation followed by DNA microarray or DNA sequencing (ChIP-chip and ChIP-seq), and can also indicate whether they have useful antibodies. We will seek community input on other issues as the opportunities arise. The core of the modENCODE project consists of ten groups who use high-throughput methods to identify functional elements (see Table 1). A Data Coordinating Center (DCC) will collect, integrate and display the data. Together, the groups expect to identify the principal classes of functional element for D. melanogaster and C. elegans. They will work closely together to complete the precise annotation of protein-coding genes, identify small RNAs and non-coding RNA transcripts, map transcription start sites, identify promoter motif elements, elucidate functional elements within 3′ untranslated regions, and identify alternatively spliced transcripts as well as the signals required for splicing. Genomic sites bound by sequence-specific transcription factors will also be comprehensively identified. Charting the chromatin 'landscapes' will include the characterization of key histone modifications and variants, nucleosome phasing, RNA polymerase II isoforms and proteins involved in dosage compensation, centromere function, replication, homologue pairing, recombination and associations of chromosomes with the nuclear envelope. Integrative analysis of these data across the different types of functional element will be used to reveal fundamental principles of fly and worm genome biology and to begin to uncover the emergent properties of these complex genomes. Some topics the modENCODE groups, along with interested members of the wider community, intend to explore are outlined below, but these are only a beginning. Our intention is to create a resource that will provide the foundation for ongoing analysis by scientists for years to come. Our two model organisms share many similarities with other metazoans, including humans. They also differ from other organisms in some striking ways, particularly in details of the establishment and maintenance of cellular identity, centromere biology and heterochromatin function. To help understand how the similarities and differences in worm and fly biology are reflected in their genome sequences and how they are specified by genome function at the molecular level, we will carry out comparative analyses of transcription, splicing, cis-regulatory and post-transcriptional elements and chromatin function. We will subsequently investigate how our findings apply to the control of gene expression in the human genome. We also plan to use genome-wide data on pre- and post-transcriptional functional elements to expand our understanding of gene-regulatory networks. We will study how these two layers of control complement or reinforce each other during development. For example, the availability of full-length transcripts and promoter structures for microRNA (miRNA) genes will enable us to develop models of regulatory circuits that integrate the upstream regulation of miRNA genes with that of other regulatory factors (such as transcription factors) and the effects of miRNAs on their downstream targets. We will search global patterns identified in the regulatory programs for emerging principles of gene regulation within and across species; as part of this endeavour, we will evaluate evidence for the modular structure of regulatory networks. Because several developmental stages and diverse tissues will be sampled in both animals, we will be able to investigate the global and dynamic activities of functional elements across the entire genome in multiple cell types and stages of differentiation. We aim to define the characteristics and rules that distinguish regulatory programs in different cell types and developmental stages at the DNA, chromatin, and post-transcriptional levels. This will enable us to identify the types of element that function together in various spatio-temporal environments and find new types of functional element, perhaps including those used in restricted developmental contexts. An important objective is to generate specific biological hypotheses that can be refined and tested experimentally by the broader scientific community. For example, these analyses might identify transcribed regions with novel regulatory roles, structural regions that function in the establishment of chromatin structure or three-dimensional conformation, enhancers far away from the gene they control, and alternative promoter regions. In addition, we will use comparative analyses of the sequenced genomes from different species to clarify the extent of conservation and the functional constraints associated with potential new classes of element and to characterize their evolutionary signatures[21]. Another objective of the modENCODE project is the creation of reference data sets of maximum utility. We have agreed that, whenever possible, a common set of reagents will be used to facilitate comparison of data sets generated by different groups. For example, the fly and worm groups using ChIP-chip and related methods to map the genome-wide distributions of histone modifications will use a common set of validated antibodies. In addition, we will use common fly and worm strains, and in the case of Drosophila, the common cell lines Kc167, S2-DRSC, CME W1 Cl.8+ and ML-DmBG3-c2. The fly and worm genomes are about a thirtieth of the size of their mammalian counterparts, making current methods for high-throughput genomic analysis cost-effective. We will use high-density tiling DNA microarrays to interrogate the genome on a single microarray (C. elegans, 26 base pair (bp) median spacing; D. melanogaster, 38 bp median spacing) at a resolution sufficient for ChIP-chip experiments. Denser arrays (D. melanogaster, 7 bp median spacing), which promise higher resolution, will be used in a move to high-throughput sequencing platforms such as the Illumina Genome Analyzer to generate sufficient sequence coverage for transcript mapping and miRNA and ChIP experiments. The biological significance of the genomic features identified will be tested in experiments designed to evaluate the accuracy and functionality of subsets of the structural and regulatory annotations. For example, we will carry out ChIP experiments on extracts from whole animals or cells that lack selected regulators (using mutants or RNAi). The tissue-specific DNA-binding patterns of selected regulators will be validated in transgenic animals. Figure 1 summarizes the DNA elements to be interrogated and the methods to be used.
Figure 1

DNA element functions and identification process.

Data management and accessibility Data generated by the modENCODE Consortium, including those from validation experiments, will be collected, quality checked, integrated and distributed through the modENCODE DCC (http://www.modencode.org). The DCC will collate detailed metadata for each submitted data set to ensure broad and long-term usability. Where appropriate, the data will also be submitted to public databases, for example, GenBank (http://www.ncbi.nlm.nih.gov) and the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) or Array Express (http://www.ebi.ac.uk/microarray-as/aer/entry) and the University of California, Santa Cruz Genome Bioinformatics Site (http://genome.ucsc.edu). The DCC will also work closely with WormBase (http://www.wormbase.org) and FlyBase (http://www.flybase.org) to facilitate integration of the modENCODE data with selected data from these databases and with other information about these organisms. All data will be available for bulk download through an FTP site and through a number of Generic Model Organism Database tools (http://www.gmod.org): BioMart (http://www.biomart.org) will provide powerful data-mining capabilities, and InterMine (http://www.intermine.org) will provide a flexible interface for complex querying of the data, a library of canned queries, and powerful list-based tools and operations (http://intermine.modencode.org). As for the ENCODE pilot project data (http://www.genome.gov/10005107), new data can be examined alongside existing data using interactive genome browsers[35] for both the fly (http://www.modencode.org/cgi-bin/gbrowse/fly) and the worm (http://www.modencode.org/cgi-bin/gbrowse/worm). The Drosophila and C. elegans communities have thrived because of their open culture. In keeping with this tradition and with those of the genome sequencing projects, HapMap and the ENCODE pilot project, modENCODE is a 'community resource project' subject to the NHGRI's data-sharing policy. The success of this policy is based on mutual and independent responsibilities for the production and use of the resource. We will release data rapidly (Table 2), before publication, once they have been established to be reproducible (verification; see http://www.modencode.org/'PublicationPolicylink' for the criteria), even if the data have not been sampled to determine if there is biological meaning (validation). In turn, users are asked to recognize the source of the data and to respect the legitimate interest of the resource producers to publish an initial report of their work (see http://www.genome.gov/modencode for more details). Finally, the funding agencies recognize the need to support the analysis and dissemination of the data.
Table 2

GLOBAL ANALYSIS GOALS

GLOBAL ANALYSIS GOALS In addition, a variety of physical resources (for example, DNA constructs and transgenic strains) will be produced that are likely to be of use to the broader community and to which that community will have unrestricted access. We expect to cooperate with data users in the worm and fly communities to set the gold standard for data release and openness. Conclusion The Human Genome Project benefited enormously from the technology developed and the experience acquired in sequencing the significantly smaller genomes of model organisms, particularly C. elegans and D. melanogaster. The modENCODE project is dedicated to the next phase of decoding the information stored in these genomes: the comprehensive identification of sequence-based functional elements. Having laid the foundation for the discovery of many of the genetic programs underlying metazoan development and behaviour, Drosophila and Caenorhabditis will serve as ideal model systems to identify DNA-based functional elements on a genome-wide basis. In the future, these data will provide a powerful platform for characterizing the functional networks that direct multicellular biology, thereby linking genomic data with the biological programs of higher organisms, including humans. Supplementary information A full list of names and addresses of current consortium participants is linked to the online version of this feature at http://tinyurl.com/modENCODE
  35 in total

1.  Genome-wide RNAi analysis of growth and viability in Drosophila cells.

Authors:  Michael Boutros; Amy A Kiger; Susan Armknecht; Kim Kerr; Marc Hild; Britta Koch; Stefan A Haas; Renato Paro; Norbert Perrimon
Journal:  Science       Date:  2004-02-06       Impact factor: 47.728

2.  Tissue-specific gene expression and ecdysone-regulated genomic networks in Drosophila.

Authors:  Tong-Ruei Li; Kevin P White
Journal:  Dev Cell       Date:  2003-07       Impact factor: 12.270

3.  A gene-coexpression network for global discovery of conserved genetic modules.

Authors:  Joshua M Stuart; Eran Segal; Daphne Koller; Stuart K Kim
Journal:  Science       Date:  2003-08-21       Impact factor: 47.728

4.  A gene expression map for the euchromatic genome of Drosophila melanogaster.

Authors:  Viktor Stolc; Zareen Gauhar; Christopher Mason; Gabor Halasz; Marinus F van Batenburg; Scott A Rifkin; Sujun Hua; Tine Herreman; Waraporn Tongprasit; Paolo Emilio Barbano; Harmen J Bussemaker; Kevin P White
Journal:  Science       Date:  2004-10-22       Impact factor: 47.728

5.  The ENCODE (ENCyclopedia Of DNA Elements) Project.

Authors: 
Journal:  Science       Date:  2004-10-22       Impact factor: 47.728

6.  Toward improving Caenorhabditis elegans phenome mapping with an ORFeome-based RNAi library.

Authors:  Jean-François Rual; Julian Ceron; John Koreth; Tong Hao; Anne-Sophie Nicot; Tomoko Hirozane-Kishikawa; Jean Vandenhaute; Stuart H Orkin; David E Hill; Sander van den Heuvel; Marc Vidal
Journal:  Genome Res       Date:  2004-10       Impact factor: 9.043

7.  Composition and dynamics of the Caenorhabditis elegans early embryonic transcriptome.

Authors:  L Ryan Baugh; Andrew A Hill; Donna K Slonim; Eugene L Brown; Craig P Hunter
Journal:  Development       Date:  2003-03       Impact factor: 6.868

8.  Genome-wide germline-enriched and sex-biased expression profiles in Caenorhabditis elegans.

Authors:  Valerie Reinke; Inigo San Gil; Samuel Ward; Keith Kazmer
Journal:  Development       Date:  2003-12-10       Impact factor: 6.868

9.  The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.

Authors:  Lincoln D Stein; Zhirong Bao; Darin Blasiar; Thomas Blumenthal; Michael R Brent; Nansheng Chen; Asif Chinwalla; Laura Clarke; Chris Clee; Avril Coghlan; Alan Coulson; Peter D'Eustachio; David H A Fitch; Lucinda A Fulton; Robert E Fulton; Sam Griffiths-Jones; Todd W Harris; LaDeana W Hillier; Ravi Kamath; Patricia E Kuwabara; Elaine R Mardis; Marco A Marra; Tracie L Miner; Patrick Minx; James C Mullikin; Robert W Plumb; Jane Rogers; Jacqueline E Schein; Marc Sohrmann; John Spieth; Jason E Stajich; C Wei; David Willey; Richard K Wilson; Richard Durbin; Robert H Waterston
Journal:  PLoS Biol       Date:  2003-11-17       Impact factor: 8.029

10.  The BDGP gene disruption project: single transposon insertions associated with 40% of Drosophila genes.

Authors:  Hugo J Bellen; Robert W Levis; Guochun Liao; Yuchun He; Joseph W Carlson; Garson Tsang; Martha Evans-Holm; P Robin Hiesinger; Karen L Schulze; Gerald M Rubin; Roger A Hoskins; Allan C Spradling
Journal:  Genetics       Date:  2004-06       Impact factor: 4.562

View more
  465 in total

1.  Tethering of SUUR and HP1 proteins results in delayed replication of euchromatic regions in Drosophila melanogaster polytene chromosomes.

Authors:  Galina V Pokholkova; Dmitry E Koryakov; Alexey V Pindyurin; Elena N Kozhevnikova; Stepan N Belyakin; Oleg V Andreyenkov; Elena S Belyaeva; Igor F Zhimulev
Journal:  Chromosoma       Date:  2014-11-16       Impact factor: 4.316

Review 2.  Lineage programming: navigating through transient regulatory states via binary decisions.

Authors:  Vincent Bertrand; Oliver Hobert
Journal:  Curr Opin Genet Dev       Date:  2010-05-27       Impact factor: 5.578

Review 3.  Next-generation genomics: an integrative approach.

Authors:  R David Hawkins; Gary C Hon; Bing Ren
Journal:  Nat Rev Genet       Date:  2010-07       Impact factor: 53.242

4.  Adaptive impact of the chimeric gene Quetzalcoatl in Drosophila melanogaster.

Authors:  Rebekah L Rogers; Trevor Bedford; Ana M Lyons; Daniel L Hartl
Journal:  Proc Natl Acad Sci U S A       Date:  2010-06-01       Impact factor: 11.205

5.  siRNAs from an X-linked satellite repeat promote X-chromosome recognition in Drosophila melanogaster.

Authors:  Debashish U Menon; Cristian Coarfa; Weimin Xiao; Preethi H Gunaratne; Victoria H Meller
Journal:  Proc Natl Acad Sci U S A       Date:  2014-11-03       Impact factor: 11.205

Review 6.  Coupling polymerase pausing and chromatin landscapes for precise regulation of transcription.

Authors:  Daniel A Gilchrist; Karen Adelman
Journal:  Biochim Biophys Acta       Date:  2012-03-02

7.  Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation.

Authors:  Jingyi Jessica Li; Ci-Ren Jiang; James B Brown; Haiyan Huang; Peter J Bickel
Journal:  Proc Natl Acad Sci U S A       Date:  2011-12-01       Impact factor: 11.205

Review 8.  Annotating non-coding regions of the genome.

Authors:  Roger P Alexander; Gang Fang; Joel Rozowsky; Michael Snyder; Mark B Gerstein
Journal:  Nat Rev Genet       Date:  2010-07-13       Impact factor: 53.242

9.  A global analysis of C. elegans trans-splicing.

Authors:  Mary Ann Allen; LaDeana W Hillier; Robert H Waterston; Thomas Blumenthal
Journal:  Genome Res       Date:  2010-12-22       Impact factor: 9.043

10.  Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin.

Authors:  Nicole C Riddle; Aki Minoda; Peter V Kharchenko; Artyom A Alekseyenko; Yuri B Schwartz; Michael Y Tolstorukov; Andrey A Gorchakov; Jacob D Jaffe; Cameron Kennedy; Daniela Linder-Basso; Sally E Peach; Gregory Shanower; Haiyan Zheng; Mitzi I Kuroda; Vincenzo Pirrotta; Peter J Park; Sarah C R Elgin; Gary H Karpen
Journal:  Genome Res       Date:  2010-12-22       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.