Literature DB >> 22267903

LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes.

Dapeng Wang¹, Yubin Zhang, Zhonghua Fan, Guiming Liu, Jun Yu.

Abstract

Animal genes of different lineages, such as vertebrates and arthropods, are well-organized and blended into dynamic chromosomal structures that represent a primary regulatory mechanism for body development and cellular differentiation. The majority of genes in a genome are actually clustered, which are evolutionarily stable to different extents and biologically meaningful when evaluated among genomes within and across lineages. Until now, many questions concerning gene organization, such as what is the minimal number of genes in a cluster and what is the driving force leading to gene co-regulation, remain to be addressed. Here, we provide a user-friendly database-LCGbase (a comprehensive database for lineage-based co-regulated genes)-hosting information on evolutionary dynamics of gene clustering and ordering within animal kingdoms in two different lineages: vertebrates and arthropods. The database is constructed on a web-based Linux-Apache-MySQL-PHP framework and effective interactive user-inquiry service. Compared to other gene annotation databases with similar purposes, our database has three comprehensible advantages. First, our database is inclusive, including all high-quality genome assemblies of vertebrates and representative arthropod species. Second, it is human-centric since we map all gene clusters from other genomes in an order of lineage-ranks (such as primates, mammals, warm-blooded, and reptiles) onto human genome and start the database from well-defined gene pairs (a minimal cluster where the two adjacent genes are oriented as co-directional, convergent, and divergent pairs) to large gene clusters. Furthermore, users can search for any adjacent genes and their detailed annotations. Third, the database provides flexible parameter definitions, such as the distance of transcription start sites between two adjacent genes, which is extendable to genes that flanking the cluster across species. We also provide useful tools for sequence alignment, gene ontology (GO) annotation, promoter identification, gene expression (co-expression), and evolutionary analysis. This database not only provides a way to define lineage-specific and species-specific gene clusters but also facilitates future studies on gene co-regulation, epigenetic control of gene expression (DNA methylation and histone marks), and chromosomal structures in a context of gene clusters and species evolution. LCGbase is freely available at http://lcgbase.big.ac.cn/LCGbase.

Entities: Chemical Disease Gene Species

Keywords: co-regulated genes; database; evolution; vertebrate

Year: 2011 PMID： 22267903 PMCID： PMC3256993 DOI： 10.4137/EBO.S8540

Source DB: PubMed Journal: Evol Bioinform Online ISSN： 1176-9343 Impact factor: 1.625

Introduction

Animal genomes harbor several tens of thousands of protein-coding and RNA-coding genes and the rest are regulatory elements adjacent to genes.1 Although there are intergenic sequences, which have been called “gene desert”, it is believed that a majority of them may also be parts of genes that have not yet been discovered.2,3 It is important for the entire genome to be regulated timely and accurately through a battery of processes with distinct mechanisms. In prokaryotes (such as Escherichia coli) and lower eukaryotes (such as Caenorhabditis elegans), operons or clustered genes are major regulatory mechanisms so that genes in a consecutive order share a suite of transcription machinery and its accessories.4 However, gene structures in higher eukaryotes are not only greater in numbers but also more complex than those of prokaryotes.5 For instance, there are large genes in a size range of one million basepairs or more (such as dystrophin) and numerous regulatory elements for transcriptional regulation, including enhancers, insulators, silencers, and repressors.6 In addition, epigenetic regulations, including DNA methylation, hydroxymethylation, various histone marks, and chromatin structure states, may all play essential roles in the construction of a multiple-layer gene expression regulatory network.7,8 In such as a complex gene regulation context, co-regulated genes are first to be scrutinized since they are readily defined based on transcriptomic data and are either adjacent to each other or co-regulated (co-activated, co-suppressed, and antagonism); the minimal co-regulated genes are a pair of genes adjacent to each other and the maxima are several genes that are clustered together over evolutionary time scale, which may be even extendable to large chromosomal regions.4 For instance, some of the clustered genes may perform long-range interaction-based functions or be involved in the same regulatory or metabolic pathways.9 The precise identification of compositional and organizational features for these gene clusters may improve our knowledge on transcriptional controls and RNA processing mechanisms. Previous studies on minimal gene clustering have been largely focused on genes in three basic categories of paired orientations according to the relative transcription direction between two neighboring genes: divergently-paired (DPGs, positioned head-to-head but transcribed toward opposite directions), co-directionally-paired (CDPGs, positioned head-to-tail and transcribed in the same direction), and convergently-paired genes (CPGs, positioned tail-to-tail and transcribed toward each other).10,11 It has been suggested that tandem duplication may be the major cause leading to these paired genes (especially CDPGs), and promoter sharing is an plausible explanation for the occurrence of DPGs.4,12 It has been reported that the proportion of DPGs is positively correlated with gene densities as DPGs tend to keep their transcription directions throughout relatively larger evolutionary time scale (eg, human to fugu comparison).10 DPGs tend to perform similar biological functions being involved in housekeeping functions, as compared to CDPGs and CPGs, and the expression of DPGs is often positively correlated (albeit minor exceptions) at different developmental stages and under pathologic conditions.10,11 Furthermore, when comparing dynamic structural features of DPGs between vertebrates and insects, we found that all three categories of paired genes in insects are less conserved than their vertebrate counterparts, although DPGs in insects also tend to form functional clusters and to share promoters.13 As to the intergenic distance (longer in metazoa and shorter in fungi), although the distance of transcription starts between two co-regulated DPGs is between a few hundreds and around one thousand basepairs,12 we recognize the possible function of sequences— often tens of kilo-basepairs in length—between the two neighboring DPGs with respect to co-expression and shared regulatory elements.14 Furthermore, the bimodality of intergenic distances observed among mammal gene pairs (but not in other vertebrates) suggests that mammals share certain common features in transcription regulation.11 Until now, how the length of intergenic regions affects the contiguity in regulating multiple genes remains to be illuminated. As next-generation sequencing technology matures, both cost and throughput are in favor of more basic data acquisition. In future studies, lineage-based data organization will take over the “one-covers-all” fashion and more tools will be developed for handling both larger and more genomes in addition to those for smaller and single genomes, such as those of mitochondrion,15 plastid,16 and yeast.17,18 A recent study has expanded a gene order browser into 74 species but covers only four mammals.19 In this study we curated 38 mammal and 14 other animal genomes (only use one fungus as out-group) to discover and to display conserved gene clusters across mammals and their sub-groups, such as primates, large mammals, and rodents. In particular, we combine the two concepts that stringently-defined lineage-specific conserved core paired genes (based on both orthology and transcriptional direction) and gene order of ten consecutive genes flanking the core paired genes. We also offer a series of toolkits covering GO functional annotations promoter identification, gene expression, and evolution analysis to help characterizing features of gene clusters (Fig. 1).

Figure 1

A flowchart to illustrate the content and organization of LCGbase.

Using LCGbase, we would like to address several most imperative questions: (1) Although mammalian gene order or genome organization have been thought to be non-randomly distributed among the chromosomes, what is the precise number of genes that tend to move around or to form clusters? (2) How are clustered genes conserved across various definable lineages? Are the forming-and-breaking events evolutionarily selected and functionally meaningful? What are the mechanisms, including rearrangement, translocation, inversion, recombination, duplication, and transposon-mediated episodes, that alter clustered genes? (3) Are we able to define a “core clustered set” for different lineages or subgroups? Are there identifiable chromosomal regions whose gene clusters are evolutionarily stable? (4) How are gene clusters related to nucleosome positioning and chromosome folding in the nucleus?20,21 The questioning continues but the conclusions will be what we have to know for every single gene and its position on the chromosome, not only physically but also functionally.

Functionality

These are several ways to reach available data in LCGbase. First, one can utilize the browse option to direct all annotated genes in the 53 species, and each gene can be found by the link of gene ID. Second, one can take advantage of gene positioning or clustering information to use a gene ID from the neighbouring genes within and across lineages. In particular, the search is strand-sensitive when used to detect strand-specific organizational features of gene clusters and their variations. The database also distinguishes TSS (transcript start site) distances between two adjacent genes in five roughly defined categories: 0–1 kbp, 1–10 kbp, 10–50 kbp, 50–100 kbp, and >100 kbp. It display ten genes left or right of the core gene cluster and high light all the genes on screen in different colours to indicate their orthologous groups. Furthermore, it assigns random group numbers to order all groups (Fig. 2). Genes that are not assigned in groups are labelled with “X”. Users can click on the hyperlink for each gene to check for detailed annotations (eg, location, structure, ontology, and family). Third, the result page also displays gene orders from different species according to taxonomic and lineage definitions, such as mammals (primates, rodents, afrosoricida, carnivora, chiroptera, lagomorpha), birds (galliformes and passeriformes), reptiles (squamata), amphibians (anura), fishes (beloniformes, tetraodontiformes, cypriniformes, gasterosteiformes), insects (diptera), chordata (enterogona), nematoda (rhabditida), and fungi (saccharomycetales). The information helps to reveal lineage-specific dynamic patterns or rules of gene clusters in lineage groups and sub-group. In particular, the database provides three kinds of downloadable files (xls, cvs and html) containing information including species, gene ID, strand category, and group number, which appears on the search result page. Fourth, we also count species number, strand-specificity, and orthologous gene. Fifth, the database also provides blast tools22 (ie, to match cDNA sequence with blastn and protein sequence with blastp or blastx) to help users to study their query sequences and associate them to data in LCGbase as well as other databases.

Figure 2

An example of the LCGbase browser (A) and a search result (B). The inquired gene is ENSG00000171612.

Due to co-regulation, genes in a cluster may have related functions, share promoters, evolve at a similar rate or in a distinct pattern, and show significantly correlated expressions. LCGbase also provides several easy-to-use tools to facilitate the analysis of these features. Due to the fact that gene ID used in this database is the same as the Ensembl gene ID, an ID Convertor tool takes charge of converting gene IDs of other systems (eg, Entrez Gene ID, Gene Symbol, Refseq mRNA ID and Refseq protein ID) into Ensemble gene ID. GO Function Classification tool is to compare a query gene list with all genes in both species and GO terms (with at least 10 genes)23 and performs gene function enrichment analysis to determine whether gene clusters tend to be functionally related or not. This tool adopts the Fisher Exact Test involved in perl Text-NSP module (http://search.cpan.org/dist/Text-NSP/) combing with four multiple testing correction methods (ie, Bonferroni correction, Bonferroni Step-down [Holm] correction, Benjamini & Hochberg False Discovery Rate, and Not adjusted).24 Four cut-off values are to be chosen: 0.1, 0.05, 0.01, and 0.001. Promoter Analysis tool is to compare a query nucleotide sequence with the upstream and downstream (from −499 bp to 100 bp, or from −9999 bp to 6000 bp) of experimentally-identified transcript start site (TSS) embedded in Eukaryotic Promoter Database (EPD), which is a promoter sequence collection of model organisms.25 To illustrate the co-expressed genes in a cluster, we introduced co-expression data of seven animals including human, mouse, rat, chicken, zebrafish, fly, and nematode from COXPRESdb (Gene Coexpression Database).26 We adopted R package “BioNet” to draw network,27 when a query gene has correlated expression with other query genes. Evolution Analysis tool includes KaKs_Calulator2.0 toolkit28 that adopts multiple algorithms and alternative codon tables to compute nonsynonymous (Ka) and synonymous mutation rates (Ks). The ratio of Ka to Ks is a popular statistical measure for selection between one or multiple pairs of protein-coding genes and one may want to know if several genes in a cluster evolve simultaneously. In the statistics section, we draw two types of figures to describe TSS distance and minimal distance between three cluster classes: CDPGs, CPGs, and DPGs. Minimal distance is defined as (1) the subtraction of the 5′-end of the downstream transcript and the 3′-end of the upstream transcript for CDPGs, (2) the subtraction of the 3′-end of the downstream transcript and the 3′-end of the upstream transcript for CPGs, and (3) the subtraction of the 5′-end of the downstream transcript and the 5′-end of the upstream transcript for DPGs. In the downloadable page, we also provide the characterized features of gene pairs (“->->”, “-><-” and “<-->” to represent CDPGs, CPGs and DPGs, respectively), including gene pair ID, order class, TSS distance, minimal distance, chromosome, gene ID, transcript ID, protein ID, and strand, as well as transcription start site and transcription end site of both genes. LCA5L (ENSG00000157578, Leber congenital amaurosis 5-like) and SH3BGR (ENSG00000185437, SH3 domain binding glutamic acid-rich protein) are both DPGs on human chromosome 21. Although most of the distances between transcription start sites of paired genes are from 1 Kb to 100 Kb, this gene pair is conserved in all vertebrate lineages across mammals, birds, reptiles, amphibians, and fishes (Fig. S1). When compared the genes among fish and bird lineages, we found that PSMG1, BRWD1, HMGN1, WRB, LCA5L, SH3BGR, and B3GALT5 cluster tightly in three bird species (chicken, turkey, and zebra finch) and that MTMR9, XKR6, CCM2, FAM167A, LCA5, and SH3BGRL2 cluster in medaka and stickleback. The different clustering suggests potential difference in regulated mechanisms between the two vertebrate lineages. TUB (ENSG00000166402, tubby) and RIC3 (ENSG00000166405, resistance to inhibitors of cholinesterase 3) transcribe in the opposite direction and they are CPGs on human chromosome 11 (Fig. S2). All distances between transcription start sites of the two genes are larger than 10 Kbp. When trying to expand it into distant gene clusters, we found three obviously distinct patterns in the three taxonomic groups: TUB, RIC3, LMO1 and STK33 in mammals (such as human, chimpanzee, mouse, lemur, macaque, marmoset, galago, gibbon, guinea pig, mouse, cow, dog, giant panda, bottle-nose dolphin, rabbit, horse, elephant, and opossum), CYP2R1, PDE3B, COPB1, RRAS2, TUB, and RIC3 in fishes (such as medaka, zebrafish, and stickleback), and INSC, CALCA, CYP2R1, PDE3B, PSMA1, COPB1, RRAS2, TUB, RIC3 and LMO1 in birds (such as chicken, turkey, and zebra finch). PAPD5 (ENSG00000121274, PAP associated domain containing 5) and ADCY7 (ENSG00000121281, adenylate cyclase 7) are CDPGs on human chromosome 16 (Fig. S3). We found some interesting duplication events happened in long-term evolution of paired genes throughout the vertebrate lineages. There are two patterns in the gene clusters; one contains NSUN2, SRD5A1, PAPD7, ADCY2 and the other have PAPD5, ADCY7, BRD7, and NKD1. We found that the two clusters appear on different chromosomes of several species (eg, chimpanzee, orangutan, macaque, gibbon, turkey, fugu, and zebrafish). This phenomenon suggests that the duplication and rearrangement events forming these clusters happened very early in vertebrate evolution (perhaps at the formation of vertebrates). Moreover, we have observed several species-specific gene insertion or deletion events. For instance, the loss of SRD5A1 gene happened between NSUN2 and PAPD7 on the anole chromosome 4 and the gain of A530095I07Rik gene occurred between SRD5A1 and PAPD7 on the mouse chromosome 13.

Data Collection

We collected positions of genes, transcripts, and proteins as well as other annotation information (eg, Gene Ontology and gene family classification) of 53 species across broad lineages (including vertebrates, insects, nematode, and fungi) from the Ensembl/Biomart Version 62 (www.ensembl.org).29 We only selected transcripts with the longest coding sequence to represent genes or gene loci. Gene orthology relationship was also retrieved from this database, and we defined orthology between human and other 52 species as well as paralogs within human. In details, we assumed that there is a transitive relationship among homologs so that we combine paired homologs into one group until the group number becomes stable or converged. Based on this evolutionary principle and phylogenetic relationship, we classified all genes into homologous groups.

Implementation

This database is built on a GNU/Linux web-database LAMP framework (OS—linux, web server—Apache, database management program—MySQL, and server-side script—PHP language). At the server-side, PHP takes charge of calling Perl scripts and R functions, and uses GD modules across API (application programming interface) to generate 2D graphs. At the browser-side, we use HTML, Javascript, and CSS to allow users to experience better and convenient interfaces. We also chose SQL scripts and appropriate storage engine for MySQL to optimize the database performance, with three heavily-loaded record tables including gene, orthologous group, and gene annotation from the information of the 53 species. To speed up searching process and time-consuming tasks, we created full-text indexes for key fields in the database, and added Enquiry Optimizing of high-performance matching in MySQL database and Structured Query Language Grammar Optimizing.

Future Work

First, we plan to update the database as frequently as when new species are sequenced and new assemblies are released. We will focus on insect or arthropod genomes for comparative analysis with vertebrate genomes. Furthermore, with the I5K initiative (to sequence 5,000 insect genomes in the next five years), a large number of insect genomes may soon be available. Our preliminary analysis on the two dozen or so sequenced plant genomes also revealed clustering features, but due to the lack of contiguity within the genome assemblies, we are not able to include the data into our database at present time. In the future, however, we will bring in plant genomes to the database to study gene clustering/ordering and distinct gene organizational parameters, such as large genes with small intergenic regions in animals and small genes with larger intergenic regions in plants.30 We will also curate new annotations when they are published, including regulatory elements and new genes, such as what from ENCODE (The Encyclopedia of DNA Elements) and similar projects.31,32 Second, we will increase the complexity of our curations. For instance, our current organization of genes and their clusters are basically linear. We should be able to incorporate chromosomal structures and organizational information in a tempo-spatial fashion such as early and later replicated/transcribed genes. We should also be able to map nucleosome positioning and packaging information.33 Third, we can extend the concept “co-expression” or “co-regulation” to genes beyond clusters but neighboring clusters and clusters on chromosomes and chromosome regions (such as subtelomeric and subcentromeric regions). These new additions will lead to a network of genes and their relationships, a path toward systems biology. Finally, we hope to reveal regulatory mechanisms and their related genes that control lineage-specific or species-specific characteristics over evolutionary time scales. Supplementary figures S1, S2, and S3 are available from 8540 Supplementary Files.zip

30 in total

1. eGOB: eukaryotic Gene Order Browser.

Authors: Marcela Dávila López; Tore Samuelsson
Journal: Bioinformatics Date: 2011-02-10 Impact factor: 6.937

2. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors: Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal: Nature Date: 2007-06-14 Impact factor: 49.962

Review 3. Nucleosome positioning and gene regulation: advances through genomics.

Authors: Cizhong Jiang; B Franklin Pugh
Journal: Nat Rev Genet Date: 2009-03 Impact factor: 53.242

4. Epigenetic regulation of the neural transcriptome: the meaning of the marks.

Authors: Michael J Meaney; Anne C Ferguson-Smith
Journal: Nat Neurosci Date: 2010-11 Impact factor: 24.884

Review 5. Epigenetic reprogramming in plant and animal development.

Authors: Suhua Feng; Steven E Jacobsen; Wolf Reik
Journal: Science Date: 2010-10-29 Impact factor: 47.728

6. Ensembl 2011.

Authors: Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Leo Gordon; Maurice Hendrix; Thibaut Hourlier; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Eugene Kulesha; Pontus Larsson; Ian Longden; William McLaren; Bert Overduin; Bethan Pritchard; Harpreet Singh Riat; Daniel Rios; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sobral; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Jana Vandrovcova; Albert J Vilella; Simon White; Steven P Wilder; Amonida Zadissa; Jorge Zamora; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; Jan Vogel; Stephen M J Searle
Journal: Nucleic Acids Res Date: 2010-11-02 Impact factor: 16.971

6. The Rice Genome Knowledgebase (RGKbase): an annotation database for rice comparative genomics and evolutionary biology.

Authors: Dapeng Wang; Yan Xia; Xinna Li; Lixia Hou; Jun Yu
Journal: Nucleic Acids Res Date: 2012-11-28 Impact factor: 16.971

6 in total

LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes.

Introduction

Functionality

Data Collection

Implementation

Future Work

1. eGOB: eukaryotic Gene Order Browser.

2. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Review 3. Nucleosome positioning and gene regulation: advances through genomics.

4. Epigenetic regulation of the neural transcriptome: the meaning of the marks.

Review 5. Epigenetic reprogramming in plant and animal development.

6. Ensembl 2011.

7. COXPRESdb: a database to compare gene coexpression in seven model animals.

8. EPD in its twentieth year: towards complete promoter coverage of selected model organisms.

9. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies.

10. A comparative analysis of divergently-paired genes (DPGs) among Drosophila and vertebrate genomes.

1. LCGserver: A Webserver for Exploring Evolutionary Trajectory of Gene Orders in a Large Number of Genomes.

2. Functional networking of human divergently paired genes (DPGs).

3. Plastid-LCGbase: a collection of evolutionarily conserved plastid-associated gene pairs.

4. Challenges to the common dogma.

5. The chordate proteome history database.

6. The Rice Genome Knowledgebase (RGKbase): an annotation database for rice comparative genomics and evolutionary biology.