| Literature DB >> 25409824 |
Feng Yue1, Yong Cheng2, Alessandra Breschi3, Jeff Vierstra4, Weisheng Wu5, Tyrone Ryba6, Richard Sandstrom4, Zhihai Ma2, Carrie Davis7, Benjamin D Pope6, Yin Shen8, Dmitri D Pervouchine3, Sarah Djebali3, Robert E Thurman4, Rajinder Kaul4, Eric Rynes4, Anthony Kirilusha9, Georgi K Marinov9, Brian A Williams9, Diane Trout9, Henry Amrhein9, Katherine Fisher-Aylor9, Igor Antoshechkin9, Gilberto DeSalvo9, Lei-Hoon See7, Meagan Fastuca7, Jorg Drenkow7, Chris Zaleski7, Alex Dobin7, Pablo Prieto3, Julien Lagarde3, Giovanni Bussotti3, Andrea Tanzer10, Olgert Denas11, Kanwei Li11, M A Bender12, Miaohua Zhang13, Rachel Byron13, Mark T Groudine14, David McCleary8, Long Pham8, Zhen Ye8, Samantha Kuan8, Lee Edsall8, Yi-Chieh Wu15, Matthew D Rasmussen15, Mukul S Bansal15, Manolis Kellis16, Cheryl A Keller5, Christapher S Morrissey5, Tejaswini Mishra5, Deepti Jain5, Nergiz Dogan5, Robert S Harris5, Philip Cayting2, Trupti Kawli2, Alan P Boyle2, Ghia Euskirchen2, Anshul Kundaje2, Shin Lin2, Yiing Lin2, Camden Jansen17, Venkat S Malladi2, Melissa S Cline18, Drew T Erickson2, Vanessa M Kirkup18, Katrina Learned18, Cricket A Sloan2, Kate R Rosenbloom18, Beatriz Lacerda de Sousa19, Kathryn Beal20, Miguel Pignatelli20, Paul Flicek20, Jin Lian21, Tamer Kahveci22, Dongwon Lee23, W James Kent18, Miguel Ramalho Santos19, Javier Herrero24, Cedric Notredame3, Audra Johnson4, Shinny Vong4, Kristen Lee4, Daniel Bates4, Fidencio Neri4, Morgan Diegel4, Theresa Canfield4, Peter J Sabo4, Matthew S Wilken25, Thomas A Reh25, Erika Giste4, Anthony Shafer4, Tanya Kutyavin4, Eric Haugen4, Douglas Dunn4, Alex P Reynolds4, Shane Neph4, Richard Humbert4, R Scott Hansen4, Marella De Bruijn26, Licia Selleri27, Alexander Rudensky28, Steven Josefowicz28, Robert Samstein28, Evan E Eichler4, Stuart H Orkin29, Dana Levasseur30, Thalia Papayannopoulou31, Kai-Hsin Chang30, Arthur Skoultchi32, Srikanta Gosh32, Christine Disteche33, Piper Treuting34, Yanli Wang35, Mitchell J Weiss36, Gerd A Blobel37, Xiaoyi Cao38, Sheng Zhong38, Ting Wang39, Peter J Good40, Rebecca F Lowdon40, Leslie B Adams40, Xiao-Qiao Zhou40, Michael J Pazin40, Elise A Feingold40, Barbara Wold9, James Taylor11, Ali Mortazavi17, Sherman M Weissman21, John A Stamatoyannopoulos4, Michael P Snyder2, Roderic Guigo3, Thomas R Gingeras7, David M Gilbert6, Ross C Hardison5, Michael A Beer23, Bing Ren8.
Abstract
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25409824 PMCID: PMC4266106 DOI: 10.1038/nature13992
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure 1Overview of the mouse ENCODE data sets.
a, A genome browser snapshot shows the primary data and annotated sequence features in the mouse CH12 cells (Methods). b, Chart shows that much of the human and mouse genomes is transcribed in one or more cell and tissue samples. c, A bar chart shows the percentages of the mouse genome annotated as various types of cis-regulatory elements (Methods). DHS, DNase hypersensitive sites; TF, transcription factor. d, Pie charts show the fraction of the entire genome that is covered by each of the seven states in the mouse embryonic stem cells (mESC) and adult heart. e, Charts showing the number of replication timing (RT) boundaries in specific mouse and human cell types, and the total number of boundaries from all cell types combined. ESC, embryonic stem cell; endomeso, endomesoderm; NPC, neural precursor; GM06990, B lymphocyte; HeLa-S3, cervical carcinoma; IMR90, fetal lung fibroblast; EPL, early primitive ectoderm-like cell; EBM6/EpiSC, epiblast stem cell; piPSC, partially induced pluripotent stem cell; MEF, mouse embryonic fibroblast; MEL, murine erythroleukemia; CH12, B-cell lymphoma.
PowerPoint slide
A seven-state chromHMM model learned from four histone modifications in 15 mouse cell types or lines and six human cell lines is shown
The numbers represent the emission probabilities of each histone modification (column) in each chromatin state (row). The enriched histone modifications in each state are summarized in the first column. The fraction of genome assigned in each state was calculated (Supplementary Fig. 2). The average and variation of these fraction values across all included cell types/tissues are listed in the last two columns.
Figure 2Comparative analysis of the gene expression programs in human and mouse samples.
a, Principal component analysis (PCA) was performed for RNA-seq data for 10 human and mouse matching tissues. The expression values are normalized across the entire data set. Solid squares denote human tissues. Open squares denote mouse tissues. Each category of tissue is represented by a different colour. b, Gene expression variance decomposition (see Methods) estimates the relative contribution of tissue and species to the observed variance in gene expression for each orthologous human–mouse gene pair. Green dots indicate genes with higher between-tissue contribution and red dots genes with higher between-species contributions. c, Neighbourhood analysis of conserved co-expression (NACC) in human and mouse samples. The distribution of NACC scores for each gene is shown. d, A scatter plot shows the average of NACC score over the set of genes in each functional gene ontology category. Highlighted are those biological processes that tend to be more conserved between human and mouse and those processes that have been less conserved (see Supplementary Table 21 for list of genes).
PowerPoint slide
Extended Data Figure 1Clustering analysis of human and mouse tissue samples.
a, RNA-seq data from Ilumina Body Map (adipose, adrenal, brain, colon, heart, kidney, liver, lung, ovary and testis) were analysed together with that from the matched mouse samples using clustering analysis. Genes with high variance across tissues were used, resulting in cell samples clustering by tissues, not by species. b, Clustering employing genes with high variance between species shows clustering by species instead of tissues. c, Principal Component Analysis (PCA) was performed for RNA-seq data for 10 human and mouse matching tissues. The expression values are normalized within each species and we observed the clustering of samples by tissue types.
Figure 3Comparative analysis of the cis-elements predicted in the human and mouse genome.
a, Chart shows the fractions of the predicted mouse cis-regulatory elements with homologous sequences in the human genome (Methods). TFBS, transcription factor binding site. b, A bar chart shows the fraction of the DNA fragments tested positive in the reporter assays performed either using mouse embryonic stem cells (mESCs) or mouse embryonic fibroblasts (MEF). c, A chart shows the gene ontology (GO) categories enriched near the predicted mouse-specific enhancers. d, A bar chart shows the percentage of the predicted mouse-specific enhancers containing various subclasses of LTR and SINE elements. As control, the predicted mouse cis elements with homologous sequences in the human genome or random genomic regions are included.
PowerPoint slide
Extended Data Figure 2Comparative analysis of sequence conservation in the cis elements predicted in the human and mouse genome.
a, The predicted mouse-specific promoters and enhancers can function in human embryonic stem cells (hESCs). Percentages of predicted enhancers or promoters that test positive are shown in a bar chart. b, A bar chart shows the percentage of the predicted mouse-specific promoters containing various subclasses of LTR and SINE elements. As control, the predicted mouse cis elements with homologous sequences in the human genome or random genomic regions are included.
Figure 4Analysis of conservation in biochemical activities at the predicted mouse cis-regulatory sequences with human orthologues.
a, b, Histograms show the distribution of the NACC score for the chromatin modification H3K27ac signal at the predicted mouse promoters (a) or enhancers (b). c, d, Histograms show the distributions of NACC scores for DNase I signal at the promoter proximal (c) and distal (d) DNase I hypersensitive sites (DHS).
PowerPoint slide
Figure 5Chromatin landscape is stable within individual cell lineages.
a, Map displaying the distribution of chromatin states over the neighbourhoods of human–mouse one-to-one orthologue genes in CH12 cells. The gene neighbourhood intervals were sorted by the transcription level of each gene, shown by white dots. TSS, transcription start site. b, c, Distribution of chromatin states in human–mouse one-to-one orthologues that are differentially expressed genes between erythroid progenitor and erythroblasts models (b) and between erythroblast and megakaryocyte (c).
PowerPoint slide
Figure 6Human GWAS hits when mapped onto mouse genome are associated with specific chromatin states.
a, A self-organization map of histone modification H3K4me1 shows association between kidney H3K4me1 state and specific GWAS hits associated with urate levels (Methods). b, Liver-specific H3K36me3 unit shows enrichment in GWAS hits related to cholesterol, alcohol dependence and triglyceride levels. c, Brain-specific H3K27me3 high unit shows enrichment in GWAS SNPs associated with neurological disorders. d, Characterization of every unit with statistically significant GWAS enrichments in terms of highest histone modification signal in at least one sample. Units with no signal in top 100 map units for every histone modification are listed as none. RPKM, reads per kilobase per million reads mapped.
PowerPoint slide
Self-organizing map of histone modifications shows enrichment of human GWAS SNPs when mapped onto mouse
a, Kidney-specific H3K4me1 that shows enrichment of specific GWAS hits associated with urate levels and metabolites. b, Liver-specific H3K36me3 unit shows enrichment in GWAS hits related to cholesterol, alcohol dependence and triglyceride levels. c, Brain-specific H3K27me3 signals show enrichment in GWAS SNPs associated with neurological disorders.
Figure 7Replication timing boundaries preserved among tissues are conserved in mice and humans.
a, Depiction of a timing transition region (TTR) between the early and late replication domains. Early and late boundaries are defined as slope changes at either end of TTRs. b, Boundaries conserved between species for matched mouse and human cell types as a function of preservation among mouse cell types. c, Percentage of boundaries conserved between species (bar graph) and overall conservation of boundaries between comparable mouse and human cell types (CH12 versus GM06990, mESC versus hESC, mouse epiblast stem cells (mEpiSC) versus hESC) as a function of preservation among mouse cell types. d, A Venn diagram compares the replication timing boundaries identified in the mouse and human genome.
PowerPoint slide
Extended Data Figure 3Replication timing boundaries preserved among tissues are conserved during evolution.
a, Heat map of TTR overlap with positive (yellow) or negative (blue) slope. Replication timing (RT) boundaries were identified as clustered TTR endpoints (grey) above the 95th percentile (dashed line) of randomly resampled positions (black). b, Examples of constitutive boundaries (blue regions) and regulated boundaries (grey regions) highlighted. c, Spearman correlations between differences in chromatin feature enrichment and differences in RT in non-overlapping 200-kb windows. d, Percentage of boundaries preserved between the indicated number of human cell types. e, f, Distribution of boundary replication timing in mouse (e) and human (f) as a function of preservation level between cell types. g, Comparison of changes in replication timing versus various histone marks across a segment of mouse chromosome 6.