Literature DB >> 25336621

DBTMEE: a database of transcriptome in mouse early embryos.

Sung-Joon Park1, Katsuhiko Shirahige2, Miho Ohsugi3, Kenta Nakai4.   

Abstract

DBTMEE (http://dbtmee.hgc.jp/) is a searchable and browsable database designed to manipulate gene expression information from our ultralarge-scale whole-transcriptome analysis of mouse early embryos. Since integrative approaches with multiple public analytical data have become indispensable for studying embryogenesis due to technical challenges such as biological sample collection, we intend DBTMEE to be an integrated gateway for the research community. To do so, we combined the gene expression profile with various public resources. Thereby, users can extensively investigate molecular characteristics among totipotent, pluripotent and differentiated cells while taking genetic and epigenetic characteristics into consideration. We have also designed user friendly web interfaces that enable users to access the data quickly and easily. DBTMEE will help to promote our understanding of the enigmatic fertilization dynamics.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25336621      PMCID: PMC4383872          DOI: 10.1093/nar/gku1001

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Mammalian fertilization is a dynamically and precisely regulated developmental step, where both maternal and paternal genomes coordinate with each other to confer embryonic developmental competence on sperm-oocyte fusion cells. During fertilization, specialized gene regulatory programs that cause wave-like transitions of gene expression have to be tightly controlled (1,2). Deciphering mechanisms underlying such gene regulation programs is a fundamental challenge in human infertility and stem cell biology; that is, how embryos avoid abortive development and acquire totipotency. Recent advances in high-throughput DNA sequencing technology leading to the rising influx of data have rapidly forwarded our understanding of the enigmatic fertilization dynamics; for example, a mechanism of epigenetic maintenance in germ cells analyzed by genome-wide DNA methylations (3) and histone modifications (4), temporal gene expression patterns detected by whole-transcriptome profile of single cells (5) and huge number of cells (6), a mechanism of zygotic reprogramming uncovered by identifying parental factors that also enhance the generation of induced pluripotent stem cells (iPSCs) (7). Integrating of these multiple resources makes it possible to extensively investigate the molecular basis at early embryogenesis. Here, we present a novel database, DBTMEE (DataBase of Transcriptome in Mouse Early Embryos), which centralizes gene expression profile at early developmental stages (http://dbtmee.hgc.jp/). Aim of this database is to provide the gene catalog that was established by our ultralarge-scale RNA-seq analysis with high-quality 1.5 × 105 oocytes (6). These oocytes were either used for in vitro fertilization or parthenogenetic activation in order to contribute to the functional characterization of parental genomes (6). To provide an integrated gateway to the research community, DBTMEE combines our gene expression profile with various public resources, including RNA-seq data of embryonic stem cells (ESCs) and iPSCs (8). Since it has been reported that factors enriched in oocytes and zygotes are good candidates to enhance the reprogramming of somatic cells to iPSCs (7), the inclusion of pluripotent cells will promote stem cell biology. Users can explore (dis)similarly expressed genes across totipotent, pluripotent and differentiated cells while taking genetic and epigenetic characteristics into consideration.

DATA COLLECTION

Ultralarge-scale whole-transcriptome profile

DBTMEE provides a searchable gene expression profile during normal and parthenogenetic early embryo developments that were established previously (6). In brief, we extracted high-quality metaphase II oocytes (Oo) and one-cell stage (1C), two-cell (2C) and four-cell (4C) stage embryos by in vitro fertilization. In addition, we collected mouse embryonic fibroblast cells (MEFs) and parthenogenetic 1C (p1C) and 4C (p4C) embryos. Total RNAs from ≥104 × 2 cells per stage were sequenced by SOLiD system and analyzed by Bioinformatics tools, detecting 17 486 genes that exhibited intriguing expression patterns (Figures 1 and 2). The gene expression patterns categorized by hierarchical clustering are browsable from the front page of DBTMEE (Figure 3A-4).
Figure 1.

Gene expression patterns observed from RNA-seq data in DBTMEE. Oo, oocytes; 1C, one-cell; 2C, two-cell; 4C, four-cell stage embryos; ZGA, zygotic gene activation; MGA, mid-preimplantation gene activation; FPKM, fragments per kilobase of exon per million mapped reads.

Figure 2.

Gene expression profile in DBTMEE ver. 2.0 that contains 21 206 coding and non-coding RefSeq genes. MEF, mouse embryonic fibroblast; p1C, parthenogenetic 1C; p4C, parthenogenetic 4C; iPSC, induced pluripotent stem cell.

Figure 3.

Screenshots of the web interfaces in DBTMEE. The front page (A) displays search boxes that handle a regular expression (A-1, 2) and gene catalog established by clustering of gene expression patterns (A-4). Users can access information of a gene or a set of genes on multiple tables in single web page (B) that contains hyperlinks (B-1) to each gene information page (C). Through the basic and advanced search options in browser interfaces (E) appearing to the left menu (A-3), users can handle more specific search results from a particular table (D).

Gene expression patterns observed from RNA-seq data in DBTMEE. Oo, oocytes; 1C, one-cell; 2C, two-cell; 4C, four-cell stage embryos; ZGA, zygotic gene activation; MGA, mid-preimplantation gene activation; FPKM, fragments per kilobase of exon per million mapped reads. Gene expression profile in DBTMEE ver. 2.0 that contains 21 206 coding and non-coding RefSeq genes. MEF, mouse embryonic fibroblast; p1C, parthenogenetic 1C; p4C, parthenogenetic 4C; iPSC, induced pluripotent stem cell. Screenshots of the web interfaces in DBTMEE. The front page (A) displays search boxes that handle a regular expression (A-1, 2) and gene catalog established by clustering of gene expression patterns (A-4). Users can access information of a gene or a set of genes on multiple tables in single web page (B) that contains hyperlinks (B-1) to each gene information page (C). Through the basic and advanced search options in browser interfaces (E) appearing to the left menu (A-3), users can handle more specific search results from a particular table (D).

Multiple public resources

Gene expression profiles established by heterogeneous platforms are valuable resources, giving complementary and extensive information to users. We downloaded three microarray profiles (9–11), and analyzed RNA-seq data sets that were prepared by different sequencing protocols at various embryonic stages and cell types, such as Oo and 2C embryos (12), spermatozoa (3), single-cell oocytes (5), ESCs (8) and iPSCs (8). Also, we downloaded mass spectrometry proteomic data that detected proteins in oocytes and zygotes to be used for linking mRNA to its products (13). In addition, we prepared DNA methylation data (3) and ChIP-seq data of histone variants (4) in spermatozoa, then profiled these epigenetic features for each gene promoter, ±2kbp and ±5kbp from transcription start site (TSS).

MANIPULATION OF DATABASE CONTENTS

We compiled tables from the above-mentioned collections, then built a database with these tables using MongoDB (Ver. 2.4.3, http://www.mongodb.com/) coupled with PHP language (http://php.net/). Since MongoDB offers high scalability, we can efficiently handle a wide range of collections as database tables. We assigned the unique identifiers to RefSeq genes (DBTMEE IDs) to manipulate the database efficiently. Genomic information was prepared from NCBI (http://www.ncbi.nlm.nih.gov/gene/) and MGI (http://www.informatics.jax.org/). The mouse reference genome, mm9 assembly, was downloaded from UCSC genome browser (http://genome.ucsc.edu/). In our previously published paper (6), we analyzed RNA-seq data using the TopHat (ver. 1.4.1)-Cufflinks (ver. 2.0.2) pipeline (14) and RefSeq annotation (release 46, http://www.ncbi.nlm.nih.gov/refseq/). The quantification of RNA abundances in terms of FPKM, fragments per kilobase of exon per million fragments mapped reads, relies on given RNA-seq data sets. For this reason, we generated many FPKM tables in different combinations with developmental stages and annotations. More information can be found in the help web page of DBTMEE. To provide the reproducibility of our original study, we have deposited these results into DBTMEE (ver. 1.0). Meanwhile, we have updated the database contents by utilizing the newly released Cufflinks package (ver. 2.2.1) and RefSeq annotation (release 65). Notably, as shown in Figure 2, we calculated FPKMs with whole sequenced reads from 10 embryo stages and cell types at a time, and generated an all-in-one table that contains 21 206 genes expressed at least one stage (DBTMEE ver. 2.0). This table presents not only gene expression transitions during early embryo development but (dis)similar gene expression levels among totipotent, parthenogenetic, pluripotent and differentiated cells.

WEB INTERFACES

Three ways of gene information search and access

To offer high accessibility and usability, we opened up three ways that guide users to searching and filtering records for a particular gene or a set of genes from all the tables. First, users can enter case-insensitive keywords (i.e. gene names, RefSeq IDs, DBTMEE IDs) into the search field always appearing in the top-right corner (Figure 3A-1), returning records for them from all the tables on single web page (Figure 3B). The keywords highlighted in the result can contain space-delimited multiword and a wildcard ‘*’ that matches all possible strings (Table 1). This may be helpful when users want to find members of a certain gene family.
Table 1.

Example of keywords used in the search options.

InputImplication
Tet1Exact match to gene names or aliases
DBTMEE:2021159Exact match to DBTMEE IDs
Tet1 Tet2Exact match for ‘Tet1’ or ‘Tet2’a
*Sox*All gene names or aliases containing ‘Sox’
NM_008242*All RefSeq containing ‘NM_008242’ at the beginning
*All records

aOR search operator handles multiword

aOR search operator handles multiword Alternatively, users can directly access gene information by entering an exact keyword into a search box (Figure 3A-2) or through ‘Gene Search’ of the left-side menu (Figure 3A-3). The search result includes genomic features, GO terms and hyperlinks to external databases. It further contains records found from the tables (Figure 3C). In any case, each one of found records contains the unique DBTMEE IDs linked to its gene information page (Figure 3B-1). As another option to access DBTMEE data, we installed ‘Table Browser’, ‘Regulation’ and ‘Epigenome’ interfaces, which can be found from the left-side menu (Figure 3A-3). The tutorial and help web pages contain detailed information on how to use the interfaces. Through this way, users can browse a particular table by setting basic search fields and advanced search options, then download the results as TSV (tab-separated values) file (Figure 3E). In the regulation browser, we installed Cytoscape Web (http://cytoscapeweb.cytoscape.org/) for visualizing TF-bindings to a gene promoter we inferred (Figure 3D-1).

Basic and advanced search options

In the browser interfaces (Figure 3D), we equipped the basic and advanced search options that enable users to handle more specific search results. For example, in the ‘Table Browser’ (Figure 3E), users must enter a simple regular expression (Table 1) into the ‘3. Gene’ field after selecting a table from the ‘2. Table selection’ field. The ‘1. Category’ field helps to focus the search on tables in a particular category. To download the search result, users have to select ‘TSV text file’ from the ‘5. Output format’ field. In the epigenome browser, this field contains an extra option ‘TSV+Gene expression (V2.0)’ that serves both epigenetic features and gene expression levels in one text file. The content of advanced options that is combined by AND search operator is dynamically changed corresponding to a table user selected. For example, in the case of tables for gene expression profiles, users can set thresholds for FPKMs and confine genes that exhibit a specific expression pattern.

WORKING EXAMPLE

As an example of DBTMEE usage, we analyzed zygotic gene expression changes that might be associated with paternal histone bivalent marks of H3 lysine 4 trimethylation (H3K4me3) and H3 lysine 27 trimethylation (H3K27me3). In the web interface of epigenome browser, we chose ‘SPR_Histone_Erkek_+-2KfromTSS’, implying the histone enrichment within ±2kbp from TSS in spermatozoa that was established by Erkek et al. (4), and ‘TSV+Gene expression (V2.0)’ options. Then, we entered ‘*’ into the ‘2. Gene’ field, and set the advanced search options with >1.0 for four fields: H3K4me_1, H3K4me_2, H3K27me_1 and H3K27me_2. Since the advanced search utilizes AND operator, this setting implies that gene promoters are marked by all four ChIP replications of the histones in sperm (i.e. 2-fold enrichment in ChIP against input). This returned 3543 genes as a TSV file that contained their FPKMs. In addition, by setting the options with ≤1.0 for the four fields, we downloaded 3686 genes that were marked by neither histones. Although we have not installed any analytical tools, users can further analyze the downloaded file as desired. For example, after gathering genes that were not expressed at both p1C and p4C (≤1.0 in FPKM) but were expressed at least one stage, we could prepare 658 and 901 genes that were marked by both histones and by neither histones, respectively (Supplementary Tables S1 and S2). These genes belong to one of the zygotic expression patterns shown in Figure 3A-4. To investigate the influence of paternal histone bivalent marks on early development, we plotted FPKM distributions along the stages in each of the zygotic expression patterns (Figure 4). Interestingly, after fertilization, the bivalently marked genes categorized into 4-cell transient, maternal RNA and minor ZGA patterns are likely to be actively transcribed. Although experimental validations are required, this result might be helpful to identify genes that are affected by sperm transmission and to determine the roles of epigenetic inheritance from sperm.
Figure 4.

Working example of the analysis of paternal histone bivalent marks with DBTMEE. 658 genes marked by both H3K4me3 and H3K27me3 in sperm and not expressed in parthenogenesis belong to one of 10 zygotic expression patterns. These genes that particularly exhibit 4-cell transient, maternal RNA and minor ZGA patterns are preferentially up-regulated after fertilization. Asterisks represent the level of statistical significance (P-value of Wilcoxon test): *P < 0.01, **P < 0.001, ***P < 0.0001. Ne, genes marked by neither histones; Bi, genes marked by both histones.

Working example of the analysis of paternal histone bivalent marks with DBTMEE. 658 genes marked by both H3K4me3 and H3K27me3 in sperm and not expressed in parthenogenesis belong to one of 10 zygotic expression patterns. These genes that particularly exhibit 4-cell transient, maternal RNA and minor ZGA patterns are preferentially up-regulated after fertilization. Asterisks represent the level of statistical significance (P-value of Wilcoxon test): *P < 0.01, **P < 0.001, ***P < 0.0001. Ne, genes marked by neither histones; Bi, genes marked by both histones. Users can find more introductory working examples from the tutorial web page (http://dbtmee.hgc.jp/tutorial/tutorial.php).

CONCLUSIONS

DBTMEE provides the transcriptome profile that we established from mouse early embryos in an unprecedented scale of experiment. To make our ultralarge-scale transcriptome profile more useful, we built the database with not only the results of downstream analyses of the profile but also the deposition of related public resources. We have also designed the user friendly web interfaces that enable users to access the data quickly. Because the high-scalable system manipulates information on the database, we can cover a wide range of collections that will help shed light on the enigmatic fertilization dynamics; e.g. high-resolution single-cell transcriptome data of embryos, allelic-specific gene expression, asymmetric genetic and epigenetic features in blastomeres. We aim to further improve and update the data in future releases, and will implement additional web utilities.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.
  14 in total

1.  Dynamics of global gene expression changes during mouse preimplantation development.

Authors:  Toshio Hamatani; Mark G Carter; Alexei A Sharov; Minoru S H Ko
Journal:  Dev Cell       Date:  2004-01       Impact factor: 12.270

Review 2.  The molecular foundations of the maternal to zygotic transition in the preimplantation embryo.

Authors:  Richard M Schultz
Journal:  Hum Reprod Update       Date:  2002 Jul-Aug       Impact factor: 15.610

3.  Transcript profiling during preimplantation mouse development.

Authors:  Fanyi Zeng; Don A Baldwin; Richard M Schultz
Journal:  Dev Biol       Date:  2004-08-15       Impact factor: 3.582

4.  Proteome of mouse oocytes at different developmental stages.

Authors:  Shufang Wang; Zhaohui Kou; Zhiyi Jing; Yu Zhang; Xinzheng Guo; Mengqiu Dong; Ian Wilmut; Shaorong Gao
Journal:  Proc Natl Acad Sci U S A       Date:  2010-09-27       Impact factor: 11.205

Review 5.  Roadmap to embryo implantation: clues from mouse models.

Authors:  Haibin Wang; Sudhansu K Dey
Journal:  Nat Rev Genet       Date:  2006-03       Impact factor: 53.242

6.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.

Authors:  Cole Trapnell; Adam Roberts; Loyal Goff; Geo Pertea; Daehwan Kim; David R Kelley; Harold Pimentel; Steven L Salzberg; John L Rinn; Lior Pachter
Journal:  Nat Protoc       Date:  2012-03-01       Impact factor: 13.491

7.  mRNA-Seq whole-transcriptome analysis of a single cell.

Authors:  Fuchou Tang; Catalin Barbacioru; Yangzhou Wang; Ellen Nordman; Clarence Lee; Nanlan Xu; Xiaohui Wang; John Bodeau; Brian B Tuch; Asim Siddiqui; Kaiqin Lao; M Azim Surani
Journal:  Nat Methods       Date:  2009-04-06       Impact factor: 28.547

8.  Embryonic stem cell potency fluctuates with endogenous retrovirus activity.

Authors:  Todd S Macfarlan; Wesley D Gifford; Shawn Driscoll; Karen Lettieri; Helen M Rowe; Dario Bonanomi; Amy Firth; Oded Singer; Didier Trono; Samuel L Pfaff
Journal:  Nature       Date:  2012-07-05       Impact factor: 49.962

9.  Contribution of intragenic DNA methylation in mouse gametic DNA methylomes to establish oocyte-specific heritable marks.

Authors:  Hisato Kobayashi; Takayuki Sakurai; Misaki Imai; Nozomi Takahashi; Atsushi Fukuda; Obata Yayoi; Shun Sato; Kazuhiko Nakabayashi; Kenichiro Hata; Yusuke Sotomaru; Yutaka Suzuki; Tomohiro Kono
Journal:  PLoS Genet       Date:  2012-01-05       Impact factor: 5.917

10.  Nanog-independent reprogramming to iPSCs with canonical factors.

Authors:  Ava C Carter; Brandi N Davis-Dusenbery; Kathryn Koszka; Justin K Ichida; Kevin Eggan
Journal:  Stem Cell Reports       Date:  2014-01-31       Impact factor: 7.765

View more
  34 in total

1.  Paternal H3K4 methylation is required for minor zygotic gene activation and early mouse embryonic development.

Authors:  Keisuke Aoshima; Erina Inoue; Hirofumi Sawa; Yuki Okada
Journal:  EMBO Rep       Date:  2015-04-29       Impact factor: 8.807

2.  Atypical GATA protein TRPS1 plays indispensable roles in mouse two-cell embryo.

Authors:  Yue Liu; Songhua Xu; Xiuli Lian; Yang Su; Yuhuan Zhong; Ruimin Lv; Kaien Mo; Huimin Zhu; Wang Xiaojiang; Lixuan Xu; Shie Wang
Journal:  Cell Cycle       Date:  2019-02-12       Impact factor: 4.534

3.  EXOSC10 sculpts the transcriptome during the growth-to-maturation transition in mouse oocytes.

Authors:  Di Wu; Jurrien Dean
Journal:  Nucleic Acids Res       Date:  2020-06-04       Impact factor: 16.971

4.  EpiDenovo: a platform for linking regulatory de novo mutations to developmental epigenetics and diseases.

Authors:  Fengbiao Mao; Qi Liu; Xiaolu Zhao; Haonan Yang; Sen Guo; Luoyuan Xiao; Xianfeng Li; Huajing Teng; Zhongsheng Sun; Yali Dou
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

5.  Locus-specific expression of transposable elements in single cells with CELLO-seq.

Authors:  Andrian Yang; Christopher E Laumer; Rebecca V Berrens; Aaron T L Lun; Florian Bieberich; Cheuk-Ting Law; Guocheng Lan; Maria Imaz; Joseph S Bowness; Neil Brockdorff; Daniel J Gaffney; John C Marioni
Journal:  Nat Biotechnol       Date:  2021-11-15       Impact factor: 68.164

6.  Polq-Mediated End Joining Is Essential for Surviving DNA Double-Strand Breaks during Early Zebrafish Development.

Authors:  Summer B Thyme; Alexander F Schier
Journal:  Cell Rep       Date:  2016-04-14       Impact factor: 9.423

7.  TRPM7-like channels are functionally expressed in oocytes and modulate post-fertilization embryo development in mouse.

Authors:  Ingrid Carvacho; Goli Ardestani; Hoi Chang Lee; Kaitlyn McGarvey; Rafael A Fissore; Karin Lykke-Hartmann
Journal:  Sci Rep       Date:  2016-09-29       Impact factor: 4.379

8.  Oocyte competence is maintained by m6A methyltransferase KIAA1429-mediated RNA metabolism during mouse follicular development.

Authors:  Yue Hu; Zhangyi Ouyang; Xuesong Sui; Meijie Qi; Mingrui Li; Yuanlin He; Yumeng Cao; Qiqi Cao; Qianneng Lu; Shuai Zhou; Lu Liu; Li Liu; Bin Shen; Wenjie Shu; Ran Huo
Journal:  Cell Death Differ       Date:  2020-02-24       Impact factor: 15.828

9.  Defective folate metabolism causes germline epigenetic instability and distinguishes Hira as a phenotype inheritance biomarker.

Authors:  Georgina E T Blake; Xiaohui Zhao; Hong Wa Yung; Graham J Burton; Anne C Ferguson-Smith; Russell S Hamilton; Erica D Watson
Journal:  Nat Commun       Date:  2021-06-17       Impact factor: 14.919

10.  Single-cell RNA-Seq reveals a highly coordinated transcriptional program in mouse germ cells during primordial follicle formation.

Authors:  Yuanlin He; Qiuzhen Chen; Juncheng Dai; Yiqiang Cui; Chi Zhang; Xidong Wen; Jiazhao Li; Yue Xiao; Xiaoxu Peng; Mingxi Liu; Bin Shen; Jiahao Sha; Zhibin Hu; Jing Li; Wenjie Shu
Journal:  Aging Cell       Date:  2021-06-26       Impact factor: 9.304

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.