| Literature DB >> 27307610 |
Jiangwen Sun1, Zongliang Jiang2, Xiuchun Tian2, Jinbo Bi1.
Abstract
MOTIVATION: A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes.Entities:
Mesh:
Year: 2016 PMID: 27307610 PMCID: PMC4908362 DOI: 10.1093/bioinformatics/btw278
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The 22 gene expression patterns included in our analysis for characterizing gene regulation in the human pre-implantation embryonic development
Dark (white) color indicates high (low) expression level.
The 18 gene expression patterns included in our analysis for characterizing gene regulation in mouse pre-implantation embryonic development
Dark (white) color indicates high (low) expression level.
Fig. 1.Sparse rank-one matrix factorization of X: . All values in X are assumed to be positive. Heavier color represents larger value at corresponding opposition in X
Variables and number of subjects in the three true consistent blocks between the two views of the synthetic datasets
| Block 1 | Block 2 | Block 3 | ||
|---|---|---|---|---|
| Variables | 1–3 | 4–6 | 1–3 | |
| 1–3 | 7–9 | 4–6 | ||
| Number of subjects | 240 | 200 | 160 |
The variable set is represented by i–j, which includes variables indexed from i through j (with both i and j included).
Fig. 2.Plot of mean and standard deviation of NMIs obtained by each compared method on the six synthetic datasets. The proposed method is labeled with MVBC
Fig. 3.Consistent blocks identified by all compaired methods on one of the six synthetic datasets. The proposed method is labeled with MVBC. Data matrixes are plotted with black spot indicating 0 and white spot indicating 1. Subjects in the plot are arranged according to the consistent blocks identified by each method. Two matrixes are plotted for each method, i.e. one per each view. The left most set of two matrix plots indicates the true consistent blocks in the data. See Table 1 for details of these three blocks
Conserved co-regulated gene clusters identified by our proposed method during the human and mouse pre-implantation embryonic development
| Co-regulated gene cluster | No. of genes | Mouse (Ooc,Pr,2c,4c,8c,M) | Human (Ooc,Pr,Zy,2c,4c,8c,M) | Gene Ontology |
|---|---|---|---|---|
| C1 | 1042 | M12 (1,1,0,0,0,0) | H12 (0,0,0,0,0,1,1) | Cell death and survival, cancer |
| C2 | 1510 | M9 (0,0,0,1,1,1) | H12 (0,0,0,0,0,1,1) | RNA post-transcriptional modification, protein synthesis, cellular growth and proliferation genes |
| C4 | 765 | M12 (1,1,0,0,0,0) | H11 (1,1,1,1,1,0,0) | Cell cycle, gene expression, cellular assembly and organization |
| C9 | 207 | M9 (0,0,0,1,1,1) | H11 (1,1,1,1,1,0,0) | Cancer, cell cycle, carbohydrate metabolism, lipid metabolism, small molecule biochemistry |
| C7 | 179 | M9 (0,0,0,1,1,1) | H1 (1,0,0,0,0,0,0) | DNA replication, recombination and repair, cell cycle |
| C10 | 158 | M9 (0,0,0,1,1,1) | H5 (0,0,0,0,1,0,0) | Cellular function and maintenance, cell cycle, reproductive system development and function |
| C5 | 143 | M9 (0,0,0,1,1,1) | H7 (0,0,0,0,0,0,1) | Embryonic development, |
| C6 | 54 | M9 (0,0,0,1,1,1) | H7 (0,0,0,0,0,0,1) | Cellular growth and proliferation |
| C8 | 53 | M12 (1,1,0,0,0,0) | H7 (0,0,0,0,0,0,1) | Amino acid Metabolism, small molecule biochemistry, carbohydrate metabolism, small molecule biochemistry |
| C3 | 51 | M12 (1,1,0,0,0,0) | H6 (0,0,0,0,0,1,0) | Hereditary disorder, neurological disease, cell-to-cell signaling and interaction, cell morphology |
| C13 | 38 | M3 (0,0,1,0,0,0) | H12 (0,0,0,0,0,1,1) | RNA processing |
| C14 | 34 | M3 (0,0,1,0,0,0) | H2 (0,1,0,0,0,0,0) | Organic alcohol transport |
| C11 | 33 | M16 (0,0,1,1,1,0) | H5 (0,0,0,0,1,0,0) | Sex differentiation, stem cell maintenance |
| C12 | 20 | M6 (0,0,0,0,0,1) | H11 (1,1,1,1,1,0,0) | Regulation of muscle cell differentiation, cell motion |
| C20 | 18 | M4 (0,0,0,1,0,0) | H22 (1,0,0,0,0,0,1) | Mitochondrial |
| C16 | 17 | M3 (0,0,1,0,0,0) | H9 (1,1,1,0,0,0,0) | Gene silencing by RNA, DNA metabolic process |
| C15 | 12 | M2 (0,1,0,0,0,0) | H12 (0,0,0,0,0,1,1) | Cellular amino acid derivative metabolic process |
| C21 | 12 | M5 (0,0,0,0,1,0) | H17 (0,1,1,1,0,0,0) | mRNA metabolic process |
| C17 | 11 | M1 (1,0,0,0,0,0) | H18 (0,0,0,0,1,1,0) | Transcription |
| C18 | 11 | M4 (0,0,0,1,0,0) | H2 (0,1,0,0,0,0,0) | Translation, protein transport |
| C19 | 10 | M5 (0,0,0,0,1,0) | H2 (0,1,0,0,0,0,0) | Reproduction |
| C22 | 8 | M8 (1,1,1,0,0,0) | H17 (0,1,1,1,0,0,0) | Mitosis II |
The size of each cluster, the patterns for both the human and mouse that have the strongest association with genes in each cluster as indicated by the component with the largest value in vector of Problem (2), and the top GO terms that are significantly associated to genes in these clusters (with P value ) are provided. Expression patterns are represented by a sequence of 0 and 1, with 0 denoting low level or no expression and 1 indicating high level expression.
Note: C5 and C6 are two distinct clusters, as besides the pattern with strongest support from genes in the cluster (data shown), there are other associated patterns that are distinct between these two clusters (data not shown).