| Literature DB >> 22798478 |
Stavroula Skylaki1, Simon R Tomlinson.
Abstract
A number of studies have shown that transcriptome analysis in terms of chromosomal location can reveal regions of non-random transcriptional activity within the genome. Genomic clusters of differentially expressed genes can identify genomic patterns of structural organization, underlying copy number variations or long-range epigenetic regulation such as X-chromosome inactivation. Here we apply an integrative bioinformatics analysis to a collection of 315 freely available mouse pluripotent stem cell samples to discover transcriptional clusters in the genome. We show that over half of the analysed samples (56.83%) carry whole or partial-chromosome spanning clusters which recur in genomic regions previously implicated in chromosomal imbalances. Strikingly, we found that the presence of such large-clusters is linked to the differential expression of a limited number of genes, common to all samples carrying clusters irrespectively of the chromosome where the cluster is found. We have used these genes to train and test classification models that can predict samples that carry large-scale clusters on any chromosome with over 90% accuracy. Our findings suggest that there is a common downstream activation in these cells that affects a limited number of nodes. We propose that this effect is linked to selective advantage and identify potential driver genes.Entities:
Mesh:
Year: 2012 PMID: 22798478 PMCID: PMC3479167 DOI: 10.1093/nar/gks663
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The integrative analysis workflow. (A) Collection and global normalization of 481 publicly available samples. (B) Pearson’s correlation derived distance matrix and agglomerative hierarchical clustering with average linkage of the normalized data. (C) PGE analysis with MultiLevel Otsu thresholding for identification of recurrent aberrant localized expression across the dendrogram. (D) Catalogue of recurrent DE clusters. Filtering of samples according to the expression of pluripotency and lineage-specific markers resulting to the Nanog-high subgroup of 315 pluripotent samples. Identification of DE genes between the Nanog-high Normal versus Variant group, the Normal-Chr8 versus Variant-Chr8 groups and the Normal-Chr11 versus Variant-Chr11 groups. (E) Training and testing of classification models using PAM and SVMs for the prediction of Variant samples.
Figure 2.The circular karyotype of all predicted significantly over-expressed (red) and under-expressed (blue) DE clusters in the matrix and the genes that are DE between predicted Normal and Variant samples (red for up-regulated genes and blue for down-regulated). Larger effects observed in chromosomes 8, 11, 14 and X. For an example of the enhanced detection power of the approach, see also Supplementary Figure S1. For detailed description of the samples included in the analysis, see Supplementary Table S1.
Figure 3.Description of the large-scale chromosome spanning DE clusters in the Nanog-high subgroup. (A) Percentages of Variant and Normal samples in the differentiating or partially reprogrammed Nanog-low group of samples (n = 166). (B) Percentages of Variant and Normal samples in the Nanog-high pluripotent group of samples (n = 315). The downstream analysis was focused on this subgroup of 315 samples. (C) Comparison of the frequencies of predicted abnormalities per chromosome in the present study and two independent cytogenetic studies of mouse ESCs (15,17). (D) Venn-diagram representing the co-occurrence of large DE clusters between chromosomes 6, 8, 11 and 14. Figure constructed in Venny (46). (E) Breakdown of percentages for the aberrant chromosomes and the associated aberrant chromosome pairs. For a detailed comparison between mouse ESCs and iPSCs, see also Supplementary Figure S2.
Figure 4.Heatmap representation of the top 50 genes generated from SAM analysis. The panel of the three core pluripotency genes (Nanog, Pou5f1 (Oct4) and Sox2) at the bottom of each heatmap demonstrates the independency of the large DE clusters from the core pluripotency program in the stem cell populations. Figure constructed in GenePattern (47). (A) Heatmap of the global set where the Variant group consists of samples with any type of large-scale DE cluster. (B) Heatmap of the chromosome 8-specific set where the Variant-Chr8 group consists of any sample with a chromosome 8-specific DE cluster. (C) Heatmap of the chromosome 11-specific set where the Variant-Chr11 group consists of any sample with a chromosome 11-specific DE cluster. For the SAM-derived lists of DE genes for each comparison, refer to Supplementary Tables S2–S4.
Functional categories of the top 50 over- and under-expressed genes in the Variant feature set
| Functional category | Up-regulated genes ( | Down-regulated genes ( |
|---|---|---|
| Cell cycle/growth | ||
| Survival | ||
| Protein metabolic process | ||
| Genomic integrity | ||
| Cell death | ||
| Stem cells | ||
| Cancer | ||
| ECM | ||
| Other/unknown function |
The top 50 up- and down-regulated genes (ranked by FC) in the Global feature set (which in total includes 128 over-expressed and 543 under-expressed genes). In bold: candidates with literature evidence that supports functional significance in ESC self-renewal.
Performance of classifiers
| Classifier | Set | Feature selection | Accuracy | F1 score |
|---|---|---|---|---|
| PAM | Global | None | 0.82 | 0.88 |
| SVM | Variant | None | 0.86 | 0.89 |
| SVM | Global | SAM All | 0.92 | 0.94 |
| SVM | Global | RFE_SVM Top 100 | 0.89 | 0.92 |
| SVM | Global | RFE_SVM Top 10 | 0.55 | 0.59 |
| SVM | Chr8 | None | 0.73 | 0.68 |
| SVM | Chr8 | SAM All | 0.80 | 0.78 |
| SVM | Chr8 | RFE SVM Top 10 | 0.80 | 0.79 |
| SVM | Chr8 | 0.71 | 0.63 | |
| SVM | Chr11 | None | 0.73 | 0.29 |
| SVM | Chr11 | SAM All | 0.93 | 0.79 |
| SVM | Chr11 | RFE_SVM Top 10 | 0.90 | 0.61 |
Best performing classifiers (with bold we highlight the classifier trained with the top 50 features in each set). Feature selection was performed from the SAM output list by RFE. In the RFE SVM—No Chr8 feature set, genes mapped to chromosome 8 were excluded from the up-regulated list. Global: Normal and Variant, Chr8: Normal-Chr8 and Variant-Chr8, Chr11: Normal-Chr11 and Variant-Chr11.