| Literature DB >> 33882820 |
Edith Le Floch1, Christophe Battail2,3, Solène Brohard-Julien4,5,6, Vincent Frouin7, Vincent Meyer8, Smahane Chalabi8, Jean-François Deleuze8,9,10.
Abstract
BACKGROUND: The duplication of genes is one of the main genetic mechanisms that led to the gain in complexity of biological tissue. Although the implication of duplicated gene expression in brain evolution was extensively studied through comparisons between organs, their role in the regional specialization of the adult human central nervous system has not yet been well described.Entities:
Keywords: Brain region-specific expression; Gene co-expression network; Human central nervous system; Paralog; Small scale duplication
Year: 2021 PMID: 33882820 PMCID: PMC8059171 DOI: 10.1186/s12862-021-01794-w
Source DB: PubMed Journal: BMC Ecol Evol ISSN: 2730-7182
Fig. 1Specific expression of protein coding genes across human CNS regions. a Density plot of original Tau scores (blue line) calculated from the expression values of 16,427 protein coding genes, and permutated Tau scores (purple line) calculated from 1000 × 16,427 permutations. The region-specificity threshold of 0.525 (red dotted line) is defined, from permutated scores using the Benjamini–Hochberg corrected p-value of 0.01. b Unsupervised hierarchical clustering of region-specific genes expressed across CNS territories. The heatmap illustrates the mean gene expression calculated over samples of the cohort for each CNS region
Enrichments in CNS region-specific genes for the tested and reference gene groups
| Reference groupa | Tested group for CNS region-specificitya | Percentage of CNS region-specific genes in the tested group (%) | Chi-squared test P-valueb | Odds ratioc |
|---|---|---|---|---|
| Protein coding genes | Paralogous genes | 19.2 | 2.045E−18* | 1.48 |
| Paralogous genesd | WGD genes | 15.7 | 1.061E−18* | 0.64 |
| SSD genes | 22.6 | 9.022E−11* | 1.39 | |
| ySSD genes | 28.6 | 6.341E−18* | 1.82 | |
| SSD genes | ySSD genes | 28.6 | 3.483E−09* | 1.62 |
| oSSD genes | 15.6 | 2.729E−13* | 0.52 | |
| WGD + wSSD genes | wSSD genes | 24.0 | 5.185E−12* | 1.69 |
aAbbreviations for gene duplication categories: WGD (Whole-Genome Duplication), SSD (Small-Scale Duplication), ySSD (younger SSD occuring after WGD events), oSSD (older SSD occuring before WGD events) and wSSD (WGD-old SSD occuring around WGD events)
bApplication of Chi-squared tests (or of Fisher’s exact test when the Chi-squared test could not be applied) with a corrected p-value threshold = 7.14E-03 (Bonferroni correction for 7 statistical tests)
cThe odds ratio (> 1 or < 1) indicates the group (tested or non-tested respectively) in which there is an enrichment
dThe paralog reference group includes the genes belonging to WGD, SSD and WGD-SSD categories and the paralogs without annotation
Fig. 2Distribution of CNS region-specific genes across ranges of expression values. Barplots show a the number of expressed genes and b the percentage of region-specific genes for different expression bins. For each gene, we first calculated its expression value per CNS region by averaging over all the samples associated with each region. We then selected as reference value for each gene, the maximum of these averages of expression across the CNS regions. Gene expression values are given in RPKM (on a log2 scale) and each bin corresponds to 1 unit of the log2(RPKM + 1) values. The open square bracket for each bin means that the start value is included and the round bracket means that the end value is excluded from the bin. The last bin groups all gene expressions higher or equal to 127 RPKM
Fig. 3Association between the phyletic age of the duplication and the region-specificity. Boxplots show the distribution of Tau scores for paralogs grouped according to their phyletic age obtained from Chen et al. 2013. The range of phyletic ages corresponding to WGDs is indicated by a blue horizontal bar. The red horizontal line represents the threshold of region-specificity (Tau score = 0.525)
Enrichments in genes from homogeneously expressed families for the tested and reference gene groups
| Reference group | Tested group for homogeneous family expressiona | Percentage of homogeneous family genes in the tested group (%) | Chi-squared test P-valueb | Odds ratioc |
|---|---|---|---|---|
| Paralogous genesd | SSD genes | 3.3 | 2.777E−04* | 1.59 |
| ySSD genes | 5.2 | 5.758E−10* | 2.49 | |
| Paralogous genesd | Region-specific familiese | 45 | 1.691E−69* | 42.94 |
aAbbreviations for gene duplication categories: WGD (Whole-Genome Duplication), SSD (Small-Scale Duplication) and ySSD (younger SSD occuring after WGD events)
bApplication of Chi-squared tests (or of Fisher’s exact test when the Chi-squared test could not be applied) with a corrected p-value threshold = 1.67E-02 (Bonferroni correction for 3 statistical tests)
cThe odds ratio (> 1 or < 1) indicates the group (tested or non-tested respectively) in which there is an enrichment
dThe paralog reference group includes the genes belonging to WGD, SSD and WGD-SSD categories and the paralogs without annotation
eGenes included into region-specific families. Only genes specific to the major region are considered