| Literature DB >> 31867040 |
Zhihua Ni1,2, Xiao-Yu Zhou1, Sidra Aslam1, Deng-Ke Niu1.
Abstract
Copy number changes in protein-coding genes are detrimental if the consequent changes in protein concentrations disrupt essential cellular functions. The dosage sensitivity of transcription factor (TF) genes is particularly interesting because their products are essential in regulating the expression of genetic information. From four recently curated data sets of dosage-sensitive genes (genes with conserved copy numbers across mammals, ohnologs, and two data sets of haploinsufficient genes), we compiled a data set of the most reliable dosage-sensitive (MRDS) genes and a data set of the most reliable dosage-insensitive (MRDIS) genes. The MRDS genes were those present in all four data sets, while the MRDIS genes were those absent from any one of the four data sets and with the probability of being loss of function-intolerant (pLI) values < 0.5 in both of the haploinsufficient gene data sets. Enrichment analysis of TF genes among the MRDS and MRDIS gene data sets showed that TF genes are more likely to be dosage-sensitive than other genes in the human genome. The nuclear receptor family was the most enriched TF family among the dosage-sensitive genes. TF families with very few members were also deemed more likely to be dosage-sensitive than TF families with more members. In addition, we found a certain number of dosage-insensitive TFs. The most typical were the Krüppel-associated box domain-containing zinc-finger proteins (KZFPs). Gene ontology (GO) enrichment analysis showed that the MRDS TFs were enriched for many more terms than the MRDIS TFs; however, the proteins interacting with these two groups of TFs did not show such sharp differences. Furthermore, we found that the MRDIS KZFPs were not significantly enriched for any GO terms, whereas their interacting proteins were significantly enriched for thousands of GO terms. Further characterizations revealed significant differences between MRDS TFs and MRDIS TFs in the lengths and nucleotide compositions of DNA-binding sites as well as in expression level, protein size, and selective force.Entities:
Keywords: C2H2-ZF; conserved copy number; disease; dosage-sensitive; haploinsufficiency; nuclear receptor; ohnolog; transcription factor
Year: 2019 PMID: 31867040 PMCID: PMC6904359 DOI: 10.3389/fgene.2019.01208
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Venn diagram displaying the variations among dosage-sensitive gene data sets obtained through different methods. The 853 dosage-sensitive genes shared by the four data sets obtained by Makino et al. (2013); Lek et al. (2016); Shihab et al. (2017), and Rice and McLysaght (2017b)were regarded as the most reliable dosage-sensitive (MRDS) genes. To obtain a data set of the most reliable dosage-insensitive (MRDIS) genes, the 9,459 genes that were absent from any of the four data sets were further filtered by discarding the genes with a pLI value > 0.5 in either the data set of Shihab et al. (2017)or the data set of Lek et al. (2016). In total, 5,579 MRDIS genes were obtained.
Figure 2Clear differences in the Gene Ontology enrichment between most reliable dosage-sensitive (MRDS) transcription factors (TFs) and most reliable dosage-insensitive (MRDIS) TFs. Due to space limitations, only the most significant terms of the MRDS TFs are displayed. The color of each circle represents the significance of the enrichment adjusted using the Benjamini-Hochberg (BH) procedure and the size of each circle represents the percentage of genes associated with that term.
Differences in the DNA-binding sites of the most reliable dosage-sensitive (MRDS) transcription factors (TFs) and most reliable dosage-insensitive (MRDIS) TFs.
| MRDS TFs (mean ± SD) | MRDIS TFs |(mean ± SD) |
| |
|---|---|---|---|
| Length (nucleotides) | 12.0 ± 3.84 | 12.7 ± 4.60 | 7.4 × 10−5 |
| A (%) | 28.2 ± 10.5 | 30.1 ± 10.7 | <10−6 |
| C (%) | 24.1 ± 13.3 | 23.5 ± 12.0 | 0.899 |
| G (%) | 24.3 ± 10.9 | 23.3 ± 10.1 | 0.047 |
| T (%) | 23.4 ± 10.3 | 23.1 ± 9.58 | 0.671 |
The p values were calculated using Mann-Whitney U tests.
Differences in the protein expression patterns of the most reliable dosage-sensitive (MRDS) transcription factors (TFs) and most reliable dosage-insensitive (MRDIS) TFs.
| MRDS genes | MRDIS genes |
| MRDS TFs | MRDIS TFs |
| |
|---|---|---|---|---|---|---|
| Number of genes | 684 | 2736 | 92 | 150 | ||
| Number of cell samples in which the proteins are | ||||||
| at high levels | 4.5 ± 6.6 | 3.1 ± 4.8 | <10−6 | 4.6 ± 6.6 | 3.8 ± 5.4 | 0.520 |
| at medium levels | 9.2 ± 6.6 | 7.6 ± 6.7 | <10−6 | 9.3 ± 7.2 | 9.8 ± 6.2 | 0.416 |
| at low levels | 5.9 ± 4.3 | 4.8 ± 4.2 | <10−6 | 5.4 ± 4.3 | 6.2 ± 4.4 | 0.116 |
| not detected | 11.3 ± 9.9 | 15.5 ± 11.1 | <10−6 | 11.8 ± 10.8 | 11.1 ± 9.5 | 0.960 |
In total, 31 cell samples were studied for each gene. The MRDS genes and the MRDIS genes are defined in and as well as in the main text. The p values were calculated using Mann-Whitney U tests.
Transcription factor (TF) genes in the most reliable dosage-sensitive (MRDS) gene data set.
| TF family | Total number in human genome | Observed number of MRDS TFs | Expected number in this data set | BH-adjusted |
|---|---|---|---|---|
| Nuclear receptor | 46 | 20 | 1.72 | 1.5 × 10−14 |
| Grainyhead | 6 | 4 | 0.22 | 6.0 × 10−4 |
| C2H2-ZF; Homeodomain | 4 | 3 | 0.15 | 0.002 |
| T-box | 17 | 5 | 0.64 | 0.003 |
| AP-2 | 5 | 3 | 0.19 | 0.004 |
| bHLH | 108 | 12 | 4.04 | 0.005 |
| RFX | 8 | 3 | 0.30 | 0.016 |
| Rel | 10 | 3 | 0.37 | 0.030 |
| Paired box | 4 | 2 | 0.15 | 0.040 |
| All TFs | 1,639 | 122 | 61.29 | 2.6 × 10−4 |
| Small-family TFs | ||||
| ≤5 members | 87 | 15 | 3.25 | 0.032 |
| ≤7 members | 126 | 23 | 4.71 | 0.004 |
| ≤9 members | 186 | 31 | 6.96 | 0.001 |
| ≤11 members | 238 | 37 | 8.90 | 5.9 × 10−4 |
The chi-square test (expected value > 5) and Fisher’s exact test (expected value ≤ 5) were used to test the overrepresentation or underrepresentation of the TF genes in this data set. The Benjamini-Hochberg (BH) procedure was used to compute the false discovery rate-adjusted p values. The MRDS genes were defined as the common dosage-sensitive genes among the four data sets obtained by Makino et al. (2013); Lek et al. (2016); Shihab et al. (2017)and Rice and McLysaght (2017b).
Transcription factor (TF) genes in the most reliable dosage-insensitive (MRDIS) gene data set.
| TF family | Total number in human genome | Observed number of MRDS TFs | Expected number in this data set | BH-adjusted |
|---|---|---|---|---|
| C2H2-ZF | 747 | 281 | 183 | 4 × 10−4 |
| Homeodomain | 196 | 15 | 48 | 0.001 |
| Nuclear receptor | 46 | 0 | 11 | 0.009 |
| CENPB | 11 | 8 | 2.7 | 0.010 |
| All TFs | 1,639 | 368 | 401 | 0.569 |
| Small-family TFs | ||||
| ≤5 members | 87 | 6 | 21 | 0.030 |
| ≤7 members | 126 | 8 | 31 | 0.004 |
| ≤9 members | 186 | 15 | 45 | 0.002 |
| ≤11 members | 238 | 27 | 58 | 0.009 |
The chi-square test (expected value > 5) and Fisher’s exact test (expected value ≤ 5) were used to test the overrepresentation or underrepresentation of the TF genes in this data set. The Benjamini-Hochberg (BH) procedure was used to compute the false discovery rate-adjusted p values. The MRDIS genes are the genes with pLI values < 0.5 in either the data set of Shihab et al. (2017)or the data set of Lek et al. (2016)that were not considered as dosage-sensitive genes in any of the four data sets obtained by Makino et al. (2013); Lek et al. (2016); Shihab et al. (2017)and Rice and McLysaght (2017b).
Comparison of coding sequence lengths.
| Number of genes | Mean ± SD (bp) |
| |
|---|---|---|---|
| MRDS genes | 853 | 2640 ± 1783 | <10−6 |
| MRDIS genes | 5569 | 1310 ± 1956 | |
| MRDS TFs | 122 | 2189 ± 1444 | 10−6 |
| MRDIS TFs | 368 | 1577 ± 698 | |
| TFs | 1608 | 1777 ± 1205 | <10−6 |
| Other genes | 19634 | 1710 ± 1826 |
The most reliable dosage-sensitive (MRDS) genes and the most reliable dosage-insensitive (MRDIS) genes are defined in and as well as in the main text. The p values were calculated using Mann-Whitney U tests.
Comparison of selective pressures.
| Number of genes |
|
| |
|---|---|---|---|
| MRDS genes | 853 | 0.062 ± 0.057 | <10−6 |
| MRDIS genes | 4188 | 0.257 ± 1.536 | |
| MRDS TFs | 122 | 0.066 ± 0.054 | <10−6 |
| MRDIS Fs | 226 | 0.256 ± 0.217 | |
| TFs | 1374 | 0.134 ± 0.141 | <10−6 |
| Other genes | 16804 | 0.161 ± 0.776 |
The most reliable dosage-sensitive (MRDS) genes and the most reliable dosage-insensitive (MRDIS) genes are defined in and as well as in the main text. The p values were calculated using Mann-Whitney U tests.