| Literature DB >> 29745835 |
Tizian Schulz1,2, Jens Stoye1, Daniel Doerr3.
Abstract
BACKGROUND: Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species.Entities:
Keywords: Gene teams; Graph teams; Hi-C data; Single-linkage clustering; Spatial gene cluster
Mesh:
Year: 2018 PMID: 29745835 PMCID: PMC5998887 DOI: 10.1186/s12864-018-4622-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Illustrations of a sequential and b spatial gene clusters. Genes with the same colors belong to the same gene family
Fig. 2Examples of δ-teams and δ-clusters in graphs without families a and with families b. δ-Teams and -clusters are highlighted by areas of shared color. Edge labels indicate weights. Vertices in b are represented by their family identifier
Fig. 3Results of Algorithm 3 on intrachromosomal Hi-C datasets of human and mouse for different values of δ. The plots show for each threshold value δ, the number of discovered 1D and 3D gene clusters (upper left) and their average sizes (upper right) in the spatial and sequential graphs, respectively, the average number of gained genes in the 3D gene clusters versus the 1D gene clusters (lower left), and the computation time for the 3D gene clusters (lower right)
Top 20 3D gene clusters with smallest p-value using intrachromosomal Hi-C data
| Name | Genes | Penalty | |
|---|---|---|---|
| HOXC ∗ | HOTAIR_2, HOTAIR_3, HOXC10, HOXC11, HOXC12, HOXC13, HOXC4, HOXC5, HOXC6, HOXC8, HOXC9 | 0.006 | 1·10−7 |
| OR | OR5AP2, OR5AR1, OR5M1, OR5M10, OR5M11, OR5M3, OR5M8, OR5M9, OR5R1, OR8K1, OR8U1, OR9G1, OR9G | 0 | 1·10−7 |
| IGHV ∗ | IGHV3-11, IGHV3-13, IGHV3-20, IGHV3-21, IGHV3-23, IGHV3-30, IGHV3-33, IGHV3-35, IGHV3-64D, IGHV3-7 | 0 | 1·10−7 |
| KRTAP ∗ | KRTAP13-1, KRTAP13-2, KRTAP13-3, KRTAP13-4, KRTAP15-1, KRTAP24-1, KRTAP26-1, KRTAP27-1 | 0 | 1·10−7 |
| TAS2R | TAS2R14, TAS2R19, TAS2R20, TAS2R31, TAS2R46, TAS2R50 | 0 | 3.70·10−6 |
| OR | OR2A12, OR2A14, OR2A25, OR2A5 | 0 | 9.09·10−5 |
| ZSCAN4 | NKAPL, ZKSCAN3, ZKSCAN4, ZSCAN26 | 0.006 | 0.00015 |
| TRAV | TRAV12-1, TRAV12-2, TRAV12-3, TRAV13-1, TRAV13-2, TRAV17, TRAV18, TRAV19, TRAV22, TRAV23DV6, TRAV5, TRAV8-1, TRAV8-3, TRAV9-2 | 0 | 0.00037 |
| OR | OR5AC1, OR5H1, OR5H14 | 0 | 0.00037 |
| IGHV + | IGHV1-18, IGHV1-24, IGHV1-3 | 0 | 0.00037 |
| BTN3 + | BTN3A1, BTN3A2, BTN3A3 | 0 | 0.00037 |
|
| GTF2A1L, STON1, STON1-GTF2A1L | 0 | 0.00037 |
| CYP3A | CYP3A4, CYP3A43, CYP3A5, CYP3A7, CYP3A7-CYP3A51P | 0.028 | 0.00037 |
|
| ADGRE1, C3, CD70, GPR108, TNFSF14, TRIP10, VAV1 | 0.057 | 0.00047 |
| ZNF | CCDC106, FIZ1, U2AF2, ZNF524, ZNF580, ZNF784, ZNF865 | 0.097 | 0.00110 |
| OR | OR8B12, OR8B4, OR8B8 | 0.012 | 0.00376 |
| KIR | KIR2DL1, KIR2DL3, KIR2DL4, KIR2DS4, KIR3DL1, KIR3DL2, KIR3DL3 | 0.179 | 0.00243 |
| MMP | MMP12, MMP13, MMP3 | 0.035 | 0.00486 |
| TSPY + | TSPYL1, TSPYL4 | 0 | 0.00504 |
| SIGLEC + | SIGLEC12, SIGLEC8 | 0 | 0.00504 |
Clusters that can be found as split sub-clusters in the 1D results are marked by an asterisk. Those completely absent in the 1D results are marked by a plus
Fig. 4Results of Algorithm 3 on interchromosomal Hi-C datasets of human and mouse for different values of δ. The plots show for each threshold value δ, the number of discovered 1D and 3D gene clusters in the spatial and sequential graphs, respectively (left) and the computation time for the 3D gene clusters (right)
Interchromosomal gene cluster candidates identified by Algorithm 3 with δ=700 and δ=750
| Name | Genes | |
|---|---|---|
| USP17L | 1·10−7 | |
| OR4F | 1.40·10−5 | |
| GGT | 9.09·10−5 | |
|
| 9.09·10−5 | |
| OR4F | 0.0050417 | |
|
| – | |
|
| – | |
|
| – | |
|
| – | |
|
| – | |
|
| – | |
|
| – | |
|
| – |
The upper part shows all clusters that received a significant p-value in the GO analysis. The lower part lists clusters containing no GO-annotated genes, but were identified by manual inspection as corresponding to associated gene families