| Literature DB >> 31527706 |
Chao-Hsin Chen1, Chao-Yu Pan1,2, Wen-Chang Lin3,4.
Abstract
The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5'-tandem overlapping and 3'-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.Entities:
Mesh:
Year: 2019 PMID: 31527706 PMCID: PMC6746723 DOI: 10.1038/s41598-019-49802-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Numbers of overlapping genes according to chromosome positions. Five types of overlapping gene groups were noted: paired, triple, quadruple, quintuple, and above sextuple. Chromosomal distributions of all five overlapping gene groups are displayed.
Basic information of paired overlapping genes.
| Min. | Max. | Mean ± SD | |
|---|---|---|---|
| Gene Length | 176 | 1987245 | 68002 ± 129004.7 |
| Gene_F | 392 | 1825171 | 84594 ± 151411.4 |
| Gene_L | 176 | 1987245 | 51411 ± 99176.9 |
| Overlapping Interval | 0 | 284372 | 9343 ± 19234.9 |
| Block Length | 579 | 2071405 | 109128 ± 159514 |
| Distance_F | 1 | 4088861 | 69091 ± 227137.9 |
| Distance_L | 3 | 22512734 | 94364 ± 591428.2 |
| Average overlapping Interval (by each Chromosome) | 4497 | 37722 | 9344 ± 2877.2 |
Note:
Gene_F: Frontal gene of paired overlapping genes.
Gene_L: Lateral gene of paired overlapping genes.
Overlapping interval: Overlapped regions of Gene_F and Gene_L.
Block length: Length from the start position of the frontal gene to the end position of the lateral gene.
Distance_F: Distance between the up-stream gene and Gene_F.
Distance_L: Distance between Gene_L and the down-stream gene.
Chromosome distribution of paired overlapping genes.
| Chromo-some | 5ʹ-tandem overlapping | Convergent overlapping | Divergent overlapping | 3ʹ-tandem overlapping | Sub-Total | Total Genes in each chromosome | Overlapping gene % in chromosome |
|---|---|---|---|---|---|---|---|
| 1 | 26 | 176 | 110 | 28 | 340 | 1995 | 17.04% |
| 2 | 26 | 104 | 102 | 12 | 244 | 1209 | 20.18% |
| 3 | 18 | 102 | 92 | 22 | 234 | 1039 | 22.52% |
| 4 | 10 | 54 | 36 | 16 | 116 | 742 | 15.63% |
| 5 | 14 | 70 | 70 | 10 | 164 | 852 | 19.25% |
| 6 | 10 | 102 | 66 | 14 | 192 | 1007 | 19.07% |
| 7 | 16 | 70 | 50 | 28 | 164 | 874 | 18.76% |
| 8 | 4 | 50 | 52 | 6 | 112 | 656 | 17.07% |
| 9 | 6 | 64 | 50 | 10 | 130 | 753 | 17.26% |
| 10 | 6 | 62 | 42 | 6 | 116 | 712 | 16.29% |
| 11 | 12 | 100 | 80 | 22 | 214 | 1267 | 16.89% |
| 12 | 20 | 120 | 78 | 20 | 238 | 999 | 23.82% |
| 13 | 8 | 24 | 12 | 0 | 44 | 313 | 14.06% |
| 14 | 10 | 60 | 48 | 8 | 126 | 590 | 21.36% |
| 15 | 8 | 68 | 34 | 6 | 116 | 572 | 20.28% |
| 16 | 16 | 104 | 62 | 2 | 184 | 812 | 22.66% |
| 17 | 8 | 94 | 94 | 28 | 224 | 1138 | 19.68% |
| 18 | 6 | 18 | 10 | 0 | 34 | 263 | 12.93% |
| 19 | 24 | 98 | 108 | 28 | 258 | 1383 | 18.66% |
| 20 | 4 | 48 | 24 | 8 | 84 | 523 | 16.06% |
| 21 | 4 | 14 | 8 | 2 | 28 | 219 | 12.79% |
| 22 | 8 | 32 | 28 | 10 | 78 | 425 | 18.35% |
| X | 14 | 38 | 52 | 12 | 116 | 832 | 13.94% |
| Y | 2 | 0 | 0 | 0 | 2 | 45 | 4.44% |
| Total | 280 | 1672 | 1308 | 298 | 3558 | 19220 | 18.51% |
Figure 2RPKM distribution of the control group and four subtypes of paired overlapping genes. These scatter plots show the RPKM expression levels of the randomly selected non-overlapping gene group (control) and the four subtypes of paired overlapping protein-coding genes: 5ʹ-tandem overlap subtype; convergent overlap subtype; divergent overlap subtype; and 3ʹ-tandem overlap subtype.
Comparison of expression levels of four overlapping gene subtypes.
| Comparison of expression values | Statistical | FDR |
|---|---|---|
| 5ʹ-tandem – Convergent | <0.001 | <0.001 |
| 5ʹ-tandem – Divergent | <0.001 | <0.001 |
| 5ʹ-tandem – 3ʹ-tandem | 0.502 | 0.502 |
| Convergent – Divergent | <0.001 | <0.001 |
| Convergent – 3ʹ-tandem | 0.003 | 0.004 |
| Divergent – 3ʹ-tandem | <0.001 | <0.001 |
Kruskal–Wallis test is used to calculate p - value.
FDR: False Discovery Rate.
Figure 3Correlation coefficients of the control group and four subtypes of paired overlapping genes. Boxplots show the correlation coefficient levels of gene expression associations, which included randomly selected non-overlapping genes (the control group) and paired overlapping gene subtypes. Four subtypes of paired overlapping genes: 5ʹ-tandem overlap subtype; convergent overlap subtype; divergent overlap subtype; and 3ʹ-tandem overlap subtype. Fisher’s z test was used to evaluate the significance of differences between two correlation coefficients in subtypes of paired overlapping protein-coding genes. Comparing with the control group, the convergent and divergent overlapping protein-coding gene groups show significant difference. Among the four subtypes, the convergent and divergent overlapping protein-coding gene groups also showed significant variations.
Figure 4R2 values distribution of the control group and four subtypes of paired overlapping genes. Boxplots show the R2 values of gene expression associations, which included randomly selected non-overlapping genes (the control group) and paired gene subtypes. Four subtypes of paired overlapping genes: 5ʹ-tandem overlap subtype; convergent overlap subtype; divergent overlap subtype; and 3ʹ-tandem overlap subtype. Fisher’s z test was used to evaluate the significance of differences between R2 values in subtypes of paired overlapping protein-coding genes. Among the four subtypes, the convergent and divergent overlapping protein-coding gene groups also showed significant variations.
Figure 5Expression patterns of selected paired overlapping genes in five tissues. The expression of selected paired overlapping genes is illustrated. These gene pairs were selected according to their calculated high coefficient variance values. Both TUBA1A/TUBA1C and GALNT6/SLC4A8 pairs belonged to the divergent overlap subtype. CISD3/PCGF2 and ENAM/JCHAIN pairs belonged to the convergent overlap subtype. The four figures: (a) TUBA1A and TUBA1C; (b) GALNT6 and SLC4A8; (c) CISD3 and PCGF2; (d) ENAM and JCHAIN.