| Literature DB >> 21143794 |
Yizhen Jia1, Thomas K F Wong, You-Qiang Song, Siu-Ming Yiu, David K Smith.
Abstract
BACKGROUND: Orthologues are genes in different species that are related through divergent evolution from a common ancestor and are expected to have similar functions. Many databases have been created to describe orthologous genes based on existing sequence data. However, alternative splicing (in eukaryotes) is usually disregarded in the determination of orthologue groups and the functional consequences of alternative splicing have not been considered. Most multi-exon genes can encode multiple protein isoforms which often have different functions and can be disease-related. Extending the definition of orthologue groups to take account of alternate splicing and the functional differences it causes requires further examination.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21143794 PMCID: PMC3005912 DOI: 10.1186/1471-2164-11-S4-S11
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Illustration of the alternative transcript similarity scoring scheme. For an orthologue group with two alternative transcripts in each species, a global multiple sequence alignment results in the alignment of homologous exons. Where an alignment of isoforms that differ in the number of exons is used to calculate the similarity score (eg P1-P2') a score <1 will be obtained (0.67 in this case if the exons are assumed to have equal length) as the extra exon is matched by a gap in the alignment. P1-P1' and P2-P2' will give scores of 1.
The distribution of numbers of proteins in an orthologue group
| Protein Group Size | Number |
|---|---|
| 2 | 6,085 |
| 3 | 2,703 |
| 4 | 1,413 |
| 5 | 748 |
| 6 | 347 |
| 7 | 230 |
| >8 | 328 |
| Total | 11,854 |
Numbers of sub-clusters by orthologue group
| Number of sub-clusters | Number of gene orthologue groups | Total | |
|---|---|---|---|
| 3 proteins in group | > 3 proteins in group | ||
| Only 1 cluster | 1,391 (51.5%) | 887 (28.9%) | 2,278 (39.5%) |
| ≥ 2 sub-clusters | 1,312 (48.5%) | 2,109 (71.1%) | 3,491 (60.5%) |
Figure 2The distribution of the average difffunc() values for each orthologous gene set for intra-group (left) and inter-group (right) comparisons.
Figure 3The distribution of the average diffdis() values for each orthologous gene set for intra-group (left) and inter-group (right) comparisons.
Figure 4The gene structures of SYNE1, ESR2 and AHNAK.
InterPro keyword labels of the protein isoforms of genes SYNE1, ESR2 and AHNAK
| Gene | Cluster# | Protein | SYNE1's keyword labels * | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SYNE1 | 1 | ENSMUSP00000051825 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | ENSP00000265368 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | ENSP00000308157 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | ENSP00000356216 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | ENSP00000356220 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | ENSP00000356224 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | ENSP00000390975 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | ENSP00000396024 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 2 | ENSMUSP00000093587 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | |
| 2 | ENSP00000318783 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | |
| 3 | ENSP00000356225 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | |
| 4 | ENSMUSP00000039440 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | |
| ESR2's keyword labels* | |||||||||||||||||||
| ESR2 | 1 | ENSP00000343925 (ERβw) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | ENSMUSP00000075932 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
| 1 | ENSMUSP00000106051 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
| 2 | ENSP00000351412 (ERβcx) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
| 2 | ENSP00000335551 (ERβcx) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
| 3 | ENSMUSP00000098849 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
| AHNAK's keyword labels * | |||||||||||||||||||
| AHNAK | 1 | ENSP00000257247 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | ||||||||
| 1 | ENSMUSP00000090632 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | |||||||||
| 2 | ENSP00000367263 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | |||||||||
| 2 | ENSMUSP00000090633 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | |||||||||
*SYNE1’s keyword labels: GO3779, GO16021, IPR001589, IPR001715, IPR002017, IPR012315, IPR016146, IPR018159, PF00307, PF00435, PF10541, PS00019, PS00020, PS50021, PS51049, SM00033, SM00150
ESR2’s keyword labels: PR00350, PF00104, SM00430, PR00047, PF00105, SM00399, PS00031, PS51030, PR00398, G3DSA:1.10.565.10, SSF48508, G3DSA:3.30.50.10, PF12497, PTHR11865, PTHR1186:SF216, SSF57716
AHNAK’s keyword labels: PF00595, SM00228, PS50106, SSF50156, PS01031, G3DSA:2.30.42.10, PTHR23348, PRHR23348:SF7, PTHR23348:SF8
The prediction of disordered regions in the protein isoforms of genes SYNE1, ESR2 and AHNAK
| Gene | Cluster# | Protein | # of disordered regions | Sum of region lengths | Protein length | Proportion of disordered regions |
|---|---|---|---|---|---|---|
| SYNE1 | 1 | ENSMUSP00000051825 | 94 | 4390 | 8800 | 49.9% |
| 1 | ENSP00000265368 | 89 | 4326 | 8798 | 49.2% | |
| 1 | ENSP00000308157 | 88 | 4275 | 8750 | 48.9% | |
| 1 | ENSP00000356216 | 89 | 4327 | 8798 | 49.2% | |
| 1 | ENSP00000356220 | 88 | 4276 | 8750 | 48.9% | |
| 1 | ENSP00000356224 | 89 | 4372 | 8798 | 49.7% | |
| 1 | ENSP00000390975 | 88 | 4321 | 8750 | 49.4% | |
| 1 | ENSP00000396024 | 88 | 4275 | 8750 | 48.9% | |
| 2 | ENSMUSP00000093587 | 9 | 410 | 950 | 43.2% | |
| 2 | ENSP00000318783 | 9 | 403 | 983 | 41.0% | |
| 3 | ENSP00000356225 | 29 | 1727 | 3322 | 52.0% | |
| 4 | ENSMUSP00000039440 | 16 | 616 | 1432 | 43.0% | |
| ESR2 | 1 | ENSP00000343925 | 5 | 243 | 531 | 45.8% |
| 1 | ENSMUSP00000075932 | 5 | 295 | 550 | 53.6% | |
| 1 | ENSMUSP00000106051 | 5 | 295 | 550 | 53.6% | |
| 2 | ENSP00000351412 | 5 | 244 | 496 | 49.2% | |
| 2 | ENSP00000335551 | 5 | 244 | 496 | 49.2% | |
| 3 | ENSMUSP00000098849 | 5 | 295 | 568 | 51.9% | |
| AHNAK | 1 | ENSP00000257247 | 1 | 67 | 150 | 44.7% |
| 1 | ENSMUSP00000090632 | 2 | 120 | 185 | 64.9% | |
| 2 | ENSP00000367263 | 6 | 5716 | 5891 | 97.0% | |
| 2 | ENSMUSP00000090633 | 10 | 5431 | 5657 | 96.0% | |