| Literature DB >> 26941008 |
Ben Smithers1, Matt E Oates1, Peter Tompa2, Julian Gough1.
Abstract
We have identified that the collagen helix has the potential to be disruptive to analyses of intrinsically disordered proteins. The collagen helix is an extended fibrous structure that is both promiscuous and repetitive. Whilst its sequence is predicted to be disordered, this type of protein structure is not typically considered as intrinsic disorder. Here, we show that collagen-encoding proteins skew the distribution of exon lengths in genes. We find that previous results, demonstrating that exons encoding disordered regions are more likely to be symmetric, are due to the abundance of the collagen helix. Other related results, showing increased levels of alternative splicing in disorder-encoding exons, still hold after considering collagen-containing proteins. Aside from analyses of exons, we find that the set of proteins that contain collagen significantly alters the amino acid composition of regions predicted as disordered. We conclude that research in this area should be conducted in the light of the collagen helix.Entities:
Keywords: collagen helix; exons; intrinsically disordered proteins; phase symmetry; splicing
Mesh:
Substances:
Year: 2016 PMID: 26941008 PMCID: PMC4838654 DOI: 10.1002/pro.2913
Source DB: PubMed Journal: Protein Sci ISSN: 0961-8368 Impact factor: 6.725
Figure 3Comparison of mean percentage disorder for different types of exons. Results shown for all exons in the human genome (green) and for all exons excluding those that encode proteins containing at least one Pfam collagen domain (blue). Symmetric exons have the same start and end phase; asymmetric exons do not. Successive symmetric exons are symmetric exons adjacent to exons of the same phase (at least 3 in a row).
Figure 2The length distribution of disorder encoding exons. Exons that are part of a collagen‐containing protein are shown in green; exons not in collagen‐containing proteins are show in blue. The distributions are stacked such that the length distribution of all disorder encoding exons is given by the height of both distributions.
Figure 1Distribution of the percentage disorder of proteins encoding at least one collagen helix (Pfam domain PF01391) in the human genome. In figure A on the left, the disorder percentage of exons encoding these proteins is shown; in figure B on the right, the disorder percentage of each protein is shown.
The Percentage of Exons that are Symmetric, for Mostly Disordered and Mostly Ordered Exons and Proteins
| Exon disorder | Protein disorder | ||||||
|---|---|---|---|---|---|---|---|
| <30% | >70% |
| <30% | >70% |
| ||
| All proteins | 41.84% | 46.27% | 2.26E‐146 | 41.97% | 53.03% | ∼=0 | |
| Excluding collagenproteins | 41.81% | 42.08% | 0.134 | 41.92% | 44.41% | 7.12E‐16 | |
The percentage of exons that are symmetric (i.e., have the same start and end phase). Results are shown for all proteins in the dataset and with collagen proteins removed. Data is shown for mostly ordered and disordered exons on the left, as well as mostly ordered and disordered proteins on the right. P values calculated using a Chi‐square test.
Figure 4Comparison of mean percentage disorder for different types of exons. Results shown for all exons in the human genome (green) and for all exons excluding those that encode proteins containing at least one Pfam collagen domain (blue). Constitutive exons are exons contributing the same coding sequence from the same genetic locus to all transcripts for a given gene; alternative exons do not.
Figure 5The amino acid composition of all proteins predicted as mostly disordered (green) compared with the composition of mostly disordered proteins excluding those that encode the collagen helix (blue).