| Literature DB >> 20385029 |
Alain Coletta1, John W Pinney, David Y Weiss Solís, James Marsh, Steve R Pettifer, Teresa K Attwood.
Abstract
BACKGROUND: Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20385029 PMCID: PMC2873317 DOI: 10.1186/1752-0509-4-43
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Nodes and edges in each PPI dataset
| BioGrid | HC | FYI | DIPv | |
|---|---|---|---|---|
| Number of nodes | 4884 | 2977 | 2545 | 2278 |
| Number of edges | 37989 | 9203 | 5953 | 5373 |
Figure 1Degree distributions comparison between proteins with and without LCRs. Degree distributions of proteins with and without LCRs in the BioGrid dataset show proteins with LCRs have more connections than proteins without LCRs. See Table 2 for Wilcoxon-Mann-Whitney p-values for this and the other datasets.
Degree distributions comparison between protein with and without LCRs.
| dataset | BioGrid | HC | FYI | DIPv |
|---|---|---|---|---|
| 1.58 × 10-13 | 3.63 × 10-04 | 0.002 | 0.021 |
Wilcoxon-Mann-Whitney test p-values obtained from comparing degree distributions from proteins with and without LCRs across the four different PPI datasets.
Figure 2Distribution of folded LCR centre positions. Comparison of normalised and randomly re-arranged LCR centre positions in S. cerevisiae. The Kolmogorov-Smirnov test confirms that these two distributions are significantly different (p-value = 7.6 × 10-6).
Number of t-LCRs and c-LCRs found across the four PPI datasets.
| BioGrid | HC | FYI | DIPv | |
|---|---|---|---|---|
| t-LCRs | 183 | 135 | 123 | 109 |
| c-LCRs | 493 | 349 | 299 | 263 |
Figure 3Degree distribution comparisons. Boxplot representations comparing degree distributions of t-LCRs, c-LCRs, and proteins without LCRs. Table 4 shows Wilcoxon-Mann-Whitney p-values resulting from comparing their degree distributions.
Degree distributions comparison between protein with c-LCRs, t-LCRs, and proteins without LCRs.
| t-LCRs/c-LCRs | c-LCRs/ | t-LCRs/ | |
|---|---|---|---|
| BioGrid | 0.001 | 1.94 × 10-07 | 1.54 × 10-10 |
| HC | 0.005 | 0.031 | 6.88 × 10-04 |
| DIPv | 0.01 | 0.471 | 0.001 |
| FYI | 0.587 | 0.044 | 0.051 |
Wilcoxon-Mann-Whitney test p-values were calculated to compare the degree distributions of proteins with t-LCRs, c-LCRs, and without LCRs across the four different PPI datasets.
Figure 4LCR length versus protein degree. Scatterplots show the relationship between length and protein degree for t-LCRs (in black) and c-LCRs (in gray) in four different PPI networks. The associated p-values and r2-values for linear regression are shown in Table 5.
Correlation results (LCR length versus protein degree).
| BioGrid | 0.672 | 3.66 × 10-04 | 0.004 | 0.043 |
| HC | 0.837 | 1.22 × 10-04 | 0.004 | 0.06 |
| DIPv | 0.792 | 2.68 × 10-04 | 0.006 | 0.069 |
| FYI | 0.263 | 0.004 | 0.019 | 0.045 |
The table shows statistics for the regression lines plotted in Figure 4. The p-values show the probability that LCR length is uncorrelated with protein degree, as calculated by an F-test.
GO term enrichments for all LCRs.
| Frequencies | |||||
|---|---|---|---|---|---|
| Genes | Background | GO term ID | definition | ||
| 49 | 147 | 3.89 × 10-06 | 0.003 | (P)GO:0006950 | |
| 117 | 518 | 4.40 × 10-05 | 0.017 | (P)GO:0006350 | |
| 41 | 133 | 1.03 × 10-04 | 0.026 | (P)GO:0006468 | |
| 11 | 15 | 2.22 × 10-04 | 0.042 | (P)GO:0006414 | |
| 105 | 490 | 6.08 × 10-04 | 0.092 | (P)GO:0006355 | |
| 73 | 294 | 1.25 × 10-04 | 0.054 | (F)GO:0003676 | |
| 51 | 189 | 2.59 × 10-04 | 0.066 | (C)GO:0005730 | |
| 30 | 93 | 4.58 × 10-04 | 0.066 | (C)GO:0009277 | |
| 344 | 1946 | 6.27 × 10-04 | 0.066 | (C)GO:0005634 | |
| 22 | 63 | 0.001 | 0.088 | (C)GO:0005934 | |
GO term enrichments from proteins with LCRs compared to the entire S. cerevisiae proteome. Frequencies represent the number of proteins annotated by a given term, p-values are calculated using Fisher's exact test, q-values are calculated using Benjamini & Hochberg's FDR method.
GO term enrichments for central and terminal LCRs.
| Terminal LCRs | |||||
|---|---|---|---|---|---|
| 22 | 147 | 1.09 × 10-10 | 2.76 × 10-08 | (P)GO:0006950 | |
| 28 | 418 | 3.64 × 10-06 | 4.62 × 10-04 | (P)GO:0006412 | |
| 6 | 15 | 8.55 × 10-06 | 7.24 × 10-04 | (P)GO:0006414 | |
| 5 | 10 | 2.19 × 10-05 | 0.001 | (P)GO:0006616 | |
| 5 | 26 | 8.99 × 10-04 | 0.046 | (P)GO:0006893 | |
| 13 | 114 | 1.37 × 10-05 | 0.002 | (F)GO:0016887 | |
| 16 | 202 | 9.10 × 10-05 | 0.005 | (F)GO:0003735 | |
| 5 | 33 | 0.002 | 0.087 | (F)GO:0004175 | |
| 30 | 703 | 0.004 | 0.087 | (F)GO:0000166 | |
| 4 | 24 | 0.005 | 0.087 | (F)GO:0005484 | |
| 5 | 40 | 0.005 | 0.087 | (F)GO:0003743 | |
| 3 | 12 | 0.006 | 0.087 | (F)GO:0003746 | |
| 2 | 3 | 0.006 | 0.087 | (F)GO:0019904 | |
| 7 | 85 | 0.008 | 0.092 | (F)GO:0051082 | |
| 4 | 28 | 0.008 | 0.092 | (F)GO:0003688 | |
| 2 | 4 | 0.009 | 0.093 | (F)GO:0008353 | |
| 21 | 290 | 2.40 × 10-05 | 0.003 | (C)GO:0005840 | |
| 5 | 14 | 7.83 × 10-05 | 0.006 | (C)GO:0015935 | |
| 19 | 284 | 1.63 × 10-04 | 0.008 | (C)GO:0030529 | |
| 6 | 43 | 0.001 | 0.038 | (C)GO:0043234 | |
| 4 | 16 | 0.001 | 0.038 | (C)GO:0000502 | |
| 3 | 9 | 0.003 | 0.051 | (C)GO:0000786 | |
| 3 | 9 | 0.003 | 0.051 | (C)GO:0000788 | |
| 3 | 9 | 0.003 | 0.051 | (C)GO:0005852 | |
| 6 | 53 | 0.003 | 0.052 | (C)GO:0022627 | |
| 3 | 10 | 0.004 | 0.052 | (C)GO:0043614 | |
| 2 | 3 | 0.006 | 0.065 | (C)GO:0034099 | |
| 2 | 3 | 0.006 | 0.065 | (C)GO:0030133 | |
| 2 | 3 | 0.006 | 0.065 | (C)GO:0031201 | |
| 3 | 14 | 0.008 | 0.082 | (C)GO:0005667 | |
| 6 | 68 | 0.010 | 0.096 | (C)GO:0030686 | |
| 11 | 189 | 0.011 | 0.098 | (C)GO:0005730 | |
| 27 | 133 | 3.03 × 10-09 | 1.40 × 10-06 | (P)GO:0006468 | |
| 50 | 518 | 4.38 × 10-06 | 0.001 | (P)GO:0006350 | |
| 45 | 490 | 4.52 × 10-05 | 0.007 | (P)GO:0006355 | |
| 7 | 18 | 9.81 × 10-05 | 0.011 | (P)GO:0006378 | |
| 24 | 123 | 4.64 × 10-08 | 1.03 × 10-05 | (F)GO:0004674 | |
| 66 | 703 | 2.18 × 10-07 | 1.68 × 10-05 | (F)GO:0000166 | |
| 23 | 125 | 2.28 × 10-07 | 1.68 × 10-05 | (F)GO:0004672 | |
| 55 | 577 | 1.88 × 10-06 | 1.04 × 10-04 | (F)GO:0005524 | |
| 15 | 90 | 8.39 × 10-05 | 0.004 | (F)GO:0004386 | |
| 23 | 204 | 2.94 × 10-04 | 0.011 | (F)GO:0016301 | |
| 28 | 294 | 8.31 × 10-04 | 0.026 | (F)GO:0003676 | |
| 10 | 61 | 0.001 | 0.036 | (F)GO:0008026 | |
| 6 | 22 | 0.001 | 0.036 | (F)GO:0004407 | |
| 3 | 4 | 0.003 | 0.066 | (F)GO:0004708 | |
| 4 | 11 | 0.004 | 0.077 | (F)GO:0005543 | |
| 5 | 19 | 0.004 | 0.077 | (F)GO:0016566 | |
| 15 | 63 | 2.04 × 10-06 | 3.39 × 10-04 | (C)GO:0005934 | |
| 132 | 1946 | 4.07 × 10-06 | 3.39 × 10-04 | (C)GO:0005634 | |
| 26 | 189 | 5.24 × 10-06 | 3.39 × 10-04 | (C)GO:0005730 | |
| 5 | 9 | 2.89 × 10-04 | 0.014 | (C)GO:0005849 | |
| 5 | 12 | 7.97 × 10-04 | 0.031 | (C)GO:0000508 | |
| 16 | 129 | 9.96 × 10-04 | 0.032 | (C)GO:0005935 | |
GO term enrichments from proteins with c-LCRs and t-LCRs compared to the complete set of proteins in S. cerevisiae.