| Literature DB >> 26923229 |
Fabia U Battistuzzi1, Kristan A Schneider2, Matthew K Spencer3, David Fisher4, Sophia Chaudhry5,6, Ananias A Escalante7.
Abstract
BACKGROUND: Low complexity regions (LCRs) are a ubiquitous feature in genomes and yet their evolutionary history and functional roles are unclear. Previous studies have shown contrasting evidence in favor of both neutral and selective mechanisms of evolution for different sets of LCRs suggesting that modes of identification of these regions may play a role in our ability to discern their evolutionary history. To further investigate this issue, we used a multiple threshold approach to identify species-specific profiles of proteome complexity and, by comparing properties of these sets, determine the influence that starting parameters have on evolutionary inferences.Entities:
Mesh:
Year: 2016 PMID: 26923229 PMCID: PMC4770516 DOI: 10.1186/s12862-016-0625-0
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Statistics of 11 apicomplexa genomes
| Tax. group | Genome | # of chr | Protein coding genes | AT (%) | Sproteome | % LCR frequency ( |
| % LCR frequency ( |
|
|---|---|---|---|---|---|---|---|---|---|
| Hsp |
| 14 | 5435 | 55 | 16.62 | 34.3 | 19.9 | 5.08 | 26.96 |
|
| 14 | 4988 | 58 | 16.45 | 31.1 | 56.64 | 4.5 | 27.5 | |
|
| 14 | 5122 | 60 | 16.36 | 27.72 | 55.96 | 3.67 | 28.56 | |
|
| 14 | 7724 | 75 | 13.71 | 26.5 | 40.6 | 4.16 | 16.06 | |
|
| 14 | 5042 | 78 | 14.05 | 25.2 | 53.15 | 2.27 | 25.86 | |
|
| 14 | 5410 | 75 | 13.35 | 49 | 31.88 | 2.16 | 6.2 | |
| Ppl |
| 4 | 3706 | 57 | 17.23 | 7 | 59.32 | 0.16 | 29.92 |
|
| 4 | 4082 | 59 | 16.25 | 14.2 | 71.59 | 0.34 | 39.51 | |
| Ccd |
| 8 | 3805 | 67 | 15.68 | 19.5 | 62.33 | 5.39 | 33.61 |
|
| 14 | 7080 | 47 | 14.51 | 39 | 37.74 | 8.15 | 9.83 | |
|
| 14 | 8102 | 44 | 14.75 | 36.7 | 36.49 | 9.34 | 8.83 |
LCRs are identified using a window size of 15 and a complexity threshold (K) of 1.9 and 1 as examples. LCRs frequency: percentage of proteins with at least one LCR. S%: Simpson’s Reciprocal Index relative to the diversity of the proteome. The AT content is calculated from the proteome of each species
Abbreviations: Tax Taxonomic, Hsp Haemosporidia, Ppl Piroplasmida, Ccd Coccidia, Pv Plasmodium vivax, Pcy P. cynomolgy, Pk P. knowlesi, Pyo P. yoelii, Pch P. chabaudi, Pf P. falciparum, Bb Babesia bovis, Tp Theileria parva, Cp Cryptosporidium parvum, Nc Neospora caninum, Tg Toxoplasma gondii, chr chromosomes, LCRs low complexity regions
Fig. 1Relation of proteome AT content with frequency of LCRs calculated at K = 1.9. a Overall AT content is calculated based on the proteome of each species. b Trends in AT-rich/poor (biased) and AT-balanced (unbiased) proteins. Boundaries for nucleotide enrichment were < 45 % and > 55 %. Trends were comparable when < 40 % and > 70 % boundaries were used. Species abbreviations are as in Table 1
Fig. 2Species-specific frequency of LCRs for multiple complexity thresholds in Apicomplexa. a Maximum likelihood phylogenetic tree of Apicomplexa species used for the phylogenetic contrast analysis. Pyo was excluded from this analysis because it lacked chromosome assignments for its proteins, which was necessary for the phylogenetic contrast analysis. The phylogeny was obtained using 30 orthologous genes randomly selected from Kuo et al. (2008). Bootstrap values are shown at each node. Species belonging to the same taxonomic group are shown (Hsp: Haemosporidia, Ppl: Piroplasmida, Ccd: Coccidia). b LCRs profiles with complexity thresholds = 0–3 (color coding refers to individual species and taxonomic groups: blue/purple: HSP; green: Ppl; Orange/brown: Ccd). c Linear regression on the logits of LCR frequencies. Values at K = 3 were excluded because virtually indistinguishable from the background composition
Distribution of single low complexity regions in proteins
| Tax. group | Sp. |
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Hsp |
| R (23) | 0a | R (47) | 10.38 | R (240) | 26.96 | R (601) | 43.53 | R (995) | 56.79 |
|
|
|
| R (20) | 0a | R (33) | 11.30 | R (196) | 27.49 | R (497) | 43.69 | R (893) | 56.64 |
|
| |
|
|
|
| R (31) | 13.33 | R (154) | 28.56 | R (447) | 43.78 |
|
|
|
| |
|
| R (21) | 2.30 | R (57) | 7.40 | R (286) | 16.06 | R (647) | 28.27 |
|
|
|
| |
|
| R (7) | 0a | R (18) | 10.70 | R (102) | 25.86 | R (308) | 41.02 |
|
|
|
| |
|
| R (77) | 1.50 |
|
|
|
|
|
| R (978) | 31.88 |
|
| |
| Ppl |
| NC | NC | NC | NC | R (6) | 29.92 | R (51) | 46.26 |
|
|
|
|
|
| NC | NC | NC | NC |
|
| R (124) | 58.74 | R (434) | 71.58 |
|
| |
| Ccd |
| R (56) | 0a | R (78) | 15.62 | R (152) | 33.61 |
|
|
|
|
|
|
|
| R (111) | 0a |
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R random, NR non random (bold), K complexity threshold, S diversity at K relative to the proteome diversity. In parenthesis are shown the total number of proteins considered in each case. Genome abbreviations are as in Table 1. NC not computable
aThese values have been forced to zero where S% is negative due to lower accuracy of the fitted equations at low complexity
Fig. 3Usage of the most common amino acid (asparagine) in two species with similar AT content (Pf and Pyo). Significant (p-value < 0.001) preference for Asparagine usage in LCRs is shown by the asterisk