| Literature DB >> 31504783 |
Chrysa Ntountoumi1, Panayotis Vlastaridis1, Dimitris Mossialos2, Constantinos Stathopoulos3, Ioannis Iliopoulos4, Vasilios Promponas5, Stephen G Oliver6, Grigoris D Amoutzias1.
Abstract
We provide the first high-throughput analysis of the properties and functional role of Low Complexity Regions (LCRs) in more than 1500 prokaryotic and phage proteomes. We observe that, contrary to a widespread belief based on older and sparse data, LCRs actually have a significant, persistent and highly conserved presence and role in many and diverse prokaryotes. Their specific amino acid content is linked to proteins with certain molecular functions, such as the binding of RNA, DNA, metal-ions and polysaccharides. In addition, LCRs have been repeatedly identified in very ancient, and usually highly expressed proteins of the translation machinery. At last, based on the amino acid content enriched in certain categories, we have developed a neural network web server to identify LCRs and accurately predict whether they can bind nucleic acids, metal-ions or are involved in chaperone functions. An evaluation of the tool showed that it is highly accurate for eukaryotic proteins as well.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31504783 PMCID: PMC6821194 DOI: 10.1093/nar/gkz730
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Frequency and (B) enrichment of amino acids in LCRs. Enrichment was based on the background frequency obtained from the complete set of analyzed proteomes. The order of amino acids in the graphs is based on their biosynthetic energetic cost, as calculated in (44).
Bacterial and archaeal protein annotations (Uniprot) with the most LCRs
| No. of LCRs | Uniprot annotation | Kingdom |
|---|---|---|
| 321 | Translation initiation factor IF-2 | Bacteria |
| 281 | DNA topoisomerase 1 | Bacteria |
| 220 | 60 kDa chaperonin | Bacteria |
| 208 | Acetyltransferase component of pyruvate dehydrogenase complex | Bacteria |
| 186 | 30S ribosomal protein S16 | Bacteria |
| 167 | Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex | Bacteria |
| 166 | Protein TonB | Bacteria |
| 152 | Single-stranded DNA-binding protein | Bacteria |
| 146 | RNA-binding protein | Bacteria |
| 135 | Serine/threonine protein kinase | Bacteria |
| 60 | Thermosome | Archaea |
| 40 | 50S ribosomal protein L12 | Archaea |
| 23 | Extracellular solute-binding protein family 5 | Archaea |
| 18 | Chaperone protein DnaK | Archaea |
| 13 | 50S ribosomal protein L10 | Archaea |
| 13 | 30S ribosomal protein S24e | Archaea |
| 11 | Prefoldin subunit alpha | Archaea |
| 11 | Carbohydrate binding family 6 | Archaea |
| 11 | 30S ribosomal protein S3 | Archaea |
| 7 | Signal recognition particle receptor FtsY | Archaea |
The first column shows the total number (sum) of LCRs that were found in all proteins (from many different organisms) with that particular Uniprot annotation.
Enrichment fold for five amino acids within LCRs of bacterial proteins, related to polysaccharide binding and processing
| Bacterial LCRs | ||||||
|---|---|---|---|---|---|---|
| GO description | GO ID | C | P | Q | S | T |
| chitin binding | GO:0008061 | 2.5 | 3.9 | |||
| carbohydrate binding | GO:0030246 | 2.9 | 3.1 | |||
| carbohydrate metabolic process | GO:0005975 | 3.2 | 2.8 | |||
| hydrolase activity, hydrolyzing O-glycosyl compounds | GO:0004553 | 3.6 | 2.5 | |||
| cellulose catabolic process | GO:0030245 | 4.2 | 2.5 | |||
| cellulase activity | GO:0008810 | 4.2 | ||||
| peptidoglycan binding | GO:0042834 | 2.7 | ||||
| chitinase activity | GO:0004568 | 2.9 | 4.2 | |||
| xylan catabolic process | GO:0045493 | 2.8 | 3.8 | |||
| cellulose binding | GO:0030248 | 3.8 | 6.8 | |||
| endo-1,4-beta-xylanase activity | GO:0031176 | 3.9 | 4.6 | |||
Only enrichment folds ≥2.5 are displayed, for clarity.
Enrichment fold for nine amino acids within LCRs of bacterial proteins, related to RNA binding and processing
| GO description | GO ID | D | E | F | I | L | M | N | R | V |
|---|---|---|---|---|---|---|---|---|---|---|
| 7S RNA binding | GO:0008312 | 2.8 | 3.9 | 22 | ||||||
| DNA-directed 5′-3′ RNA polymerase activity | GO:0003899 | 5.9 | 4.5 | 3.3 | 4.3 | 3.9 | ||||
| polyribonucleotide nucleotidyltransferase activity | GO:0004654 | 2.9 | 10.4 | |||||||
| 3′-5′-exoribonuclease activity | GO:0000175 | 2.97 | 10.4 | |||||||
| RNA processing | GO:0006396 | 7.5 | ||||||||
| helicase activity | GO:0004386 | 4.3 | 6.8 | |||||||
| mRNA catabolic process | GO:0006402 | 5.9 | ||||||||
| RNA binding | GO:0003723 | 2.5 | 4.9 | |||||||
| small ribosomal subunit | GO:0015935 | 3.2 | ||||||||
| translation initiation factor activity | GO:0003743 | 2.9 | ||||||||
| rRNA binding | GO:0019843 | 2.8 | ||||||||
| endoribonuclease activity | GO:0004521 | 2.7 | 3.2 | |||||||
| rRNA processing | GO:0006364 | 2.6 | 3 | |||||||
| tRNA processing | GO:0008033 | 2.6 | 3.2 | |||||||
| ribonuclease E activity | GO:0008995 | 2.5 | 3.6 | |||||||
| translation | GO:0006412 | 2.8 | ||||||||
| structural constituent of ribosome | GO:0003735 | 2.8 | ||||||||
| ribosome | GO:0005840 | 3 | ||||||||
| 5S rRNA binding | GO:0008097 | 3.2 | ||||||||
| transcription, DNA-templated | GO:0006351 | 3.5 | 2.5 | 2.5 |
Only enrichment folds ≥2.5 are displayed, for clarity.
Enrichment fold for nine amino acids within LCRs of bacterial proteins, related to DNA binding and processing.
| GO description | GO ID | F | G | H | K | L | N | P | Q | Y |
|---|---|---|---|---|---|---|---|---|---|---|
| DNA recombination | GO:0006310 | 5.1 | ||||||||
| DNA polymerase III complex | GO:0009360 | 2.6 | 2.5 | |||||||
| regulation of transcription, DNA-templated | GO:0006355 | 2.9 | ||||||||
| DNA-templated transcription, initiation | GO:0006352 | 3.6 | ||||||||
| DNA binding | GO:0003677 | 4.2 | ||||||||
| chromosome condensation | GO:0030261 | 5.2 | ||||||||
| chromosome | GO:0005694 | 5.8 | ||||||||
| DNA topological change | GO:0006265 | 6.1 | ||||||||
| DNA topoisomerase type I activity | GO:0003917 | 6.3 | ||||||||
| nucleosome | GO:0000786 | 6.5 | ||||||||
| nucleosome assembly | GO:0006334 | 6.5 | ||||||||
| nucleotide binding | GO:0000166 | 3.1 | ||||||||
| DNA repair | GO:0006281 | 2.6 | 4.8 | |||||||
| single-stranded DNA binding | GO:0003697 | 2.6 | 3.2 | 3 | 5.4 | |||||
| DNA replication | GO:0006260 | 5.4 | 2.5 | 2.6 | 3.2 | |||||
| DNA-templated transcription, termination | GO:0006353 | 7.8 | 2.5 | 2.9 |
Only enrichment folds ≥ 2.5 are displayed, for clarity
Enrichment fold for amino acids within LCRs of bacterial proteins, related to protein folding and metal-ion binding
| GO description | GO ID | D | F | G | H | I | K | M |
|---|---|---|---|---|---|---|---|---|
| Unfolded protein binding | GO:0051082 | 5.7 | 2.6 | 4 | 21.6 | |||
| Protein refolding | GO:0042026 | 2.9 | 37.2 | |||||
| Protein folding | GO:0006457 | 7.4 | 6.3 | |||||
| Heat shock protein binding | GO:0031072 | 21.5 | 3 | 5.2 | ||||
| Metal ion binding | GO:0046872 | 4.9 | 3.1 | |||||
| Zinc ion binding | GO:0008270 | 4.5 | ||||||
| Nickel cation binding | GO:0016151 | 2.7 | 33 | |||||
| Metal ion transport | GO:0030001 | 5.3 | 21.6 | |||||
| Cobalamin biosynthetic process | GO:0009236 | 2.7 | 29 |
Only enrichment folds ≥ 2.5 are displayed, for clarity.
Confusion Matrix of the neural network
| Actual Chaperones | Actual DNA or RNA binding | Actual Metal-ion binding | Actual Other | Precision | |
|---|---|---|---|---|---|
| Predicted Chaperones | 34 | 1 | 0 | 4 | 87.18% |
| Predicted DNA or RNA binding | 0 | 120 | 0 | 39 | 75.47% |
| Predicted Metal-ion binding | 0 | 0 | 20 | 1 | 95.24% |
| Predicted Other | 0 | 57 | 1 | 1082 | 94.91% |
| Recall | 100% | 67.42% | 95.24% | 96.09% |