| Literature DB >> 30841624 |
Hiroto Anbo1, Masaya Sato2, Atsushi Okoshi3, Satoshi Fukuchi4.
Abstract
One of the unique characteristics of intrinsically disordered proteins (IPDs) is the existence of functional segments in intrinsically disordered regions (IDRs). A typical function of these segments is binding to partner molecules, such as proteins and DNAs. These segments play important roles in signaling pathways and transcriptional regulation. We conducted bioinformatics analysis to search these functional segments based on IDR predictions and database annotations. We found more than a thousand potential functional IDR segments in disease-related proteins. Large fractions of proteins related to cancers, congenital disorders, digestive system diseases, and reproductive system diseases have these functional IDRs. Some proteins in nervous system diseases have long functional segments in IDRs. The detailed analysis of some of these regions showed that the functional segments are located on experimentally verified IDRs. The proteins with functional IDR segments generally tend to come and go between the cytoplasm and the nucleus. Proteins involved in multiple diseases tend to have more protein-protein interactors, suggesting that hub proteins in the protein-protein interaction networks can have multiple impacts on human diseases.Entities:
Keywords: disease-related proteins; functional segments; intrinsically disordered regions; protein-protein interaction; subcellular location
Mesh:
Substances:
Year: 2019 PMID: 30841624 PMCID: PMC6468909 DOI: 10.3390/biom9030088
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Statistics of the UniProt annotations.
| All Proteins | Disease-Related | pProS | pProS (%) | |
|---|---|---|---|---|
| No. proteins | 20,410 | 3378 | 402 | 11.9 |
| No. annotations shorter than 30 residues | 29,145 | 18,450 | 1124 | 6.1 |
| “Region of interest” | 4646 | 2656 | 220 | 8.3 |
| “Mutagenesis site” | 21,269 | 14,056 | 479 | 3.4 |
| “Short sequence motif” | 3230 | 1740 | 425 | 24.4 |
pProS: Possible protean segment.
Figure 1An example of possible protean segment (pProS) definition, illustrated by p53. The black line in the middle represents the amino acid chain, and the intrinsically disordered regions (IDR) predictions are presented below. Pink, orange, and red represent the results by MobiDB-lite, DISOPRED3, and DICHOT, respectively. Regions where any of the two methods predict IDR are defined as IDRs. The green bars represent pProSs, and the annotations defining pProS are shown above with residue numbers of the annotations. DBD: DNA-binding domain; Tet: Tetramerization domain.
Statistics of pProS by the disease category.
| Category | No. Unique pProSs | No. Proteins with pProS | No. Proteins | Protein Coverage (%) | Average Annotations |
|---|---|---|---|---|---|
| Cancers | 147 | 57 | 204 | 27.9 | 2.6 |
| Cardiovascular diseases | 93 | 41 | 335 | 12.2 | 2.3 |
| Congenital disorders of metabolism | 53 | 40 | 687 | 5.8 | 1.3 |
| Congenital malformations | 242 | 111 | 832 | 13.3 | 2.2 |
| Digestive system diseases | 32 | 15 | 79 | 19.0 | 2.1 |
| Endocrine and metabolic diseases | 63 | 30 | 213 | 14.1 | 2.1 |
| Immune system diseases | 56 | 31 | 256 | 12.1 | 1.8 |
| Musculoskeletal diseases | 69 | 26 | 149 | 17.4 | 2.7 |
| Nervous system diseases | 199 | 95 | 795 | 11.9 | 2.1 |
| Other congenital disorders | 39 | 20 | 91 | 22.0 | 2.0 |
| Reproductive system diseases | 21 | 12 | 63 | 19.0 | 1.8 |
| Respiratory diseases | 1 | 1 | 55 | 1.8 | 1.0 |
| Skin diseases | 22 | 14 | 104 | 13.5 | 1.6 |
| Urinary system diseases | 33 | 9 | 66 | 13.6 | 3.7 |
| Other diseases | 68 | 30 | 194 | 15.5 | 2.3 |
Figure 2The IDR ratios by disease category. The green, yellow, and blue bars represent the IDR fractions of the pProS-containing proteins, the non-pProS proteins, and the total proteins in each of the disease categories, respectively. The measure on the left axis represents the IDR fractions. The black line with dots represents the protein coverage found in Table 2. The measure on the right axis represents the protein coverage. The dashed line represents the IDR ratio of the human proteome. Can: Cancers; Car: Cardiovascular diseases; Dme: Congenital disorders of metabolism; Mal: Congenital malformations; Dig: Digestive system diseases; End: Endocrine and metabolic diseases; Imm: Immune system diseases; Mus: Musculoskeletal diseases; Ner: Nervous system diseases; Oco: Other congenital disorders; Rep: Reproductive system diseases; Res: Respiratory diseases; Ski: Skin diseases; Uri: Urinary system diseases; Oth: Other diseases.
The list of the proteins with long pProSs.
| Protein Name | UniProt Accession | pProS Residues | No. Disease | Disease Category | Disease |
|---|---|---|---|---|---|
| DNA excision repair protein ERCC-6 | Q03468 | 63 | 4 | Ner | Age-related macular degeneration |
| Ner | Cockayne syndrome | ||||
| Mal | Disorders of nucleotide excision repair | ||||
| Ski | Ultra violet-sensitive syndrome | ||||
| Cellular tumor antigen p53 | P04637 | 61 | 46 | * | |
| E3 ubiquitin-protein ligase RNF168 | Q8IYW5 | 55 | 1 | Imm | RIDDLE syndrome |
| CD2-associated protein | Q9Y5K6 | 50 | 1 | Uri | Focal segmental glomerulosclerosis |
| Synaptic functional regulator FMR1 | Q06787 | 47 | 3 | Rep | Premature ovarian failure |
| Low-density lipoprotein receptor-related protein 2 | P98164 | 44 | 1 | Mal | Donnai–Barrow syndrome |
| Eukaryotic translation initiation factor 4 gamma 1 | Q04637 | 44 | 1 | Ner | Parkinson disease |
| DNA (cytosine-5)-methyltransferase 1 | P26358 | 41 | 1 | Ner | Hereditary sensory and autonomic neuropathy |
| Period circadian protein homolog 2 | O15055 | 40 | 1 | Ner | Familial advanced sleep phase syndrome |
| Latent-transforming growth factor β-binding protein 2 | Q14767 | 40 | 1 | Ner | Primary congenital glaucoma |
| Low-density lipoprotein receptor-related protein 6 | O75581 | 39 | 3 | Can | Breast cancer |
| Car | Coronary artery disease | ||||
| Dig | Tooth agenesis | ||||
| DNA damage-inducible transcript 3 protein | P35638 | 39 | 1 | Can | Myxoid liposarcoma |
| KN motif and ankyrin repeat domain-containing protein 1 | Q14678 | 39 | 1 | Ner | Spastic quadriplegic cerebral palsy |
| Retinoic acid-induced protein 1 | Q7Z5J4 | 36 | 1 | Oco | Smith–Magenis syndrome |
| Histone-lysine | O14686 | 35 | 2 | Can | Follicular lymphoma |
| Mal | Kabuki syndrome | ||||
| FYN-binding protein 1 | O15117 | 35 | 1 | Car | Thrombocytopenia |
| Low-density lipoprotein receptor adapter protein 1 | Q5SW96 | 33 | 1 | Dme | Familial autosomal recessive hypercholesterolemia |
| Catenin β-1 | P35222 | 32 | 8 | Can | Thyroid cancer |
| Can | Medulloblastoma | ||||
| Can | Endometrial cancer | ||||
| Can | Colorectal cancer | ||||
| Can | Gastric cancer | ||||
| Can | Hepatocellular carcinoma | ||||
| Oth | Autosomal dominant mental retardation | ||||
| Ski | Pilomatricoma | ||||
| Sp110 nuclear body protein | Q9HB58 | 31 | 1 | Dig | Hepatic veno-occlusive disease with immunodeficiency |
| Low-density lipoprotein receptor-related protein 5 | O75197 | 30 | 6 | Mal | Osteopetrosis |
| Mal | Worth type autosomal dominant osteosclerosis | ||||
| Mal | Osteoporosis-pseudoglioma syndrome | ||||
| Mus | Hyperostosis corticalis generalisata | ||||
| Mus | Osteoporosis | ||||
| Ner | Familial exudative vitreoretinopathy | ||||
| LEM domain-containing protein 2 | Q8NC56 | 30 | 1 | Ner | Cataract |
| Single-stranded DNA cytosine deaminase | Q9GZX7 | 30 | 1 | Imm | Hyper IgM syndromes, autosomal recessive type |
* The list of diseases involving p53 is found in Supplementary Table S2. Can: Cancers; Car: Cardiovascular diseases; Dme: Congenital disorders of metabolism; Mal: Congenital malformations; Dig: Digestive system diseases; End: Endocrine and metabolic diseases; Imm: Immune system diseases; Mus: Musculoskeletal diseases; Ner: Nervous system diseases; Oco: Other congenital disorders; Rep: Reproductive system diseases; Res: Respiratory diseases; Ski: Skin diseases; Uri: Urinary system diseases; Oth: Other diseases.
Figure 3Examples of proteins with pProS. The black line in the middle represents the amino acid chain, and the IDR predictions are presented below. Pink, orange, and red represent the results by MobiDB-lite, DISOPRED3, and DICHOT, respectively. The green bars represent pProSs, and the annotations defining pProS are shown above with the residue numbers of the annotations. The gray bar in the example of survival of motor neuron (SMN) represents the regions of a pseudo-pProS, which was not taken as pProS because the region of the annotation is longer than 30. In the case of low-density lipoprotein receptor adaptor protein 1 (ARH) and desmin, MobiDB-lite does not predict any IDRs. The scale of eIF4G1 (eukaryotic translation initiation factor 4 gamma 1) differs from other three.
Figure 4Subcellular localizations by disease category. The bars represent the degree of over-representation in each of the location categories, where green represents pProS-containing proteins, and yellow represents non-pProS proteins (see also Materials and Methods). N: Nuclear; C: Cytoplasm; M: Membrane; CN: Cytoplasm and nuclear; Can: Cancers; Car: Cardiovascular diseases; Dme: Congenital disorders of metabolism; Mal: Congenital malformations; Dig: Digestive system diseases; End: Endocrine and metabolic diseases; Imm: Immune system diseases; Mus: Musculoskeletal diseases; Ner: Nervous system diseases; Oco: Other congenital disorders; Rep: Reproductive system diseases; Ski: Skin diseases; Uri: Urinary system diseases; Oth: Other diseases; All: All disease-related proteins.
Figure 5The correlation between the number of protein–protein interactions and the number of diseases involved. The horizontal axis represents the number of diseases, and the vertical one represents the number of interactors. A box and a pair of whiskers represent quartiles, and the line in the middle of the box represents the median. The dots represent outliers.