| Literature DB >> 19594871 |
Uros Midic1, Christopher J Oldfield, A Keith Dunker, Zoran Obradovic, Vladimir N Uversky.
Abstract
BACKGROUND: Intrinsically disordered proteins lack stable structure under physiological conditions, yet carry out many crucial biological functions, especially functions associated with regulation, recognition, signaling and control. Recently, human genetic diseases and related genes were organized into a bipartite graph (Goh KI, Cusick ME, Valle D, Childs B, Vidal M, et al. (2007) The human disease network. Proc Natl Acad Sci U S A 104: 8685-8690). This diseasome network revealed several significant features such as the common genetic origin of many diseases. METHODS ANDEntities:
Mesh:
Substances:
Year: 2009 PMID: 19594871 PMCID: PMC2709255 DOI: 10.1186/1471-2164-10-S1-S12
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Disease class names and acronyms, number of diseases and number of genes related to disease classes.
| Class name | Acronym | Number of diseases | % | Number of genes | % |
| Skeletal | SKEL | 64 | 4.98% | 56 | 3.20% |
| Bone | BONE | 30 | 2.34% | 44 | 2.51% |
| Dermatological | DERM | 48 | 3.74% | 80 | 4.57% |
| Cancer | CANC | 113 | 8.80% | 207 | 11.82% |
| Developmental | DEVE | 37 | 2.88% | 53 | 3.03% |
| Multi-class disease | MCD | 155 | 12.07% | 209 | 11.94% |
| Cardiovascular | CARD | 41 | 3.19% | 96 | 5.48% |
| Muscular | MUSC | 31 | 2.41% | 68 | 3.88% |
| Immunological | IMMU | 69 | 5.37% | 115 | 6.57% |
| Ophthamological | OPHT | 62 | 4.83% | 120 | 6.85% |
| Connective tissue disorder | CTD | 28 | 2.18% | 51 | 2.91% |
| Endocrine | ENDO | 56 | 4.36% | 96 | 5.48% |
| Neurological | NEUR | 117 | 9.11% | 254 | 14.51% |
| Psychiatric | PSYC | 17 | 1.32% | 30 | 1.71% |
| Ear, Nose, Throat | ENT | 6 | 0.47% | 44 | 2.51% |
| Respiratory | RESP | 13 | 1.01% | 34 | 1.94% |
| Renal | RENA | 36 | 2.80% | 58 | 3.31% |
| Hematological | HEMA | 88 | 6.85% | 146 | 8.34% |
| Nutritional | NUTR | 4 | 0.31% | 22 | 1.26% |
| Gastrointestinal | GI | 23 | 1.79% | 34 | 1.94% |
| Unclassified | UNCL | 31 | 2.41% | 29 | 1.66% |
| Metabolic | META | 215 | 16.74% | 289 | 16.50% |
| Multiple class genes | MULT | 295 | 16.85% | ||
| Disease genes | DIS | 1751 | 100.00% | ||
| Human genes | HUM | 18109 |
The same order of classes is used in graphs in the Results section; the first 22 classes are sorted in descending order with respect to the median of disorder content (defined in Experimental procedures). The difference between "multi-class diseases" and "multiple class genes" is that "multi-class diseases" set includes genes that are only related to diseases that are classified as "multiple" in [47], whereas "multiple class genes" includes genes that are related to several diseases that belong to different classes.
Figure 1Comparison of disorder content distributions in disease classes and human gene class using boxplots. The 22 disease classes are sorted according to their disorder content medians. The boxes in the boxplot represent the first quartile (left edge), median (line in the middle), and third quartile (right edge); the whiskers extend to the lowest/highest values within the 1.5 IQR interval from the box (IQR is the range between the first and the third quartile), while the + signs represent the outliers. Medians for two classes can be compared by looking at the notches at their median lines; if the notches do not overlap, the medians are different at the significance level α = 0.05.
Figure 2Comparison of disorder content distributions in disease classes and human gene class using stacked histograms. The histograms are stacked horizontally to save space. They show what fraction of genes in each class has disorder content within various ranges. Each of the five major ranges, that cover 20% each, is further split into two smaller 10% ranges (they use the same color, but are divided with a line). Distributions can be visually compared by observing the balance between darker and lighter shades of gray; the class with a darker histogram has on average more disorder content.
Figure 3Pairwise comparison of disorder content medians for disease classes and human gene class. Filled squares represent pairs for which adjusted Wilcoxon rank sum test p-values are smaller than α = 0.05 (p-values are adjusted for false discovery rate control with Benjamini-Yekutieli method). Squares are filled black if the median for the row class is greater than the median for the column class, or gray if the median for the row class is smaller than the median for the column class.
Comparison of disorder content medians of disease classes with disease gene set (DIS) and with human gene set (HUM).
| Comparison with DIS | Comparison with HUM | ||||
| BY | BY | ||||
| META | 9.10·10-31 | 7.81·10-29 | META | 1.38·10-50 | 1.25·10-48 |
| CANC | 9.76·10-09 | 4.19·10-07 | DIS | 6.16·10-15 | 2.79·10-13 |
| SKEL | 3.92·10-05 | 0.001123 | HEMA | 7.13·10-08 | 2.15·10-06 |
| MCD | 0.000548 | 0.011771 | UNCL | 0.000192 | 0.004349 |
| DERM | 0.001852 | 0.031810 | CANC | 0.002397 | 0.043445 |
| HEMA | 0.003684 | 0.052740 | NUTR | 0.007141 | 0.107855 |
| UNCL | 0.004152 | 0.050941 | SKEL | 0.011080 | 0.143441 |
| DEVE | 0.008386 | 0.090036 | GI | 0.015816 | 0.179167 |
| NEUR | 0.033455 | 0.319267 | IMMU | 0.016768 | 0.168843 |
| BONE | 0.042742 | 0.367102 | RENA | 0.026136 | 0.236856 |
| NUTR | 0.063282 | 0.494113 | RESP | 0.093824 | 0.772967 |
| MULT | 0.090375 | 0.646849 | MULT | 0.105644 | 0.797813 |
| MUSC | 0.122463 | 0.809091 | DERM | 0.178919 | 1.247247 |
| GI | 0.130811 | 0.802516 | ENT | 0.195823 | 1.267578 |
| ENDO | 0.164391 | 0.941288 | DEVE | 0.208293 | 1.258409 |
Both the p-values and the adjusted p-values (for Benjamini-Yekutieli FDR control method) are listed in the table.
Figure 4Linear regression of disorder content with respect to number of related diseases (for genes). The genes with number of related diseases up to 4 are represented as a boxplot, while the remaining genes are represented as points. Note that the disorder content means (inverted triangles) for subsets are greater than the respective medians, because the disorder content distributions in these subsets are positively skewed.
Figure 5Linear regression of disorder content with respect to number of related disease classes (for genes). The genes with number of related disease classes between 1 and 3 are represented as a boxplot, while the remaining genes are represented as points. Note that the disorder content means (inverted triangles) for subsets are greater than the respective medians, because the disorder content distributions in these subsets are positively skewed.
Figure 6Linear regression of disorder content with respect to gene degree in DGN.
Figure 7Comparison of fractions of disease genes in the large component and the small components of the DGN. The classes with the + signs after their acronyms are significantly overrepresented in the big component; the classes with the – signs after their acronyms are significantly underrepresented in the big component. The error bars represent one standard deviation or 68.2% confidence interval.
Figure 8Comparison of distributions of disorder content in LARGECOMP and SMALLCOMP for genes related to metabolic diseases, and for the whole disease gene set. Distribution of disorder content for human gene set is included for comparison. The error bars represent one standard deviation or 68.2% confidence interval.
Figure 9Comparison of fractions of disease genes with multiple isoforms (i.e. with alternative splicing) and with single isoform. The classes with the + signs after their acronyms have significantly higher fraction of genes with multiple isoforms than the human gene set. The error bars represent one standard deviation or 68.2% confidence interval.
Figure 10Comparison of disorder content distributions for whole proteins and for alternative splicing (AS) regions. Series #1–4 represent disorder content distributions in 1) disease genes with multiple isoforms, 2) non-disease genes with multiple isoforms, 3) human genes with multiple isoforms, 4) all human genes. Series #5 and #6 represent disorder content distributions for AS regions in disease genes, and AS regions in non-disease genes. The error bars represent one standard deviation or 68.2% confidence interval.
Figure 11Comparison of disorder content distributions for AS regions in various classes of human genes. Series #1 and #2 represent disorder content distribution for whole human gene sequences and AS regions in human genes. Series #3–6 represent disorder content distributions for AS regions in: 3) developmental, 4) neurological, 5) hematological, and 6) metabolic disease classes. The error bars represent one standard deviation or 68.2% confidence interval.
Figure 12Comparison of fractions of genes with predicted α-MoRFs and densities of α-MoRFs with fractions of disordered residues. The plot on the left compares fractions of genes with predicted α-MoRFs (top/first series) with fractions of disordered residues (bottom/second series). The plot on the right compares densities of α-MoRFs (top/first series) with fractions of disordered residues (bottom/second series). In both plots the series are shown with different scales, such that the values for HUM set are aligned. The error bars represent one standard deviation or 68.2% confidence interval.
Figure 13Comparison of overall density of predicted MoRFs vs density of predicted MoRFs in AS regions for 25 classes/sets. The error bars represent one standard deviation or 68.2% confidence interval.
Comparison of densities of predicted α-MoRFs in AS regions and complete genes.
| Class acronym | Density of MoRFs in AS/Overall density of MoRFs | p-value for comparison of densities of MoRFs |
| GI | 4.68 | 3.84·10-021 |
| META | 3.71 | 1.16·10-061 |
| RESP | 2.65 | 1.19·10-010 |
| NEUR | 2.45 | 2.54·10-076 |
| DEVE | 2.37 | 4.76·10-017 |
| BONE | 2.10 | 1.28·10-010 |
| NUTR | 2.06 | 0.017694 |
| DERM | 1.75 | 5.11·10-010 |
| ENDO | 1.71 | 2.93·10-011 |
| IMMU | 1.69 | 1.92·10-013 |
| CANC | 1.68 | 1.07·10-031 |
| ENT | 1.50 | 0.00079422 |
| HEMA | 1.48 | 4.65·10-007 |
| HUM | 1.30 | 6.12·10-233 |
| DIS | 1.27 | 1.10·10-030 |
| RENA | 1.21 | 0.55174 |
| MCD | 1.08 | 0.55638 |
| SKEL | 1.06 | 2.8735 |
| MULT | 0.99 | 3.0621 |
| OPHT | 0.78 | 0.049566 |
| CARD | 0.72 | 3.30·10-007 |
| MUSC | 0.66 | 8.89·10-009 |
| CTD | 0.32 | 3.78·10-006 |
| PSYC | 0 | 2.07·10-005 |
| UNCL | 0 | 0.56375 |
The quotients of density of predicted α-MoRFs in AS regions over overall density of predicted α-MoRFs and the p-values for comparison of corresponding densities.
Figure 14Fractions of genes predicted to be disordered by CDF and CH predictors. The error bars represent one standard deviation or 68.2% confidence interval.
Figure 15Comparison of CDF and CH predictions in various disease gene classes and gene sets. Each spot represents a gene whose coordinates were calculated as the distance of the corresponding point in the CH-plot from the boundary (x-coordinate) and the averaged distance of the corresponding CDF-curve from the CDF boundary (y-coordinate). Four quadrants in each plot correspond to the following predictions: (-,-) proteins predicted to be disordered by CDF, but compact by CH, (-,+) proteins predicted to be disordered by both methods, (+,-) ordered proteins, (+,+) proteins predicted to be disordered by CH, but ordered by CDF. This is further illustrated by an explanatory plot at the bottom right corner. Percentages represent the fractions of genes in the corresponding quadrants.
Disease-related genes with multiple predicted α-MoRFs.
| FGFR1 (2, 901, 38.5%) | |
| GNAS (3, 1323, 74.9%), COL9A1 (2, 945, 76.5%), AMELX (2, 205, 66.8%) | |
| EDA (4, 460, 63.9%), PLEC1 (3, 4904, 56.6%), ADAR (2, 1226, 53.6%), SLC39A4 (2, 686, 36.3%), PVRL1 (2, 658, 37.2%) | |
| BRCA1 (3, 1864, 80.6%), GNAS (3, 1323, 74.9%), FAS (3, 376, 59.3%), MXI1 (3, 320, 88.8%), ATM (2, 3056, 21.1%), DLC1 (2, 1591, 62.7%), PML (2, 1225, 63.9%), ABL1 (2, 1175, 59.2%), AR (2, 927, 60.0%), CHEK2 (2, 586, 37.2%), CASP8 (2, 548, 38.5%), PARK2 (2, 465, 37.0%), SMARCB1 (2, 385, 37.1%), SSX2 (2, 255, 77.3%) | |
| NSD1 (2, 2706, 77.3%), UBE3A (2, 882, 27.8%), PVRL1 (2, 658, 37.2%), TGIF1 (2, 424, 77.4%) | |
| MITF (5, 598, 75.6%), GNAS (3, 1323, 74.9%), PITX2 (3, 385, 81.0%), NSD1 (2, 2706, 77.3%), ATRX (2, 2492, 72.2%), COL11A1 (2, 1857, 81.3%), COL18A1 (2, 1551, 74.4%), L1CAM (2, 1257, 27.0%), USH1C (2, 926, 58.4%), FGFR1 (2, 901, 38.5%), HPS4 (2, 783, 42.9%), KCNQ1 (2, 718, 42.3%), PVRL1 (2, 658, 37.2%), DTNBP1 (2, 383, 81.7%) | |
| DMD (7, 3771, 54.4%), DTNA (4, 767, 61.0%), KCNH2 (2, 1283, 46.0%), KCNQ1 (2, 718, 42.3%), EYA4 (2, 665, 61.7%), TPM1 (2, 443, 100.0%) | |
| DMD (7, 3771, 54.4%), PLEC1 (3, 4904, 56.6%), COL6A3 (2, 3177, 28.6%), AR (2, 927, 60.0%), CHAT (2, 748, 34.1%), TPM3 (2, 378, 97.6%) | |
| FAS (3, 376, 59.3%), ATM (2, 3056, 21.1%), PTPRC (2, 1307, 42.2%), CASP8 (2, 548, 38.5%), PARK2 (2, 465, 37.0%), UNG (2, 348, 44.5%) | |
| OPA1 (3, 1015, 36.1%), PITX2 (3, 385, 81.0%), PIP5K3 (2, 2108, 52.0%), EYA1 (2, 600, 60.5%) | |
| -- | |
| GNAS (3, 1323, 74.9%), AR (2, 927, 60.0%), HNF4A (2, 531, 51.6%), GCK (2, 495, 35.6%) | |
| COLQ (3, 622, 76.0%), PTPRC (2, 1307, 42.2%), L1CAM (2, 1257, 27.0%), KCNQ2 (2, 892, 55.5%), FOXP2 (2, 740, 78.9%), MTMR2 (2, 643, 31.6%), SPAST (2, 616, 51.1%), EYA1 (2, 600, 60.5%), NR4A2 (2, 599, 55.8%), EIF2B4 (2, 555, 47.2%), CACNB4 (2, 538, 60.0%), OPRM1 (2, 492, 34.6%), CCM2 (2, 475, 54.1%), PARK2 (2, 465, 37.0%), DCX (2, 446, 57.8%), DRD2 (2, 443, 39.1%), PNKD (2, 440, 39.5%), ATXN3 (2, 370, 61.6%), FGF14 (2, 316, 46.2%) | |
| -- | |
| OTOF (2, 2100, 33.9%), USH1C (2, 926, 58.4%), KCNQ4 (2, 695, 41.3%), EYA4 (2, 665, 61.7%) | |
| GDNF (2, 230, 51.3%) | |
| -- | |
| CD44 (3, 807, 76.0%), ATRX (2, 2492, 72.2%), ANK1 (2, 2001, 40.9%), ADAMTS13 (2, 1497, 48.6%), EPB41 (2, 850, 66.2%), AMPD3 (2, 781, 36.9%), IGLL1 (2, 228, 86.4%) | |
| -- | |
| GDNF (2, 230, 51.3%) | |
| -- | |
| GCK (2, 495, 35.6%), HFE2 (2, 426, 42.0%) |
The class identifier is followed by the number of genes with predicted α-MoRFs, the total number of genes, and the fraction of genes with predicted α-MoRFs in that class. This is followed by the list of genes with multiple (>= 2) predicted α-MoRFs. The numbers following each gene symbol are the number of predicted α-MoRFs, the number of amino acids and the disorder content (note that in the case of alternative splicing the number of amino acids includes all exons in a gene and may be larger than the lengths of individual isoforms).