| Literature DB >> 25347823 |
Jiabin Wang1, Jian Yang1, Song Mao1, Xiaoqiang Chai1, Yuling Hu1, Xugang Hou1, Yiheng Tang1, Cheng Bi1, Xiao Li1.
Abstract
Mitochondrion plays a central role in diverse biological processes in most eukaryotes, and its dysfunctions are critically involved in a large number of diseases and the aging process. A systematic identification of mitochondrial proteomes and characterization of functional linkages among mitochondrial proteins are fundamental in understanding the mechanisms underlying biological functions and human diseases associated with mitochondria. Here we present a database MitProNet which provides a comprehensive knowledgebase for mitochondrial proteome, interactome and human diseases. First an inventory of mammalian mitochondrial proteins was compiled by widely collecting proteomic datasets, and the proteins were classified by machine learning to achieve a high-confidence list of mitochondrial proteins. The current version of MitProNet covers 1124 high-confidence proteins, and the remainders were further classified as middle- or low-confidence. An organelle-specific network of functional linkages among mitochondrial proteins was then generated by integrating genomic features encoded by a wide range of datasets including genomic context, gene expression profiles, protein-protein interactions, functional similarity and metabolic pathways. The functional-linkage network should be a valuable resource for the study of biological functions of mitochondrial proteins and human mitochondrial diseases. Furthermore, we utilized the network to predict candidate genes for mitochondrial diseases using prioritization algorithms. All proteins, functional linkages and disease candidate genes in MitProNet were annotated according to the information collected from their original sources including GO, GEO, OMIM, KEGG, MIPS, HPRD and so on. MitProNet features a user-friendly graphic visualization interface to present functional analysis of linkage networks. As an up-to-date database and analysis platform, MitProNet should be particularly helpful in comprehensive studies of complicated biological mechanisms underlying mitochondrial functions and human mitochondrial diseases. MitProNet is freely accessible at http://bio.scu.edu.cn:8085/MitProNet.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25347823 PMCID: PMC4210245 DOI: 10.1371/journal.pone.0111187
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A flowchart depicting the work.
(A) Step 1: obtaining a mitochondrial proteins inventory utilizing machine learning classification. (B) Step 2: constructing the FLN by integrating 11 genomic features including protein-protein interaction, domain-domain interaction, shared domains, genomic context, genetic interaction, phenotypic semantic similarity, co-expression, GO semantic similarity, protein expression profiles, disease involvement and operon based on the Naïve bayes model. (C) Step 3: ranking the disease candidate genes utilizing the FLN and a network-based algorithm. The table on the right shows the ranking scores of the top 5 candidate genes for mitochondrial complex I deficiency.
Integrated mitochondrial proteomic datasets for an inventory of mammalian mitochondrial proteins.
| Datasets | Species | Number of Proteins | Tissue/organ/cell | Method |
| Calvo S et al. | H. sapiens | 1048 | Prediction | |
| Taylor SW et al. | H. sapiens | 600 | Heart | MS |
| Rezaul K et al. | H. sapiens | 656 | T leukemia cells | MS |
| Xie J et al. | H. sapiens | 180 | Immortalized lymphoblastoid cell lines | 2-GE |
| Ozawa T et al. | M. musculus | 48 | Cell line BNL1ME (liver) | GFP |
| Mootha VK et al. | M. musculus | 462 | Brain, heart, kidney, and liver | MS |
| Jin J et al. | M. musculus | 781 | Dopaminergic cells | MS |
| Kislinger T et al. | M. musculus | 1872 | Brain, heart, kidney, liver, lung, and placenta | MS |
| Da Cruz S et al. | M. musculus | 97 | Liver | MS |
| Johnson DT et al. | R. norvegicus | 292 | Brain, liver, heart, and kidney | MS |
| Forner F et al. | R. norvegicus | 503 | Muscle, heart, and liver | MS |
| Reifschneider NH et al. | R. norvegicus | 110 | Kidney, Liver, Heart, Skeletal Muscle and Brain | BN |
| Palmfeldt J et al. | H. sapiens | 2591 | Skin fibroblast | MS |
| Lefort N et al. | H. sapiens | 892 | Skeletal muscle | MS |
| Bousette N et al. | M. musculus | 2087 | Heart | MS |
| Fang X et al. | M. musculus | 2165 | Brain | MS |
| Zhang J et al. | M. musculus | 916 | Heart | MS |
| Deng WJ et al. | R. norvegicus | 624 | Liver | MS |
| Wu L et al. | H. sapiens | 1149 | T leukemia cells | MS |
| Catherman AD et al. | H.sapiens | 1326 | H1299 cells | MS |
| Hansen J et al. | H.sapiens | 2138 | human lymphoblastoid cells | MS |
| Chappell NP et al. | H.sapiens | 1523 | Epithelial ovarian cancer cell | MS |
| Chen X et al. | R. norvegicus | 1215 | rat INS-1 cells | MS |
MS, mass spectrometry. 2-GE, two-dimensional gel electrophoretic. GFP, green fluorescent protein. BN, blue-native.
Quality comparison of MitoCom with other mitochondrial databases.
| Database | Number | Sensitivity | False discovery rate |
| MitoCom | 1109 | 97.34% | 11.30% |
| MitoCarta | 1013 | 86.10% | 13.70% |
| MitoPred | 910 | 50.10% | 14.80% |
*Just the high-confidence proteins.
Figure 2Venn diagram of the four datasets: MitoCom (high-confidence), MitoCom (middle-confidence), MitoCarta and MitoPred.
Functional features for mammalian mitochondrial FLN construction.
| Functional features | Data sets | Description | Scale | Data source |
| Protein-protein interaction | Protein-proteininteraction. | Genome-scale | HPRD | |
| Domain-domain interaction | Protein pairs haveinteracting proteindomains. | Genome-scale | 3did | |
| Shared domains | Proteins pairs sharingsame protein domains. | Genome-scale | Interpro | |
| Genomic context | Rosetta Stone | Gene fusion events. | Genome-scale | Prolinks |
| Phylogenetic profiles | Phylogenetic Profiles | Genome-scale | NCBI, KEGG | |
| Genetic interaction | Mutations in two genesproduce a phenotype thatis greatly different fromeach mutation’sindividual effects. | Genome-scale | Saccharomyces Genome Database | |
| Phenotypic semantic similarity | Sementic simlilarity ofmouse phenotypicterms. | Genome-scale | Mammalian Phenotype Browser | |
| Co-expression | GSE1133 | Gene expression profile ofthe vast majority ofprotein-encoding humanand mouse genes in79human and 61 mousetissues. | Genome-scale | GEO |
| GSE4726 | A quantitative andcomprehensive atlas ofgene expression in mousedevelopment. | Genome-scale | GEO | |
| GSE4330 | Microarray time-course ofmouse myotubestransducedwith thetranscriptionalco-activatorPGC-1α, whichis known toinducemitochondrial biogenesisin muscle cells. | Mitochondria-specific | GEO | |
| GSE6210 | Gene expressionprofile in livertissue andquadricepsmuscle in mice betweencontrol and the PCG-1βmutant, a transcriptionalcoactivator thatpotently stimulatesmitochondrialbiogenesis andrespiration of cells. | Mitochondria-specific | GEO | |
| GO semantic similarity | GO Sementicsimilarityof genessharing thesame biologicalprocess terms | Genome-scale | The GeneOntology | |
| Protein expression profiles | Mitochondrialproteinprofiles of protein-coding genes inheart,brain, liver, kidneyand lung. | Mitochondria-specific | Results of ThomasKislinger et al | |
| Disease involvement | A pair ofgenes thatannotated in thesamedisease. | Mitochondria-specific | OMIM | |
| Operon | Operon data of | Mitochondria-specific | Database of prOkaryotic OpeRons |
Figure 3ROC curves for evaluating the performances of various data sources using cross-validations.
(A) ROC curves and AUC of individual dataset and integrated dataset. The data sources are highlighted in different colors. (B) ROC curves and AUC of mitochondrial-specific (green) and genome-scale (blue) datasets. ID: Integrated datasets; ProP: Protein expression profiles; DDI: Domain-Domian Interaction; GI: Genetic Interaction; DI: Disease Involvement; PSS: Phenotypic Semantic Similarity; PheP: Phylogenetic Profiles; RS: Rosetta Stone; PPI: Protein-Protein Interaction; SD: Shared Domains; GOSS: GO Semantic Similarity; IGD: Integrated Genomic-scale Datasets; IMG: Integrated Mitochondrial-specific Datasets; ROC: receiver operating characteristic; AUC: area under ROC curves.
Figure 4TP/FP ratios vs. LR cutoff, and corresponding sensitivity.
TP: True Positive; FP: False Positive. Sensitivity = TP/(TP+FN).
Descriptions and parameters of four networks.
| Description | NumberofNodes | NumberofEdges | Averagenumber ofneighbors | Density | |
| FLN | FLN among the proteinswith high confidence | 1072 | 32951 | 61.476 | 0.057 |
| FLNhm | FLN among the proteinswith high or middle confidence | 1992 | 1983036 | 1991.000 | 1 |
| PPI network | Protein-protein interactionsnetwork derived from HPRDand I2D | 1322 | 9049 | 12.850 | 0.01 |
| Co-expressionnetwork | Co-expression networkderived from microarrayexperiment GSE1133 | 1684 | 1417186 | 1683.000 | 1 |
Figure 5ROC curves for evaluating the performances of four networks on disease-gene prioritization.
(A) The ROC curve for FLN. (B) The ROC curve for FLNhm. (C) The ROC curve for PPI network. (D) The ROC curve for co-expression network. AAR: Average Adjacency Ranking; PRP: PageRank with Priors; KSM: K-Step Markov; HKDR: Heat Kernel Diffusion Ranking; FLN: Functional Linkage Network among high-confidence mitochondrial proteins; FLNhm: Functional Linkage Network among high-confidence and middle-confidence mitochondrial proteins; PPIN: Protein-Protein Interaction Network; CEN: Co-Expression Network.
The 30 top-ranking genes for mitochondrial complex I deficiency.
| Ranking | Score | GeneID | Symbol | Description |
| 1 | 0.802272 | 4723 | NDUFV1 | NADH dehydrogenase flavoprotein 1, 51 kDa |
| 2 | 0.697647 | 51103 | NDUFAF1 | NADH dehydrogenase 1 alpha subcomplex, assembly factor 1 |
| 3 | 0.691345 | 4694 | NDUFA1 | NADH dehydrogenase 1 alpha subcomplex, 1, 7.5 kDa |
| 4 | 0.688717 | 4726 | NDUFS6 | NADH dehydrogenase Fe-S protein 6, 13 kDa |
| 5 | 0.686216 | 4719 | NDUFS1 | NADH dehydrogenase Fe-S protein 1, 75 kDa |
| 6 | 0.685317 | 4720 | NDUFS2 | NADH dehydrogenase Fe-S protein 2, 49 kDa |
| 7 | 0.68423 | 4709 | NDUFB3 | NADH dehydrogenase 1 beta subcomplex, 3, 12 kDa |
| 8 | 0.681527 | 4729 | NDUFV2 | NADH dehydrogenase flavoprotein 2, 24 kDa |
| 9 | 0.676788 | 4724 | NDUFS4 | NADH dehydrogenase Fe-S protein 4, 18 kDa |
| 10 | 0.65894 | 79133 | C20orf7 | chromosome 20 open reading frame 7 |
| 11 | 0.656693 | 126328 | NDUFA11 | NADH dehydrogenase 1 alpha subcomplex, 11, 14.7 kDa |
| 12 | 0.656337 | 91942 | NDUFAF2 | NADH dehydrogenase 1 alpha subcomplex, assembly factor 2 |
| 13 | 0.656292 | 55572 | FOXRED1 | FAD-dependent oxidoreductase domain containing 1 |
| 14 | 0.656166 | 25915 | NDUFAF3 | NADH dehydrogenase 1 alpha subcomplex, assembly factor 3 |
| 15 | 0.656105 | 80224 | NUBPL | nucleotide binding protein-like |
| 16 | 0.115148 | 4714 | NDUFB8 | NADH dehydrogenase 1 beta subcomplex, 8, 19 kDa |
| 17 | 0.109928 | 1329 | COX5B | cytochrome c oxidase subunit Vb |
| 18 | 0.090152 | 2108 | ETFA | electron-transfer-flavoprotein, alpha polypeptide |
| 19 | 0.087915 | 4722 | NDUFS3 | NADH dehydrogenase Fe-S protein 3, 30 kDa |
| 20 | 0.083753 | 6390 | SDHB | succinate dehydrogenase complex, subunit B, iron sulfur (Ip) |
| 21 | 0.078834 | 1743 | DLST | dihydrolipoamide S-succinyltransferase (E2 component of 2-oxo-glutarate complex) |
| 22 | 0.070645 | 54205 | CYCS | cytochrome c, somatic |
| 23 | 0.068273 | 509 | ATP5C1 | ATP synthase, H+ transporting, mitochondrial F1 complex, gamma polypeptide 1 |
| 24 | 0.067436 | 506 | ATP5B | ATP synthase, H+ transporting, mitochondrial F1 complex, beta polypeptide |
| 25 | 0.06552 | 1345 | COX6C | cytochrome c oxidase subunit VIc |
| 26 | 0.061017 | 25828 | TXN2 | thioredoxin 2 |
| 27 | 0.060686 | 6391 | SDHC | succinate dehydrogenase complex, subunit C, integral membrane protein, 15 kDa |
| 28 | 0.060526 | 50 | ACO2 | aconitase 2, mitochondrial |
| 29 | 0.060351 | 4713 | NDUFB7 | NADH dehydrogenase 1 beta subcomplex, 7, 18 kDa |
| 30 | 0.058394 | 740 | MRPL49 | mitochondrial ribosomal protein L49 |
Figure 6Prioritization results for mitochondrial complex I deficiency.
(A) A hypothetical FLN of mitochondrial complex I deficiency. The FLN is comprising of known disease genes (highlighted in red) annotated in OMIM and predicted disease genes (highlighted in greed). The candidates are classified into three levels (high-confidence, middle-confidence and low-confidence) according to their ranking scores. (B) The functional linkage sub-network among the candidate NDUFS3 that has a top score on ranking algorithm for mitochondrial complex I deficiency.
Figure 7System architecture and main contents of MitProNet.
MitProNet is composed of three sections including mitochondrial protein part lists, annotations of mitochondrial protein and disease information.
Figure 8Web pages in MitProNet.
(A) A list page of mitochondrial proteins. The mitochondrial proteins can be listed according to proteomic datasets, confidence levels and organisms, respectively. (B) The outcome page for the query protein NDUFS7, an annotated disease gene for Leigh syndrome. The page provides a brief summary of the query protein, subcellular localization evidences and a FLN among the query protein. Moreover, the query protein is annotated according to the information collected from their original sources including GO, KEGG, MIPS and OMIM. (C) The prioritization results for Leigh syndrome. The result page includes a brief description for this phenotype, disease genes and a FLN among these genes. The disease genes are listed dividedly as the known genes and the candidates that are ordered by these ranking scores.