| Literature DB >> 32710834 |
Mi Yang1, Francesca Petralia2, Zhi Li3, Hongyang Li4, Weiping Ma2, Xiaoyu Song5, Sunkyu Kim6, Heewon Lee6, Han Yu7, Bora Lee8, Seohui Bae9, Eunji Heo10, Jan Kaczmarczyk11, Piotr Stępniak11, Michał Warchoł11, Thomas Yu12, Anna P Calinawan2, Paul C Boutros13, Samuel H Payne14, Boris Reva2, Emily Boja15, Henry Rodriguez15, Gustavo Stolovitzky16, Yuanfang Guan4, Jaewoo Kang6, Pei Wang2, David Fenyö17, Julio Saez-Rodriguez18.
Abstract
Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge. We asked for methods to predict protein and phosphorylation levels from genomic and transcriptomic data in cancer patients. The best performance was achieved by an ensemble of models, including as predictors transcript level of the corresponding genes, interaction between genes, conservation across tumor types, and phosphosite proximity for phosphorylation prediction. Proteins from metabolic pathways and complexes were the best and worst predicted, respectively. The performance of even the best-performing model was modest, suggesting that many proteins are strongly regulated through translational control and degradation. Our results set a reference for the limitations of computational inference in proteogenomics. A record of this paper's transparent peer review process is included in the Supplemental Information.Entities:
Keywords: cancer; crowdsourcing; genomics; machine learning; protein regulation; proteogenomics; proteomics
Mesh:
Substances:
Year: 2020 PMID: 32710834 DOI: 10.1016/j.cels.2020.06.013
Source DB: PubMed Journal: Cell Syst ISSN: 2405-4712 Impact factor: 10.304