Literature DB >> 30194412

Right data for right patient-a precisionFDA NCI-CPTAC Multi-omics Mislabeling Challenge.

Emily Boja¹, Živana Težak², Bing Zhang³, Pei Wang⁴, Elaine Johanson⁵, Denise Hinton⁶, Henry Rodriguez⁷.

Abstract

Entities: Disease Species

Year: 2018 PMID： 30194412 PMCID： PMC6892367 DOI： 10.1038/s41591-018-0180-x

Source DB: PubMed Journal: Nat Med ISSN： 1078-8956 Impact factor: 53.440

× No keyword cloud information.

Although genomics has shaped the current scope of precision medicine, it is becoming increasingly clear that molecular phenotypes, such as DNA and RNA profiles and, in particular, protein abundance profiles, are essential to our understanding of biology and for enhancing our ability to achieve the promise of precision medicine for patients. Hence, simultaneous generation and integration of multidimensional multi-omics datasets from a large set of tumor samples, such as those used in the National Cancer Institute’s (NCI) The Cancer Genome Atlas (TCGA; https://cancergenome.nih.gov) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC; https://proteomics.cancer.gov) projects[1-4], is becoming a powerful approach to understanding the molecular basis of diseases and speeding the translation of new discoveries to patient care. This development has been largely enabled by the rapid technological advancement, standardization and harmonization in tumor molecular profiling in recent years. Consequently, several initiatives have been launched to leverage this development for application to clinical practice, including the International Cancer Proteogenome Consortium[5] and the Applied Proteogenomics Organizational Learning and Outcomes[6] programs. These efforts promise to revolutionize our understanding of cancer biology and change the way cancer is treated. The value of multi-omics technologies and datasets lies in the possibility of accurately extracting rich information to help understand the molecular complexities specific to individual patients through use of sophisticated integrative computational algorithms. Such information can be used to reach a deeper understanding of a disease, which then can be applied clinically, for example, to elucidate the relationship between the genome and proteome of a patient’s tumor or to deconvolute tumor heterogeneity associated with clinical outcome. Ideally, individual and population data would ultimately serve to inform a physician and a patient and to help determine the most appropriate treatment options. Furthermore, the comprehensive information obtained on the same sample in multiple dimensions can add value in pinpointing and correcting problems that can be encountered, such as sample mislabeling by accidental swapping of patient samples or data mislabeling (accidental swapping of patient omics data), which could lead to multiple patients receiving the wrong medical treatment, resulting in severe, irreversible consequences. Sample mislabeling that contributes to irreproducible results and invalid conclusions is known to be one of the obstacles in basic and translational research[7]. This is also prevalent in data-rich large-scale omics studies[8,9], in which human errors could arise anywhere in the data production and analysis pipeline—either sample mislabeling (early in the pipeline) or data mislabeling (later in the pipeline). The Food and Drug Administration (FDA) and NCI-CPTAC, with a history of collaboration[10], also have experience in building challenges, such as the precisionFDA Challenges (https://precision.fda.gov/challenges) and NCI–CPTAC DREAM Proteogenomics Challenge (https://www.synapse.org/#!Synapse:syn8228304/wiki/413428), to solve complex problems. Now they are joining forces to launch a Multi-omics Enabled Sample Mislabeling and Correction Challenge (https://precision.fda.gov/mislabeling) in September 2018. The objective of this challenge is to encourage development and evaluation of computational algorithms that can accurately detect and correct mislabeled samples using rich multi-omics datasets, enhancing the assurance that the right data is attributed to the right patient.

Challenge design

The challenge comprises two subchallenges to be conducted sequentially. In Subchallenge 1, participants will be asked to detect mislabeled samples. Participants will be presented with a training dataset and a test dataset, comprising real-world clinical and proteomics data. Mislabeled samples will be known in the training dataset and not known in the test dataset. Using the training dataset, participants will develop computational models to distinguish samples of matched and nonmatched clinical and proteomics data. The computational models will then be used to identify mislabeled samples in the test dataset. In Subchallenge 2, participants will be asked to correct mislabeled samples in richer data. Participants will be presented with real-world RNA profiling data for all samples in both the training and test datasets. Similar to the clinical and proteomics data, newly introduced RNA profiling data will also include mislabeled samples. As with Subchallenge 1, this information will be known in the training dataset, but not in the test dataset. Participants will develop computational algorithms to model the relationships among the three data types in the training dataset and then will apply the computational model to identify and correct instances of single data type sample mislabeling among the trio of data types in the test dataset. Subchallenge results will be independently evaluated (Fig. 1).

Fig. 1

Challenge design and timelines.

Anticipated outcome and impact

An immediate outcome envisioned is a flagship challenge manuscript that gives an overview of the challenge data, questions, design, and outcomes[11]. Additionally, the algorithms that the participants propose will be aggregated with the aim of refining a final open-source product to be incorporated into an analysis pipeline and ultimately as part of a quality-management system to reduce errors. This could help speed the translation of multidimensional omics technologies and datasets to the clinic. Meanwhile, NCI and FDA hope to build and expand a community of scientists that will collaborate to solve important problems that prevent the translation of multi-omics data to the clinical labs.

9 in total

1. Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics.

Authors: Li Ding; Matthew H Bailey; Eduard Porta-Pardo; Vesteinn Thorsson; Antonio Colaprico; Denis Bertrand; David L Gibbs; Amila Weerasinghe; Kuan-Lin Huang; Collin Tokheim; Isidro Cortés-Ciriano; Reyka Jayasinghe; Feng Chen; Lihua Yu; Sam Sun; Catharina Olsen; Jaegil Kim; Alison M Taylor; Andrew D Cherniack; Rehan Akbani; Chayaporn Suphavilai; Niranjan Nagarajan; Joshua M Stuart; Gordon B Mills; Matthew A Wyczalkowski; Benjamin G Vincent; Carolyn M Hutter; Jean Claude Zenklusen; Katherine A Hoadley; Michael C Wendl; Llya Shmulevich; Alexander J Lazar; David A Wheeler; Gad Getz
Journal: Cell Date: 2018-04-05 Impact factor: 41.582

2. Protein-based multiplex assays: mock presubmissions to the US Food and Drug Administration.

Authors: Fred E Regnier; Steven J Skates; Mehdi Mesri; Henry Rodriguez; Zivana Tezak; Marina V Kondratovich; Michail A Alterman; Joshua D Levin; Donna Roscoe; Eugene Reilly; James Callaghan; Kellie Kelm; David Brown; Reena Philip; Steven A Carr; Daniel C Liebler; Susan J Fisher; Paul Tempst; Tara Hiltke; Larry G Kessler; Christopher R Kinsinger; David F Ransohoff; Elizabeth Mansfield; N Leigh Anderson
Journal: Clin Chem Date: 2009-12-10 Impact factor: 8.327

Review 3. Collaboration to Accelerate Proteogenomics Cancer Care: The Department of Veterans Affairs, Department of Defense, and the National Cancer Institute's Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO) Network.

Authors: L D Fiore; H Rodriguez; C D Shriver
Journal: Clin Pharmacol Ther Date: 2017-05 Impact factor: 6.875

4. Proteogenomic characterization of human colon and rectal cancer.

Authors: Bing Zhang; Jing Wang; Xiaojing Wang; Jing Zhu; Qi Liu; Zhiao Shi; Matthew C Chambers; Lisa J Zimmerman; Kent F Shaddox; Sangtae Kim; Sherri R Davies; Sean Wang; Pei Wang; Christopher R Kinsinger; Robert C Rivers; Henry Rodriguez; R Reid Townsend; Matthew J C Ellis; Steven A Carr; David L Tabb; Robert J Coffey; Robbert J C Slebos; Daniel C Liebler
Journal: Nature Date: 2014-07-20 Impact factor: 49.962

5. Revolutionizing Precision Oncology through Collaborative Proteogenomics and Data Sharing.

Authors: Henry Rodriguez; Stephen R Pennington
Journal: Cell Date: 2018-04-19 Impact factor: 41.582

6. Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer.

Authors: Hui Zhang; Tao Liu; Zhen Zhang; Samuel H Payne; Bai Zhang; Jason E McDermott; Jian-Ying Zhou; Vladislav A Petyuk; Li Chen; Debjit Ray; Shisheng Sun; Feng Yang; Lijun Chen; Jing Wang; Punit Shah; Seong Won Cha; Paul Aiyetan; Sunghee Woo; Yuan Tian; Marina A Gritsenko; Therese R Clauss; Caitlin Choi; Matthew E Monroe; Stefani Thomas; Song Nie; Chaochao Wu; Ronald J Moore; Kun-Hsing Yu; David L Tabb; David Fenyö; Vineet Bafna; Yue Wang; Henry Rodriguez; Emily S Boja; Tara Hiltke; Robert C Rivers; Lori Sokoll; Heng Zhu; Ie-Ming Shih; Leslie Cope; Akhilesh Pandey; Bing Zhang; Michael P Snyder; Douglas A Levine; Richard D Smith; Daniel W Chan; Karin D Rodland
Journal: Cell Date: 2016-06-29 Impact factor: 41.582

7. A community effort to assess and improve drug sensitivity prediction algorithms.

Authors: James C Costello; Laura M Heiser; Elisabeth Georgii; Mehmet Gönen; Michael P Menden; Nicholas J Wang; Mukesh Bansal; Muhammad Ammad-ud-din; Petteri Hintsanen; Suleiman A Khan; John-Patrick Mpindi; Olli Kallioniemi; Antti Honkela; Tero Aittokallio; Krister Wennerberg; James J Collins; Dan Gallahan; Dinah Singer; Julio Saez-Rodriguez; Samuel Kaski; Joe W Gray; Gustavo Stolovitzky
Journal: Nat Biotechnol Date: 2014-06-01 Impact factor: 54.908

8. reGenotyper: Detecting mislabeled samples in genetic data.

Authors: Konrad Zych; Basten L Snoek; Mark Elvin; Miriam Rodriguez; K Joeri Van der Velde; Danny Arends; Harm-Jan Westra; Morris A Swertz; Gino Poulin; Jan E Kammenga; Rainer Breitling; Ritsert C Jansen; Yang Li
Journal: PLoS One Date: 2017-02-13 Impact factor: 3.240

9. Proteogenomics connects somatic mutations to signalling in breast cancer.

Authors: Philipp Mertins; D R Mani; Kelly V Ruggles; Michael A Gillette; Karl R Clauser; Pei Wang; Xianlong Wang; Jana W Qiao; Song Cao; Francesca Petralia; Emily Kawaler; Filip Mundt; Karsten Krug; Zhidong Tu; Jonathan T Lei; Michael L Gatza; Matthew Wilkerson; Charles M Perou; Venkata Yellapantula; Kuan-lin Huang; Chenwei Lin; Michael D McLellan; Ping Yan; Sherri R Davies; R Reid Townsend; Steven J Skates; Jing Wang; Bing Zhang; Christopher R Kinsinger; Mehdi Mesri; Henry Rodriguez; Li Ding; Amanda G Paulovich; David Fenyö; Matthew J Ellis; Steven A Carr
Journal: Nature Date: 2016-05-25 Impact factor: 49.962

9 in total

6 in total

1. A community effort to identify and correct mislabeled samples in proteogenomic studies.

Authors: Seungyeul Yoo; Zhiao Shi; Bo Wen; SoonJye Kho; Renke Pan; Hanying Feng; Hong Chen; Anders Carlsson; Patrik Edén; Weiping Ma; Michael Raymer; Ezekiel J Maier; Zivana Tezak; Elaine Johanson; Denise Hinton; Henry Rodriguez; Jun Zhu; Emily Boja; Pei Wang; Bing Zhang
Journal: Patterns (N Y) Date: 2021-05-07

Review 2. Clinical metagenomics.

Authors: Charles Y Chiu; Steven A Miller
Journal: Nat Rev Genet Date: 2019-06 Impact factor: 53.242

Review 3. Application of Proteomics in Cancer: Recent Trends and Approaches for Biomarkers Discovery.

Authors: Yang Woo Kwon; Han-Seul Jo; Sungwon Bae; Youngsuk Seo; Parkyong Song; Minseok Song; Jong Hyuk Yoon
Journal: Front Med (Lausanne) Date: 2021-09-22

4. SMAP is a pipeline for sample matching in proteogenomics.

Authors: Ling Li; Mingming Niu; Alyssa Erickson; Jie Luo; Kincaid Rowbotham; Kai Guo; He Huang; Yuxin Li; Yi Jiang; Junguk Hur; Chunyu Liu; Junmin Peng; Xusheng Wang
Journal: Nat Commun Date: 2022-02-08 Impact factor: 17.694

5. A reference profile-free deconvolution method to infer cancer cell-intrinsic subtypes and tumor-type-specific stromal profiles.

Authors: Li Wang; Robert P Sebra; John P Sfakianos; Kimaada Allette; Wenhui Wang; Seungyeul Yoo; Nina Bhardwaj; Eric E Schadt; Xin Yao; Matthew D Galsky; Jun Zhu
Journal: Genome Med Date: 2020-02-28 Impact factor: 11.117

6. Comparative analysis of transcriptomic profile, histology, and IDH mutation for classification of gliomas.

Authors: Paul M H Tran; Lynn K H Tran; John Nechtman; Bruno Dos Santos; Sharad Purohit; Khaled Bin Satter; Boying Dun; Ravindra Kolhe; Suash Sharma; Roni Bollag; Jin-Xiong She
Journal: Sci Rep Date: 2020-11-26 Impact factor: 4.379

6 in total