| Literature DB >> 35140225 |
Caralyn Reisle1,2, Laura M Williamson1, Erin Pleasance1, Anna Davies1, Brayden Pellegrini1, Dustin W Bleile1, Karen L Mungall1, Eric Chuah1, Martin R Jones1, Yussanne Ma1, Eleanor Lewis1, Isaac Beckie1, David Pham1, Raphael Matiello Pletz1, Amir Muhammadzadeh1, Brandon M Pierce1, Jacky Li1, Ross Stevenson1, Hansen Wong1, Lance Bailey1, Abbey Reisle1, Matthew Douglas1, Melika Bonakdar1, Jessica M T Nelson1, Cameron J Grisdale1, Martin Krzywinski1, Ana Fisic3, Teresa Mitchell3, Daniel J Renouf4,5, Stephen Yip5, Janessa Laskin3, Marco A Marra1,6, Steven J M Jones7,8,9.
Abstract
Manual interpretation of variants remains rate limiting in precision oncology. The increasing scale and complexity of molecular data generated from comprehensive sequencing of cancer samples requires advanced interpretative platforms as precision oncology expands beyond individual patients to entire populations. To address this unmet need, we introduce a Platform for Oncogenomic Reporting and Interpretation (PORI), comprising an analytic framework that facilitates the interpretation and reporting of somatic variants in cancer. PORI integrates reporting and graph knowledge base tools combined with support for manual curation at the reporting stage. PORI represents an open-source platform alternative to commercial reporting solutions suitable for comprehensive genomic data sets in precision oncology. We demonstrate the utility of PORI by matching 9,961 pan-cancer genome atlas tumours to the graph knowledge base, calculating therapeutically informative alterations, and making available reports describing select individual samples.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35140225 PMCID: PMC8828759 DOI: 10.1038/s41467-022-28348-y
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Platform of Oncogenomic Reporting and Interpretation (PORI) overview.
PORI Design showing both the placement of PORI within a precision oncology workflow (a) and the process of generating a report (b). PORI is used for the interpretation and reporting of genomic findings from tumour sequencing. Sequencing Data is taken as input to a number of bioinformatic pipelines and analyses defined by the user. The results of these are loaded by the IPR report python adapter (ipr-python) and annotated with information from GraphKB. After annotation, the results are collated and prioritized based on matches for output into a report using the IPR interactive web platform. This is optionally manually reviewed by the case analyst who may add content to GraphKB as part of their literature review for the case and re-generate the report to include the newly added content. This report is shared with the molecular tumour board (MTB) to inform clinical decisions.
Fig. 2Integrated Pipeline Reports (IPR) web interface.
(a) The report front page of an example report displaying patient metadata and a summary of tumour characteristics and findings. (b) The annotations collected from GraphKB are listed in the knowledge base matches section of the generated report with links from each match back to its corresponding statement in GraphKB.
Fig. 3GraphKB Statement Schema.
Statements are composed of four main elements: conditions, subject, relevance, and evidence. A statement may be linked to any number of conditions but only one subject and relevance. The conclusion of a statement is considered to be composed only of the relevance and subject.
Fig. 4Clinically informative conclusion agreement across knowledge bases.
(a) The individual contribution of each source is shown as the number of unique conclusions which are given for both raw and normalized counts. The raw values represent the number of conclusions prior to normalization. (b) The amount of content which is shared between sources is shown as a fraction of the total number of unique conclusions. Source data are provided as a Source Data file.
Fig. 5Graph view of content in the GraphKB web application.
A subset of links between disease terms related to colorectal adenocarcinoma are shown from NCI thesaurus (NCIt) and OncoTree. For brevity, only a small number of links to clinical trials (Clinicaltrials.gov) are shown.
Fig. 6Coverage of clinical trial terminology in popular ontologies.
Proportion of matched drug (a) and disease (b) terms from clinical trials matched by ontology terms across multiple resources. A distinction was made between the primary/preferred terms for a given resource and the set of all terms (indicated with a+), which included synonyms, aliases, and commercial product names. The proportion of total clinical trials terms where an exact match was found in a given ontology is termed coverage. Coverage was calculated for trial terms at 3 frequencies (1+, 10+, or 100+) where the frequency is calculated as the number of clinical trials a given term was used in. Source data are provided as a Source Data file.
Fig. 7Division of the therapeutic matches to the TCGA samples (n = 9,961).
The total set of matches is further subdivided by a number of filters. Samples were considered disease matched when the diagnosis of the patient matched the disease listed by the annotation (Diagnosis Match) Samples were considered position matched (Position-Specific) when matches to non-specific gene-level small mutations were excluded. Other filters included: matches with AMP Tier I compatible evidence (AMP Tier I); matches excluding those obtained by second-pass or inferred matching (Direct Match); and finally only non-synonymous, at the protein level, mutations (Non-Synon). The union of all matches is given by the shaded portion and vertical dashed line. Source data are provided as a Source Data file.
Fig. 8Proportion of samples (n = 9,961) with therapeutic matches derived from each combination of variant types.
Upset plot of the number of samples with therapeutic matches from annotation of a given variant type. Sample variants are divided into four types: copy number variants (cnv); single nucleotide variants and indels (mut); gene fusions (fus); and gene expression (exp) variants. The left-hand bar plots are the total number of samples which have 1 or more therapeutic conclusions matched to the listed variant type. The union of all matches is given by the shaded portion and vertical dashed line. The upper bar plots show the number of samples in each of the intersection groups. These groups are mutually exclusive. Source data are provided as a Source Data file.