| Literature DB >> 35192403 |
Jessica A Lavery1, Eva M Lepisto2, Samantha Brown1, Hira Rizvi1, Caroline McCarthy1, Michele LeNoue-Newton3, Celeste Yu4, Jasme Lee1, Xindi Guo5, Thomas Yu5, Julia Rudolph1, Shawn Sweeney6, Ben Ho Park3, Jeremy L Warner3, Philippe L Bedard4, Gregory Riely1, Deborah Schrag2, Katherine S Panageas1.
Abstract
PURPOSE: The American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange Biopharma Collaborative is a multi-institution effort to build a pan-cancer repository of genomic and clinical data curated from the electronic health record. For the research community to be confident that data extracted from electronic health record text are reliable, transparency of the approach used to ensure data quality is essential.Entities:
Mesh:
Year: 2022 PMID: 35192403 PMCID: PMC8863125 DOI: 10.1200/CCI.21.00105
Source DB: PubMed Journal: JCO Clin Cancer Inform ISSN: 2473-4276
Summary of QA Processes
FIG 1.Flow diagram for QA processes and data release. QA, quality assurance.
Example Data Quality Rules and Data Quality Queries
FIG A1.Example of (A) incorrectly curated data and (B) corresponding REDCap data quality rule alert. Example REDCap data quality rule for incorrectly curated data on the basis of simulated data for an example patient. REDCap, Research Electronic Data Capture.
Source Data Verification: Major and Minor Violations
FIG A2.Snapshot of quality assurance application summarizing source data verification findings. The screenshot indicates a dropdown on the left-hand side where the user can specify a particular institution, cancer diagnosis, and summary level for the source data verification findings. The first table on the left shows the number of forms that were reviewed per patient. The table to the right has tabs for each PRISSMM module and shows the number of forms that were compared with the electronic health record and the extent and type of major and minor issues.
Summary of Source Data Verification
Assessment of Reproducibility: Percent Agreement for Select Variables
FIG A3.Kaplan-Meier estimates of OS stratified by curator for (A) NSCLC, (B) CRC, and (C) BrCa. Kaplan-Meier estimates of OS stratified by the primary curator and secondary curator among records that underwent double curation for the purposes of assessing reproducibility. Note that curves are not intended to describe estimates of time to event end points, but to demonstrate the assessment of reproducibility of curation of time to event end points across curators. BrCa, breast cancer; CRC, colorectal cancer; NSCLC, non–small-cell lung cancer; OS, overall survival.
FIG A4.Kaplan-Meier estimates of PFS-I and PFS-M stratified by curator for NSCLC and CRC: (A) PFS-I NSCLC, (B) PFS-I CRC, (C) PFS-M NSCLC, and (D) PFS-M CRC. Survival curves are stratified by the primary curator and secondary curator among records that underwent double curation for the purposes of assessing reproducibility. Note that curves are not intended to describe estimates of time to event end points, but to demonstrate the assessment of reproducibility of curation of time to event end points across curators. CRC, colorectal cancer; NSCLC, non–small-cell lung cancer; PFS-I, progression-free survival according to imaging; PFS-M, progression-free survival according to medical oncologist.
APPENDIX 1. AACR Project GENIE Consortium