| Literature DB >> 32699100 |
Klaus Gottlieb1, Marco Daperno2, Keith Usiskin3, Bruce E Sands4, Harris Ahmad5, Colin W Howden6, William Karnes7, Young S Oh8, Irene Modesto9, Colleen Marano10, Ryan William Stidham11, Walter Reinisch12.
Abstract
Central reading, that is, independent, off-site, blinded review or reading of imaging endpoints, has been identified as a crucial component in the conduct and analysis of inflammatory bowel disease clinical trials. Central reading is the final step in a workflow that has many parts, all of which can be improved. Furthermore, the best reading algorithm and the most intensive central reader training cannot make up for deficiencies in the acquisition stage (clinical trial endoscopy) or improve on the limitations of the underlying score (outcome instrument). In this review, academic and industry experts review scoring systems, and propose a theoretical framework for central reading that predicts when improvements in statistical power, affecting trial size and chances of success, can be expected: Multireader models can be conceptualised as statistical or non-statistical (social). Important organisational and operational factors, such as training and retraining of readers, optimal bowel preparation for colonoscopy, video quality, optimal or at least acceptable read duration times and other quality control matters, are addressed as well. The theory and practice of central reading and the conduct of endoscopy in clinical trials are interdisciplinary topics that should be of interest to many, regulators, clinical trial experts, gastroenterology societies and those in the academic community who endeavour to develop new scoring systems using traditional and machine learning approaches. © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.Entities:
Keywords: CLINICAL TRIALS; ENDOSCOPY; INFLAMMATORY BOWEL DISEASE
Mesh:
Year: 2020 PMID: 32699100 PMCID: PMC7815632 DOI: 10.1136/gutjnl-2020-320690
Source DB: PubMed Journal: Gut ISSN: 0017-5749 Impact factor: 23.059
Comparison of strengths and limitations of commonly used endoscopic scores
| Score | Endoscopic activity reporting | Responsiveness to treatments | Prognostic value | Central reading | |
|
|
|
Gross classification of the gestalt of inflammation. Present standard for Drug Agencies (FDA, EMA). |
Development focused at responsiveness. Extensively used over past 20 years in trials. |
Limited data for a prognostic role in the literature. |
Algorithms for central reading. Categorical score leads to easier algorithms for adjudication. Widely used over past 5 years. |
|
|
Final score defined by worst lesion. Lacks precision for global burden of severity and extent of lesions. Lack of face validity Endoscopic features only post hoc defined. Limited spectrum at lower and higher spectrum of activity. |
Lack of ability to highlight segmental healing. Lack of responsiveness due to limited range. |
Not developed with prognostic intent. |
Limited interobserver agreement. Inconsistencies between readers if insufficient washing of the mucosa. Data on impact of reader paradigms on eMS-based endpoints is missing. | |
|
|
|
Extensive characterisation and validation of elemental endoscopic lesions focused at agreement. |
Better range than eMS. |
Already used in some trials. | |
|
|
Lacks precision for global burden of severity and extent of lesions. |
Lack of ability to highlight segmental healing. Limited use in clinical trials. Development not focused at responsiveness. |
Not developed with prognostic intent. |
Agreement and adjudication more complex for more granular scores as compared with categorical scores. Modest agreement on some lesions (eg, bleeding). | |
|
|
|
Clear-cut description of elemental lesions. |
Development focused on prognosis. Prognostic value has been reproduced. |
Central reading easy to implement Algorithms for eMS easily exportable to Rutgeerts’ score. | |
|
|
Not an activity measure. Does not evaluate endoscopic activity outside of the anastomotic site. |
No responsiveness evaluation. |
Developed for end-to-end anastomoses, never validated for side-to-side anastomoses. Limited interobserver agreement. |
Limited interobserver agreement. No data on impact of read paradigm on outcome. | |
|
|
|
Developed and validated in order to precisely report disease activity. |
Shown in few trials, even if not explicitly developed for responsiveness. |
Used in clinical trials. Excellent inter-rater reliability. | |
|
|
Complexity. Exact weight of each variable to be better clarified. Unvalidated thresholds for remission and response. The definition of remission does not exclude the presence of ulcers. |
Not developed with focus on responsiveness. |
Limited prognostic value of the sum score. | Agreement and adjudication more complex for continuous scores as compared with categorical scores. Not developed for postoperative anatomy. | |
|
|
|
Developed and validated in order to precisely report disease activity. Possibility to easily exclude a given variable. Segmental and ulcer subscores can be calculated. |
Shown in several trials, even if not explicitly developed for responsiveness. |
Widely used in trials. Excellent inter-rater variability. Different reader algorithms available (fix or sliding scale for adjudication, paired reading …). | |
|
|
Relative complex. Exact weight of each variable to be better clarified. Unvalidated thresholds for remission and response. |
Not developed with focus on responsiveness. |
Limited prognostic value of sum score. |
Agreement and adjudication more complex for more granular scores as compared with categorical scores. No adjustment for missing segments due to sum score. Not developed for postoperative anatomy. |
CDEIS, Crohn’s Disease Endoscopic Index of Severity; EMA, European Medicines Agency; eMS, endoscopic Mayo Score; FDA, Food and Drug Administration; SES-CD, Simple Endoscopic Score for Crohn’s Disease; UCEIS, Ulcerative Colitis Endoscopic Index of Severity.
Summary of suggested changes and improvements in the conduct of clinical trial endoscopy
| Suggested improvement or change (in order of presentation in the paper) | Importance | Ease of implementation |
| Colonoscopy only for UC trials. | ++ | ++ |
| Require split dosing for colonoscopy preps. | +++ | +++ |
| Avoid early morning colonoscopy for trial participants. | ++ | ++ |
| Standardise bowel prep to polyethylene glycol 3350. | +++ | +++ |
| Require vendors to present videos to central readers at the same resolution as recorded (no downsampling). | +++ | +++ |
| Capture metrics for colonoscopy acquisition times (site reader) and viewing times (central reader) and set minimum standards. | ++ | ++ |
| Involve site endoscopists as readers. | ++ | + |
| Central reading training programmes by GI societies. | +++ | + |
| Better training and collaborative use of ancillary personnel. | +++ | + |
| Design new scoring systems (endoscopic outcome instruments), especially for UC, that better reflect inflammatory burden and are validated for their context of use, possibly using machine learning. | ++++ | + |
| Harmonise central reader qualification processes with clinical credentialing requirements. | ++ | ++ |
| Insist on more transparency regarding vendor central reader training programmes and harmonisation (see also above ‘Central reading training programmes by GI societies’). | +++ | ++ |
| Embrace ML to inform development of new scoring systems. | +++ | + |
| Read algorithms (aggregation of the input of more than one reader per video into the final score): choose statistical over non-statistical data aggregation methods. | ++++ | +++ |
| Create prespecified thresholds for acceptable versus unacceptable bowel preps, possible implementation with ML algorithms prior to presentation to central readers. | +++ | ++ |
GI, gastrointestinal; ML, machine learning; UC, ulcerative colitis.