Literature DB >> 33134830

Systematic Review and STARD Scoring of Renal Cell Carcinoma Circulating Diagnostic Biomarker Manuscripts.

Marco A J Iafolla^1,2,3, Sarah Picardo^1,2, Kyaw Aung^1,2,4, Aaron R Hansen^1,2.

Abstract

BACKGROUND: No validated molecular biomarkers exist to help guide diagnosis of renal cell carcinoma (RCC) patients. We seek to evaluate the quality of published RCC circulating diagnostic biomarker manuscripts using the Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidelines.
METHODS: The phrase "(renal cell carcinoma OR renal cancer OR kidney cancer OR kidney carcinoma) AND circulating AND (biomarkers OR cell free DNA OR tumor DNA OR methylated cell free DNA OR methylated tumor DNA)" was searched in Embase, MEDLINE, and PubMed in March 2018. Relevant manuscripts were scored using 41 STARD subcriteria for a maximal score of 26 points. All tests of statistical significance were 2 sided.
RESULTS: The search identified 535 publications: 27 manuscripts of primary research were analyzed. The median STARD score was 11.5 (range = 7-16.75). All manuscripts had appropriate abstracts, introductions, and distribution of alternative diagnoses. None of the manuscripts stated how indeterminant data were handled or if adverse events occurred from performing the index test or reference standard. Statistically significantly higher STARD scores were present in manuscripts reporting receiver operator characteristic curves (P < .001), larger sample sizes (P = .007), and after release of the original STARD statement (P = .005).
CONCLUSIONS: Most RCC circulating diagnostic biomarker manuscripts poorly adhere to the STARD guidelines. Future studies adhering to STARD guidelines may address this unmet need.

Entities: Chemical

Year: 2020 PMID： 33134830 PMCID： PMC7583155 DOI： 10.1093/jncics/pkaa050

Source DB: PubMed Journal: JNCI Cancer Spectr ISSN： 2515-5091

Renal cell cancers (RCC) present with a diverse range of symptoms that may become clinically overt in late-stage disease. Consequently, nearly 30% of new RCC diagnoses will present with de novo metastatic or locally advanced disease (1). Of those able to be treated with curative resection, unfortunately 30% subsequently develop metastatic disease (2). The large range in 5-year survival from 80.9% for stage 1 disease to 8.2% with stage 4 is mostly based on the prognostic features identified from clinico-pathological and radiographic investigations (3). Despite improved response rates from targeted and immunotherapies in the metastatic setting, survival still remains inadequate (4,5). Hence, there is an increased need to identify highly sensitive and specific biomarkers suitable for appropriate screening and diagnosis. Yet despite ongoing research and translational efforts, there remains a paucity of validated diagnostic biomarkers that may influence screening guidelines, clinical trial design, and patient stratification (6). In contrast, cross-sectional abdominal imaging incidentally identifies greater than 50% of renal masses (7,8), of which 16%-23% are benign (9,10) and 60% are organ confined (≤cT2bN0M0) (11). As the lesions increase in size, the likelihood of malignancy also increases, and such masses can be managed with active surveillance, focal ablation, or surgical resection. For some patients with small kidney cancers of 4 cm or less not extending beyond the kidney parenchyma, greater mortality risk exists from nononcologic causes relative to their kidney cancer (12). Diagnostic biomarkers would have clinically significant utility in the detection of small tumors and in a risk-stratified management approach for small renal masses. The feasibility of procuring blood-based biomarkers has garnered interest by both researchers and patient advocacy groups alike due to avoidance of invasive tissue biopsy and tumor heterogeneity while gaining information on tumor dynamics with real-time detection (13). For more than 20 years, the scientific community has investigated various molecular and cellular characteristics in the blood of those afflicted with RCC, yet despite these efforts an absence remains of validated circulating diagnostic biomarkers. This low rate of biomarker discovery is a known problem within oncology research, because fewer than 1% of promising oncologic biomarkers become clinically useful (14). Unfortunately, issues arising from poor methodological design, inadequate participant enrollment, deficiencies in specimen and data collection, overinterpreting results, and unclear reporting all lead to biased conclusions that erroneously support or refute the diagnostic test, threaten estimates of sensitivity and specificity, and affect future research endeavors (15–17). Further, diagnostic studies are prone to overestimating accuracy (18) and often report overly optimistic results (19,20). Consequently, the Standards for Reporting of Diagnostic Accuracy Studies (STARD) statement was created by a comprehensive group of researchers and editors and published in 2003 in several major scientific journals (21–23) to help address the substandard study design and reporting of diagnostic studies. STARD was built on the Consolidated Standards for the Reporting of Trials statement for reporting randomized controlled trials (24,25) to create an itemized checklist to ensure high-quality data procurement, processing, and creation of conclusions for diagnostic studies. The chosen items have their own evidence to show variations in measurement of diagnostic accuracy (23). These recommendations intend to assist investigators in designing and reporting diagnostic studies and give readers the ability to appraise appropriately the research presented. We hypothesized that deficiencies in methodological design as evidenced by the quality of study reporting could contribute to the failure to develop and validate RCC circulating diagnostic biomarkers, and this could be formally assessed by the STARD guidelines. Our primary objective was to determine the number of primary research manuscripts investigating RCC circulating diagnostic biomarkers by performing a systematic review of the literature and then score each valid manuscript using the STARD guidelines.

Methods

Literature Search and Publication Organization

We used the Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) statement to guide our literature search (26). See Supplementary Table (available online) for the PRISMA checklist. The following terms were searched on PubMed (March 23, 2018), MEDLINE (March 29, 2018), and Embase (March 29, 2018): “(renal cell carcinoma OR renal cancer OR kidney cancer OR kidney carcinoma) AND circulating AND (biomarkers OR cell free DNA OR tumor DNA OR methylated cell free DNA OR methylated tumor DNA).” There were no restrictions imposed on past dates of publications, language, or publication status. The search strategy was designed by M.I., K.A., and A.H. The Supplementary Methods (available online) details the full rationale and Medical Subject Headings terms included. Inclusion and exclusion criteria were defined a priori: inclusion criteria consisted of primary research manuscripts examining RCC circulating diagnostic blood-based biomarkers; exclusion criteria were RCC studies limited to review articles, abstract publications, case reports, duplicates, and non-RCC articles. Duplicate publications were identified by matching PubMed reference numbers, Unique Identifier (UI) numbers, and/or publication titles. Because part of the STARD guideline is to appraise reporting quality of the study, the authors of identified studies were not contacted for additional information. Additionally, inherent to STARD is the ability to assess the risk of bias of a diagnostic study. There was no formal review protocol created before the search. The 41 STARD subcriteria used to score valid RCC circulating diagnostic biomarker manuscripts and the number of manuscripts satisfying each subcriteria Adapted from the 2015 STARD Explanation and Elaboration publication (28). AUC = area under curve; CI = confidence interval; RCC = renal cell carcinoma; STARD = Standards for Reporting of Diagnostic Accuracy Studies. Removed from analysis (see Supplementary Table 2, available online). Results were exported into CSV file format for review. Authors M.I. and S.P. independently appraised all titles and abstracts, and organized the publications into 1 of 3 possible categories: publications examining RCC circulating diagnostic biomarkers, publications not examining RCC circulating diagnostic biomarkers, and publications that were unclear about examining RCC circulating diagnostic biomarkers. Author A.H. adjudicated any discrepancies in organization between M.I. and S.P. Manuscripts within the publications examining RCC circulating diagnostic biomarkers were then selected for STARD scoring; abstract-only publications and review articles cannot be subjected to STARD appraisal. All valid manuscripts were then categorized as per their investigated biomarker: categories were established if 2 or more publications examined the same biomarker, and manuscripts examining a biomarker not examined in other publications were classified as “other.” Publications either not examining or unclear if examining RCC circulating diagnostic biomarkers were also subclassified. Unclear abstracts were further assessed by searching for subsequent publications to determine if the abstract pertained to RCC circulating diagnostic biomarkers.

STARD Scoring and Diagnostic Parameters

The STARD criteria consists of a 30-item checklist (27,28), and each item can be further divided into multiple subcategories. There is the potential to evaluate each manuscript according to 46 separate subcriteria; however, authors M.I., S.P., and A.H. examined the STARD subcriteria in tandem and removed 5 of 46 subcriteria from the analysis due to lack of relevance in evaluating valid RCC circulating diagnostic biomarkers (Supplementary Table 2, available online). This yielded a maximal STARD score of 26 points. All subcriteria and their weighted points are listed in Table 1. M.I. and S.P. independently scored all relevant manuscripts, and any discordant scores were reviewed by A.H. The additional variables were also collected: sample size of RCC patients in the study (not including patient controls), histology of RCC investigated, the type of statistical test used, if the study reported statistically significant results, the sensitivity and specificity metrics reported, country and continent of corresponding author, name and impact factor of the publishing journal, if the publishing journal required adherence to STARD guidelines, if the manuscript stated adherence to STARD guidelines, and the year the study was published (based on the following hierarchy depending on information available: the year the manuscript was accepted for publication, year published online, and then year of periodical publication). InCites Journal Citation Reports (29) or other sources (30,31) (in the event the journal and/or year published was not available on InCites) were searched to determine the impact factors of the year the relevant manuscripts were published.

Table 1.

The 41 STARD subcriteria used to score valid RCC circulating diagnostic biomarker manuscripts and the number of manuscripts satisfying each subcriteria

STARD criteria No. (28)	Criteria No. used in score	Criteria description^a	Potential points awarded	Manuscripts meeting, No. (%)
Title or abstract
1	1	States it is a diagnostic study	0.5	16 (59.3)
1	2	States its measure of accuracy (eg, sensitivity, specificity, predictive value, or AUC)	0.5	13 (48.1)
Abstract
2	3	States summary of design, methods, results, and conclusions	1	27 (100.0)
Introduction
3	4	States background and clinical use of test	1	27 (100.0)
4	5	States objective(s) or hypothesi(e)s	1	20 (74.1)
Methods
5	Study design
	6	States if data were collected before (prospective) or after (retrospective) index test was performed	0.5	23 (85.2)
	7	States if data were collected before (prospective) or after (retrospective) reference standard was performed	0.5	11 (40.7)
6	Participants
	8	States inclusion criteria	0.5	26 (96.3)
	9	States exclusion criteria	0.5	2 (7.4)
7	10	States where patients were identified (eg, from registry, database, symptoms, etc)	1	4 (14.8)
8	11	States where identified patients were located	0.5	14 (51.9)
8	12	States when identified patients were found	0.5	11 (40.7)
9	13	States if patients were a random, consecutive, or convenience series	1	2 (7.4)
10a	Test methods
10a	14	States index test method and handling of sample	0.5	20 (74.1)
10b	15	States reference standard method and handling of sample	0.5	8 (29.6)
11		States rationale for choosing reference standard^b
12a	16	States definition of positive cut-off for index test and defined a priori vs posteriori	0.25	22 (81.5)
12a	17	States rationale of positive cut-off for index test and defined a priori vs posteriori	0.25	22 (81.5)
12b	18	States definition of positive cut-off for reference standard and defined a priori vs posteriori	0.25	22 (81.5)
12b	19	States rationale of positive cut-off for reference standard and defined a priori vs posteriori	0.25	22 (81.5)
13a	20	States clinical info was blinded to those performing or reading the index test	0.25	1 (3.7)
13a	21	States reference standard result was blinded to those performing or reading the index test	0.25	1 (3.7)
13b	22	States clinical info was blinded to those assessing the reference standard	0.25	1 (3.7)
13b	23	States index test result was blinded to those assessing the reference standard	0.25	1 (3.7)
14	Analysis
14	24	States the statistical method used to determine diagnostic accuracy (ie, used regression to determine 95% CI)	1	4 (14.8)
15	25	States how indeterminant data from index test were handled	0.5	0 (0.0)
15	26	States how indeterminant data from reference standard were handled	0.5	0 (0.0)
16	27	States how missing data from index test were handled	0.5	1 (3.7)
16	28	States how missing data from reference standard were handled	0.5	1 (3.7)
17	29	States analyses of variability in diagnostic accuracy by comparing differences in accuracy across subgroups of participants, readers, or centers; and defined a priori vs posteriori	1	5 (18.5)
18	30	States intended sample size	0.5	2 (7.4)
18	31	States how sample size was calculated	0.5	2 (7.4)
Results
19	Participants
19	32	States flow patients or shows flow diagram	1	1 (3.7)
20	33	States baseline demographic and clinical characteristics of participants	1	21 (77.8)
21a	34	States distribution of severity of disease in those with the target condition	0.5	26 (96.3)
21b	35	States distribution of alternative diagnoses in those without the target condition	0.5	27 (100.0)
22		States time interval and clinical interventions between index test and reference standard^b
23	Test results
23	36	States cross tabulation (ie, 2 × 2 table) of the index test results by the results of the reference standard	1	5 (18.5)
24	37	States estimates of diagnostic accuracy and their precision (such as 95% CIs)	1	14 (51.9)
25	38	States any adverse events from performing the index test or the reference standard	1	0 (0.0)
Discussion
26	39	States limitations	1	12 (44.4)
27	40	States implications for practice	1	22 (81.5)
Other information
28		States registration number and registry name^b
29		States where study protocol can be accessed^b
30	41	States sources of funding, support, and role of funders	1	23 (85.2)

Adapted from the 2015 STARD Explanation and Elaboration publication (28). AUC = area under curve; CI = confidence interval; RCC = renal cell carcinoma; STARD = Standards for Reporting of Diagnostic Accuracy Studies.

Removed from analysis (see Supplementary Table 2, available online).

RCC circulating diagnostic biomarker manuscript independent variables subject to categorical variable analysis with STARD scores Calculated using the Mann-Whitney test (for 2 categories) or Kruskal-Wallis test (for >2 categories). N/A = not applicable; RCC = renal cell carcinoma; STARD = Standards for Reporting of Diagnostic Accuracy Studies. Six studies were removed from the analysis due to not stating the RCC histology investigated. Four studies were removed from the analysis due to being published before the initial STARD publication in 2003. Two studies were removed from the analysis due to combining RCC patients with other malignancies in the final analysis. Due to only 2 articles reporting non-statistically significant results, the Mann-Whitney test cannot be used and the P value was determined using an unpaired t test with Welch’s correction.

Statistical Analysis

Descriptive statistics such as median and range were used to summarize the STARD scores. The following categorical variables were correlated with STARD score using the Mann-Whitney test (for 2 categories) or Kruskal-Wallis test (for >2 categories): manuscripts published before the year 2003 (when STARD was released) (21–23), after 2005 (to allow 2 years for investigators to adopt STARD guidelines), and after 2015 (the year the elaborated STARD edition was published) (27,28); continent of corresponding author; histology of RCC (clear cell RCC vs other); if a receiver operating characteristic curve was used in calculating sensitivity and specificity; report of statistically significant results; and if the publishing journal required adherence to STARD (publications before and including 2003 were excluded). The following continuous variables were correlated with STARD scores using Spearman’s correlation: journal impact factor, sample size of RCC patients (not including control patients), sensitivity reported, and specificity reported. No additional sensitivity, subgroup, or meta-regression analyses were performed. All tests of statistical significance were 2-sided, and P less than .05 was considered statistically significant.

Results

Literature Search Results

Embase identified 252 results, MEDLINE 146 results, and PubMed 394 results. The PubMed search had 98 and 146 overlapping results with Embase and MEDLINE, respectively. The Embase search had 13 repeated publications: 5 abstracts UI numbers were repeated twice, 5 abstracts were repeated twice under a different UI, and 1 abstract was repeated 3 times under a different UI. Note that 1 abstract had its UI repeated twice and had the abstract repeated under a different UI. The search resulted in 535 unique publications. Using title, abstract, and keywords, the authors identified 51 publications that examined RCC circulating diagnostic biomarkers, 18 publications of unclear relevance, and 466 rejected publications. One abstract examining circulating tumor DNA as a RCC diagnostic biomarker was presented at 3 different conferences using slightly different titles and abstract content; this study was only tallied once. In total, 27 valid manuscripts were subjected to STARD scoring and were organized into 6 different categories; the major categories were studies examining various forms of circulating RNA (n = 13) or those defined as “other” due to the biomarker not examined in other publications (n = 6). The inaccessible manuscripts of valid abstracts (n = 6) were due to foreign language (n = 2) or the manuscript not being available (n = 4; attempts at contacting the corresponding authors were unsuccessful). The 7 review articles and 11 abstracts examining RCC circulating diagnostic biomarkers were excluded from our analysis. Publications of unclear statistical significance (n = 18) were due to unclear abstracts: inaccessible manuscripts (n = 12) and abstract-only publications (n = 3) were both composed of review studies, and hence no additional attempts were made to clarify if they examined RCC circulating diagnostic biomarkers. Two studies were non-English publications. One manuscript examined circulating tumor DNA, but it is unclear if the cancer patients were compared with healthy controls because its methods were inaccessible. Rejected publications were organized into 54 different categories, of which the largest groups were studies without any diagnostic, predictive, or prognostic biomarkers (n = 129) and studies examining RCC circulating prognostic biomarkers (n = 69) (see Supplementary Box 1, available online). Figure 1 summarizes these results in a PRISMA diagram (26).

Figure 1.

Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) flow diagram depicting the results from the literature search and subclassification into 1 of 3 categories: publications examining renal cell carcinoma (RCC) circulating diagnostic biomarkers, publications unclear if examining RCC circulating diagnostic biomarkers, and publications not examining RCC circulating diagnostic biomarkers. Only manuscripts that examined RCC circulating diagnostic biomarkers were subjected to Standards for Reporting of Diagnostic Accuracy Studies (STARD) appraisal.

STARD Scores and Descriptive Statistics

Figure 2 is a histogram of the STARD scores from the 27 manuscripts included in this review. Based on the 41 STARD subcriteria used in this study (Table 1), the maximum achievable STARD score was 26. The median STARD score was 11.5 (range = 7-16.75). The following are STARD criteria present in all examined manuscripts: abstracts stating summary of design, methods, results, and conclusions; introductions stating background and clinical use of test; and distribution of alternative diagnoses in those without the target condition. Unfortunately, none of the examined manuscripts stated how indeterminant data were handled or if there were adverse events from performing the index test or the reference standard. Twenty-three (56%) STARD subcriteria were met in less than 50% of the appraised manuscripts. The methods section of the STARD division had the lowest scores with only a median 4 (15%) manuscripts satisfying these subcriteria. In contrast, the introduction/abstract, results, and discussion STARD divisions had a median manuscript number of 20 (74%), 14 (52%), and 17 (63%), respectively, addressing their corresponding subcriteria. The STARD subdivisions analysis (n = 1.5; 5.6%) and participants (n = 7.5; 28%) had the lowest number of median manuscripts satisfying their corresponding subcriteria. Table 1 summarizes the number of publications that met criteria for the 41 subcriteria, and Supplementary Table 3 (available online) lists all publications examining RCC circulating diagnostic biomarkers and the STARD scores for valid manuscripts.

Figure 2.

Histogram depiction of the Standards for Reporting of Diagnostic Accuracy Studies (STARD) scores from relevant renal cell carcinoma (RCC) circulating diagnostic biomarker manuscripts identified in this study. The maximal STARD score was 26. RCC circulating diagnostic biomarker manuscript independent variables subject to continuous variable analysis with STARD scores Calculated using Spearman’s correlation. RCC = renal cell carcinoma; STARD = Standards for Reporting of Diagnostic Accuracy Studies. The majority of the studies were conducted in Europe (n = 18; 67%) and examined clear cell RCC histology (n = 11; 41%), although 6 studies (22%) did not state the type of RCC histology studied. In total, 23 manuscripts (78%) reported a statistically significant association between their circulating biomarker and diagnostic ability. Receiver operating characteristic curves were constructed in 18 (67%) manuscripts; the remaining manuscripts compared cancer patients with controls using only the Mann-Whitney test (n = 3; 11%), χ2 test (n = 2; 7%), Fisher’s exact test (n = 1; 4%), or Wilcoxon rank-sum (n = 1; 4%), 2−ΔΔCt (n = 1; 4%) or the test was not stated (n = 1; 4%). Sensitivity and specificity were reported in 17 studies (63%): 1 study examining 6 different circulating methylated genes reported 10 different sensitivity and specificity results; 1 study mixed their RCC results with renal pelvis transitional cell carcinoma, angiomyolipoma, metanephric nephroma, and oncocytoma; and 2 studies constructed a receiver operating characteristic curve yet did not report sensitivity or specificity. The median sensitivity and specificity were 70% (range = 0%-100%) and 88.5% (range = 33.3%-100%), respectively. The median impact factor was 2.348 (range = 1.604-12.945). Two manuscripts published in European Urology Focus, 1 from Journal of Molecular Biomarkers & Diagnosis, and 1 from the British Journal of Cancer were unable to have their impact factors identified at the year published; the impact factor with the closest calendar date to the date published was used. The median year of publication was 2012 (range = 1993-2016) and had a large variation in sample size of RCC patients (median = 43, range = 1-229; the study with only 1 RCC patient was combined with 80 other metastatic cancer patients examining circulating endothelial cells, and hence is not considered a case report). Seven (26%) manuscripts were published in journals requiring adherence to STARD guidelines, although 4 manuscripts were removed from the correlative analyses because they were published before the year 2003.

Associations With STARD Score

As shown in Figure 3, there were statistically significant associations between several categorical variables with STARD score: continent of the corresponding author (P = .03); manuscripts with a receiver operator characteristic curve (P < .001); and year of publication either before the year 2003 (P = .005), after 2005 (P = .001), or after 2015 (P = .05). Interestingly, journals mandating adherence to STARD guidelines did not produce better STARD scores (P = .29), although this interpretation is limited by the small number of manuscripts published in journals requiring adherence (n = 4). Table 2 lists the remaining categorical variables that failed to reach statistical significance. Figure 4 graphically displays the statistically significant association between RCC sample size as a continuous variable with STARD score using Spearman’s correlation (r = 0.51, 95% confidence interval [CI] = 0.15 to 0.75; P = .007). As summarized in Table 3, the remaining continuous variables analyzed were non-statistically significant: impact factor (r = −0.15, 95% CI −0.51 to 0.26; P = .47), sensitivity (r = 0.30, 95% CI = −0.23 to 0.69; P = .25), and specificity (r = −0.31, 95% CI −0.70 to 0.21; P = .22). Unfortunately, only 1 manuscript mentioned adherence to STARD, preventing any analysis on a relationship between this variable and STARD scores.

Figure 3.

Table 2.

RCC circulating diagnostic biomarker manuscript independent variables subject to categorical variable analysis with STARD scores

Description		Median STARD score (range)	No. (%)	P ^a
Histology^b	Clear-cell only	11.5 (8-15.5)	11 (52.4)	.67
Histology^b	Mixed histology	12 (9-16.75)	10 (47.6)	.67
Journal stating requirement to STARD criteria^c	Yes	10.5 (9.5-12.25)	4 (17.4)	.29
Journal stating requirement to STARD criteria^c	No	12 (8-16.75)	19 (82.6)	.29
Location of corresponding author	Asia	13 (11.5-15.5)	3 (11.1)	.03
	Australia	7 (N/A)	1 (3.7)
	Europe	11.75 (8-16.75)	18 (66.7)
	North America	9 (8-10.5)	5 (18.5)
Manuscript published before the year 2003	Yes	8 (7-9)	4 (14.8)	.005
Manuscript published before the year 2003	No	11.5 (8-16.75)	23 (85.2)	.005
Manuscript published after the year 2005	Yes	11.75 (8-16.75)	22 (81.5)	.001
Manuscript published after the year 2005	No	8 (7-9)	5 (18.5)	.001
Manuscript published after the year 2015	Yes	13 (8-16.75)	9 (33.3)	.05
Manuscript published after the year 2015	No	11.25 (7-15.5)	18 (66.7)	.05
Receiver operator characteristic curve	Yes	12 (9.5-16.75)	18 (66.7)	<.001
Receiver operator characteristic curve	No	8 (7-12.25)	9 (33.3)	<.001
Statistically significant results^d	Yes	11.5 (7-16.75)	23 (92.0)	.08

Calculated using the Mann-Whitney test (for 2 categories) or Kruskal-Wallis test (for >2 categories). N/A = not applicable; RCC = renal cell carcinoma; STARD = Standards for Reporting of Diagnostic Accuracy Studies.

Six studies were removed from the analysis due to not stating the RCC histology investigated.

Four studies were removed from the analysis due to being published before the initial STARD publication in 2003.

Two studies were removed from the analysis due to combining RCC patients with other malignancies in the final analysis. Due to only 2 articles reporting non-statistically significant results, the Mann-Whitney test cannot be used and the P value was determined using an unpaired t test with Welch’s correction.

Figure 4.

Scatterplot representation of renal cell carcinoma (RCC) sample size as a continuous variable with Standards for Reporting of Diagnostic Accuracy Studies (STARD) scores. Sample size of controls was not incorporated into this analysis.

Table 3.

RCC circulating diagnostic biomarker manuscript independent variables subject to continuous variable analysis with STARD scores

Description	Median (range)	P ^a
Impact factor	2.348 (1.604-12.945)	.47
Sample size	43 (1-229)	.007
Sensitivity	70.0 (0.0-100.0)	.25
Specificity	88.5 (33.3-100.0)	.22

Calculated using Spearman’s correlation. RCC = renal cell carcinoma; STARD = Standards for Reporting of Diagnostic Accuracy Studies.

Beeswarm plot representation of categorical statistically significant associations with Standards for Reporting of Diagnostic Accuracy Studies (STARD) scores: (A) continent of corresponding author, (B) publishing a receiver operator characteristic curve, (C) manuscripts published before the year 2003, (D) manuscripts published after the year 2005, and (E) manuscripts published after the year 2015. Scatterplot representation of renal cell carcinoma (RCC) sample size as a continuous variable with Standards for Reporting of Diagnostic Accuracy Studies (STARD) scores. Sample size of controls was not incorporated into this analysis.

Discussion

Herein we demonstrated the first, to our knowledge, appraisal of RCC circulating diagnostic biomarker manuscripts using the STARD criteria and identified the need to improve the quality of methodology and reporting. Our analysis revealed there was poor compliance with the STARD checklist, which raises questions about the methodology of these trials. This is in keeping with a 2014 meta-analysis of various diagnostic studies examining adherence to STARD guidelines: the mean number of items reported ranged from 9.1 to 14.3 using a maximal STARD score of 25 (32). This meta-analysis also found mean STARD scores less than 50% in 6 of 16 (38.5%) studies, which is similar to another study that found nearly 25% of diagnostic manuscripts in high–impact factor journals had less than one-half of the STARD items addressed (33). Unfortunately, our study showed 20 manuscripts (74.1%) addressed less than 50% of STARD items. Our most common STARD deficiencies lay within the manuscript’s methodologies, with the STARD subdivision “analysis” nested under the methods section having the lowest number of manuscripts satisfying these criteria. Further, all manuscripts examined failed to comment on the handling of indeterminant data or occurrence of adverse events from performing the index test or the reference standard. Several items were commented in 3 (11.1%) or fewer manuscripts: exclusion criteria, where patients were identified, blinding of those performing or reading the index test or reference standard to clinical information and reference standard results, handling of missing data, intended sample size and sample size calculations, and reporting a flow chart. These deficiencies are consistent with the inadequate reporting of eligibility criteria, recruitment process, and sampling methods in high–impact factor diagnostic publications of various medical disciplines in the post-STARD era (33). Further, this same study also determined reporting of confidence intervals around estimates of diagnostic accuracy in one-third of their manuscripts, which is comparable with our RCC diagnostic biomarker studies (n = 14; 51.9%). Our analysis for correlations with STARD scores determined the quality of RCC circulating diagnostic biomarker manuscripts improved after the release of the 2003 STARD guidelines (21–23) (median increase = 3.5), when dichotomizing by the year 2005 to allow investigators time to adopt the STARD criteria (34) (median increase = 3.75), and when the elaborated STARD edition was published in 2015 (median increase = 1.75) (27,28). We were unable to perform an analysis dichotomizing manuscripts by the year 2017 (to allow investigators time to adopt the elaborated STARD criteria) because the most recent manuscript from our search was published in 2016. Several analyses have since been conducted on the impact STARD has made in the diagnostic investigator community: whereas an improvement of 1.4 to 3.4 items was observed when comparing manuscripts published in the pre- to post-STARD era (32,33,35), 1 study found no meaningful difference in methodology or reporting (36). Whereas 1 of these studies (36) only used 13 STARD items, the others (32,33,35) evaluated their manuscripts based on all of STARD’s original 25 items. Our analysis used all relevant criteria in the updated 30 item STARD checklist (27,28) with further subcriteria analysis, and showed the release of STARD and its subsequent elaborated edition is likely associated with better STARD scores (although this was inconsistent with our continuous variable analysis of publication year). Even though the best STARD scores were after the release of the elaborated STARD edition, the median score in this era was only 13 (50%). Although the STARD guidelines were constructed to address bias by encouraging transparent reporting of the methods and results to ensure robust, reproducible conclusions that readers can ascertain solely from the study report (37–42), unfortunately our analysis suggests this framework has not been widely or routinely used by RCC circulating diagnostic biomarker investigators. We also demonstrated continent of the corresponding author, publishing a receiver operator characteristic curve, and size of RCC sample size studied produced manuscripts with better STARD scores. The latter is consistent with an obstetrics and gynecology study that found sample size was associated with better STARD adherence (43). We hypothesized journals with greater impact factors would publish higher quality manuscripts and hence have better adherence to STARD items. Our results did not show this to be the case. Two studies evaluating manuscripts in journals with impact factors of at least 4 showed only a modest 13.6 to 15.3 STARD item adherence in the post-STARD era (maximal score was 25), with only 1.81 to 3.4 item improvement compared with the pre-STARD era (33,35). Although 1 study demonstrated improved reporting of other medical guidelines in journals with large impact factors (44), a recent study showed overinterpretation and distorted presentation of results in diagnostic accuracy studies published in journals with impact factors of at least 4 (20). Although STARD assists with appropriate diagnostic methodology and reporting, in our opinion not all items are made equal: we believe items 19-21 and 23-24 that pertain to validity of results are most important. The information in these items allows the reader to determine the patient population studied and reasons for drop-out, recalculate measures of diagnostic accuracy (such as sensitivity and specificity), and be able to use the data in meta-analyses (28). Our studies showed these items were present in a median 17.5 manuscripts (64.8%) but had a large range (1-27 manuscripts). This is relatively consistent with a STARD study analyzing diagnostic endoscopy manuscripts (45). However, bias is extremely important for readership and is best assessed with items 10 (details on index text and reference standard), 12 (definition and rationale behind positive cut-offs), 13 (blinding of the performers/readers of the tests), 19 (patients included and reasons for drop-out), and 20 (demographic and clinical characteristics of participants) (8,16). Items pertaining to bias were present in fewer manuscripts compared with validity items: a median 14 manuscripts (range = 1-22). Ultimately, impairments in reportability do not necessarily invalidate the evidence but leave the reader unable to fully trust or apply the presented information. However, not all STARD items may be applicable to a RCC circulating diagnostic biomarker study (see Supplementary Table 2, available online), and we believe omission of these items will yield more accurate STARD scores for our purpose. Unfortunately, the lack of adherence to STARD is not unique to this guideline: a similar study examining the quality of RCC circulating prognostic biomarkers using the Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK) criteria (46) also showed poor adherence, suggesting the paucity of these biomarkers may also be secondary to inadequate methodology and reporting. Although this study showed statistically significantly higher REMARK scores in publications stating adherence to REMARK guidelines and reporting statistically significant results, our STARD analysis did not generate similar correlations. The adoption of guidelines remains an ongoing issue in biomedical research: a systematic scoping review assessing use of health-care reporting guidelines found 86% of studies had suboptimal adherence (44). Further, implementation of guidelines, including STARD (33,34) and Consolidated Standards of Reporting Trials (47–49), is generally slow. Although not clear, the limited requirement of published reporting guidelines in journal policies, in addition to author instruction and enforcement of guidelines, may play a role (34,50,51). Journals use various language requesting adherence to STARD guidelines, ranging from “required” to “encouraged” and “consult.” Although we were unable to demonstrate journals requiring adherence to STARD criteria had better STARD scores, this may be a consequence of our analysis identifying a limited number of journals stating this requirement (n = 4). Theoretically, strict enforcement of this guideline by journals should help improve these metrics. Our review showed the vast majority of RCC diagnostic biomarker studies investigated microRNA, although other major themes identified included cell-free DNA, endothelial cells, methylated cell free DNA, and tumor cells. microRNAs are small noncoding RNAs that posttranscriptionally regulate genes, mediate cell-to-cell communication (52), and may play a role in tumorigenesis (53–55). Their stability and detectability in circulating blood (56), in addition to unique signatures documented in several cancer types (57–59), including RCC (60–63), have led to increased interest as a potential biomarker. However, the diagnostic biomarker potential of specific microRNAs has been inconsistent between studies, likely secondary to differences in patient demographics, methodology, and, most importantly, lack of standardized microRNA controls to normalize the data (64). To date, the endogenous 3 miRNAs named let-7d, let-7g, and let-7i are presumably the best reference gene normalizers due to their consistency between healthy controls and RCC (64). microRNA is commonly detected using reverse transcription–reverse transcription–polymerase chain reaction (PCR)-based TaqMan Low Density Array with subsequent quantitative reverse transcription–PCR assay validation (64), with or without the use of magnetic bead–bound tumor-associated epithelial cell adhesion molecule antibodies to avoid interference from nonepithelial tissues (65). Cell-free DNA is shed from apoptotic and necrotic cells into the blood stream (66). Since its first identification in cancer patients (67), it has been investigated as both a diagnostic (68, 69) and prognostic cancer (70,71) biomarker in RCC and other cancers (72–75). However, cell-free DNA is not yet clinically validated in RCC. Although this paucity may be secondary to deficiencies in methodology or reporting, as shown in this study and our prior work (46), we hypothesize that cell-free DNA alone lacks diagnostic biomarker potential in RCC due to its unclear relationship with tumor biology, sampling noise that limits analytic sensitivity, and the recognition that cancer-associated mutations are also present in healthy individuals without cancer (76–79). In contrast, methylated cell free DNA is dynamic and has the potential to capture information on the tumor heterogeneity, its microenvironment, and interactions with the immune system. Presently, most RCC methylated DNA investigations focus on silencing of specific tumor suppressor genes, promoter hypermethylation, or other changes in cytosine and guanine island methylation patterns (80–84). The 2 methylated DNA RCC diagnostic biomarker investigations found in our review (85,86) analyzed several genes involved in the development and progression of RCC using either methylation-specific PCR or methylation-sensitive restriction enzymes with subsequent PCR. Thus, there is a clearer link with the tumor biology. Future studies analyzing changes in cell-free methylated DNA patterns over longitudinal time points may yield a better understanding of its dynamic state and diagnostic, predictive, and prognostic biomarker potential. This project has several limitations. We included studies that evaluated the discriminatory ability of their index test to a reference standard, even if diagnostic accuracy was not formally stated as a primary objective. This is justifiable because primary and secondary objectives are not always delineated (87), and any study attempting to differentiate a diseased from nondiseased state should have its methods and reporting appraised. Although journals may formally state adherence to STARD in their diagnostic studies at the time of this publication, it is unknown if this was applied during the year the manuscript was published. We attempted to augment this by excluding manuscripts published before STARD’s release in 2003 (21–23). In addition, we applied the criteria from the elaborated STARD edition (27,28) to manuscripts published before its release, creating possible underestimates in our scores. Further, STARD was originally created to assist authors, editors, and peer reviewers in appraising diagnostic studies, not as a tool to assess quality of reporting. Hence, there is subjectivity in the scoring of some STARD items. Although reproducibility of STARD scoring is generally concordant, substantial discordance for specific items does exist (88). We searched the supplementary sections of relevant manuscripts to assess for STARD items. However, one may argue that supplementary information should be considered nonessential for appropriate appraisal of a diagnostic study, and this may overestimate our STARD scores. Finally, some STARD items are more relevant than others; by using equal weight of each item, the summative STARD score may not always be an accurate reflection of the manuscript’s trustworthiness and applicability.

Conclusions

Despite the need for improved RCC diagnostic biomarkers, ideally using minimally invasive techniques, our review showed that poor study design and reporting are possibly limiting their clinical validity. Although manuscripts appear to be of better quality if they were published with receiver operator characteristic curves, have larger sample sizes, and were published after the release of the original STARD statement and its subsequent elaborated edition, and even include continent of the corresponding author, they still fail to follow many of the recommendations. The act of publication does not validate results: all readers should look at the methods and data, not the interpretation, and come to their own conclusion. We suggest that future investigations into RCC circulating diagnostic biomarkers consult the STARD criteria: by taking this essential checklist into account at the inception of study design, investigators will be able to identify possible sources of bias, make appropriate changes in methodological design, and ultimately produce a transparent study that is robust and reliable.

Funding

No funding was obtained for the purpose of this systematic review.

Notes

Disclosures: Marco Iafolla: Honoraria from Merck and CompassMed. Sarah Picardo: This author declares no conflict of interest. Kyaw Aung: This author declares no conflict of interest. Aaron Hansen: Advisory/Consulting/Research for Genentech/Roche, Merck, GSK, Bristol-Myers Squibb, Novartis, Boston Biomedical, Boehringer-Ingelheim, AstraZeneca, Medimmune, Pfizer. Grants from Karyopharm and Novartis. Role of the authors: Study design by MI, KA and AH. Data collection by MI and SP. Statistical analysis by SP. Interpretation of results by MI, SP, KA and AH. Manuscript drafting by MI. Manuscript review by MI, SP, KA, and AH. All authors have approved the submitted version. Acknowledgments: Marco Iafolla was supported in part by a fellowship through the BMO Chair in Precision Cancer Genomics. Click here for additional data file.

84 in total

1. Expression profile of microRNAs in serum: a fingerprint for esophageal squamous cell carcinoma.

Authors: Chunni Zhang; Cheng Wang; Xi Chen; Cuihua Yang; Ke Li; Junjun Wang; Juncheng Dai; Zhibin Hu; Xiaojun Zhou; Longbang Chen; Yanni Zhang; Yanfang Li; Hong Qiu; Jicheng Xing; Zhichao Liang; Binhui Ren; Chen Yang; Ke Zen; Chen-Yu Zhang
Journal: Clin Chem Date: 2010-10-13 Impact factor: 8.327

Review 2. Increasing value and reducing waste in biomedical research: who's listening?

Authors: David Moher; Paul Glasziou; Iain Chalmers; Mona Nasser; Patrick M M Bossuyt; Daniël A Korevaar; Ian D Graham; Philippe Ravaud; Isabelle Boutron
Journal: Lancet Date: 2015-09-27 Impact factor: 79.321

Review 3. Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review.

Authors: Amy C Plint; David Moher; Andra Morrison; Kenneth Schulz; Douglas G Altman; Catherine Hill; Isabelle Gaboury
Journal: Med J Aust Date: 2006-09-04 Impact factor: 7.738

10. Endorsement of the CONSORT Statement by high impact factor medical journals: a survey of journal editors and journal 'Instructions to Authors'.

Authors: Sally Hopewell; Douglas G Altman; David Moher; Kenneth F Schulz
Journal: Trials Date: 2008-04-18 Impact factor: 2.279