Literature DB >> 28050126

ROC Estimation from Clustered Data with an Application to Liver Cancer Data.

Joungyoun Kim¹, Sung-Cheol Yun², Johan Lim³, Moo-Song Lee², Won Son³, DoHwan Park⁴.

Abstract

In this article, we propose a regression model to compare the performances of different diagnostic methods having clustered ordinal test outcomes. The proposed model treats ordinal test outcomes (an ordinal categorical variable) as grouped-survival time data and uses random effects to explain correlation among outcomes from the same cluster. To compare different diagnostic methods, we introduce a set of covariates indicating diagnostic methods and compare their coefficients. We find that the proposed model defines a Lehmann family and can also introduce a location-scale family of a receiver operating characteristic (ROC) curve. The proposed model can easily be estimated using standard statistical software such as SAS and SPSS. We illustrate its practical usefulness by applying it to testing different magnetic resonance imaging (MRI) methods to detect abnormal lesions in a liver.

Entities: Disease Gene Species

Keywords: Lehmann family; clustered data; grouped-time survival; ordinal outcomes; random effects; receiver operating characteristic (ROC) curve

Year: 2016 PMID： 28050126 PMCID： PMC5181834 DOI： 10.4137/CIN.S40299

Source DB: PubMed Journal: Cancer Inform ISSN： 1176-9351

Introduction

The receiver operating characteristic (ROC) curve plots two accuracy measures of tests (the false-positive rate and the true-positive rate), which are frequently used to measure and compare the accuracy of different diagnostic methods. It frequently depends on covariates such as gender and age as in the hearing impairment example in Dodd and Pepe.1 To explain the effects of covariates, the covariate-dependent ROC models are studied extensively in the literature. Two most common approaches are as follows: (i) one that introduces covariate-dependent error distributions2 and (ii) the other that directly quantifies the covariate effect on the ROC curve.1,3–8 The clustered outcomes are multiple measurements of a diagnostic test from a single subject (or cluster). They typically occur when subjects are repeatedly tested over time and are naturally correlated to each other. The random-effects model is introduced to explain the correlation among observations within a cluster. In particular, the location-scale model, where the location and scale parameters are modeled with random effects to explain the correlation among observations within a cluster, is popularly used.9,10 On another direction, there are efforts to directly model the area under the ROC curve (AUC). Obuchowski11 estimated the AUC (without covariates) with the Mann–Whitney (MW)-type statistics and made pairwise comparisons among several diagnostic methods. Dodd and Pepe1 introduced the generalized estimation equation (GEE) framework to model the AUC with covariates. However, they both did not consider the clustered outcomes. Recently, Lim et al.12 extended the study of Dodd and Pepe1 to incorporate the clustered outcomes by considering a wider class of GEE weights and proposed a procedure to choose the optimal weights to minimize the variance of the regression estimator of interest. Despite many scholastic discussions on the ROC regression model, we have a limited number of models for the clustered ordinal outcomes and practitioners still have difficulties in using them. In this article, we propose a simple regression model (a nonlinear mixed-effects model) for the clustered ordinal test results with covariates. The proposed model is originally from the proportional hazard model for grouped-survival (GS) time data by Hedeker et al.13 We find that the proposed model defines a new type of location-scale model and also a Lehman family of ROC curves, where the Lehman family was proposed and extensively studied by Gönen and Heller.15 Finally, due to the formulation of the GS time (or GS in short) model, our model can be estimated by fitting a nonlinear mixed-effects model, which is supported by many common statistical software, including SAS and SPSS. The remainder of the article is composed of four sections. In the Model section, we introduce the model we proposed and discuss its connection to the existing models. In the Numerical study section, we present a numerical study to assess the performance of the proposed method. We apply the model to comparing different magnetic resonance imaging (MRI) methods to detect the presence of hepatic metastases in the Data example section. The Conclusion section provides a brief summary of the article. Finally, the sample codes for the example are presented in the Appendix section.

Model

This section introduces the nonlinear random effects model we proposed in this article. Let Y be an ordinal marker value with K categories of the ith cluster (or subject) and the jth measurement (diagnostic result), for i = 1, 2, …, n, and j = 1,2, …, n. Let x be a p-dimensional covariate vector and V be a random effect to explain correlation of observations in the ith cluster. Let d be an indicator variable of the true diagnostic class, where d = 0 and d = 1 imply that the true class of the (i,j)th observation is normal (negative) class and tumor (positive) class, respectively. The regression model we propose is, for k = 1, 2, …, K, as follows: where P(k|v; x, d) = P(Y ≤ k|v; x, d) and the link function g(·) is a monotone increasing function from (0,1) to R. The model (1) is known as the grouped-time survival model by Hedeker et al.13 The model assumes that the true survival time T is observed as a categorical response Y defined by , where I, k = 1, 2, …, K, are predetermined disjoint and exhaustive intervals on [0,∝). This is the case we encounter in a medical diagnostic procedure. For example, the ordinal outcomes in the next section are the medical judgments by experts based on the size or number of tumors. In applying the GS model to the data, the continuous severity measures of a disease (for example, the size of tumors) are considered as the true survival times and the ordinal diagnostic outcomes are read as the observed GS times. By reading the ordinal results as GS times, we have a handy and interesting class of nonlinear random-effects models for the ROC curves of correlated categorical diagnostic results. In particular, the model we propose is also closely related to the several models in the previous literature. First, the model (1) with the link function as g(p) = log(−log p) defines an extension of the Lehmann family of the ROC curves to the model with random effects. Suppose x is a covariate vector attached to the test results Y. The Lehmann family of the ROC curves is defined as a collection of the ROC curves of the form: where Here, S0(t) is the survival function (=1 − cumulative distribution function) of the outcome of normal subject with x = 0. It assumes that the survival functions of normal and diseased subjects have the proportional hazard specification on the covariates. The hazard rate at (the outcome value) t is the instantaneous rate that we have a diagnostic outcome value at t when its value is known to be not Pepe (page 151 in Chapter 6.3),16 that is The model in (4) tells that, given V, the survival function of Y is and it defines the location-scale family mentioned earlier if log S0(t) = exp(−t). In addition, our model (4) assumes that the intercept is random in S0(t|v;x), and all other coefficient vectors are fixed, not cluster-specific random. This simplification makes the cluster-specific ROC curve be the same with (2). Our primary goal of the ROC analysis is the comparison of different diagnostic methods. To do it, we introduce dichotomous covariates x indicating the choice of methods and test their coefficients in model (1). In our motivating example (the example followed in the next section), we have three imaging methods by two readers (two medical doctors who read the images) and we use the three-dimensional covariate vector with two levels (0 or 1), x = (x, x2, x3), where x = 1 implies the outcome is recorded by the second reader, x2 = 1 implies the outcome is from the second imaging method, and x3 = 1 implies the outcome is from the third imaging method. Finally, the proposed model (1) is a nonlinear mixed-effects model, which is well studied in the literature. Many refined approximations to the likelihood function of the model are proposed and encoded into common statistical packages, including SAS and SPSS. The comparison of different diagnosis methods could be done by testing the regression coefficients of the covariates, indicating the choice of diagnostic methods. This feature is illustrated in details by analyzing a real data in the next section. We refer readers to Davidian and Giltinan17 for the details of the nonlinear mixed-effects model.

Numerical Study

In this section, we conduct a small numerical study to access the performance of the proposed GS model. The study considers one sample problem to estimate and test the effectiveness of a single diagnostic method with the AUC instead of comparing the ROCs of two or more diagnostic methods. Thus, we have the diagnostic outcomes of two populations, say the normal and cancer populations, by a given diagnostic method. The data sets for the study are generated as follows. The number of subjects from each population is set as 25 (n = 50 (= 25 + 25)) and 50 (n = 100 (= 50 + 50)). The number of repeated observations per subject is set as m = 2 and m = 4. The ordinal data are generated by categorizing exponentially distributed random variables as follows. For the ith subject of the normal population, we generate m continuous repeatedly measurements from the exponential distribution with the rate λ = 0.1v, where log v is from normal distribution with mean 0 and variance . For the subject i in the cancer population, the repeated measurements are from the exponential distribution with the rate λ so that where log v is again from normal distribution with mean 0 and variance . The variance is considered as 1 and 3, where introduces higher correlation among observations within a subject than . We transform them to the ordinal data (with five levels indexed by k = 0, 1, 2, 3, 4) using pre-decided grids, which are the five quantiles of the exponential distribution with λ = 0.1. In (7), γ = 0 implies that there is no difference in the distributions of diagnostic outcomes of normal and cancer populations. In the study, we vary γ = 0, 0.2, 0.4, 0.6, 0.8, 1.0 and consider the powers in testing H0: γ = 0 (versus γ > 0) as a measure of effectiveness of the procedure. To test the hypothesis, we apply the proposed GS model with link function where d is the indicator variable for the cancer population (it has value 1 if the ith subject is from the cancer population, otherwise, 0) and P(k|v; d) = P(Y # k|v; d) for k = 0, 1, 2, 3, 4. The hypothesis is tested by the Z-statistic , and both and are obtained using the NLMIXED procedure in SAS version 9.3. For the comparison, we consider the AUC estimate based on the MW statistic, which is popularly used in practice. Here, the MW statistic does not take into account the within-cluster correlation and treats all observations as independent samples. The sample AUC for the ordinal data is computed as the MW statistic with ties as where U = 1, if ; U = 1/2, if ; and U = 0, if . Here, n is the total number of observations from the disease population and n is that from the normal population. In our case, n = n = n · m. The asymptotic variance of the AUC under the assumption of independence of all outcomes is given as follows: where S is the number of observations whose scores are j for k = 1, …, 5. The test for non-effectiveness of the diagnostic method is done using the standardized statistics , which follows approximately the standard normal distribution under the null. We generate 1000 data sets for each case of (the variance of random effects) and m = 2, 4 (the number of repetitions with a subject) and evaluate the empirical power by counting the number of rejections among 1000 data sets. The empirical powers of the GS method and MW statistic are plotted in Figure 1. In the figure, “MW (size corrected)” is the empirical size correction of MW test, where 100(1 − α)th empirical percentile of the MW statistic for γ = 0 is used as the critical value, instead of z/2, the 100(1 − α/2)th percentile of the standard normal distribution. “MW (size corrected)” is simply added as a reference and is not applicable in practice because we do not have those statistics from the null.

Figure 1

Power comparison between the proposed grouped-survival model-based and MW-based tests (not considering correlation among outcomes of a single subject). The “size corrected MW” implies the MW test is implemented with an empirically decided critical value. The empirical critical value is decided with the percentile of the (evaluated) MW test statistics for the case γ = 0. (A) (low within-cluster correlation) and the number of repetitions m = 2. (B) (low within-cluster correlation) and the number of repetitions m = 4. (C) (high within-cluster correlation) and the number of repetitions m = 2. (D) (high within-cluster correlation) and the number of repetitions m = 4.

Figure 1 shows that the size of MW-based test (the power when γ = 0) is biased significantly and its magnitude increases as either increases or the number of repetition m increases. Both increases of and m imply the increase in the correlation among repeated observations within a subject (or cluster). On the other hand, the size of the proposed GS model-based test is approximately at the aimed level 0.05, regardless of and m. The powers of GS-method are comparable to the size-corrected MW test in all cases considered.

Data Example

In this section, we apply our model to the detection of hepatic lesions. The liver is the most frequent site of metastases from various extrahepatic malignancies, and determining the presence of hepatic metastases is important in order to provide the optimal plan for patients who are candidates for surgery and in order to assess prognosis after initial treatment. The data analyzed in this article are the records of patients who underwent liver MRI with separate acquisition of double contrast enhancement between November 2005 and June 2006. The data are collected from the database of the radiology department at Asan Medical Center in Seoul, Republic of Korea. The data record the test results of the 106 focal liver lesions from 36 patients , where f denotes the number of the focal hepatic lesions from the ith subject. The 106 focal liver lesions are composed of 51 metastases and 55 benign lesions. Note that the minimum and the maximum of {f1, …, f36} are 1 and 8, respectively. We take a picture of each lesion with three different methods: MRI with MultiHance (Set A), MRI with Resovist (Set B), and the combination of the original MRI with Resovist and dynamic MRI (Set C). Two readers determine the possibility of malignancy of the detected lesions using a 5-point confidence rating scale (definitely not = 0, probably not = 1, possibly = 2, probably = 3, and definitely = 4). Then, each lesion has six ratings; the combination of two readers (two medical and three imaging methods) and a subject has six x f(= n) ratings in total. The goal of this experiment is comparing the performance of three picturing methods and two readers. More details on the data including how the data are corrected are available from the study of Hong et al.14 A summary of the data is presented in Table 1.

Table 1

Summary of the data for hepatic metastases.

IMAGING SET	READER	DISEASE	RATINGS
IMAGING SET	READER	DISEASE	Y = 0	Y = 1	Y = 2	Y = 3	Y = 4
A	1	0	46	4	0	5	0
A	1	1	13	1	3	8	26
A	2	0	43	5	2	2	3
A	2	1	6	0	0	3	42
B	1	0	43	3	2	1	6
B	1	1	8	1	1	4	37
B	2	0	44	5	2	2	2
B	2	1	7	2	3	1	38
C	1	0	49	0	1	2	3
C	1	1	2	0	0	8	41
C	2	0	47	3	1	1	3
C	2	1	5	0	1	5	40

Notes: Set A is the MRI with MultiHance, Set B is the MRI with Resovist, and Set C is the combination of the original MRI with Resovist and dynamic MRI (Set C). Reader 0 and Reader 1 are the IDs of radiologists who read the images. Y is the diagnostic results.

Let Y be the jth rating of the ith subject having an ordinal integer mark value between 0 and 4, for i = 1, 2, …, 36 and j = 1, 2, …, n; let d be the binary variable, which equals to 1 if the (i, j)th rating is from the disease (positive) group in truth, otherwise 0. For the (i, j)th rating, let x = (x, x2, x3) be the three-dimensional covariate vector of indicator variables, where x = 1 if the rating is done by the first reader, x2 = 1 if the rating is on the MRI by Set B, and x3 = 1 if it is by Set C. We apply the model (1) with the complementary log−log link, so that we have where P (k|v; x, d) = P (Y ≤ k|v; x, d), θ = (θ, θ2, θ3) and β = (β, β2, β3)T. Let V be a random effect from the ith subject and . Here, γ measures the difference in outcomes between benign and metastasis lesions; ϑ, ϑ2, and ϑ3 measure the difference between two readers, the difference in imaging methods between Set A and Set B, and the difference between Set A and Set C, respectively; β is the interaction effect between the reader and existence of disease; and β2 and β3 are the interaction effects between the imaging methods and the existence of disease. Here, the non-zero β implies that the readers and the methods perform differently between the normal and disease groups. In particular, β2 and β3 measure the efficacy of the imaging methods. Table 2 displays the parameter estimates calculated from the data, and Figure 1 plots the estimated ROC curves for the six combinations of readers and MRI methods. The results tell that the MRI methods and the reader do not perform differently for normal liver lesions. However, for the tumor lesions, the MRI method Set C performs better than Set A (P-value = 0.0044). In addition, the P-value for jointly testing H0: β2 = β3 = 0 is 0.0160 and that for testing H0: β2 − β3 = 0 is 0.0490. This implies that the MRI method Set C performs better than any of Set A and Set B for tumor lesions. In tumor groups, there is no statistically significant difference between readers.

Table 2

Parameter estimates.

PARAMETER	ESTIMATE	S.E.	P-VALUE
γ	−2.3688	0.2969	<10^–4
ϑ_r	−0.2545	0.1633	0.1280
ϑ₂	−0.04253	0.1878	0.8222
ϑ₃	0.2926	0.2034	0.1591
β_r	−0.2230	0.2803	0.4316
β₂	−0.3414	0.3247	0.3003
β₃	−1.0639	0.3498	0.0044

Notes: SE means the standard error and the P-value is that for the two-sided test. The parameter γ measures the overall difference in outcomes between benign and metastasis lesions; ϑ is for the difference between two readers; ϑ2, and ϑ3 are for the differences between Imaging Set A and Set B and between Set A and Set C, respectively; β is the interaction effect between the reader and existence of disease; and β2 and β3 are the interaction effects between the imaging methods and the existence of disease, respectively.

The covariate-specific ROC curve R(u|x, d) from (11) is in the form of and its AUC is A(x) = 1/{1+ exp(γ + x)}. Figure 1 plots the estimated ROC curves along with their empirical ROC curves. Here, the empirical ROC curves do not take into account the correlation among outcomes within a cluster. For both readers, the curves for Set C are higher than Set A and Set B, indicating that taking pictures of lesion with combination of original MRI with Resovist and dynamic MRI method is superior to use only a single MRI method (Fig. 2).

Figure 2

Estimated ROC curves of three methods by two readers. The “Empirical” is the empirical ROC curve based on empirical (cumulative) distribution functions of (diagnostic outcomes of) normal and diseased populations. The Empirical disregards the correlations among repeated measurements of a subject and treats them as independent samples. The “Model” is the ROC curve from the model with the estimated parameters.

Table 3 summarizes the estimates of the AUCs for the combinations of a reader and an imaging method. The standard errors (SEs) of the model-based estimates of the AUC are obtained by the delta method. We also report their empirical estimates without taking account the within-cluster correlation of outcomes by Obuchowski.11 The formulas of empirical estimates can also be found from Pepe (Chapter 6.3).16

Table 3

The estimates of the AUC and their SEs for the combinations of a reader and a picturing method.

FACTOR	SET A	SET B	SET C
Model
Reader 1	0.914 (0.0196)	0.938 (0.1226)	0.969 (0.0900)
Reader2	0.930 (0.0671)	0.949 (0.0967)	0.975 (0.1096)
Empirical
Reader 1	0.837 (0.0393)	0.849 (0.0393)	0.945 (0.0393)
Reader2	0.902 (0.0393)	0.892 (0.0393)	0.915 (0.0393)

Notes: The “Empirical” is estimated using the MW statistic, which disregards the correlation among measurements from a single subject. The SEs of the Empirical are evaluated under the independence assumption of the repeated measurements from a subject, which is rarely true. Thus, they would not be the right numbers. The Model is the estimated AUCs using the formula (13), and its SEs are evaluated using the delta method.

The AUCs may not be always sensible to detect the differences for specific covariates, regardless of whether they are model based or empirical, since they are functional forms of many other components as given: On the other hand, the proposed regression model can test the contribution from each covariate separately. To be specific, in our example, if we want to find the performance difference between imaging methods of Set A and Set C for reader 1, the (model-based) AUC estimates are 0.914 and 0.969, respectively. The 95% confidence interval for AUC of Set C is (0.793, 1), which overlaps the confidence interval of AUC for Set A, (0.876, 0.952). This indicates that there is no significant difference between the AUCs of two sets. However, the test based on the proposed regression model makes it possible to test the significance for particular parameter. For example, the P-value of test H0: β3 = 0 is 0.0044, and it indicates the existence of significant interaction between sets (A and C) and disease groups (disease and non-disease) at α = 0.05; this implies that the imaging methods A and C perform differently in detecting the cancer.

Conclusion

In this article, we propose a new ROC regression model for clustered ordinal outcomes. The new model views the ordinal outcomes as GS times and uses the grouped-time survival model to define the regression model of the ROC curve. It is shown that the proposed model is closely related with many existing models including the Lehman family and the location-scale family of the ROC curves and further provides their extensions to the random-effects models. Our proposed model has an additional advantage of being easily programmed in many standard statistical packages, which makes it easy to use and interpret. In summary, the model proposed in this article provides a flexible exploratory tool for identifying covariate effects on the ROC curve with clustered ordinal outcomes.

10 in total

10. Characterization of liver metastases: the efficacy of biphasic magnetic resonance imaging with ferucarbotran-enhancement.

Authors: H S Hong; J H Byun; H J Won; K W Kim; S S Lee; M G Lee; S C Yun
Journal: Clin Radiol Date: 2010-07-02 Impact factor: 2.350

10 in total

ROC Estimation from Clustered Data with an Application to Liver Cancer Data.

Introduction

Model

Numerical Study

Data Example

Conclusion

1. Random-effects regression analysis of correlated grouped-time survival data.

2. An interpretation for the ROC curve and inference using GLM procedures.

3. Distribution-free ROC analysis using binary regression techniques.

4. Adjusting the generalized ROC curve for covariates.

5. Lehmann family of ROC curves.

6. Random-effects models for diagnostic accuracy data.

7. Three approaches to regression analysis of receiver operating characteristic curves for continuous test results.

8. Nonparametric analysis of clustered ROC curve data.

9. A general regression methodology for ROC curve estimation.

10. Characterization of liver metastases: the efficacy of biphasic magnetic resonance imaging with ferucarbotran-enhancement.