Literature DB >> 30090714

Understanding diagnostic tests - Part 3: Receiver operating characteristic curves.

Abstract

In the previous two articles in this series on biostatistics, we examined the properties of diagnostic tests and various measures of their performance in clinical practice. These performance measures vary according to the cutoff used to distinguish the diseased and the healthy. We conclude the series on diagnostic tests by looking at receiver operating characteristic curves, a technique to assess the performance of a test across several different cutoffs, and discuss how to determine an optimum cutoff.

Entities: Disease Species

Keywords: Biostatistics; receiver operating characteristic curve; sensitivity; specificity

Year: 2018 PMID： 30090714 PMCID： PMC6058507 DOI： 10.4103/picr.PICR_87_18

Source DB: PubMed Journal: Perspect Clin Res ISSN： 2229-3485

INTRODUCTION

In two previous articles in this series,[12] we discussed some of the properties of diagnostic tests. The sensitivity and specificity of a test inform us about the likelihood of a positive or a negative result, given that the disease of interest is present or absent, whereas positive and negative predictive values tell us about the probability of presence or absence of the disease, given that a test's result is positive or negative.[1] The latter values are heavily influenced by the prevalence of disease in the population being tested and are more relevant to clinicians.[1] The positive and negative likelihood ratios, another way of looking at diagnostic tests, represent the probability that someone with the disease has a particular test result as compared to someone without the disease.[2] A test with a higher positive likelihood ratio and a lower negative likelihood ratio is better at discriminating between those with and without disease. All the attributes of diagnostic tests discussed in the previous articles depend on the cutoff value used to define the presence or absence of disease. However, the cutoffs are not cast in stone, and it is not infrequent for different cutoffs to be used to define disease or health. This change can markedly affect the performance characteristics of the test. In this third and final article on diagnostic tests, we look at another way of assessing a diagnostic test, namely the receiver operating characteristic (ROC) curve, which looks at the performance of the test over a range of cutoffs.

HOW ARE RECEIVER OPERATING CHARACTERISTIC CURVES PLOTTED?

An ROC curve is constructed by plotting sensitivity (proportion of cases having positive test or the proportion of cases correctly identified as having disease or “true positives”/“all cases”) against “1 − specificity” (i.e., the proportion of controls having positive test or proportion of controls incorrectly classified as having disease or “false-positives”/“all controls”), for each possible cutoff score. By convention, sensitivity (or the true-positive rate) is plotted along the “y” axis, whereas “1 − specificity” (or the false-positive rate) is plotted along the “x” axis. The ROC curve thus provides a graphical representation of the proportion of patients with the disease of interest correctly identified as positive against the proportion of healthy subjects incorrectly identified as positive for each cutoff score. Let us, as an example, think of a test which can have values of 0–14, with higher values more likely to indicate disease and lower values indicating health. This test is administered to 40 persons each with and without the disease of interest, whose test results are shown in Figure 1. One could now use different cutoffs (e.g., 0.5, 1.5, 2.5..., 12.5, 13.5) to define the test result as positive or negative. The number of persons with or without disease who test positive or negative would vary according to the cutoff used [Table 1]. A lower cutoff would lead to more patients with disease being picked up correctly but a higher proportion of false-positives among healthy persons. On the other hand, a higher cutoff would miss some persons with disease but would lead to fewer false-positives. Using these numbers, one can easily calculate sensitivity and “1 − specificity” for each cutoff [Table 1]. If one plots these values, one obtains a curved line which is referred to as the ROC curve [Figure 2].

Figure 1

Table 1

Number of persons who are correctly classified as having disease (true positives; among 40 diseased persons) or not having disease (true negatives; among 40 healthy persons) using different cutoffs

Figure 2

Receiver operating characteristic (ROC) curve for hypothetical data shown in Figure 1. From the data in Figure 1, sensitivity and false-positivity (=1 − specificity) rates were calculated for various possible cutoffs [Table 1]. A plot of these values yielded this ROC curve. The values in parentheses represent the cut-off value(s) that each point on the curve corresponds to. The dotted diagonal line represents a test that does not discriminate at all between those with and without disease (see text for details)

A hypothetical test with possible test result values of 0–14 is offered to forty persons known to have disease and forty healthy persons. The number of persons in each group with each possible test result is shown. In general, higher values are more likely in diseased persons than in healthy persons Number of persons who are correctly classified as having disease (true positives; among 40 diseased persons) or not having disease (true negatives; among 40 healthy persons) using different cutoffs Receiver operating characteristic (ROC) curve for hypothetical data shown in Figure 1. From the data in Figure 1, sensitivity and false-positivity (=1 − specificity) rates were calculated for various possible cutoffs [Table 1]. A plot of these values yielded this ROC curve. The values in parentheses represent the cut-off value(s) that each point on the curve corresponds to. The dotted diagonal line represents a test that does not discriminate at all between those with and without disease (see text for details)

INTERPRETING RECEIVER OPERATING CHARACTERISTIC CURVES

A test with good performance would be expected to correctly diagnose nearly all the cases, i.e., to have a high sensitivity. Further, it would be expected to correctly diagnose nearly all the controls, i.e., have a very low false-positive rate (or a low “1 − specificity”). For such a test, the points on the ROC curve for cutoffs that provide good discrimination between persons with and without the disease would be expected to lie close to the top-left corner of the plot [Figure 3] (curve A). In fact, for a perfect test which accurately diagnoses all the cases and controls, sensitivity and specificity would both be 1.0 and “1 − specificity” would be zero. The ROC curve for such a test would rise vertically from the origin to the left top corner of the box and then run horizontally across to the right. By comparison, a test with a larger number of false-positive or negative tests would not reach as close to the left upper corner [Figure 3] (curve B). It is customary to draw a diagonal line on the ROC curve extending from left lower end (sensitivity = 0 and false-positivity rate = 0) to right upper end (sensitivity = 1.0 and false-positivity rate = 1.0) of the box in which the ROC is drawn. For all points on such a line [Figure 2] (line C), the values of sensitivity and false-positivity rate are identical. This line represents a hypothetical test for which, using any cut-off, positive results are as frequent in cases as in controls, i.e., the test does not discriminate at all between persons with and without the disease. Such a test would have no clinical use.

Figure 3

Comparison of performance of tests using receiver operating characteristic (ROC) curves. A test with ROC curve which is located closer to the left upper corner (e.g., curve “A”) has a better discrimination ability than a test with a curve that is located farther from this corner (e.g., curve “B”). The former would also have a higher value of area under curve, which is a quantitative measure of a test's performance. The diagonal line (line “C”; with area under curve = 0.50) represents a test with no discriminating ability. An ideal test would be expected to have an area under ROC curve value of 1.0

AREA UNDER THE RECEIVER OPERATING CHARACTERISTIC CURVE

ROC curves also permit a numeric assessment of the overall performance of diagnostic tests. This is done by estimating the area under (i.e., to the right of and beneath) the curve and is expressed as a proportion of total area of the square in which the curve is drawn. A test with higher sensitivity and specificity would reach closer to the left upper corner and hence would have a higher area under the curve. This measure can also be used to compare the performance of two different tests for the diagnosis of a particular disease. Thus, a test with larger area under the ROC curve is preferred over another test with smaller area under the curve [e.g., in Figure 2, the test with ROC curve A would be preferred over that with ROC curve B]. A test with area under the curve value of 0.5 (e.g., curve C) has no diagnostic value, as discussed above. For an ideal test, area under the ROC curve would be expected to be 1.0.

CHOOSING THE CUTOFF VALUE FOR A TEST

ROC curve is also helpful in deciding the optimum cutoff for a test. One possible cutoff could be one which is least likely to lead to misclassification, i.e., is likely to have the least number of false-positives and false-negatives taken together. This is represented by the point on the ROC curve that has the least distance from the top-left corner of the box. For instance, in Figure 2, the point nearest to the top-left corner is the one for the cutoff of 5.5, suggesting that this may be the optimal cutoff to differentiate persons with disease from those without disease. This point, as compared to other possible cutoffs, has the minimum value for (1 − sensitivity)2 + (1 − specificity)2. A simpler and more commonly used alternative is the use of cutoff with the maximum sum of sensitivity and specificity. It is calculated as the cutoff with maximum value of Youden's index, which is defined as (sensitivity + specificity − 1). Its values can vary between −1.0 and 1.0, and higher values indicate a test cutoff with higher discriminative ability. However, these apply only if misclassification in either direction is given equal weightage. In clinical situations, the importance of a false-negative test is often different from that of a false-positive test. If one wishes the test to have a high sensitivity at the cost of some loss of specificity, one can choose as cutoff, a point where the curve becomes horizontal (e.g., in Figure 2, one could decide to use 1.5 or 2.5 as the cutoff). Alternatively, if one prefers a test with higher specificity with some loss of sensitivity, one could choose a point where the curve stops being vertical (e.g., in Figure 2, using 11.5 or 12.5 as the cutoff). For instance, for an assay for hepatitis B surface antigen (HBsAg) in serum, one could use a lower cutoff value when the test is done for screening of donated blood in blood banks than when it is used to test blood from patients attending a clinic. In the former situation, we wish to detect blood units with the minutest amounts of HBsAg so that these can be excluded from the blood supply system (i.e., we wish to minimize the risk of transfusion-related infection even at the cost of discarding some blood units that contain no or so little virus that these cannot transmit infection). Therefore, we prefer a lower cutoff, with greater sensitivity at the cost of some loss of specificity. On the other hand, in the clinic situation, we wish to be certain that anyone with a positive test actually has the infection; any false-positive test in this situation would cause unwarranted psychological stress to the person and further costly testing and treatment. Thus, in this situation, we use a higher cutoff, preferring specificity over sensitivity.

SUGGESTED READING

The readers may want to read a study by Oh and Bae who assessed the effect of use of different cutoff levels of an antigen in the serum for detecting recurrent disease in women treated for cervical cancer undergoing posttreatment surveillance on the test's sensitivity and specificity.[3] Further, they used these data to create an ROC curve, calculated the area under this curve, and determined the optimal cutoff using Youden's index. It may be pertinent to point out here that a lower cutoff may be preferred when this blood test is used for surveillance, as in this study; in this situation, one would prefer a higher sensitivity (fewer false-negatives) even at the cost of some loss of specificity (more false-positives). By comparison, for the use of this blood test as a confirmatory test, a higher cutoff with higher specificity (fewer false-positives) may be preferred, even though that would be associated with a loss of sensitivity (i.e., a larger number of false-negatives).

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

3 in total

1. Common pitfalls in statistical analysis: Understanding the properties of diagnostic tests - Part 1.

Authors: Priya Ranganathan; Rakesh Aggarwal
Journal: Perspect Clin Res Date: 2018 Jan-Mar

2. Optimal cutoff level of serum squamous cell carcinoma antigen to detect recurrent cervical squamous cell carcinoma during post-treatment surveillance.

Authors: Jinju Oh; Jin Young Bae
Journal: Obstet Gynecol Sci Date: 2018-04-23

3. Understanding the properties of diagnostic tests - Part 2: Likelihood ratios.

Authors: Priya Ranganathan; Rakesh Aggarwal
Journal: Perspect Clin Res Date: 2018 Apr-Jun

3 in total

14 in total

1. Prediction of Prehospital Change of the Cardiac Rhythm From Nonshockable to Shockable in Out-of-Hospital Patients With Cardiac Arrest: A Post Hoc Analysis of a Nationwide, Multicenter, Prospective Registry.

Authors: Ryo Emoto; Mitsuaki Nishikimi; Muhammad Shoaib; Kei Hayashida; Kazuki Nishida; Kazuya Kikutani; Shinichiro Ohshimo; Shigeyuki Matsui; Nobuaki Shime; Taku Iwami
Journal: J Am Heart Assoc Date: 2022-06-14 Impact factor: 6.106

2. CT Attenuation and Cross-Sectional Area of the Pectoralis Are Associated With Clinical Characteristics in Chronic Obstructive Pulmonary Disease Patients.

Authors: Xin Qiao; Gang Hou; Jian Kang; Qiu-Yue Wang; Yan Yin
Journal: Front Physiol Date: 2022-06-03 Impact factor: 4.755

3. Insulin secretion is a strong predictor for need of insulin therapy in patients with new-onset diabetes and HbA1c of more than 10%: A post hoc analysis of the EDICT study.

Authors: Siham Abdelgani; Curtiss Puckett; John Adams; Curtis Triplitt; Ralph A DeFronzo; Muhammad Abdul-Ghani
Journal: Diabetes Obes Metab Date: 2021-05-06 Impact factor: 6.408

4. Plasma MicroRNA Signature Panel Predicts the Immune Response After Antiretroviral Therapy in HIV-Infected Patients.

Authors: Jun-Nan Lv; Jia-Qi Li; Ying-Bin Cui; Yuan-Yuan Ren; Ya-Jing Fu; Yong-Jun Jiang; Hong Shang; Zi-Ning Zhang
Journal: Front Immunol Date: 2021-11-23 Impact factor: 7.561

5. Simple scoring algorithm to identify community-dwelling older adults with limited health literacy: a cross-sectional study in Taiwan.

Authors: Hsiu-Nien Shen; Chung-Yi Li; Wen-Hsuan Hou; Ken N Kuo; Mu-Jean Chen; Yao-Mao Chang; Han-Wei Tsai; Ding-Cheng Chan; Chien-Tien Su; Der-Sheng Han
Journal: BMJ Open Date: 2021-11-25 Impact factor: 2.692

Review 6. Real-time locating systems to improve healthcare delivery: A systematic review.

Authors: Kevin M Overmann; Danny T Y Wu; Catherine T Xu; Shwetha S Bindhu; Lindsey Barrick
Journal: J Am Med Inform Assoc Date: 2021-06-12 Impact factor: 4.497

7. Insulin Secretion Predicts the Response to Antidiabetic Therapy in Patients With New-onset Diabetes.

Authors: S Abdelgani; C Puckett; J Adams; C Triplitt; R A DeFronzo; M Abdul-Ghani
Journal: J Clin Endocrinol Metab Date: 2021-11-19 Impact factor: 6.134

8. Combining genetic risk score with artificial neural network to predict the efficacy of folic acid therapy to hyperhomocysteinemia.

Authors: Xiaorui Chen; Xiaowen Huang; Diao Jie; Caifang Zheng; Xiliang Wang; Bowen Zhang; Weihao Shao; Gaili Wang; Weidong Zhang
Journal: Sci Rep Date: 2021-11-02 Impact factor: 4.379

9. Trauma triage criteria as predictors of severe injury - a Swedish multicenter cohort study.

Authors: Lina Holmberg; Kevin Mani; Knut Thorbjørnsen; Anders Wanhainen; Håkan Andréasson; Claes Juhlin; Fredrik Linder
Journal: BMC Emerg Med Date: 2022-03-12

10. Association between obesity indicators and cardiovascular risk factors among adults in low-income Han Chinese from southwest China.

Authors: Ke Wang; Li Pan; Dingming Wang; Fen Dong; Yangwen Yu; Li Wang; Ling Li; Tao Liu; Liangxian Sun; Guangjin Zhu; Kui Feng; Ke Xu; Xinglong Pang; Ting Chen; Hui Pan; Jin Ma; Yong Zhong; Guangliang Shan
Journal: Medicine (Baltimore) Date: 2020-07-24 Impact factor: 1.817