Literature DB >> 30359417

A quantitative method for measuring the relationship between an objective endpoint and patient reported outcome measures.

Chul Ahn¹, Xin Fang¹, Phyllis Silverman¹, Zhiwei Zhang².

Abstract

Patient reported outcome measures (PROMs) become increasingly important for assessing the effectiveness of a drug or medical device. In order for a PROM to be claimed in labeling, the PROM has to be valid, reliable and able to detect a change if the targeted disease status changes. One approach to assess the quality of a patient reported outcome measure (PROM) is to investigate the association between the PROM and an objective clinical endpoint measuring the status of a disease/condition. However, methods assessing the association between continuous and discrete variables are limited, especially for correlated measurements. In this paper, we propose a method to assess such association with any type of samples with or without correlation. The method involves estimating the probability revealing the status of a subject's disease/condition (called truth thereafter) through the subject's reported outcomes. The probability is a conditional probability revealing truth given the relative location of the subject's objective outcome compared to the subject-specific latent threshold in the objective endpoint. A consistent estimator for the probability is derived. The operating characteristics of the consistent estimator are illustrated using simulation. Our method is applied to hypothetical clinical trial data generated for an ophthalmic device as an illustration.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30359417 PMCID： PMC6201893 DOI： 10.1371/journal.pone.0205845

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

1. Introduction

Patient reported outcome measures (PROMs) have become increasingly important in measuring the effectiveness of a drug or medical device. Between years 1997 and 2002, about 30% of the new drug labels were found to have included patient reported outcomes (PROs) [1]. Later between 2006 and 2010, about 24% of new molecular entities and biologic license applications were granted patient reported outcome (PRO) claims [2]. The authors of this paper also noticed that the PROM claims in approved medical devices had been steadily increasing since 2012. In the meantime, many efforts have been made to advance the use of PROMs in drug or medical device development and regulatory decision making. Recent major challenges were reported from the Food and Drug Administration’s perspective [3]. The National Institutes of Health (NIH) also funded the establishment of a PROM Information System (PROMIS) [4, 5, 6]. Some recent literature focuses on the interpretation of PRO analysis results [7, 8]. In order for a PROM to be claimed in labeling of a drug or medical device, the PROM has to be valid, reliable and able to detect a change if the status of the targeted disease or condition changes [9]. The most frequently and broadly used statistics in a PROM validation such as Pearson and intra-class correlation coefficient (ICC) [10] assess the association among PROM items or between a PROM and other established measurement(s). These correlation coefficients have been used to examine various validities (e.g. construct, convergent/divergent, criterion) of PROMs [10-30]. The correlation coefficients were also used to investigate the PROM’s ability to detect a change [31]. Some authors also used these correlation coefficients to explore the relationship of a PROM with other measurements [32-35]. However, these correlation coefficients (1) may not be appropriate in correlated samples such as repeated measures, (2) may not be reliable for endpoints with different scales (e.g. categorical scale vs. continuous scale), and (3) do not have an intuitive clinical meaning because these coefficients or their changes don’t directly carry a clinical meaning. It is difficult to draw a line for an acceptable association based on these popular coefficients most likely due to the lack of clinical meaning of these correlation coefficients. The challenge here is to develop a meaningful reliable methodology to measure the relationship between an objective continuous endpoint (X) and the dichotomized endpoint (G) of an ordinal PROM, and if the association index is strong enough, to use only the PROM to make inference about the effectiveness of the therapy or to use the PROM to support the primary inference in a clinical trial setup. This paper provides such a new meaningful quantitative statistic measuring the conditional association (denoted as Q here after) between paired endpoints (X, G), and a method to translate the ordinary PROM scales to the continuous objective measurement. The use of conditional association is due to the fact that the outcome of G is conditional on the outcome of X, because the PROM is always administrated after the treatment takes effect. The dichotomized endpoint (G) may represent mixed Bernoulli random variables with the same parameter but opposite meaning, which is explained in the method section of this paper. Section 2 describes the definition of the conditional association parameter Q, the data structure used in this paper and how to estimate Q. The derivation of the estimator of Q is also presented in this section. Section 3 shows simulation results of the estimator () of Q and an application of this new methodology to hypothetical clinical trial data. The discussion and conclusion are presented in Section 4.

2. Methods

This section shows how the parameter Q works in assessing the quality of a PROM using repeated measures from a single subject. It starts with minimum notations and theoretical construct of Q, followed by the characteristic and estimation procedure of Q, and the derivation of the consistent estimator of Q. The derivation of the estimator is specifically arranged after introducing the estimation process so that the derivation is more accessible to readers. The section ends with how to obtain the inference for the PROM in multiple subjects. In general, a single italic lower-case letter represents a nonrandom variable and a single italic upper-case letter represents a random variable unless stated otherwise (such as parameter Q). The non-italic PROMz (not a random variable) represents the scale z of the unidimensional PROM. Q is the probability of the PROMz revealing the disease status of Subject i according to his/her latent minimum objective threshold a given the subject’s objective outcome x ≥ a or x < a. Note: Q is not defined as a random variable and is a parameter to be estimated. The italic PROM is the random variable for the subjective PRO measurement, and the italic PRO is the realization of the PROM. The italic PRO represents the patient reported outcome equal to the scale z of the PROM. Other notations are defined in Appendix A.

2.1 Theoretical construct of parameter Q

As illustrated in Fig 1 below, the theoretical construct of Q is that there is a latent minimum threshold a of a disease status in terms of the objective disease measurements of Subject i which triggers PRO (z = 1, …, 7 in Fig 1) upon the PROM question according to the association parameter Q given the subject’s objective outcome x ≥ a. Although the PROM question and scales don’t change with subject, sub-index i is used to indicate that the PROM is the PROM random variable for Subject i, hereafter for clearance and without a loss of generality. Subject i will give his/her PROM ≥ z with probability Q when his/her x ≥ a, and will give his/her PROM < z with probability Q when x < a. Note here, the PRO is always dependent on where the X is realized relative to the minimum latent threshold a.

Fig 1

Conditional associations between a PROM and a continuous objective efficacy endpoint X for subject i.

Fig 1 illustrates the relationship between the continuous objective endpoint X (such as increase in hemoglobin count (HC)) and a unidimensional 7-scale PROM (such as fatigue improvement). The upper divided rectangular block illustrates a 7-scale unidimensional PROM, and the lower line X illustrates the continuous objective measurement with letter O indicating the baseline location of a subject. Each scale of the PROM (such as 5 = improved) for Subject i has its own minimum latent objective threshold (such as a) pointed by a connecting arrow between the two measurements. The PROM will be realized to the PRO with probability Q by Subject i upon the PROM question if x ≥ a, which determines the conditional association of the PROM with the continuous objective endpoint X for Subject i at PROMz. Note here, the event of “PROMz” revealing the disease status of Subject i includes two true events: (1) PROM ≥ z if x ≥ a (as true positive), and (2) PROM

2.2 Characteristics of parameter Q

Parameter Q varies with PROMz and subject based on its definition. Therefore, there is no linear relationship between the PROM and the objective endpoint X for any subject. For example, Pr(PROM < 5 | X < a) may be different from Pr(PROM < 6 | X < a); and Pr(PROM < 5 | X < a) for Subject i may be different from Pr(PROM < 5 | X < a) for Subject h. It is obvious that the clinical meaning of Q is inherited from its definition; i.e. the rate of revealing the truth, conditional on disease status (the actual disease status of Subject i relative to his/her minimum latent objective threshold for PROMz). A 50% rate revealing truth is equivalent to the subject flipping a fair coin to determine his/her PRO upon the PROM question; thus, this rate of 50% revealing truth indicates that the PROMz is not able to reveal the subject’s disease status. In general, the higher the rate revealing truth is, the better the quality of the PROMz is. This is because the higher rate indicates a higher probability of the PROMz to reveal a subject’s disease status upon the PROM question. The use of Q to reveal the actual status of a subject’s disease has not been discussed in literature. Rasch promoted a probability model for a true positive response [36]. However, because a negative agreement was not considered, the Rasch positive probability did not measure the probability of revealing truth from a PROM. Our approach is related to latent variable models for similar problems [37, 38] in the sense that a can be regarded as a latent variable. On the other hand, we do not assume a particular distribution for a, which makes our approach different from most latent variable models. It is also noteworthy to know that Q is also measuring an indirect agreement between a continuous endpoint and a dichotomized version of an ordinal endpoint. Most traditional methodologies for measuring agreement as described in [39] are developed for two measures of the same type: both categorical or both continuous endpoints. In the case of different types of endpoints, ranks within each endpoint will replace the original values to make the two endpoints the same type (such as Spearman CC). In addition, the estimation of Q (1) can be applied to correlated data, (2) takes into consideration the uncertainty of the “gold standard” and involves a series of 2-by-2 tables in order to select one for the estimate (see the toy example below). Therefore, Q can be also viewed as a new agreement statistic between a continuous endpoint and a binary endpoint with or without correlation among samples.

2.3 Data and corresponding random variables

The data considered in this paper consist of pairs of observations (x, g) for Subject i at clinical visit k, where k = 1, …, t. This x is a continuous outcome representing disease status and could be the value at visit k or the change from baseline to visit k, such as the change in hemoglobin count from baseline. The outcome g is the dichotomized version of the collected PROs at visit k, such as g = 1 if the PROM ≥ 5 and g = 0 otherwise. The change from baseline in the PROM is not considered here, because (1) each latent threshold of a PROMz is corresponding to the PROMz itself instead of its change, and (2) a change in PROs from baseline does not carry the same clinical meaning, which depends on the baseline PROs. For example, in a 7-point scale PROM shown in Fig 1, a change in one PROM unit from “much worse” to “worse” may not be meaningful to a subject, while a change in one PROM unit from “neither” to “improved” carries clinical meaning to the subject. The corresponding random variables are denoted as (X, or ). The is the Bernoulli random variable (B(1, Q)) with probability Q to be 1 when x ≥ a, and is the Bernoulli random variable (B(1, Q)) with parameter Q to be 0 when x < a. In other words, upon the PROM question, Subject i will give his/her g = 1 (positive) with probability Q when his/her x ≥ a, and will give his/her g = 0 (negative) with probability Q when his/her x < a as illustrated in Fig 2 below.

Fig 2

The g is from two Bernoulli random variables with same parameter but opposite meaning depending where the X is realized: x < a or ≥ a.

2.4 Estimation of Q

This subsection shows how to estimate Q using a toy example. The derivation of the estimator of Q can be found in next subsection. In order to estimate Q, it is necessary to first search a. Because the a is the minimum latent threshold in the objective measurement for the PROMz, the search for a can be done using a pre-selected set of values {a j = 1, …, m} between the possible minimum objective measurement and the maximum objective measurement based on the current medical knowledge for the entire target population (such as normal range of human hemoglobin count). The pre-selected value a is not meant to be random, but rather fixed and ideally pre-determined before the realization of X. For example, the normal range of human blood hemoglobin concentration can be determined from 5g/dL to 20g/dL so that a is believed to be included in the range for any subject; if the increasing step is 1g/dL between a and a, then number of searching points, m, is equal to 16 in this case. The magnitude of the increasing step is determined by how precise the a is expected to be. Again, this searching set is not considered random because it doesn’t change with study or subject and may not be changed for decades, such as the normal range of human blood pressures. shows a toy example of how to estimate Q. Note here, the number of searching points m need not necessarily be equal to the number (t) of clinical visits although we do so for illustration purpose. At each a, the outcome x (k = 1, …, t) is compared to a one at a time. Then the number of potential true positive (TP) and the number of potential true negative (TN) responses can be summarized per . For example, in the 1st data row of there are 9 x ≥ 5.0 (positive) and only 6 g equal to one (PRO positive), therefore the TP is equal to 6 (see next paragraph for more details). The total number of such 2-by-2 tables is equal to m, as the total number of distinct a is m. The derivation in next subsection shows that the maximum of R = (TP+TN)/t is a consistent estimator of Q. Note: First sample shows: , and the corresponding estimate of a = 10.5 Second sample shows: , and the corresponding estimate of a = 10 Table 1 shows how to use the pre-determined set of a (j = 1, …, m) to calculate R at each a based on two sets of 9 pairs of observations (x, g) … (x, g) from Subject i. The only difference between the two sets of samples is the different values in the 2nd binary outcome g (0 vs. 1). If the PRO is positive, g = 1; otherwise g = 0. The pre-determined set of a (j = 1, …, 9) is listed in the 2nd column of Table 1. At each a, one can compare the 9 objective outcomes (x, …, x) to a one at a time, and obtain the numbers of potential TP, FP, TN, FN per above. Thus, each data row of Table 1 displays the four statistics TP, FN, FP, and TN, corresponding to a. The estimate of Q for Subject i at the PROMz is the maximum of R. In this paper, if there are multiple tied maximums of R the median of the corresponding a is used as an estimate of a. This is because at each maximum of R, the corresponding a could be an estimate of a.

Table 1

Estimate of Q based on 9 pairs of repeated outcomes (x, g) from subject i.

Samples	a_j	TP	TN	FP	FN	TP+TN	R_ij
(5, 0), (7, 0), (9, 0), (11, 1), (12, 1), (13, 1), (14, 1), (15, 1), (16, 1)	5.0	6	0	3	0	6	0.67
	6.0	6	1	2	0	7	0.78
	8.0	6	2	1	0	8	0.89
	10.0	6	3	0	0	9	1.00
	11.0	6	3	0	0	9	1.00
	12.0	5	3	0	1	8	0.89
	13.5	3	3	0	3	6	0.67
	15.0	2	3	0	4	5	0.56
	16.0	1	3	0	5	4	0.44
(5, 0), (7, 1), (9, 0), (11, 1), (12, 1), (13, 1), (14, 1), (15, 1), (16, 1)	5.0	7	0	2	0	7	0.78
	6.0	7	1	1	0	8	0.89
	8.0	6	1	1	1	7	0.78
	10.0	6	2	0	1	8	0.89
	11.0	6	2	0	1	8	0.89
	12.0	5	2	0	2	7	0.78
	13.5	3	2	0	4	5	0.56
	15.0	2	2	0	5	4	0.44
	16.0	1	2	0	6	3	0.33

Note: First sample shows: , and the corresponding estimate of a = 10.5

Second sample shows: , and the corresponding estimate of a = 10

2.5 Derivation of the estimator of Q

As illustrated in Fig 1 above, the Q doesn’t change its magnitude as long as x ≥ a or x < a although Q changes its meaning from conditional true positive rate (when x ≥ a) to conditional true negative rate (when x < a). This is a reasonable setup because the event of PROM ≥ z is a composite event including PRO, PRO, etc. For example, the event PROM ≥ 5 includes PRO = 5, 6, or 7. When x is far above a, Subject i may just give a higher PRO (say 7) and this event counts as one event of PROM ≥ 5. This illustrates the fact that Q can be independent of the distance between x and a. Because we search a such that Pr(PROM ≥ z | X ≥ a) = Pr(PROM Also, the derivation of the Q estimator doesn’t assume independence among X, …, X. The cumulative distribution function of X is denoted as F. Because the x is obtained in the 1st clinical visit before the realization of X,…, X, the cumulative distribution function of X (denoted as F, k>1) is the marginal cumulative distribution function, which can be obtained by integrating out X, …, X from the joint distribution F for Subject i. The use of general form of F in the derivation takes into consideration the correlated samples. The joint distribution F applies to random variables with or without correlation. Therefore, the X (k = 1, …, t) are not assumed independent to each other and each X has a different marginal distribution. The derivation of the estimator of Q starts with the probability of getting TN and TP at Visit k, which are presented in Expressions (1)–(4) below: When a < a: With same argument, one can have the following: , where . When a ≥ a: Consequently, the expectation of TP+TN can be shown in Expressions (5) and (6), where E is the expectation operator. When a ≤ a: When a > a: If a is equal to a, both expressions (5) and (6) are reduced to tQ. Therefore, R = (TP+TN)/t is an unbiased estimator of Q only if a = a, and TP+TN follows the binomial distribution when a = a because its expectation follows the expectation of the binomial random variable (tQ). We further notice that E (TP+TN) obtains its maximum at a when Q > 0.5 (i.e. ) based on the sign of the derivative of E (TP+TN) with respect to a. When Q > 0.5, the derivative of E (TP+TN) is positive at the left of a (see Expression 5), and becomes negative at the right of a (see Expression 6). Therefore, E (TP+TN) not only reaches its maximum at a, but also becomes tQ. This is why the unbiased estimate of Q is chosen as the maximum of R. Similarly, E (TP+TN) obtains its minimum at a when Q < 0.5. In practice, it is reasonable to assume that a PROM has a non-negative association with the objective endpoint because it is obvious to see a potential direction of the PROM. If a negative association is expected, one can transform the objective outcome in order to have a non-negative association. For the example of a negative associate, if the PROM is the price satisfaction survey and the continuous objective endpoint is the cost of medical expense; then one can transform the cost by multiplying “-1” so that the higher negative cost (smaller cost) is in positive direction. Therefore, Q can be assumed to be ≥ 0.5. If Q = 0.5 (pure chance), this indicates that the PROMz may not be able to reveal the truth; consequently, there is no conditional association between the PROM and the objective measurement X at PROMz. This is because Q is defined as the probability revealing truth at PROMz; Q = 0.5 is equivalent to Subject i flipping a fair coin to get the PRO by pure chance. As discussed above, based on Expressions (5) and (6), the unbiased estimator of Q is if a is in the searching set {a j = 1, …, m}. In practice, many tied maximums of R may occur especially when t is small and m is large. In this case, the median of the tied maximums will be taken as the estimate. Because of this, becomes a consistent estimator. The variance of is nuisance because the validation of PROM is usually drawn from multiple subjects instead of Subject i. Nonetheless the variance estimate () of for Subject i can be obtained by , because TP+TN follows a binomial distribution with parameter t and Q when a = a. Further, because t is usually small, the exact binomial confidence interval for Q is used for in the simulation study. It is necessary to point out that if the two probabilities (say Q for negative truth and Q for positive truth) are not equal, many more assumptions are needed to estimate Q and Q. Using our method, when both Q and Q are both greater than 0.5 we can have , where r = Q /Q. We can estimate ausing a at which the maximum of (TP + TN) is reached, but we have unknown r and many unknown F (k = 1, …, t). If we further assume r is known, we still could not find the estimate for Q because we don’t know these F. Unless we further assume the distribution function of X at each clinical visit k, we can have a consistent estimate of Q and Q. But we feel that these further assumptions on knowing r and F (k = 1, …, t) are not practical, especially in medical device clinical trials. Therefore, we only search the threshold such that the two probabilities are equal in this paper.

2.6 Inference of Q in multiple subjects

So far, Q is estimated based on t repeated pairs of measurements from Subject i for the PROMz. If one wants to know the population Q for the PROM and the objective measurement X at PROMz in a target patient population, the Q can be confirmed by the mean () of with its 95% CI. For example, the lower bound of the 95% confidence interval of Q must be greater than a desired probability of revealing truth in order for one to believe that the PROMz is able to reveal disease status for majority of subjects in the patient population. The ability of the PROM to detect a change in the objective endpoint X could be confirmed by the statistically significant change of a to a obtained by different dichotomizations of the PRO. Note, the magnitude of a will be changed when the PRO is dichotomized differently. For example, the PRO can be dichotomized at scale 7 by “at least very much improvement or otherwise" or at scale 6 by “at least much improvement or otherwise". This change of dichotomization represents one unit change of the PRO from scale 6 to 7, and thus the change of a to a measures the ability of the PROM to detect the change in the objective endpoint X. The a is expected to be larger when the PRO is dichotomized by “at least very much improvement or otherwise" compared to that by “at least much improvement or otherwise". This is because “at least very much improvement” is more difficult to be reached and thus its minimum threshold is expected to be higher than that for “at least much improvement”. One can obtain the estimate of the change of a to a from each of n different subjects, and perform the test of the mean change > 0.

3 Simulation and illustration

3.1 Simulation

Simulation data from Subject i is used to illustrate the characteristics of , especially to show is a consistent estimator of Q. The simulation is not meant to align with a real clinical trial, however the use of in a clinical trial is presented after the simulation using hypothetical clinical data. Because Q is defined at subject level, the simulation uses one treatment for a disease in one subject only. The primary endpoint is an objective endpoint measuring the change of the disease status from baseline to 3 months. The PROM is the 7-scaled disease-related satisfaction PROM such as illustrated in Fig 1. In order to include different means and standard deviations, the simulation uses 5 different means [ = (0, 0.5, 1, 1.5, 2)] and 5 associated different standard deviations [σ = (1, 1.3, 1.6, 1.9, 2.2)] as two building blocks to construct various multivariate normal distribution for the objective endpoint. For example, if t = 10 then X (k = 1, … 10) will follow the multivariate normal distribution with stacked mean vector (, ) and the variance-covariance matrix with diagonal elements of repeated similarly on diagonal and the off-diagonal element of ρσσ. Other setups are described as follows: The correlation coefficient (ρ) between X and X ranges from 0.3, 0.5, and 0.8. ‘a’ (the minimum objective threshold for “at least very much improved”) is equal to 1.2, ‘a’ is equal to -0.3 and ‘a’ is equal to 0.4. The underlying probability of revealing the truth, Q (z = 3, 5, or 7) has values of 0.5, 0.6, 0.7, 0.8, and 0.9. Number of repeated measurements for the subject is t = 5, 10, 20, 40. Pre-selected a ranges from -2 to 5.0 with increasing step of 0.1, therefore m = 71. Because the minimum two standard deviations below the five means is -2 and the maximum two standard deviations above the five means is 5, this range is wide enough to include all underlying true values of a, a, and a. Number of simulation is 10,000. For each combination of ρ (0.3, 0.5, 0.8), a (-0.3, 0.4, 1.2), and Q (0.5, 0.6, 0.7, 0.8, 0.9),, the t (5, 10, 20, 40) pairs of outcomes (x, g) (k = 1, …, t) are sampled as follow. First x (k = 1, …, t) is drawn from the corresponding multivariate normal distribution. If x ≥ a, g is drawn from Bernoulli (Q); otherwise g is drawn from Bernoulli (1-Q). Then an estimate of Q is calculated using the method described above based on the t pairs of outcomes, and its 95% CI is calculated using the exact binomial confidence interval due to small samples. These steps are repeated 10,000 times for each underling value of Q and t; and then the mean of these 10,000 and the coverage probability of the 95% CIs for the Q are obtained. Figs 3–5 show three examples that the mean of these 10,000 converges to Q regardless of the values of ρ and a. As the number of clinical visits increases for Subject i, the mean of approaches its underlying true value of Q. The converging pattern exists for every value of Q (0.6, 0.7, 0.8, 0.9) except for Q = 0.5. This is not a surprise because when Q = 0.5 there is no association between PROM and X at PROMz. As shown in expressions (5) and (6), when Q = 0.5 every R (j = 1 … m) is an unbiased estimator of Q. A separate simulation using the median of R as is performed when Q = 0.5. The mean ranges from 0.50 to 0.52 (converging to 0.5) for different combinations of ρ, a, Q, and t. In practice, the simulation results for Q = 0.5 in Figs 3–5 can be used as a reference to set a minimum acceptable Q value. Table 3 shows that mean is a fairly close estimate of Q under different values of t (5, 10, 20, 40). It is found that the probability of the 95% CI including the true value of Q (coverage probability) is at least 95% due to the use of exact binomial confidence interval.

Fig 3

The mean converges to its underlying value of Q as sample size increases (ρ = 0.3, a = 1.2).

Fig 5

The mean converges to its underlying value of Qiz as sample size increases (ρ = 0.8, aiz = 0.4).

Table 3

Mean estimate and coverage probability of Q ρ = 0.8, a = 1.2.

		True value of Q
		0.5	0.6	0.7	0.8	0.9
t = 40	Q^	0.61	0.65	0.72	0.81	0.90
t = 40	Coverage Probability of the 95% CI	0.851	0.958	0.961	0.981	0.971
t = 20	Q^	0.65	0.69	0.75	0.82	0.91
t = 20	Coverage Probability of the 95% CI	0.841	0.949	0.986	0.982	0.994
t = 10	Q^	0.70	0.73	0.78	0.84	0.92
t = 10	Coverage Probability of the 95% CI	0.915	0.980	1.000	1.000	0.996
t = 5	Q^	0.77	0.79	0.83	0.88	0.94
t = 5	Coverage Probability of the 95% CI	1.000	1.000	1.000	1.000	1.000

3.2 Case study: Hypothetical clinical trial data

The probability Q of revealing truth for Subject i at PROMz, has been applied to hypothetical clinical trial data in order to assess the conditional association parameter in multiple subjects. The purpose of the trial is to improve near vision by a medical device. Each subject had a test device implanted and was followed up at Months 3, 6, 12, 18, 24, 30 post procedure. At each follow-up visit, a subject had his/her uncorrected near visual acuity (UCNVA) measured using ETDRS Chart at 40 cm/16 in, and answered a unidimensional PROM question with 7 possible outcomes as shown in Fig 1. The question in the PROM was “How satisfied are you with your near vision without reading glasses after the treatment?” The change from baseline in UCNVA is considered as the continuous objective clinical endpoint with a larger change indicating better near vision. The outcome of the satisfaction question is the PRO which can be dichotomized in 3 ways for every subject: ≥5 or otherwise, ≥6 or otherwise, ≥7 or otherwise. The mean (z = 5, 6, or 7) is used to assess the probability of the PROMz to reveal the status of the visual acuity in the targeted population. The pre-determined threshold searching set {a j = 1, …, m} ranges from -20 to 60 letters with an increasing step of 1. This set contains m = 81 searching points for the minimum threshold a (z = 5, 6, or 7). It is believed that the threshold-searching set is large enough to contain the true value of a for PROMz for every subject in the target population. below shows that the mean of the (probability of revealing truth) and the mean in the change of UCNVA. As expected, one can see that the highest satisfaction has the lowest mean probability of revealing truth uncorrected visual acuity and the largest threshold in the change of UCNVA: 21 more letters correctly identified from baseline. The associated 95% CIs for Q well exclude 0.5 indicating Q from the majority of subjects are greater than 0.5 and consequently the probability of the PROMz revealing subjects’ uncorrected visual acuity is established. Since the PROMz has > 83% probability (based on the lower limits) of revealing the status of UCNVA, it may be used as a binary endpoint for the primary inference for uncorrected near visual acuity. * includes subjects whose PROs contain the dichotomized value and have at least two different objective outcomes shows the median of when the satisfaction level changes. The is found to have a highly skewed distribution; therefore p-values are reported here from a non-parametric signed rank test, and the reference statistic is referred to median instead of mean. One can observe that When the PRO increases from ≥5 to ≥6, the majority of subjects have no change (median = 0) in their uncorrected near vision acuity; this means that the PRO change from scale 5 to scale 6 may not represent a change in majority subjects’ uncorrected near vision acuity. When the PRO increases from ≥6 to ≥7 or ≥5 to ≥7, the majority of subjects have a positive change (median = 9 or 21, respectively) in their uncorrected near vision acuity; this means that the PRO changes from a lower score to 7 represent a change in majority subjects’ uncorrected near vision acuity. *includes all subjects who are in Table 3 and have a change objective value when the associated PRO changes These indicate that a change of one PROM unit in this case might not be adequate for a translation to a change in uncorrected near visual acuity. An increase of at least two (2) PROM units represents that the majority subjects have a positive increase in their uncorrected near visual acuity. Consequently, the ability of detecting the change of uncorrected near vision function by this PROM is suggested by two (2) PROM units in this clinical trial instead of one (1) PROM unit; or the majority of subjects have their PRO scores changed to 7. It is noted that the number of samples from each subject is ≤ 6 in this trial, which limits the capability of this method to search for a.

4 Concluding remarks

The conditional probability Q revealing the true status of Subject i’s disease at PROMz is a new quantitative statistic assessing the conditional association between a unidimensional PROM and a continuous objective endpoint X measuring the disease status. The probability Q of revealing truth is estimated for each subject using paired observations (x, g) measured repeatedly at different clinical visits (such as Months 3, 6, 12 etc.). The Q reveals truth with respect to the latent minimum objective threshold a (i.e. x ≥ a, or x < a). When the PROM doesn’t associate with the objective endpoint X, the Q is equal to the pure chance of 0.5. Because Q is a probability measure, this situation looks like one has flipped a fair coin to get his/her PRO regardless the status of his/her disease. When a PROM is used as a measure for a disease/condition in a clinical trial setup, the probability of revealing truth must be at least statistically greater than the pure chance of 0.5. The threshold searching set {a: j = 1, …, m} can be pre-determined using the current clinical standard of the possible minimum and maximum objective measurements in the target population. For example, the human hemoglobin concentration ranges from 5 g/dL to 20 g/dL. The number m can be determined based on how precise a is expected to be. In practice, a clinical trial has n subjects and thus has n estimates of Q (i = 1, …, n). In order to have the PROMz used for a target population, the majority of Q (i = 1, …, n) have to be greater than the pure chance of 0.5; or it is equivalent to say that the mean/median of the Q (i = 1, …, n) should be greater than 0.5. Although the mean/median of the Q > 0.5 would indicate some association between the PROM and the objective endpoint X greater than chance in the target population, a higher quality PROM should have a larger value of the mean/median of the Q (i = 1, …, n). Let’s denote δ as the minimum value of the mean/median of the Q (i = 1, …, n) which is an acceptable probability for PROMz to reveal the status of the majority of subjects’ disease. To confirm that the majority of subjects have their Q (i = 1, …, n) greater than δ, one can simply test that the mean/median of the Q (i = 1, …, n) among n different subjects is >δ. When the PRO is dichotomized differently by one PROM unit increased at a time, one can get the associated estimate of the change of the minimum threshold in the objective measurement for each subject, such as (i = 1, …, n). If the mean of these estimates from different subjects is statistically significantly greater than 0, then the PROM has the ability to detect a change in the objective endpoint. In case that (i = 1, …, n) has a skewed distribution, one should use the median of the estimates of (i = 1, …, n) so that the test implies that the majority of a − a (i = 1, …, n) are greater than 0. The limitations of using Q include (1) it is applicable to a unidimensional PROM or a PROM item of interest in a multi-dimensional PROM instrument when a valid continuous objective measure of the disease status exists, and (2) if the number of repeated measurements is small, the estimator of Q is more biased. In this case, one can adjust the minimum acceptable probability of revealing truth in order to have confidence for the PROMz to reveal truth. Further research may focus on a quantitative method for measuring the conditional association between a multi-dimensional PROM and a pertinent objective measurement. Sub-indexes i and j represent Subject i and threshold searching point j within a clinical visit k (i = 1, …, n, j = 1, …, m, and k = 1, …, t). The letter z denotes the zth scale of the PROM (PROMz). The a is a fixed parameter which is defined as the minimum latent threshold in terms of the objective measurement for Subject i at PROMz. The a is defined for the zth scale and Subject i. For example, if the PROM has 5 different scales, then we will have five different values of a for the subject. The a is the jth searching point for a, and the a belongs to a fixed pre-selected threshold searching set {a: j = 1, …, m} (such as the normal range of hemoglobin count with an increasing step of 0.5). The a is a nonrandom variable and does not change with subject. The set is selected based on the current clinical standard of normal range. The X is the random variable for the continuous objective measurement of the status of a subject’s disease/condition, and lower case x is an outcome/realization of X. is the Bernoulli random variable with probability Q to be 1 when x ≥ a. is the Bernoulli random variable also with probability Q to be 0 when x < a. G represents two mixed Bernoulli random variables with the same parameter Q (but opposite meaning) (if x ≥ a) or (if x < a).

Table 2

Number of cell count at a (j = 1, …, m) for subject i and PROMz.

Objective efficacy outcome	Dichotomized PRO at PROM_z
Objective efficacy outcome	PRO Positive	PRO Negative
≥ a_j	# of Potential Ture Positive (TP)	# of Potential False Positive (FP)
< a_j	# of Potential False Negative (FN)	# of Potential Ture Negative (TN)

Table 4

Mean and mean in the change of UCNVA.

Satisfaction dichotomized value	# of subjects *	Mean Q^iz(97.5% CI)	Mean a^iz(# letters correctly identified)
≥5	414	0.91 (0.893, 0.917)	8
≥6	324	0.88 (0.864, 0.893)	14
≥7	190	0.85 (0.831, 0.868)	21

* includes subjects whose PROs contain the dichotomized value and have at least two different objective outcomes

Table 5

Ability of detecting a change: Median of .

Satisfaction Change	# of subjects *	change of a^iz‑a^iz′		p-value bySigned Rank Test
Satisfaction Change	# of subjects *	Mean	Median	p-value bySigned Rank Test
From ≥5 to ≥6	324	11	0	<0.001
From ≥6 to = 7	190	15	9	<0.001
From ≥5 to = 7	190	20	21	<0.001

*includes all subjects who are in Table 3 and have a change objective value when the associated PRO changes

33 in total

Review 1. Measuring treatment impact: a review of patient-reported outcomes and other efficacy endpoints in approved product labels.

Authors: Richard J Willke; Laurie B Burke; Pennifer Erickson
Journal: Control Clin Trials Date: 2004-12

2. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS).

Authors: Bryce B Reeve; Ron D Hays; Jakob B Bjorner; Karon F Cook; Paul K Crane; Jeanne A Teresi; David Thissen; Dennis A Revicki; David J Weiss; Ronald K Hambleton; Honghu Liu; Richard Gershon; Steven P Reise; Jin-shei Lai; David Cella
Journal: Med Care Date: 2007-05 Impact factor: 2.983

3. The Performance and Association Between Patient-reported and Performance-based Measures of Physical Functioning in Research on Individuals with Arthritis.

Authors: Laura C Pinheiro; Leigh F Callahan; Rebecca J Cleveland; Lloyd J Edwards; Bryce B Reeve
Journal: J Rheumatol Date: 2015-12-01 Impact factor: 4.666

4. Cross-cultural adaptation, reliability and validity of the Turkish version of the Japanese Orthopaedic Association Back Pain Evaluation Questionnaire.

Authors: Gurkan Gunaydin; Zeynep Hazar Kanik; Gul Oznur Karabicak; Ugur Sozlu; Omer Osman Pala; Zeynep Beyza Alkan; Selda Basar; Seyit Citaker
Journal: J Orthop Sci Date: 2016-02-15 Impact factor: 1.601

5. Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: a comparison of worked examples.

Authors: Jennifer Petrillo; Stefan J Cano; Lori D McLeod; Cheryl D Coon
Journal: Value Health Date: 2015-01 Impact factor: 5.725

6. Translation of Supportive Care Needs Survey Short Form 34 (SCNS-SF34) into Italian and cultural validation study.

Authors: Anita Zeneli; Elisa Fabbri; Elena Donati; Grainne Tierney; Stefania Pasa; Maria Alejandra Berardi; Marco Maltoni
Journal: Support Care Cancer Date: 2015-07-14 Impact factor: 3.603

7. A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses.

Authors: Robert W Massof
Journal: Stat Methods Med Res Date: 2013-02-19 Impact factor: 3.021

8. Measuring health-related quality of life in leukemia: the Functional Assessment of Cancer Therapy--Leukemia (FACT-Leu) questionnaire.

Authors: David Cella; Sally E Jensen; Kimberly Webster; Du Hongyan; Jin-Shei Lai; Steven Rosen; Martin S Tallman; Susan Yount
Journal: Value Health Date: 2012-12 Impact factor: 5.725

9. Association of psychological status and patient-reported physical outcome measures in joint arthroplasty: a lack of divergent validity.

Authors: Johannes M Giesinger; Markus S Kuster; Henrik Behrend; Karlmeinrad Giesinger
Journal: Health Qual Life Outcomes Date: 2013-04-19 Impact factor: 3.186

10. 'Peace' and 'life worthwhile' as measures of spiritual well-being in African palliative care: a mixed-methods study.

Authors: Lucy Selman; Peter Speck; Marjolein Gysels; Godfrey Agupio; Natalya Dinat; Julia Downing; Liz Gwyther; Thandi Mashao; Keletso Mmoledi; Tony Moll; Lydia Mpanga Sebuyira; Barbara Ikin; Irene J Higginson; Richard Harding
Journal: Health Qual Life Outcomes Date: 2013-06-10 Impact factor: 3.186