| Literature DB >> 17034633 |
.
Abstract
This guidance describes how the FDA evaluates patient-reported outcome (PRO) instruments used as effectiveness endpoints in clinical trials. It also describes our current thinking on how sponsors can develop and use study results measured by PRO instruments to support claims in approved product labeling (see appendix point 1). It does not address the use of PRO instruments for purposes beyond evaluation of claims made about a drug or medical product in its labeling. By explicitly addressing the review issues identified in this guidance, sponsors can increase the efficiency of their endpoint discussions with the FDA during the product development process, streamline the FDA's review of PRO endpoint adequacy, and provide optimal information about the patient's perspective of treatment benefit at the time of product approval. A PRO is a measurement of any aspect of a patient's health status that comes directly from the patient (i.e., without the interpretation of the patient's responses by a physician or anyone else). In clinical trials, a PRO instrument can be used to measure the impact of an intervention on one or more aspects of patients' health status, hereafter referred to as PRO concepts, ranging from the purely symptomatic (response of a headache) to more complex concepts (e.g., ability to carry out activities of daily living), to extremely complex concepts such as quality of life, which is widely understood to be a multidomain concept with physical, psychological, and social components. Data generated by a PRO instrument can provide evidence of a treatment benefit from the patient perspective. For this data to be meaningful, however, there should be evidence that the PRO instrument effectively measures the particular concept that is studied. Generally, findings measured by PRO instruments may be used to support claims in approved product labeling if the claims are derived from adequate and well-controlled investigations that use PRO instruments that reliably and validly measure the specific concepts at issue. The glossary defines many of the terms used in this guidance. In particular, the term instrument refers to the actual questions or items contained in a questionnaire or interview schedule along with all the additional information and documentation that supports the use of these items in producing a PRO measure (e.g., interviewer training and instructions, scoring and interpretation manual). The term conceptual framework refers to how items are grouped according to subconcepts or domains (e.g., the item walking without help may be grouped with another item, walking with difficulty, within the domain of ambulation, and ambulation may be further grouped into the concept of physical ability). FDA's guidance documents, including this guidance, do not establish legally enforceable responsibilities. Instead, guidance documents describe the Agency's current thinking on a topic and should be viewed only as recommendations, unless specific regulatory or statutory requirements are cited. The use of the word should in Agency guidance documents means that something is suggested or recommended but not required. First publication of the Draft Guidance by the Food and Drug Administration--February 2006.Entities:
Mesh:
Year: 2006 PMID: 17034633 PMCID: PMC1629006 DOI: 10.1186/1477-7525-4-79
Source DB: PubMed Journal: Health Qual Life Outcomes ISSN: 1477-7525 Impact factor: 3.186
Taxonomy of PROs Used in Clinical Trials
| Intended use of the measure | • To define entry criteria for study populations |
| Concepts measured | • Overall health status |
| Number of items | • Single item for single concept |
| Intended measurement population or condition | • Generic |
| Mode of data collection | • Interviewer-administered |
| Timing and frequency of administration | • As events occur |
| Types of scores | • Single rating on a single concept (e.g., pain severity) |
| Weighting of items or concepts | • All items and domains are equally weighted |
| Response options | • See Table 2 for examples of response options (types of PRO scales) |
Figure 1The PRO instrument development and modification process.
Figure 2Diagram of a conceptual framework.
Types of Response Options
| Visual analog scale (VAS) | A line of fixed length (usually 100 mm) with words that anchor the scale at the extreme ends and no words describing intermediate positions. Patients are instructed to place a mark on the line corresponding to their perceived state. These scales often produce a false sense of precision. |
| Anchored or categorized VAS | A VAS that has the addition of one or more intermediate marks positioned along the line with reference terms assigned to each mark to help patients identify the locations (e.g., half-way) between the ends of the scale. |
| Likert scale | An ordered set of discrete terms or statements from which patients are asked to choose the response that best describes their state or experience. |
| Rating scale | A set of numerical categories from which patients are asked to choose the category that best describes their state or experience. The ends of rating scales are anchored with words but the categories do not have labels. |
| Event log | Specific events are recorded as they occur using a patient diary or other reporting system (e.g., interactive voice response system) |
| Pictorial scale | A set of pictures applied to any of the other types of response options. Pictorial scales are often used in pediatric questionnaires but also have been used for patients with cognitive impairments and for patients who are otherwise unable to speak or write. |
| Checklist | Checklists provide a simple choice between a limited set of options, such as |
Common Reasons for Changing PRO Instruments During Initial Development
| Clarity or relevance | • Reported as not relevant by a large segment of the population of interest |
| Response range | • A high percent of patients respond at the floor (worst end of the response scale) or ceiling (optimal end of the response scale) |
| Variability | • All patients give the same answer (i.e., no variance) |
| Reproducibility | • Unstable scores over time when there is no logical reason for variation from one assessment to the next |
| Inter-item correlation | • Item uncorrelated with other items in the same concept of interest |
| Ability to detect change | • Item is nonresponsive (i.e., does not change when there is a known change in the concepts of interest) |
| Item discrimination | • Item is highly correlated with measures of concepts other than the one it is intended to measure |
| Redundancy | • Item duplicates information collected with other items that have equal or better measurement properties |
Measurement Properties Reviewed for PRO Instruments Used in Clinical Trials
| Reliability | Test-retest | Stability of scores over time when no change has occurred in the concept of interest | Does the PRO instrument reliably measure the concepts it was designed to measure? |
| Internal consistency | Whether the items in a domain are intercorrelated, as evidenced by an internal consistency statistic (e.g., coefficient alpha) | Were appropriate reliability tests conducted? | |
| Inter-interviewer reproducibility (for interviewer-administered PROs only) | Agreement between responses when the PRO is administered by two or more different interviewers | What was the quality of the evidence of reliability? | |
| Validity | Content-related | Whether items and response options are relevant and are comprehensive measures of the domain or concept | Do items in the verbatim copy of the PRO instrument appear to measure the concepts they are intended to measure in a useful way? |
| Have patients similar to those participating in the clinical trial confirmed the completeness and relevance of all items? | |||
| Ability to measure the concept (also known as construct-related validity; can include tests for discriminant, convergent, and known-groups validity) | Whether relationships among items, domains, and concepts conform to what is predicted by the conceptual framework for the PRO instrument itself and its validation hypotheses. | Do observed relationships between the items and domains confirm the hypotheses in the conceptual framework? Do results compare favorably with results from a similar but independent measure? | |
| Do results distinguish one group from another based on a prespecified variable that is relevant to the concept of interest? | |||
| Ability to predict future outcomes (also known as predictive validity) | Whether future events or status can be predicted by changes in the PRO scores | Do PRO scores predict subsequent events or outcomes accurately? | |
| Ability to detect change | Includes calculations of effect size and standard error of measurement among others | Whether PRO scores are stable when there is no change in the patient, and the scores change in the predicted direction when there has been a notable change in the patient as evidenced by some effect size statistic. Ability to detect change is always specific to a time interval. | Has ability to detect change been demonstrated in a comparative trial setting, comparing mean group scores or proportion of patients who experienced a response to the treatment? |
| Has ability to detect change been assessed for the time interval appropriate to study? | |||
| Interpretability | Smallest difference that is considered clinically important; this can be a specified difference (the minimum important difference (MID)) or, in some cases, any detectable difference. The MID is used as a benchmark to interpret mean score differences between treatment arms in a clinical trial | Difference in mean score between treatment groups that provides convincing evidence of a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches. The definition of an MID using a clinical anchor is sometimes called an MCID. | The FDA is specifically requesting comment on appropriate review of derivation and application of an MID in the clinical trial setting. |
| Responder definition – used to identify responders in clinical trials for analyzing differences in the proportion of responders between treatment arms | Change in score that would be clear evidence that an individual patient experienced a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches. | The FDA is specifically requesting comment on appropriate review of derivation and application of responder definitions when used in clinical trials. |