Table 4

Measurement Properties Reviewed for PRO Instruments Used in Clinical Trials

Measurement Property
Test
What is Assessed
FDA Review Considerations

Reliability
Test-retest
Stability of scores over time when no change has occurred in the concept of interest
Does the PRO instrument reliably measure the concepts it was designed to measure?

Internal consistency
Whether the items in a domain are intercorrelated, as evidenced by an internal consistency statistic (e.g., coefficient alpha)
Were appropriate reliability tests conducted?

Inter-interviewer reproducibility (for interviewer-administered PROs only)
Agreement between responses when the PRO is administered by two or more different interviewers
What was the quality of the evidence of reliability?
Validity
Content-related
Whether items and response options are relevant and are comprehensive measures of the domain or concept
Do items in the verbatim copy of the PRO instrument appear to measure the concepts they are intended to measure in a useful way?



Have patients similar to those participating in the clinical trial confirmed the completeness and relevance of all items?

Ability to measure the concept (also known as construct-related validity; can include tests for discriminant, convergent, and known-groups validity)
Whether relationships among items, domains, and concepts conform to what is predicted by the conceptual framework for the PRO instrument itself and its validation hypotheses.
Do observed relationships between the items and domains confirm the hypotheses in the conceptual framework? Do results compare favorably with results from a similar but independent measure?



Do results distinguish one group from another based on a prespecified variable that is relevant to the concept of interest?

Ability to predict future outcomes (also known as predictive validity)
Whether future events or status can be predicted by changes in the PRO scores
Do PRO scores predict subsequent events or outcomes accurately?
Ability to detect change
Includes calculations of effect size and standard error of measurement among others
Whether PRO scores are stable when there is no change in the patient, and the scores change in the predicted direction when there has been a notable change in the patient as evidenced by some effect size statistic. Ability to detect change is always specific to a time interval.
Has ability to detect change been demonstrated in a comparative trial setting, comparing mean group scores or proportion of patients who experienced a response to the treatment?



Has ability to detect change been assessed for the time interval appropriate to study?
Interpretability
Smallest difference that is considered clinically important; this can be a specified difference (the minimum important difference (MID)) or, in some cases, any detectable difference. The MID is used as a benchmark to interpret mean score differences between treatment arms in a clinical trial
Difference in mean score between treatment groups that provides convincing evidence of a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches. The definition of an MID using a clinical anchor is sometimes called an MCID.
The FDA is specifically requesting comment on appropriate review of derivation and application of an MID in the clinical trial setting.

Responder definition – used to identify responders in clinical trials for analyzing differences in the proportion of responders between treatment arms
Change in score that would be clear evidence that an individual patient experienced a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches.
The FDA is specifically requesting comment on appropriate review of derivation and application of responder definitions when used in clinical trials.

U.S. Department of Health and Human Services FDA Center for Drug Evaluation and Research et al. Health and Quality of Life Outcomes 2006 4:79   doi:10.1186/1477-7525-4-79