Urinary incontinence can cause embarrassment and can impact on daily activities and quality of life. Generic health related quality of life instruments, such as the EQ-5D, are designed to be applicable across a variety of disease areas. However, it is sometimes claimed that they are not applicable to a certain disease area because they are missing a domain which directly captures the impact of that particular disease. For example, none of the domains of the EQ-5D relate directly to incontinence, although the impact of incontinence on quality of life may be expected to be picked up indirectly through changes in domains such as usual activities or anxiety/depression. The objective of this review was to examine the appropriateness of the EQ-5D in people with urinary incontinence by reviewing published evidence relating to the psychometric performance of the EQ-5D. A systematic search was conducted to identify studies reporting data that permitted assessment of the construct validity, responsiveness or reliability of the EQ-5D in people with urinary incontinence. Included papers were those that reported EQ-5D alongside other measures of health related quality of life or clinical measures in patients with urinary incontinence or in a broader population where results were reported for a subgroup of patients with urinary incontinence. Data were extracted and a narrative synthesis was undertaken. Seventeen papers were included in the review. In most of the tests performed, EQ-5D was consistent with clinical or disease specific outcome measures. The EQ-5D demonstrated validity in the majority of ‘known group’ comparisons, although statistical significance was not always reported. Correlations between the EQ-5D and disease specific outcomes were statistically significant and in the expected direction for most but not all of the disease specific instruments and clinical measures. For responsiveness, there was general agreement between changes in EQ-5D and changes in clinical or disease specific measures. Evidence on reliability was limited to one study. The EQ-5D was generally found to perform well on tests of construct validity, responsiveness and reliability, in people with urinary incontinence although no definitive conclusion can be made on its appropriateness based on these measures alone.
Keywords:Urinary incontinence; EQ-5D; Quality of life; Utility; Quality adjusted life years; Psychometrics
Urinary incontinence (UI) has been defined by the incontinence society as “the complaint of any involuntary urinary leakage” . UI can cause embarrassment and can impact on daily activities and quality of life [2,3]. It can lead to depression, anxiety and can carry considerable health care costs . UI is often categorised as either stress, urge or mixed. Stress incontinence is associated with effort, exertion, sneezing or coughing, whilst urge incontinence is when leakage is accompanied or immediately preceded by urgency. The term mixed incontinence is used when features of both stress and urge incontinence are present.
Treatments which improve continence may have a beneficial impact on the individual’s health related quality of life (HRQoL). Reimbursement agencies are interested in knowing the impact of treatment on HRQoL when making decisions regarding whether a treatment should be made available within their health care system. Often these decisions are informed by cost-utility analyses in which treatment benefits are expressed as a change in quality adjusted life years (QALYs). QALYs are useful as they facilitate comparisons of health benefits across different interventions, patients and disease areas. In order to calculate treatment benefit in terms of QALY gains, an estimate of health utility is required. Health utility is a single metric for HRQoL, where one represents a state of full health and zero represents a state equivalent to death. Negative values are possible as these represent states that are considered to be worse than death. Whilst there are a variety of generic and disease specific instruments available to measure HRQoL, only a few of these provide the preference based measurement of health utility required for cost-utility analyses.
One of the most widely used generic preference based instruments is the EQ-5D. The EQ-5D is a generic instrument intended to measure and value health outcomes across a wide range of diseases and treatments. It is therefore described as a generic rather than a condition specific instrument. It consists of two main components. First, a classification or descriptive system that covers five health domains: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. The standard and most widespread version of the EQ-5D has three levels: no problems, some problems, severe problems. There are therefore 243 health states that can be described in what is generally accepted as a simple approach to describing health. Second, a single valuation (EQ-5D index or tariff) is provided for each particular health state in the descriptive system. The EQ-5D is the preferred instrument for measuring health utilities in adults within the Technology Appraisals Programme at the National Institute for Health and Clinical Excellence (NICE) .
Whilst generic HRQoL instruments are designed to be applicable across a variety of disease areas, it is sometimes claimed that they are not applicable to a certain disease area because they are missing a domain which directly captures the impact of that particular disease. In the case of UI, the EQ-5D lacks any domain that directly relates to continence, although the impact of incontinence on HRQoL may be expected to be picked up indirectly through changes in domains such as usual activities or anxiety/depression. Evidence is therefore needed on the appropriateness of the EQ-5D in this setting. Psychometric methods are often employed to inform assessment of the appropriateness of an instrument for use within a particular population. The aim of this review was to examine the appropriateness of the EQ-5D for measuring health utility in people with UI by examining all published evidence relating to the psychometric performance of the EQ-5D.
Search strategy and data extraction
The search strategy combined free text terms aimed at identifying papers reporting EQ-5D with free text and controlled terms (MESH and MESH-like terms) for UI. The following databases were searched in May 2010; BIOSIS, CINAHL, Cochrane Library (comprising CDSR, CENTRAL, NHS EED), EMBASE, Euroqol website, MEDLINE, PsychNFO, Web of Science. The search strategy for MEDLINE is provided in the Additional file 1.
Included papers were those that reported EQ-5D alongside other measures of HRQoL or clinical measures in patients with UI or in a broader population where results were reported for a subgroup of patients with UI. Papers reporting valuations of clinical vignettes were excluded. There were no restrictions relating to study design or interventions. Relevant systematic reviews and economic evaluations were ordered and their references checked for additional papers reporting primary data. Only English language studies were reviewed. Titles and abstracts were sifted by two reviewers independently with discussion used to resolve any inclusion / exclusion discrepancies. Full text papers were sifted by a sole reviewer.
Data were extracted using a standardised set of forms. Data extracted included study characteristics (country, study design, type of incontinence and severity measures, treatment where relevant), participant characteristics (number, age, gender, ethnicity), outcome measures and results of psychometric tests.
When establishing the appropriateness of a HRQoL instrument within a particular disease area, relevant psychometric properties include acceptability, feasibility, reliability, validity, and responsiveness . The concept of validity refers to the extent to which an instrument measures what it is intended to measure, but in this case, all measures of validity are limited by the fact that there is no gold standard measure of health utility against which to judge performance. Brazier and Deverill (1999) identify several criteria that psychometricians use to measure validity in the absence of a gold standard measure . ‘Known group validity’ examines differences between groups which are known to differ in the concept of interest, e.g health utility. Given the lack of a gold-standard measure of health utility, in practice the groups are often defined in terms of clinical measures such as disease severity. ‘Convergent validity’ refers to the situation where an instrument is highly correlated with other instruments which measure the same underlying construct. ‘Discriminant validity’, is where measures that theoretically should not be related to each other are observed not to be correlated with each other. Known-group, convergent and discriminant validity are all measures of construct validity. Other forms of validity such as face validity and content validity are concerned with whether the items of the instrument are appropriate for the health dimension being measured, in this case the conceptual model of health that is accepted to define the “quality of life” element of QALY calculations. These measures would need to be assessed in a broader population than considered here. Responsiveness refers to the ability of an instrument to reflect changes that occur in patients over time and therefore requires the comparison of longitudinal data in groups that are known to have changed in the concept of interest. Reliability can be thought of as the stability of results when using an instrument repeatedly in situations where the results are not expected to change, such as over time in the same unchanged population (test-retest reliability), or between raters or interviewers (inter-rater reliability). The acceptability and feasibility of the EQ-5D is well established and is not expected to be significantly different for this population, so the review was limited to measures of construct validity, reliability and responsiveness.
A total of 67 citations were identified from the bibliographic searches (Figure 1). Of these 38 were ordered as full-text articles, although nine papers (four reviews and five economic evaluations) were ordered purely to check their references for further primary studies. From these one further paper was identified.
Figure 1. Identification of included articles.
A total of 17 papers were included in the review, the key features of which are reported in Table 1. Four of the studies identified were randomised controlled trials (RCTs), four were cohort studies and nine were cross-sectional studies. None of the studies were specifically designed to assess the psychometric properties of the EQ-5D. One paper reported that its objective was to evaluate the measurement properties of the EQ-5D using data collected as part of a RCT . Two further studies aimed to validate another HRQoL instrument [2,8].
Table 1. Characteristics of included studies
The majority of the studies were conducted in a population with incontinence. In two studies, a sample of the general population were asked whether they had a range of clinical conditions including incontinence [2,9]. These studies were included as they reported utilities for the subgroup of patients with incontinence. One study identified patients from an academic urology unit inpatient database and examined overactive bladder symptoms including incontinence . One study was in men with uncomplicated urinary tract symptoms associated with benign prostatic enlargement . A second study was conducted in outpatients attending a urology department with urinary symptoms (not specifically incontinence) and possible benign prostatic obstruction . This study also recruited a general practice sample which was not selected for incontinence . These studies were included as UI can be experienced in patients with benign prostatic hyperplasia. Two papers reported different analyses from the Prospective Urinary Incontinence Research (PURE) study [12,13]. One paper reporting EQ-5D values from a study  had a second associated paper  which was excluded as it didn’t report EQ-5D values, however the EQ-VAS values reported in this secondary paper are included in the results table under the primary paper.
One study enrolled less than 100 patients . The total number of patients ranged from 48 to 9487. The mean age across the cohorts with UI varied from 50 to 67. One study reported a higher mean age in the patients reporting UI than in the general population sample as a whole (mean age of 64 versus 53) , whilst another reported only the mean age for the general population sample . Two papers looked exclusively at males [8,11], four had a mixed population of males and females [2,9,10,14], and the remainder looked exclusively at females. Ethnicity was reported in a single study in which 4% of participants were non-white .
The measures reported in each of the included studies are shown in Table 2 (all abbreviations used to describe HRQoL instruments are defined below Table 2). In addition to the EQ-5D, five studies administered the SF36 or some variant of it [8,10,14,17,18]. One included SF-6D, AQoL, AQoL-8, and HUI-3  and one reported the 15-D . Several papers reported using the UK valuation set for the EQ-5D and none reported using an alternative valuation set, although it was common for this information not to be reported. Only two studies reported the EQ-VAS [12,14].
Table 2. Measures reported in the included studies
The main clinical measures reported were severity, or grade of incontinence, type of incontinence (stress / urge / mixed), frequency of leakage episodes and pad usage or pad tests to determine volume of leakage. Some studies reported on cough stress tests or cystometry results. In the benign prostatic hyperplasia populations maximum flow rate and post void residual volume were used as measures of treatment effectiveness.
Various symptom scoring and incontinence specific quality of life tools were also used (KHQ, UISS, I-QOL, IIQ-7, SSI). Some studies included tools which were designed for use in patients with overactive bladder rather than incontinence (UDI-6, BFLUTS). Some studies included scales designed to measure the impact of lower urinary tract symptoms in men (ICSQoL, IPSS). One study reported a questionnaire that assesses the likelihood of destrusor instability (DIS) which may be associated with stress incontinence, based on patient history. One study reported quality of life using a patient generated index (PGI) which is an individualised health related quality of life measure.
‘Known group’ validity
A summary of those studies that compared the mean EQ-5D between groups defined in terms of incontinence severity, frequency or type of incontinence is provided in Table 3.
Table 3. Results of ‘known group’ comparisons
Two studies defined groups by the frequency of incontinence episodes [7,19]. In one study, three groups were defined and the mean EQ-5D consistently reflected differences between groups and the differences were statistically significant . In the second study, five groups were defined . The mean EQ-5D was equal for two of the groups and the differences between all the five groups were not statistically significant. In the same study, the condition specific measures of SSI and I-QoL discriminated well between the groups.
Two studies reported ‘known group’ validity by severity group. In one study the definition of severity was not well described , but in the other  a validated severity index was used which was based on combined scores for frequency and leakage amount. EQ-5D varied between severity groups as expected in both studies and had statistically significant differences between severity groups in one study , whilst the other did not report whether differences were statistically significant . Other preference based measures (SF-6D, AQoL & AQoL-8), generic measures (EQ-VAS) and disease specific measures (I-QoL) were found to perform equally well.
Three studies compared groups defined by incontinence type with two studies distinguishing between stress, urge and mixed incontinence [13,19] and the other study grouping patients as general incontinence, stress incontinence or none . It was unclear what differences were clinically expected between the stress, urge and mixed groups. However, two studies reported greater EQ-5D scores for stress incontinence than for urge and greater utilities for urge than for mixed [13,19]. These differences were statistically significant in one study and the other did not report statistical significance. EQ-VAS had differences across the groups that were consistent with the differences for EQ-5D except for when severity was reported as slight. Mean I-QoL score performed similarly to EQ-5D although the differences between the groups were not consistent for individual I-QoL domains.
In the third study EQ-5D scores were lower for general incontinence than for no incontinence as clinically expected, but statistical significance was not reported . SF-36 performed equally well in distinguishing between UI type which was categorised as general / stress / none.
Five studies provided information on the correlation between EQ-5D and disease specific instruments (KHQ, PGI, I-QoL, ICS-QoL, SSI) or clinical measures (incontinence grade and number of micturitions / leakages). Significant correlations in the expected direction were seen for several but not all of the disease specific instruments. One study reported a statistically significant correlation (p<0.01) in the expected direction for both the I-QoL index and the three I-QoL scale scores . In the same study, SSI was found not to have a statistically significant correlation with EQ-5D (p>0.05) . The correlations between EQ-5D and the individual ICS-QoL items were all in the expected direction but were not all statistically significant . One study reported significant correlations in the expected direction for PGI and KHQ, but p-values were not specified . Significant correlations were found with incontinence grade (p<0.05)  and the number of micturitions and leakages (p<0.001) .
Two studies used regression techniques to assess the impact of clinical measures on EQ-5D scores. Severity, subtype of incontinence (e.g stress / urge) and number of episodes were found to be significant predictors [12,19]. Two studies used multivariate regression to examine whether presence of incontinence was a significant predictor of utility. The first found that presence of incontinence was a significant predictor of EQ-5D in urology patients and was also a significant predictor of SF-36 scores . The second study found that incontinence was a significant predictor of both EQ-5D and 15D in a general population sample and the size of utility loss was similar between these two instruments .
Results from studies that provide details on the responsiveness of EQ-5D in incontinence are reported in Table 4. Five studies reported changes in EQ-5D from baseline and compared this to changes in disease specific or clinical measures [11,16,18,21,22]. Generally there was agreement between changes in EQ-5D and changes in clinical or disease specific measures with four studies reporting improvements in both [11,18,21,22] although two studies did not report whether the EQ-5D changes were statistically significant [11,18]. In one study there was no significant change in either EQ-5D or clinical outcomes .
Table 4. EQ-5D responsiveness results
One study reported changes from baseline for patients whose continence-specific health improved . In this subgroup significant changes from baseline were seen in SSI and I-QoL, but not EQ-5D at six weeks. However, by five months when greater changes from baseline were seen for SSI and I-QoL, the EQ-5D changes were also found to be larger and statistically significant. This study also reported mean scores for responders and non-responders with response being based on patient perceived benefit. There were significant differences between responders and non-responders in two of the I-QoL domains at six weeks, but differences in SSI, I-QoL index and EQ-5D were non-significant. However, by five months EQ-5D differences were found to be significant although only one I-QoL domain remained significantly different between responders and non-responders.
Five studies reported whether the difference between treatment groups was significant for both EQ-5D and for other measures (clinical, disease specific measures and generic HRQoL) [11,17,18,22,23]. In three studies there were no statistically significant differences in EQ-5D between treatment groups and this agreed with the other trial outcomes [17,18,22]. In one of these studies some significant differences were found in some domains of the SF-36 but not in the other clinical outcomes (objective and subjective cure rates) . One study found differences in EQ-5D scores between the treatment arms that were consistent with the clinical outcomes, but the statistical significance of the EQ-5D differences was not reported . In another study six comparisons were made between the four treatment options (three active and one no treatment) . For the three comparisons of active treatment against no treatment, all three active treatments were more clinically effective than no treatment but only two had significantly better EQ-5D scores. For the three comparisons between the active treatment arms, no significant differences were seen in the clinical effectiveness, but there were significant differences in the EQ-5D scores for two comparisons.
One study reported standardised response means for different instruments . The standardised response means were lower for EQ-5D than for disease specific measures (SSI and I-QoL).
Key findings on re-test reliability
One study reported the intraclass correlation coefficient (ICC) for patients reporting no benefits from treatment during a clinical trial (data from both trial arms were combined) . The test-retest correlation for EQ-5D was 0.83 (n=50).
The EQ-5D appears to be a reasonable instrument to use in this population when considering the psychometric measures of construct validity, responsiveness and reliability. In most situations EQ-5D performs well when assessed by ‘known group’ validity or responsiveness. In most of the responsiveness tests performed, EQ-5D was consistent with clinical or disease specific outcome measures, including in achieving statistical significance. However, there were situations where statistical significance was not achieved.
Psychometric measures such as validity, reliability and responsiveness are often used to support claims that a HRQoL instrument is adequate or inadequate in a particular population. These measures rely on making comparisons between the scores achieved by the HRQoL instrument and other instruments or clinical measures which are expected to be related. However, when the instrument in question intends to measure health utility, as EQ-5D does, these comparisons are not tests. They can highlight differences between EQ-5D and other instruments such as other generic instruments, disease specific outcomes or clinical measures, but since there is no gold standard it cannot be established conclusively which measure is “right”. Intuition and judgement are required to draw any stronger conclusions. Another issue for consideration when interpreting the results is that the populations of the included studies are somewhat diverse with some studies recruiting patients specifically with symptoms of UI and other studies recruiting patients with conditions which may be associated with UI such as overactive bladder and benign prostatic enlargement.
Limitations to the studies included in the review can only further dilute the conclusions that may be drawn. In particular, none of the studies reported here were specifically designed to test the appropriateness of the EQ-5D, they simply provided data which was potentially relevant. Where studies are not explicitly powered to detect a difference in EQ-5D scores, a lack a statistical significance in a particular comparison may be related to the size of the sample rather than a reflection on the appropriateness of the EQ-5D. Further more, sometimes not all of the data relevant to assessing a particular psychometric property were provided. For example, three of the studies providing data on responsiveness were RCTs reporting changes from baseline for the EQ-5D and other clinical measures, but two did not report whether the EQ-5D changes were statistically significant.
Where known groups are defined in terms of some clinical measure, the distinctions between groups may reasonably not translate to differences in health utilities. For example, Haywood et al. found that EQ-5D was not able to fully discriminate between 5 groups . The groups were defined in terms of the number of episodes as “not at all”, “a few days”, “half the week”, “most days” and “every day”. The differences between the groups are therefore relatively small, not necessarily mutually exclusive, and it is questionable whether there would be significant differences in the preferences of patients in some of the groups.
Furthermore, the reporting of the extent to which an instrument is consistent with groups defined in another way needs to consider how many groups are being considered. Often there are multiple groups being compared and the instrument may provide consistent results across many of them. P-values typically relate to the null hypothesis that the mean value is equal in all the subgroups under consideration. This itself may be ambiguous because it does not consider how many of the individual pairs of comparisons are statistically significant. It also does not discriminate between situations where the observations are all consistent i.e. statistical significance provides support for the validity of the instrument, versus those where one or more observations appear to be inconsistent i.e. statistical significance may or may not provide support for the validity of the instrument. Given the multiple issues identified regarding tests of statistical significance in this context, we recommend that caution should be exercised when interpreting any measures of a psychometric property which rely on tests of statistical significance.
The EuroQol Group have approved the development of “bolt-ons/dimension extensions” . These instruments will permit the addition of extra dimensions to the standard EQ-5D instrument in order to directly capture other issues of importance to patients. How precisely these bolt-ons are approached remains to be seen, but this may be a route to addressing symptoms such as incontinence which are not captured directly by any of the current dimensions. This review has not identified any strong evidence to suggest that the impact of incontinence is not adequately captured indirectly through the existing dimensions, although it did not examine content validity directly. A review by Lin et al identified several candidate areas for bolt-ons by comparing the content of disease specific preference based measures to that of the EQ-5D across a wide variety of disease areas . Despite including one paper in patients with urinary incontinence and another in patients with overactive bladder, incontinence was not identified by Lin et al. as a potential candidate for bolt-ons to the EQ-5D. One of the key advantages of the EQ-5D, which may be threatened by the addition of bolt-on dimensions, is that it provides a generic measure of HRQoL that allows decision makers to apply a consistent approach to economic evaluation across multiple disease areas.
This review provides a narrative summary of the evidence available on the appropriateness of the EQ-5D instrument in assessing the health impact of UI. The EQ-5D was generally found to perform well on tests of construct validity, responsiveness and reliability, although no definitive conclusion can be made on its appropriateness based on these measures alone.
AQoL: Assessment of quality of life; BFLUTS: Bristol female lower urinary tract symptoms questionnaire; DIS: Detrusor instability scores; EQ-VAS: Visual analogue scale which accompanies the EQ-5D descriptive system; GP: General practice; HRQoL: Health related quality of life; HUI3: Health utilities index mark 3; ICSQol: International continence society – Benign prostatic hyperplasia study quality of life instrument; IIQ-7: Incontinence impact questionnaire-short form; I-PSS: International prostate symptom score; I-QOL: Incontinence specific quality of life questionnaire; KHQ: King’s health questionnaire; NASHA/Dx: Non-animal-stabilized hyaluronic acid/dextranome; PGI: Patient generated index; QALY: Quality adjusted life year; RCT: Randomised controlled trial; SF-36: Medical outcomes study 36-item short-form health survey; SF-6D: Classification for describing health derived from a selection of SF-36 items; SSI: Symptom severity index; S/UIQ: Stress and urge incontinence questionnaire; TTO: Time trade off; TVT: Tension-free vaginal tape; TVT-O: Tension-free vaginal tape obturator; UDI-6: Urogenital distress inventory-short form; UI: Urinary incontinence; UISS: Urinary incontinence severity score; UK: United Kingdom; VAS: Visual analogue scale; 15-D: Fifteen dimension generic instrument.
The authors declare that they have no competing interests.
SD and AW contributed to the overall design of the review, interpretation of the results and drafting of the manuscript. SD was also responsible for identifying included studies and extracting and summarising study data. Both authors read and approved the final manuscript.
SD is a Senior Lecturer in Health Economics and Deputy Director of the NICE Decision Support Unit. AW is a Professor in Health Economics and Director of the NICE Decision Support Unit.
We thank Jonathan Tosh for his assistance in sifting papers for inclusion. This article is based on a report which was funded by the National Institute for Health and Clinical Excellence (“NICE”) through its Decision Support Unit. The views, and any errors or omissions, expressed in this article are of the author only.
Abrams P, Cardozo L, Fall M, Griffiths D, Rosier P, Ulmsten U, Van Kerrebroeck P, Victor A, Wein A: The standardisation of terminology of lower urinary tract function: report from the Standardisation Sub committee of the International Continence Society.
Br Med J 1988, 297:1187-1189. Publisher Full Text
Health Technol Assess 2006, 10:1-132.
iii-ivPubMed Abstract | Publisher Full Text
Donovan JL, Kay HE, Peters TJ, Abrams P, Coast J, Matos-Ferreira A, Rentzhog L, Bosch JL, Nordling J, Gajewski JB, et al.: Using the ICSOoL to measure the impact of lower urinary tract symptoms on quality of life: evidence from the ICS-'BPH' study. International continence society--benign prostatic hyperplasia.
Saarni SI, Härkänen T, Sintonen H, Suvisaari J, Koskinen S, Aromaa A, Lönnqvist J: The impact of 29 chronic conditions on health-related quality of life: a general population survey in finland using 15D and EQ-5D.
Noble SM, Coast J, Brookes S, Neal DE, Abrams P, Peters TJ, Donovan JL: Transurethral prostate resection, noncontact laser therapy or conservative management in men with symptoms of benign prostatic enlargement: an economic evaluation.
Monz B, Chartier-Kastler E, Hampel C, Samsioe G, Hunskaar S, Espuna-Pons M, Wagg A, Quail D, Castro R, Chinn C, et al.: Patient characteristics associated with quality of life in European women seeking treatment for urinary incontinence: results from PURE.
Monz B, Pons ME, Hampel C, Hunskaar S, Quail D, Samsioe G, Sykes D, Wagg A, Papanicolaou S: Patient-reported impact of urinary incontinence–results from treatment seeking women in 14 European countries.
Urology 1997, 50:100-107. PubMed Abstract
Dumville JC, Manca A, Kitchener HC, Smith AR, Nelson L, Torgerson DJ, COLPO Study Group: Cost-effectiveness analysis of open colposuspension versus laparoscopic colposuspension in the treatment of urodynamic stress incontinence.
Tincello D, Sculpher M, Tunn R, Quail D, van der Vaart H, Falconer C, Manning M, Timlin L: Patient characteristics impacting health state index scores, measured by the EQ-5D of females with stress urinary incontinence symptoms.
International Urogynecology Journal 2008, 19:1049-1054. Publisher Full Text
EuroQol group Website.