Ankylosing spondylitis (AS) is an autoimmune disorder characterized by inflammation of the spine and large joints. Fatigue is a common symptom that many AS patients find significantly impacts their health-related quality of life. The Worst Fatigue – Numeric Rating Scale (WF-NRS) assesses the severity of this symptom during the previous 24-hour period. The objective of this study was to perform qualitative research to support the development and content validity of the WF-NRS.
Patients with AS were recruited from clinical sites in the U.S. for a qualitative study which first entailed concept elicitation interviews to gain understanding of the patients’ experience with AS and fatigue. Subsequently, cognitive debriefing interviews were undertaken to assess the understandability, clarity, and appropriateness from the patient’s perspective, of the content of a measure of fatigue severity.
Thirteen patients with AS participated in concept elicitation interviews and cognitive debriefing of the Brief Fatigue Inventory (BFI) fatigue severity subscale. The WF-NRS was developed from the worst fatigue item of the BFI as patients generally reported it to be understandable and covered an important concept, the completion instructions were modified, but the response scale remained as it was familiar and readily completed, and the recall period was appropriate.
Patient responses resulted in the development of and supported the content validity of the WF-NRS. Further quantitative evaluation of the WF-NRS is warranted in order to assess its psychometric properties and confirm its usefulness as a clinical trial tool.
Keywords:Ankylosing spondylitis; Concept elicitation; Content validity; Fatigue; Patient reported outcomes (PRO); Worst Fatigue Numeric Rating Scale (WF-NRS)
Ankylosing spondylitis (AS) is an autoimmune disorder characterized by inflammation of the spine and large joints. Prevalence rates for AS have been estimated to be 0.2 – 1.2% , and historically males have been found to be more predominantly affected than females [2,3]. However, this historical precedence may have contributed to gender biased research and under-diagnosis in women . It is a chronic condition of young adults that commonly develops in the third decade of life . Common symptoms include joint pain, fatigue, low-grade fever, loss of appetite, and eye inflammation . There is wide inter-individual variability in bearing the burden of AS as some patients experience minor disabilities while others may have severe deformities of the spine. Aspects such as reduction in physical function, withdrawal from activities (including employment), and impairment in quality of life, comprise a significant portion of the disease burden .
Although there is variability in the burden of AS, many patients experience fatigue as part of their disease, with severe fatigue reported in 50% of patients with AS . Furthermore, fatigue has been found to be one of the key symptoms that can significantly impact health-related quality of life , and patients with AS have reported that fatigue has impacted their social life, relationships, and work . For this reason, researchers have called for both a comprehensive assessment of fatigue as part of routine clinical practice in patients with AS and the development of treatment programs directed at alleviating the fatigue that often accompanies AS .
The primary objectives of the qualitative research presented here are to describe the patient’s experience with fatigue with respect to their AS and to evaluate the content validity of a measure of fatigue severity for these patients. Concept elicitation interviews were undertaken to gain insights into patients’ with AS perceptions of the symptoms of their condition, especially fatigue, the importance of those symptoms, and the impacts of AS on patients’ lives. Key insights gleaned from these interviews demonstrated the importance of fatigue in AS and the need for clinical research to capture symptom severity as a means of demonstrating treatment benefit. Subsequently this research led to the development of the Worst Fatigue Numeric Rating Scale (WF-NRS), a single-item patient reported outcome (PRO) measure that assesses the level of worst fatigue severity experienced by the patient during the previous 24 hours. This manuscript presents results from qualitative analyses of the interview data to support conclusions regarding the development and usefulness of the WF-NRS for the assessment of fatigue severity among patients with AS in a clinical trial setting.
This was a cross-sectional, qualitative study involving one-on-one, in-person interviews. Participants were identified and recruited through three clinical sites in the U.S. with the sample size determined by saturation. Specifically, information “saturation” refers to the point in the interviewing process when interviews are no longer yielding new information and the researcher can feel confident that all important concepts related to the research question have been identified .
Eligibility criteria and recruitment
Ambulatory patients with AS aged eighteen years or older were eligible for this study. Original inclusion criteria required an established AS diagnosis (modified New York Criteria) for at least twelve weeks and a Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) score ≥ 4 and back pain score ≥ 4 on a 10-cm visual analogue scale (VAS). The back pain score was based upon the average of overall spinal pain and nocturnal spinal pain VAS scores for the past week. The inclusion criteria were modified after recruiting sites reported that some of the information necessary to determine eligibility under the original criteria was not readily available as it was not routinely collected or assessed in clinical practice. Specifically, the revised criteria no longer required a diagnosis based on the modified New York Criteria or a BASDAI score of ≥ 4; however, clinical confirmation of diagnosis via patient medical records was retained as a requirement. The revised criteria were adopted after recruitment of 3 participants (1 male) in order to enhance recruitment and pragmatically reflect clinical routines. Patients with complete ankylosis of the spine were not eligible.
Site staff reviewed medical charts and/or electronic patient databases in order to identify eligible participants and subsequently contacted these AS patients for further screening and possible enrollment. Institutional Review Board (IRB) approval was obtained from the Independent Investigational Review Board Inc. (Plantation, Florida, USA) prior to recruitment of participants.
Before beginning the interview, the participant reviewed and signed an IRB approved consent form. Approximately one-hour long interviews were then conducted by a trained researcher who followed a semi-structured interview guide (Table 1). The first part of the interview was designed to elicit information from patients about the symptoms and impacts of AS. Participants were asked open-ended questions about their experiences with AS (e.g., “What is it like to have AS?”) and the interviewer followed up with probing questions as necessary. Then the Brief Fatigue Inventory (BFI)  was administered to patients after which cognitive debriefing interviews were conducted to assess the clarity, relevance, and comprehensiveness of that instrument, specifically the fatigue severity subscale. The BFI is a 9-item measure assessing fatigue severity and interference in daily life . One item from the severity subscale asks the respondent to “Please rate your fatigue (weariness, tiredness) by circling the one number that best describes your WORST level of fatigue during the past 24 hours”. Responses are on an 11-point numeric rating scale with anchors at 0 (No fatigue) and 10 (As bad as you can imagine).
Table 1. Example of questions in the semi-structured discussion guide
All interview sessions were digitally audio-recorded with the participant’s permission and subsequently transcribed for analysis purposes.
A thematic analytic approach was used to summarize and evaluate the data from the interviews . Coding was performed using MaxQDA 10 , a text analysis software program designed to help organize qualitative data and allow for a thorough exploration of themes and concepts emerging from the data. After a process of independent coding of two transcripts by two team members, the team discussed and reached consensus on a codebook, which was then used by one researcher to code all transcripts.
Analysis of the cognitive debriefing portion of the interviews was on a question-by-question basis. The analysis focused on identifying any issues with the instrument with respect to clarity, interpretation, relevance, and comprehensiveness of the items, response options, instructions, and recall period. Descriptive statistics (e.g., mean, frequency) were used to characterize the demographics of the sample of participants.
A total of thirteen participants completed interviews. After the first three interviews, revisions to the inclusion and exclusion criteria (noted above) were made to enhance recruitment and the next ten participants were recruited under the revised criteria. Table 2 shows the sociodemographic characteristics of the sample. There were eight females and five males, and the mean (SD) age was 47 (13.4) years. The majority (N = 10) of participants rated their current level of AS as “moderate” and the remaining 3 participants rated their current AS level as “severe”. For the last ten participants (i.e., those recruited after the change in eligibility criteria and for whom data are available), the mean spinal pain NRS scores were: 6.0 (SD 1.2) for overall pain and 6.2 (SD 1.8) for nocturnal pain. For the entire sample, the average time since AS diagnosis was 13 years, with a minimum of 1 year and a maximum of 35 years. There were no new concepts reported or codes applied after the 7th interview transcript, which suggests that saturation was attained by the eighth interview. As a means to gather a more in-depth understanding of the concept(s) being studied, we continued to interview 5 participants beyond the saturation point.
Table 2. Sociodemographic and Clinical Characteristics
During concept elicitation interviews, participants reported being significantly affected by their AS with symptoms such as pain, stiffness, and fatigue/tiredness/less energy. Common impacts reported by participants associated with their condition included sleep difficulties, physical deformity, decreased mobility and activities, change in personality, frustration, realization of mortality, and uncertainty about the future. Exemplary quotes from the concept elicitation interviews are presented in Table 3.
Table 3. Descriptions of fatigue
Description of fatigue
All thirteen participants had experienced fatigue associated with AS, although one participant was less affected at the time of the interview and another reported not currently experiencing fatigue. Three of the participants spontaneously used the term 'fatigue’. Other related terms used spontaneously, or words that were supplied in initial responses to the interviewer’s use of the term fatigue, included: tired, exhausted, feeling worn out or wiped out, (lack of) energy, run down, hard to concentrate or focus, slow motion, needing to rest, and falling asleep. Some participants reported feeling that although they did things at the same pace as before the onset of AS, they had less energy and consequently ran out of energy sooner. Others described this concept in terms of doing things more slowly, taking longer than usual to do things, or of being in slow motion.
Clarity and interpretation of the construct of fatigue
All participants understood the term 'fatigue’. Participants reported that 'fatigue’ and 'tired’ meant the same thing to them. One participant noted that fatigue meant being 'really tired’ while others said they felt 'exhausted’. Participants interpreted fatigue in terms of a lack of energy, running out of energy or completing tasks but without 'zest’. Some participant’s comments suggested a mental component to the concept of fatigue and, in particular, noted being 'too tired to think’.
Frequency and duration
Participants reported that fatigue varied over the course of the day, with several individuals reporting extreme fatigue at the end of the day. Mornings were reported to be somewhat better with respect to fatigue, with this improvement sometimes attributed to having rested during the night. Some patients, however, woke up feeling tired. It might take them considerable time to 'get moving’ in the morning, but once moving, their fatigue and pain levels seemed to decrease. For some participants, fatigue severity increased over the course of the week, and they used the weekend to rest and 'charge their battery’.
Fatigue and pain
Pain was another key symptom of AS, and all thirteen participants reported experiencing pain. There appeared to be a complex relationship between fatigue and pain. For instance, when asked about fatigue, participants would often respond by talking about their pain. However, it was not possible to identify either a consensus on the participants’ perceived association between fatigue and pain or a causal relationship between the two. Some participants reported being 'tired from the pain’, suggesting that the experience of pain was physically and/or emotionally tiring. Conversely, others reported that when they were tired, more pain was experienced. Eight participants reported experiencing disturbed sleep because of pain during the night, and the tiredness in the morning was seen as result of this pain.
Other factors associated with fatigue
Some participants reported feeling depressed. The experiences of both pain and fatigue could be related to, or exacerbated by, feelings of depression. Other sleep disturbances (i.e., those not related to pain) were also reported as contributing to waking up tired. Participants reported that resting and taking naps could help to reduce fatigue.
Overall, the cognitive debriefing interviews revealed that participants with AS interpreted the items of the BFI in a consistent manner and, based on the judgment of the study team, as intended. Table 4 presents exemplary quotes from the study participants during the cognitive debriefing interviews.
Table 4. Content Validity of the WF-NRS
In general, participants found the worst fatigue item of the BFI to be clear and easy to understand. They understood that the item was asking them to rate the severity of their fatigue at its worst level over the previous 24 hour period. However, two issues related to comprehension were reported during the interviews. First, one participant noted some confusion with the wording of the item since both 'best’ and 'worst’ appeared together in the item (i.e., 'choose the one number that best describes your WORST level of fatigue’). Second, one participant was unfamiliar with the term “weariness”, which appears in parentheses next to the term “fatigue” to further define it. Furthermore, no participant spontaneously used the term 'weariness’ or feeling 'weary’ during the concept elicitation part of the interviews. Three participants used the term only after seeing it in the BFI. The remaining participants in the sample either did not use the term at all or referred to a symptom that 'wears on you’, or 'wears me out’, or 'wears me down’.
The response scale used in the worst fatigue item was generally well understood. Some participants found the eleven-point numeric rating scale to be similar to other scales they had used previously. There was one participant who initially answered with a '2’, but when the interviewer asked how she had decided upon her answer, the respondent realized she had interpreted the direction of the scale incorrectly and corrected her response to an '8’. Aside from that oversight, no other difficulties with the response scale were reported, and the other participants found the scale to be clear and appropriate.
The recall period of the worst fatigue item was considered appropriate, and participants appeared to have no difficulty recalling their experiences with fatigue over the past 24 hours. While most participants used the recall period accurately in answering the item, there was evidence that one patient may have used a longer recall period in responding, as when asked to define the item, the participant described fatigue experienced a few days prior. In addition, some participants suggested increasing the recall period to three days or a week because they were concerned that a 24 hour recall period might not provide an accurate picture of their fatigue, especially given the variability of this symptom.
Possible mediating factors
During the cognitive debriefing interviews, the replies of some of the participants indicated that they had taken other factors into account when selecting the response that reflected their worst fatigue severity. Some participants made an initial rating, as described above, and then adjusted the reported score if they could identify a specific reason for fatigue severity. For instance, participants reported adjusting a rating downward if there were circumstances that might explain their fatigue, such as if they had been particularly active. Other reasons included the effort required to do tasks, rest and/or sleep, lack of energy, irritability, cognitive impairment, and pain.
Based on the participants’ comments during the interviews, several recommendations for modification were proposed for clarifying the verbiage of the worst fatigue item of the BFI and, also, for its administration in clinical trials, which led to the development of the WF- NRS “Please rate your fatigue (feeling tired or worn out) by circling the one number that best describes your WORST level of fatigue during the past 24 hours”. Responses are on an 11-point numeric rating scale with anchors at 0 (No fatigue) and 10 (As bad as you can imagine).
First, in order to enhance understandability, the word 'best’ was removed from the instruction component of the item. Second, the term 'weariness’, which was used to elaborate on the term 'fatigue’, was replaced by the term 'worn out’ which seemed to resonate with a larger number of participants.
Third, in order to better reflect the variability of fatigue severity found in the AS population, it was recommended that the fatigue severity measure be administered as a daily diary in the clinical trial setting. Finally, a last recommendation was to include in the AS clinical trials items or PRO tools that would capture potential covariates of fatigue, such as sleep disturbance, physical activity level, and pain.
In this study, we intended to capture participant’s experience with having AS. Many were young adults (the youngest was 26 years old) or had lived very active lives (the oldest was a 76-year old competitive triathlete) prior to disease onset. Fatigue and pain were often experienced at extreme levels, and the symptoms affected all areas of participants’ lives: mobility, work, career aspirations, social activities, relationships, and emotional well-being. Although participants typically mentioned chronic, and sometimes excruciating, pain when first asked about their experience of AS, when probed about fatigue, they responded with 'Oh yes, very much so’ and 'Oh, I’m always exhausted’. One participant suggested that one possible explanation for not reporting fatigue spontaneously could be the 'normalization’ of the phenomenon by themselves (i.e., feeling fatigued becomes customary) or others (i.e., 'everybody gets tired’), and hence participants’ omission to report this. Fatigue severity was reported to be highly variable from day-to-day as well as throughout the day, with the evening often cited as the time of worst fatigue severity. Participants mentioned a number of factors influencing fatigue severity, including pain, sleep disturbances (that may be related to pain), exercise or activity, and feeling depressed.
Results from the concept elicitation interviews also indicated that all participants clearly understood the term 'fatigue’ and some of them used it spontaneously when describing their condition. It is possible that individuals with AS quickly learn the term because it is used by doctors and other patients when discussing the condition. While participants also used other terms and phrases to describe fatigue, such as tiredness, feeling worn out, exhausted, needing to rest, slowed down, etc., the term 'fatigue’ resonated with participants and was considered clear and appropriate for describing how AS makes them feel.
In summary, the concept elicitation interviews confirmed that fatigue was a key symptom of AS. All participants reported experiencing fatigue at some point during their illness, with all but one participant currently experiencing fatigue. Along with pain, fatigue was considered a bothersome symptom of AS, with one participant reporting that fatigue was the most bothersome symptom.
Although there have been a large number of scales developed with the intent of measuring fatigue , the WF-NRS has a number of characteristics that distinguish it from the alternatives. First, the WF-NRS was developed based on qualitative input (concept elicitation interviews and cognitive debriefing of the BFI) from an AS population ensuring the assessment of an important and relevant aspect of fatigue severity. This type of patient input is a key aspect for establishing content validity as the fatigue experience may differ across patient populations and scales developed for one condition may not be appropriate for use in another. Many of the existing fatigue scales were developed in patient populations other than AS, such as those with cancer [10,14,15], chronic fatigue syndrome[16-18], multiple sclerosis [19-21], rheumatoid arthritis , or general medical patients [23,24]. To the best of our knowledge, content validity in AS has not been established for most of these instruments. Second, the WF-NRS is a single-item, unidimensional instrument inquiring about the worst level of fatigue in a 24 hour period as a measure of fatigue severity, which the interviews revealed to be relevant and understandable in patients with AS. Having respondents answer queries based on their worst experience is consistent with recommendations from PRO development guidelines for using appropriate methods and techniques to enhance the validity and reliability of self-reported data . In contrast, the other measures listed above are all multiple-item tools that assess fatigue as either a unidimensional [10,14,16,19] or a multidimensional [15,17,18,20-24] construct. The brevity of the WF-NRS may make it particularly suitable for use in clinical trials with a patient population that suffers from fatigue, especially if the trial subjects will be expected to complete a battery of other assessment instruments. Finally, the 24-hour recall period appears to be short enough that patients will not have difficulties in accurately completing the instrument. The study participants suggested that fatigue severity was periodic and therefore other recall periods, such as “at the present time”, may not capture clinical peaks. Additionally, capturing the worst level of fatigue in a short timeframe does not require the patient to average his or her symptom severity over time. Patient responses that rely on memory based on long periods of time, or require the responder to average their response, may introduce recall bias . Moreover, incorporating the WF-NRS in a daily diary may help ensure that day-to-day variability in worst fatigue could be readily captured and addresses participants’ concerns that just one assessment may not accurately characterize their fatigue severity. The addition of a clinical outcome assessment tool to capture cognitive impairment should also be considered as participant comments suggest a mental component of fatigue presenting as a consequence of AS.
A limitation of the study may be that the small sample size allowed participant’s input to be obtained from only three clinical sites, although they were located in separate regions of the country. However, the intention of sampling for qualitative research for the development of a patient-reported outcome measure is not to obtain a representative sample of the epidemiologic profile of the patient population but rather to ensure an enriched diversity in patient and disease characteristics, and the distribution of variables indicated a diverse sample with respect to sociodemographic characteristics was obtained. Also, the recruited participants did not include AS patients with a mild level of severity. While the results of this initial qualitative study of fatigue in AS patients were promising, further quantitative research is indicated. In particular, additional research with the WF-NRS is required in order to assess its test-retest reliability; to examine its concurrent, discriminative, and construct validity; and to determine its sensitivity to change. Additionally, the translatability of the WF-NRS into other languages and cultures would also need to be examined before using the instrument in global clinical trials. Nonetheless, the interviews completed with patients with AS in the present study led to the development of the WF-NRS and ensured that the content was relevant to the AS population and the clarity, interpretation (as intended per judgement of the study team), response scale, and recall period were all appropriate.
The reported prevalence and bothersomeness of fatigue supports the importance of assessing the concept of worst fatigue severity in patients with AS to capture treatment benefit(s) in a trial setting. Results of concept elicitation interviews and cognitive debriefing of the BFI fatigue severity subscale support the development and content validity of the Worst Fatigue – Numeric Rating Scale to assess that aspect of fatigue. Additional research is warranted to further evaluate the psychometric measurement properties of the instrument, including construct validity, reliability, and sensitivity to change, and, thereby, support its use in clinical trials.
AS: Ankylosing spondylitis; BASDAI: Bath ankylosing spondylitis disease activity index; BFI: Brief fatigue inventory; NRS: Numeric rating scale; PRO: Patient reported outcomes; VAS: Visual analogue scale; WF-NRS: Worst fatigue - numeric rating scale.
ANN and EEH are fulltime employees and hold stock shares and/or stock options with Eli Lilly and Company. Eli Lilly and Company is investigating new pharmaceutical therapies for the treatment of ankylosing spondylitis.
All authors were responsible for study design, data analysis, data interpretation, and manuscript writing. All authors read and approved the final manuscript.
HealthMetrics Outcomes Research, LLC provided medical writing services on this project.
Revicki DA, Luo MP, Wordsworth P, Wong RL, Chen N, Davis JC Jr: Adalimumab reduces pain, fatigue, and stiffness in patients with ankylosing spondylitis: results from the adalimumab trial evaluating long-term safety and efficacy for ankylosing spondylitis (ATLAS).
Mult Scler 1999, 5:10-16. PubMed Abstract
Nurs Res 1993, 42:93-99. PubMed Abstract
Guidance for Industry Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. 2009.
Available from: http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf webcite [Accessed on 01 November 2013]