A growing body of literature reports that patients using CAM indicate experiencing shifts in well-being that extend beyond resolution of the symptoms from which the patient sought relief. These shifts include improvements in overall well-being, energy, clarity of thought, emotional, social, and physical functioning, control/empowerment, connection, and increased focus on one's inner life and spirituality [44–46]. However, the lack of appropriate tools to measure these emergent outcomes in a valid, reliable, comprehensive, and patient-centric way has limited the assessment of them. The two fully patient-centered instruments, the MyMOP and MyCAW [47, 48], have patients identify their most important problems through open-ended questions, and rate their severity over multiple time points. These instruments, however, do not permit the interpretation of those metrics of change across studies, nor do they allow for the capture of unanticipated changes. Our team undertook a research program to develop and evaluate a patient-centered outcome measure to assess impacts of treatments within CAM systems of medicine. This outcome measure was developed through the use of a methodological approach that began not from existing constructs but rather by listening to the experiences of individuals who had undergone CAM treatment. To do so required development of a novel methodology to capture these sensitive shifts.
Our evocative interview and card sort process was innovative in several ways. Once participants selected items, they were asked how the item could be changed to better fit their experiences. In this way the card sort was flexible and responsive to participants' suggestions as they were encouraged to discuss, change and generate new items. The process of evocative interviewing appeared to be therapeutic, in that it provided participants an opportunity to talk about their personal experiences and understand them more deeply. Given time, a supportive environment, and the presence of research staff trained to be empathetic witnesses , many participants took the opportunity to tell their stories and to bring previously buried experiences to the surface. Many expressed gratitude after the interview for the opportunity to tell their stories and reflect on changes that had occurred in their lives. Some indicated that this was the first time they had spoken these stories and in the course of doing so gained insight into their own lives. Interviewers were commonly moved by the experience of witnessing the evocative interview process.
In the process of developing the measure, we struggled to find an appropriate way of presenting the word choices that informants had selected or adapted. In developing these word pairs, it was striking that domains that were highly endorsed as relevant in the positive state were not as highly endorsed when framed in the negative, and vice versa (see Table 4). For example in the spiritual domains, 21 participants endorsed the positive item "I am hopeful", whereas only 7 participants endorsed "I had no hope" as a negative item. This may be due to linguistic features of our wordings or to experiential shifts of the participants, such that they only recognized the issue of hope as it reappeared, rather than as something that was absent. Others have reported that negative items are predictive of different types of outcomes than positive items . This is an area that we explore in our quantitative validation, and that would be appropriate for future qualitative and quantitative research with the instrument.
With regard to diversity in participant responses, it is noteworthy that in Phases Ia and Ib where metaphorical language and full narratives were analyzed, the researchers identified some minor gender, race/ethnicity, and CAM therapy differences in the types of events and situations that were reported as the source of their difficult situations. However, during our crosscheck of the card sort responses by participant category, few differences were identified in item endorsement frequencies. Thus, more general descriptors of the shifts in well-being appear to be more broadly understood and potentially generalizable, regardless of the source of an individual's difficulties.
Our limited sample size restricted our evaluation of diversity associated with different types of CAM therapies to two broad classes: therapies provided by practitioners (e.g. massage, TCM) and self-practices (e.g. yoga, meditation), and we were careful to include outcomes that were rated as relevant for both in our final list. However, in the psychometric evaluation, we hope to begin to explore whether different patterns of outcomes are associated with different CAM therapies. It seems probable that there will be differences associated with whole system interventions such as TCM and Ayurveda (which target many symptoms and conditions simultaneously) versus those interventions that only target specific symptoms (e.g. massage for low back pain), such as those reported by Hsu et al .
The content of our list of items to undergo further testing compares favorably with that presented in recent papers summarizing the qualitative research in CAM [45, 46], and responds to the recent call for the development of such an instrument . As we listened to the voices of our participants, and then developed the more streamlined language of the items, it because increasingly clear that the items set, or a subset of the items, may also be appropriate for use in other settings of complex interventions, such as cardiac rehabilitation, wellness and other lifestyle interventions, mental health interventions, or life coaching settings. It is our hope that this instrument might, as a whole or in part, move into the mainstream of patient-reported outcomes.
Our identification of the need for a retrospective assessment approach is consistent with the results in other fields [52–55]. This measurement problem has been shown to occur in some areas of education and program evaluation, where participants may indicate greater confidence in their knowledge of a topic before an educational session than after. This may be because their notion of how much there is to know has changed, or because their assessment of what they do know has changed, as a result of the session . In these settings, it appears likely that the estimates of change are more accurate if the respondent rates both time points after the session, instead of having one rating before and the other after the intervention .
In relation to CAM research, meditation researchers have expressed concern that scales designed to evaluate participants' changing experiences of meditative states may not provide accurate change scores when administered pre and post. As individuals with no experience become novices and begin a meditative practice, the meanings of the words in the scales may change for them. And as novices become experts, their abilities to discern more subtle states are enhanced, leading to shifting response frames .
The types of biases that are usually associated with standard pre and post measurement of change and with retrospective pretest measurement differ, and it is rare to find settings where the two approaches to assessing patients' subjective states can be compared with a biomarker that can be used as a gold standard. However, in 2007, Nieuwkerk et al. identified such an opportunity in their study of fatigue among patients with HIV infection . In a longitudinal assessment of changes in fatigue levels and quality of life, they found that the retrospective pretest approach to measuring change in fatigue and well-being was more highly correlated with changing viral loads than were contemporaneous assessments. The authors attribute this to a changing internal baseline, such that patients who are worsening may not have a good idea of the full range of possibilities at the initial time points. This has been seen in relation to worsening in other conditions as well . CAM interventions appear from our data to be associated with changing internal baselines in relation to improvement. Thus for CAM interventions, we view the retrospective pretest as a viable option in the assessment of subjective shifts in well-being, and this approach is further evaluated [manuscript in development].
Study limitations to this point
Although our base sample of 119 individuals providing qualitative interviews for secondary analysis was substantial, our study sample for subsequent item development and testing has been relatively small. Phase II, including 28 participants in cognitive interviews  and more than 600 participants completing the draft instrument [manuscript in development], provides greater diversity in gender, race/ethnicity, and education. Phase II also provides greater diversity in the types of conditions being addressed, as well as types of CAM therapies utilized, and will permit us to evaluate the range of responses per item, full use of the scale, and other features of response. Items at this point in the development process were chosen to cover the breadth of experience reported by our informants. The psychometric assessment will provide guidance as to the level of inter-correlation among the items, and any scaling embedded within the instrument. Further, the psychometric assessment will allow the measurement of construct validity for items, such as depression and sleep, against validated scales.