Reid, J. (Winter 2011) The sixth and final article in our series to take the mystery out of critical appraisal looks at...
JOURNAL CLUB 6
Journal club 6:
single subject designs Jennifer Reid’s series aims to help you access the speech and language therapy literature, assess its credibility and decide how to act on your ndings. Each instalment takes the mystery out of critically appraising a dierent type of journall article. journa article. Here Here,, she looks at single single subject desig designs. ns.
H
ave you ever felt uneasy about speech and language therapy clients being lumped together for group intervention studies? Aren’t our client groups simply too heterogeneous to expect that one intervention will be eective for them all? How do you know if a complex intervention will be right for your clients if it has been tested with participants and clinicians whose characteristics are described only in very broad terms? After all, speech and language therapy interventions need to be moulded to meet a client’s individual needs and circumstances to be successful, don’t they? It feels against the grain to allocate clients randomly to dierent intervention groups so that all their individual dierences are washed out! Perhaps such misgivings about group intervention methods are one of the reasons that single case designs remain popular in speech and language therapy research, despite their low ranking in the ‘evidence hierarchy’. We are not alone in this – for example, psychologists working in the eld of acquired brain injury also continue to employ single subject designs, valuing them in particular for their exibility and sensitivity to individual dierences. Thanks to this, there is a practical and validated appraisal tool we can use for these sorts of studies – the Single Subject Experimental Design Scale (SCED) (Tate et al., 2008). Note the word ‘experimental’ in the name. We are not talking here about case studies in which clinicians simply describe clients and their care pathway. In order to contribute to the evidence base, single subject studies need to be of good quality and that means robust methods, pre-planned interventions and accurate, reliable measurement. Not unlike group studies then… Do not be misled by the name, though, as ‘single subject design’ does NOT mean that the studies necessarily involve only one participant. You still need to work out which method has been used – group or single subject design – to choose the right appraisal framework for an intervention study. 18
SPEECH & LANGUAGE THERAPY IN PRACTICE
So when a study presents results from a number of participants, how do I know if this is a group study or a single subject one? If participants are allocated to groups receiving dierent intervention regimes, and group results, rather than individual, are reported, then you are most likely to be dealing with a group intervention study. Your appraisal tool of choice will be one
...to contribute to the evidence base , single case subject studies need to be of good quality - and that means robust methods, pre-planned interventions and accurate, reliable measurement for randomised controlled trials (RCTs) and other group intervention studies such as the one I presented in Journal club 4 (Speech & Language Therapy in Practice, Summer 2011). Single case design, single subject design, n-of-1 trial – if any of these terms are used in the title or abstract, then you are probably dealing with a single subject design with several participants. Single subject experimental studies involve repeated measures from
WINTER 2011
READ THIS SERIES IF YOU WANT TO y
y
y
BE MORE EVIDENCE-BASED IN YOUR PRACTICE FEEL MOTIVATED TO READ JOURNAL ARTICLES INFLUENCE DEVELOPMENT OF YOUR SERVICE
individual participant(s). The wording you will often nd is that ‘participants served as their own control’, which means that the study used repeated measures over time of the individual’s performance in an area not being treated as the comparison for measures in the treated area. The quality essentials of single subject experimentall designs are: experimenta • Performanc Performance e is measured repeatedly to ensure that any intervention eects are sustained over time. time. • Repeated measures designs show how performance varies over time in a way that is usually not possible with group designs reporting group averages. Sometimes it is hard to tell from the title or abstract whether a single subject or group design has been used; indeed, sometimes researchers report both group results and repeated measures. In any case, the SCED Scale will allow you to appraise how well a study using repeated measures was conducted. Even if you are reading a simple (anecdotal) case report, such as those often seen in Speech & Language Therapy in Practice or the Bulletin of the Royal College of Speech & Language Therapists, the domains will help you think about the reasons why the author may be barking up the wrong tree in their conclusions, especially if causal relationships are being implied.
Appraisal The SCE SCED D Scal Scale e is ava availab ilable le as a sin single gle she sheet et pdf le to download from the PsycBITE website (http://www.psycbite.com/docs/The_SCED_ Scale.pdf). It has 11 domains for appraisal, which I have converted into questions and explained. You may, however, wish to start your appraisal with the general questions we usually ask about a study, including whether the question being asked is one that is important for your practice or for your service, and whether a single subject design was a sensible choice of method to answer that question. You should also consider whether the intervention is described in enough detail for you to implement it yourself.
JOURNAL CLUB 6 Question 1: Was the participant’s clinical history adequately described?
One of the main advantages of a single subject design study is its exibility; it can provide a lot of scope for individualisation of the intervention. Consequently, you may see a more direct application to your own context – if there is enough information on the participant(s) to make a reasoned judgement on how similar they are to one or more of your own clients. The SCED Scale suggests age, sex, aetiology and severity must be reported but you may want to know about other issues, for example, response to any previous speech and language therapy intervention. Here is an extract from a helpful description of a participant in a study of an intensive group intervention for young adults with a stammer: “TM was a male, mono-lingual English speaker of African ethnic background, aged 18;0 at the beginning of the study. He had no history of identied speech, language, communication or other diculties. There was a family history of persistent stuttering, with both TM’s father and one brother stuttering into adulthood. TM was reported to have started stuttering at 11 years of age … Limited referral information identied that TM had been known to his local speech and language therapy service for ‘‘several years’’ and had periodically received both individual and group therapy since the age of 13. He had not attended therapy in the 12 months prior to the start of the study” (Fry et al .,., 2009, p.13).
Question 2: Does the study identify measures that can be used to evaluate intervention success? The intervent intervention ion goals need to be precise and properly dened so that they can be measured accurately and reliably. The rst thing to do is to work
out which aspect of the client’s functioning is being addressed in the intervention. Then check wheth whether er the interv interventio ention n goals have been dened for the purposes of the intervention (an ‘operational denition’) in such a way that allows change to be observed. It might help here to apply your knowledge of so-called SMART targets (specic, measurable, attainable, relevant, time-framed). Are the measures likely to be reliable across dierent raters or contexts? The intervention intervention programme programme in the Fry et al . (2009) study included uency management and cognitive behaviour therapy techniques to target both overt and covert stammering symptoms. Appropriate measures were selected to measure change in both these targeted areas, and therefore to assess intervention success. Measures of overt stammering included relatively objective, quantitative measures, which are precisely described and replicable (percentage stammered syllables and mean of the three longest stammered syllables from the rst 500 syllables of 5-minute video recordings made by the participant at home while talking to a family member or friend). Covert symptoms are assessed via three externally validated self-report measures. So the study gets a tick for this question too. Question 3: Is the design good enough to provide evidence of an intervention eect?
The SCED Scale species as the minimum for acceptability a 3-phase design, which should be either: • A reversal or withdrawal design (A-B-A) in which baseline performance is established before treatment is given, performancemeasuredduring treatment and then again after treatment has been withdrawn (or switched to another goal), or • A multiple baselines method across dierent behaviours where only one behaviour is being treated at a time. These designs introduce essential controls which allow you to see whether or not any changes in performance appear to be associated with the intervention. The association should show that change is specic to the intervention goals, and also linked in time with the phases of the study. Here is the description of the 4-phase design adopted by Millard et al . (2009 p.63) to enable any evidence of a treatment eect to be provided. The authors also display the phases and timeline in a helpful gure. “This was a single subject design replicated across participants. There were four phases, each lasting 6 weeks. The length of the phases and the data collection points were arranged to coincide with the current delivery of the [therapy for children who stammer] program. The duration of the study (from the rst week of phase A1 to the last week of phase A2) was matched to the time that families were on the waiting list for an assessment appointment, so that taking part in the study did not disadvantage those who did not receive therapy. This allowed us to establish a no treatment group. During each phase parents video recorded parentchild play sessions at home, once a week. Children who were allocated to the therapy condition completed all phases, while those who were allocated to the waiting list condition completed only the assessment phases (A1 and A2).”
Question 4: Was an adequate baseline established before intervention commenced?
We are in the realms of causality here, and, as discussed in Journal Club 5 on observational designs (Speech & Language Therapy in Practice , Autumn 11), water-tight evidence of cause-and-eect can be elusive even when we are using reasonably robust research methods. A single subject design study never provides denitive evidence of intervention ecacy but if participants show large amounts of specic changes, the results from such a study can be pretty compelling, providing useful preliminary evidence of an intervention eect and therefore of approaches that look ‘promising’. SPEECH & LANGUAGE THERAPY IN PRACTICE
WINTER 2011
19
JOURNAL CLUB 6 Baseline assessment provides information on a participant’s performance in the period before intervention begins. It is good practice to establish performance trends during the ‘baseline’ period, such as whether performance is stable, uctuating, deteriorating or improving. If this trend reverses or changes dramatically during the intervention, this is evidence to support an intervention eect. Trends can only be established if the baseline phase is long enough to allow sampling of performance over time. Here is Christina Samuelsson’s (2011, pp.59-60) description of her baseline assessment from a multiple baseline study of prosodic intervention intervention:: “The participating child was a boy of 4;6 years. Before the intervention was introduced, the child’s prosody was assessed repeatedly (3 times over a period of 9 weeks) using the previously described assessment tool … [which] covers production of prosody at word, phrase and discourse level. The baseline assessment was carried out every third week over the 9-week period. In addition, assessment was also made of other linguistic skills… the boy had problems with prosodic production … [which] were shown to be stable across baseline observations [presented with a bar chart display of these data].” Question 5: Can a treatment response be distinguished from uctuations resulting from other factors?
change in frequency of stammering cannot be attributed to chance alone. Graphs of the results from individual participants provide compelling visual evidence of a treatment eect in some. However, at least one of the ‘no treatment’ participants showed signicant improvement during the second assessment phase so, as the authors point out, other factors must therefore have been operating for this child. Question 6: Is data displayed to show variability?
20
SPEECH & LANGUAGE THERAPY IN PRACTICE
“… the transcriptions from one point in each phase of the study were randomly selected for blind analysis by a second rater. Percentage interrater agreement was based on point-by-point agreement for the presence of stuttering in each syllable (Hubbard & Yairi, 1988). Interrater agreement was calculated using the percentage agreement index (Suen & Ary, 1989): the number of agreements divided by the sum of the number of agreements and the number of disagreements, multiplied by 100. Interrater agreement was 96.9%.” 96.9%.”
Okay, no argument there then – not only careful consideration of the issue of interrater reliability, but also measurement using approaches supported by previous research. Tick! Question 8: Were independent assessors used? Remember that one of the strengths of single subject designs is preservation of individual variation, so studies should employ good visual displays of variability data. Graphs or tables of raw, rather than converted, scores or data from pre-, during and post-intervention phases are usually recommended. So, in the data displays from the study being appraised, can you see at a glance how things vary over time? Millard et al.’s (2009) charts are a good example of appropriate visual displays of individual variation, both of within-phase uctuations in individuals and in dierences in trends across individuals. Question 7: Are measures used reliable?
There are two issue issuess at stake here. Firs First, t, an adequate baseline will have captured information on the range of uctuation present prior to intervention. Second, there needs to be sucient sampling of performance during intervention to be able to dierentiate changes that appear to go beyond the range of ‘normal’ uctuations seen in the baseline. In the Millard et al . (2009) study, percentage words stuttered for each participant is calculated from a weekly video-recording throughout the 6-week baseline phase. This allowed calculation of a mean percentage words stuttered for the baseline phase and then of a range for percentage words stuttered beyond which a
are ‘dicult’ to measure objectively, such as perceptual measures of voice quality. With regard to inter-rater reliability, Fry et al . (2009, p. 643) report that,
Remember that reliability is about getting consistency of results. You will want to be reassured that there is good agreement between dierent assessors in how they measure or rate the performance in question, otherwise systematic dierences between assessors could skew the results (observer bias). If assessment was done by a single individual, is there evidence of intra-rater reliability? Interor intra-rater reliability will be particularly important for any aspects of functioning that
WINTER 2011
The design design should should control control for for undue inuenc inuence e on assessment from over-familiarity with the participants and the phase of the study (more observer bias…). It is good practice for assessment data to be analysed blind to the participant and / or their study phase. Here is an example from a study of constraint-induced therapy for aphasia (Faroqi-Shah & Virion, 2009): “All tests were independently scored for accuracy by both authors and a third research assistant who was blind to the treatment conditions. All discourse samples were transcribed by one of the authors, and 20% of randomly selected samples were transcribed by an independent research assistant who was blind to the treatment condition and time of testing for reliability purposes. Morphosyntactic codes were independently assigned by both authors. Of these samples, 20% were also coded by a research assistant for reliability purposes. Coding reliability exceeded 90%.”
JOURNAL CLUB 6 Question 9: Have the data been analysed statistically?
To evaluate a study on this this SCED Scale domain, you simply have to nd out whether any statistical analysis was used to demonstrate an intervention eect by comparing the results over the phases of the study. You don’t have to know whether it was an appropriate statistical technique that was used. Phew! It appears that authors get Brownie points simply for trying to use inferential stats. Millard et al . (2009) show changes in stammering over the phases of their study using a statistical technique called cusum analysis which they report has been applied to naturally uctuating data. That sounds appropriate, doesn’t it? Moreover, the cusum charts of repeated measures from each participant have the added advantage of displaying the raw data on percentage words stuttered, the upper and lower limits supporting the statistical (cusum) analysis and the changes over the timescales of the phases of the study. Neat!
Critical appraisal for speech and language therapists (CASLT) Download the SCED Scale from www.psycbite.com/ docs/The_SCED_Scale. pdf, or get Jennifer’ Jennifer ’s version with cartoons from www.speechmag ww w.speechmag.. com/Members/CASLT. Use it yourself or with colleagues in a journal club and let us know how you get on.
Question 11: Is there evidence for generalisation and carryover?
Question 10: Is there evidence that any intervention eect can be replicated?
To make use of this intervention in your own context, you need to be condent that the apparent response to intervention is not a ‘one-o’. It’s not much use to know that something worked if it is limited to that particular individual in that particular context. Has the eect been demonstrated with other clients, dierent therapists or in other settings?
For much of what speech and language therapists do, the emphasis in the long term is on the client’s self-management. It is therefore important to know whether the changes ‘kick-started’ by the intervention are shown to impact on functioning in other areas. For example, if the intervention was impairment-based, has there been any carryover to functional communication? Ebert & Kohnert (2009) suggest that treatment of non-linguistic cognitive processing skills may facilitate change in some areas of language processing for children with primary language impairment, but they go no further than demonstrating changes in performance on standardised language tests – we are given no information on impact, if any, on everyday functioning. If the intervention targeted expressive communication using AAC, has there been any impact on the person’s social participation? Beck et al . (2009) used group language stimulation to teach use of AAC techniques to seven participants with complex communication needs but they
provide only anecdotal information on carryover to other settings of the particpants’ increased responsiveness responsiveness and use of AAC.
Unwanted bias In conclusion, when considering the merits of single-subject design studies, please remember that, however methodologically good they are, lack of randomisation does allow unwanted bias to creep in. It may be a lot easier to judge
from the results of a single subject study how the intervention might impact on one of your own clients, but the bottom line is that the study cannot provide the answer to questions about the overall ecacy of this intervention SLTP for all potential recipients. Jennifer Jennif er Reid Reid is a consult consultant ant speec speech h and langu language age therapist with NHS Fife, email
[email protected]. Cartoons are by Fran, www.francartoons.co.uk.
References Beck, A.R., Stoner, J.B. & Dennis, M.L. (2009) ‘An investigation of aided language stimulation: Does it increase AAC use with adults with developmental disabilities and complex Augmentati entative ve and communication needs?’, Augm Alternative Alter native Commu Communicati nication on 25(1), pp.42-54. Ebert, K.D. & Kohnert, K. (2009) ‘Non-linguistic cognitive treatment for primary language impairment’, Clinical Linguistics & Phonetics 23(9), pp.647–664. Faroqi-Shah, Y. & Virion, C. (2009) ‘Constraintinduced language therapy for agrammatism: Aphasiology ology role of grammaticality constraints’, Aphasi 23(7-8), pp.977-988. Fry, J., Botterill, W., & Pring, T. (2009) ‘The eect of an intensive group therapy program for young adults who stutter: a single subject study’, International Journal of Speech-Language Pathology 11(1), pp.12-19. Millard, S.A., Edwards, S. & Cook, F.M. (2009) ‘Parent-child interaction therapy: Adding to the evidence’, International Journal of SpeechLanguage Pathology 11(1), pp.61–76. Samuelsson, C. (2011) ‘Prosody intervention: A single subject study of a Swedish boy with prosodic problems’, Child Language Teaching and Therapy 27(1), pp.56–67. Tate, Tat e, R.L., McDonald, S., Perdices, M., Togh Togher, er, L., Schultz, R. & Savage, S. (2008) ‘Rating the methodological quality of single-subject designs and n-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Neuropsychological ical Rehabilitation 18(4), Scale’, Neuropsycholog pp.385–401.
SPEECH & LANGUAGE THERAPY IN PRACTICE
WINTER 2011
21