Reid, J. (2011) The third in our series to take the mystery out of critical appraisal looks at articles based on system...
JOURNAL CLUB
Journal club 3: Jo systematic reviews Jennifer Reid’s series aims to help you access the speech and language therapy literature, literature, assess its credibility and decide how to act on your ndings. Each instalment takes the mystery out of critically appraising a dierent type of journal article. ar ticle. Here, she looks at systematic reviews.
READ THIS SERIES IF YOU WANT TO y
y
y
BE MORE EVIDENCEBASED IN YOUR PRACTICE FEEL MOTIVATED TO READ JOURNAL ARTICLES INFLUENCE THE DEVELOPMENT OF YOUR SERVICE
oons.co.u k to rancar t fr w w w. f
T
here has been an explosion of literature relevant to speech and language therapy over the course of my working life. An article which reviews the current state of play in a relevant area may appeal to timepressed clinicians. Can we expect a review to be more comprehensive than an article on a single piece of research? Should it avoid the need to comb the literature for articles on original research? And may we assume that the scope of the review will be better than we could do ourselves, the reviewers being more knowledgeable than us? Well, the answer is probably both yes and no. Reviews may indeed provide a ready-made synthesis of the available research but they too are open to that enemy of science, bias bias.. This fact, along with the huge expansion, particularly in the medical literature, has led to the development of a new type of review method. The sorts sorts of rev review iewss many of us gre grew w up with, like those presented in textbooks, have been reclassied variously as ‘overviews’, ‘narrative reviews’, or simply ‘non-systematic reviews’. The systematic review has become one of the core tools of evidence-based practice. If you can get your head round its principles and methods, you will nd you are much better equipped to deal with the current literature. The landscape landscape of a systemat systematic ic review may may feel
18
very foreign to the uninitiated – my advice is to persevere, because well-conducted systematic reviews on areas of current concern for speech and language therapy services are invaluable. In Fife, we have found appraising systematic reviews in our journal clubs really helpful (even if they do nip your head to begin with!)
my advice is to persevere because well-conducted systematic reviews on areas of current concern for speech and language therapy services are invaluable The critical appraisal tool for speech and language therapists presented here has been developed primarily from CASP (PHRU, 2006). It provides a structured framework for reading and appraising reports which summarise the
SPEECH & LANGUAGE THERAPY IN PRACTICE SPRING 2011
results of primary research studies. It can be used for systematic reviews with or without meta-analysis (when the reviewers attempt to combine the numerical results from various studies). These methods are at the top level of the ‘evidence hierarchy’, so authors often use the actual terms, ‘systematic review’ or ‘metaanalysis’, in the title of the article. The results of a systemati systematicc review rely not only on the quantity and quality of the primary studies included but also on how well the review and synthesis was conducted. However, a well-conducted systematic review should provide more denitive evidence than any other type of study, even if the results relate only to a circumscribed area. The tool can also be used for other types of review but, for a non-systematic or narrative review, you may wish to use selectively this and the ‘Expert Opinion’ tool presented in the rst article in this series (Reid, 2010). As with other critical appraisal tools, the main themes to be addressed revolve around the study results, their validity (how ‘true’ they are) and to what extent, if any, they might apply to the appraiser’s own context. As previously, magazine subscribers may download a formatted version of the appraisal tool at www. speechmag.com/Members/CASLT to use on their own or with colleagues in a journal club.
JOURNAL CLUB Question 1: What question was being asked, and is this an important clinical question?
Try formulat formulating ing the reviewe reviewers’ rs’ stat stated ed aims into into a research question if they have not done so explicitly in the article. Is the question clearly focused in terms of the PICO framework, which we discussed in the rst article of the series (Reid, 2010): • Population studied • Intervention given given (if it is an intervention study) • Control / comparison (if applicable) • Outcomes considered? Is this question important for your clinical practice? If the reviewers’ question does not quite t the bill, what question(s) do you wish they had asked instead? Question 2: Did the review include the right type of study?
or exclusion of particular designs. As you read more systematic reviews, you will begin to get a better feel for this. The prestigious Cochrane Collaboration (www.cochrane.org), along with other authorities on evidence-based medicine, will try to convince you that a respectable review of an intervention should include only randomised controlled trials (RCTs). This may well be an attainable goal for medical treatments. However, in many areas of speech and language therapy practice, the only available evidence comes from small scale, exploratory studies. Moreover, the UK Medical Research Council’s Framework for Development and Evaluation of RCTs for Complex Interventions to Improve Health (2000) advocates the use of small-scale and exploratory designs in the early phases of development of evidence-informed interventions. Those engag engaged ed in syste systemati maticc revie reviews ws may need to take into account the level of maturity of the eld of research before deciding where to draw the line. Inclusion criteria set too high up the evidence hierarchy increase the danger of arriving at the ‘nil’ result of, for example, a Cochrane review of treatment for acquired dysarthria (Sellars et al .,., 2005) – it found no studies met its inclusion criteria. This result may be a trigger for future research in this area, but it is distinctly unhelpful for clinicians looking for clues to potentially promising treatments, and by default promotes the ‘expert opinion’ route with all its potential biases.
Like other aspects of evidence-based practice, appraisal ‘points’ are scored by playing the game by the rules: systematic reviews should provide an exhaustive summary of the literature relevant to the question in hand, so reviewers are expected to have tried to identify all sources of evidence including those that are in the ‘grey’ literature (such as dissertations, unpublished studies or articles in obscure publications.) For appraising exhaustiveness, the relevant questions to ask are: • Did they follow up reference lists? • Did they make personal contact with experts? • Did they search for unpublished studies? • Did they search for non-English-language studies? If they did, they will mention it, because they know this earns them credit towards publication in respected, peer-reviewed journals! If they failed to do so, is there a danger their review has been seriously compromised? Hmm, I leave you to form your own opinion… Question 4: Did the reviewers assess the quality of the included studies?
Question 3: Did the reviewers try to identify all relevant studies?
The article should prese present nt clear inclusi inclusion on and/or exclusion criteria, so home into this section of the article to consider whether the included studies address the review’s question. Sometimes the primary studies have been designed to answer a dierent question, so it is important to check that a study’s inclusion in the review is justied. Do the included studies have an appropriate study design? In my experience, those who are new to critical appraisal or research design may be inclined to feel that the views of the reviewers are more valid than their own. Deal with any feelings of inadequacy by reading carefully the reviewers’ rationale for inclusion
Which bibliographic databases were used? If you are not yet familiar with the nomenclature, consider whether more than one database was searched. Beware reviews that use only one database source – the eld of speech and language therapy is so cross-disciplinary that it is impossible to predict which journals contain potentially useful articles. For example, my default setting for rapid literature searching is to search simultaneously MEDLINE®, PsychINFO® and possibly ERIC (www.eric.ed.gov) if the question involves school-aged children. This produces some duplicates but also many unique references from only one database.
The main consideration is whether a clear, pre-determined strategy was used to decide which studies were included. Look for a set of dened categories that together form the denition of quality the reviewers have adopted, plus a scoring system – there may be a table of the included studies showing the points awarded against each quality criterion. These sorts of tables often interfere with the readability of an article but you should try not to skip them. They really are crucial to understanding the results of the review – and you may nd that one or more primary studies are worth following up. It is also important that more than one assessor has been involved in rating and scoring the studies. This provides evidence that the quality system is objective and reliable enough to support the credibility of the results.
SPEECH & LANGUAGE THERAPY IN PRACTICE SPRING 2011
19
JOURNAL CLUB Question 5: How are the results presented and what is the main result?
Figure 1 A reminder about Condence Intervals Condence intervals allow you to estimate the strength o the evidence and whether it is denitive (in other words, you don’t need urther studies to check the result). A single study gives you only one example o the diference between two measures, two groups etc. I you repeated the same study several times, you would not get exactly the same result each time. You You can’t know what the ‘real’ diference is, especially rom one study. Calculating a 90 per cent condence interval around your result allows you to say that there is a 90 per cent chance that the ’true’ result lies within this range. I an author is interpreting the condence interval appropriately, you should see comments about both the extent to which their results support their original hypothesis as well as whether any urther studies need to be done. Condence intervals which straddle zero suggest that there may be no real diference or that your study used too ew par ticipants or you to detect the efect denitively.
The two compo component nentss of this questi question on are stated in this order for a reason: how results are expressed can have an important inuence on what you perceive as the main result. You need to consider: • whether the reviewers’ reviewers’ interpretation of numbers was sensible • how the results are expressed (for example, odds ratio; means and condence intervals (gure 1)) and • how large and how how meaningful meaningful this size of result is. Some systematic reviews provide an assessment of quality followed by a verbal synthesis in the form of one or more conclusions, with an indication of the strength of the current evidence for each. In terms of the main result, it can be instructive trying to sum up the ‘bottom-line’ result of the review in one sentence – it does help when trying to communicate the gist of your appraisal to others. And your clinical bottom-line will certainly be needed if your appraisal is to be combined with the appraisal of other evidence in order to produce a clinical guideline or a ‘best practice’ standard, standard, whether for your local context or for a wider audience. Question 6: Samaresults ntha of thePastudies ula If the have been combined, was it reasonable to do so?
Some systematic reviews go beyond qualitative synthesis and present a meta-analysis of the quantitative data from included studies. One
20
crucial concept here is the notion of eect of eect size. size. (Don’t panic! I’m going to talk about numbers now but stay with me…) Calculating an eect size is a method for quantifying the eectiveness of an intervention,, allowing you to compare or intervention combine the results of dierent studies. Numerical calculations are used to produce a number (a statistic!) so you can then compare like-with-like across dierent studies. You can think of it as similar to converting raw scores to standard scores in formal assessments – it allows you to compare a client’s performance in dierent areas of functioning, for example receptive vocabulary vs. comprehension of sentences. The value of a ‘Cohen’s d ’ or other statistic tells you about how big a change has been found in the outcome measure for the intervention. Whether changes can be attributed wholly to the intervention is a moot point, but in general the bigger the average change – the eect size – the more likely we are to believe it was caused by the intervention. A weak eect does not equate to no eect, but it may not show up conclusively in some study designs. To detect weak eects, you usually need lots of study participants. This may be where a meta-analysis comes into its own, as combining the data from lots of smaller studies of a relatively weak eect can provide much more denitive evidence that the intervention really does have an eect. You will come across dierent methods for analysing eect sizes. Percentage of nonoverlapping data (PND) provides a means of translating the results of individual studies into a common currency so you can evaluate them side-by-side. It can be applied to research designs that are lower down the evidence hierarchy, such as single-subject designs (also known as n=1 studies). In gure 2, can you work out which intervention had the stronger eect? For both studies, there is an area of overlap (see arrows) where the relatively high pre-intervention scores of some participants are the same as those of the people with the lowest post-intervention scores. However, this area of overlap is much smaller for intervention B, which translates into a stronger eect size and, numerically, to a larger percentage of non-overlapping data. For study B, we can be more condent that the changes in participants’ performance was indeed treatment eects and/or that the design
SPEECH & LANGUAGE THERAPY IN PRACTICE SPRING 2011
ensured that other inuences on performance were controlled. Calculation of percentage of non-overlapping data may be used to combine the results of small-scale studies for a systematic review with meta-analysis. Meta-analysis of more robust studies, such as RCTs, is more likely to be reported using a forest plot (gure 3) or ‘blobbogram ‘blobbogram’’ (see the logo of the Cochrane Collaboration). These provide a visual display of the eect sizes associated with the included studies, the condence intervals of their results, a summary eect size and condence interval. The convention is for them to include an identier for each study on the left (in order of year of publication), and some scary statistics on the right – though if you can deal with them, you will nd they answer the question about how results are expressed. Weighting is about how much each study contributed to the overall summary measure – the bigger the blob, the more inuential the study. One of our adult acquired journal clubs appraised a review of treatments for dysphagia in neurological disorders (Ashford et al., al., 2009). We found the heavy weighting of a couple of large Logemann studies a concern, because the participants in the Logemann studies were skewed towards people with Parkinson’s Disease and dementia, with very small numbers of people with stroke – very ver y dierent, we thought, from the Fife caseload prole. Question 7: Can the results be applied to the local population?
Clinical recommendations may be oered by reviewers, but without an accumulation of robust, scientic evidence, these are often
JOURNAL CLUB Figure 2 Efect size
Figure 3 Forest plot example (ctional) (ctional) Key
Intervention A pre-
post-
Intervention B pre-
Measure o efect, and thereore weighting, in metaanalysis
postBlack et Black et al ., ., 1999 Connor, 2002
Scores in the overlapping area could either be rom pre- or rom post-testing
Drake et al., al., 2005
Condence interval
Elder et al ., ., 2005
Line o no efect
Foukes, 2009
fairly circumspect. You need to address the usual considerations about potential dierences between the population covered by the review and your own, and whether your local setting is dierent from that of the review to the extent that its results cannot reasonably be applied. You also need to consider whether the intervention is practical and acceptable to clients in your own setting. Question 8: Were all important outcomes considered?
You should try to think whether the reviewers have considered the outcomes of the review from all angles, that is from the point of view of clients, families and carers, and the wider community, as well as speech and language therapists and other professionals, service managers and policy makers. Question 9: Should policy or practice change as a result of the evidence contained in this review?
Summary measure o efect - lateral spread shows condence interval
Summary
Consider whether any reported benet outweighs any risk and/or additional cost. If this information is not reported, can it be lled in from elsewhere? And nally... A good example of a clinically helpful review, in my opinion, was one we reviewed last year in an adult learning disability journal club. The study (van Oorsouw et al., al., 2009) posed a question the group felt was extremely important for them, regarding which aspects of sta training are related to improvements in sta behaviour. The authors authors included included single-subject single-subject and small small sample studies and found 55 studies that met their criteria, which provided relevant data from over 500 participants. Meta-analysis (using percentage non-overlapping data) was applied to the data from all the participants. The result resultss sugges suggested ted that a combi combinatio nation n of in-service (using multiple techniques) with coaching-on-the-job (featuring verbal feedback) is the most powerful format. Even though these results did not really add to what the group already believed, it is important for us to have evidence to support what we are currently doing as well as information to help us break new ground. The journal club sessio session n helpe helped d the sta feel more condent in their practice and gave them ammunition for resisting pressure to undertake sta training that was unlikely to be eective. The study results were also of great great interest to paediatric and adult acquired sta. These days pretty much every speech and language therapist has to do sta training, whether this is with health, education or social care sta, so this review also spoke to their concerns about
how to design and deliver training eectively. Of course, what we really need to know is how to bring about long-term, sustained change in sta behaviour, but unfortunately this study SLTP did not speak to that question. Jennifer Reid is a consultant speech and language therapist with NHS Fife, email
[email protected]. jenniferreid@ nhs.net.
References Ashford, J., McCabe, D.M.A., Wheeler-Hegland, K., Frymark, T., Mullen, R., Musson, N., Schooling, T. & Smith Hammond, C. (2009) ‘Eviden ‘Evidence-based ce-based systematic review: Oropharyngeal dysphagia behavioral treatments. Part III - Impact of dysphagia treatments on populations with neurological Rehabilitatio itation n Researc Research h & disorders’, Journal of Rehabil Development 46(2), Development 46(2), pp.195-204. Framework work for Medical Research Council (2000) A Frame development and evaluation of RCTs for Complex Interventions to Improve Health. Health . Available at http:// www.mrc.ac.uk/Utilities/Documentrecord/index. htm?d=MRC003372 (Accessed 18 February 2011.) Public Health Research Unit (2006) Critical Appraisal Skills Programme. Programme . Available at: www.phru.nhs.uk/ Pages/PHD/CASP.htm (Accessed: 18 February 2011.) Reid, J. (2010) ( 2010) ‘Journal Journal Club: Cl ub: expert exper t opinion’, Speech & Language Therapy in Practice Autumn, pp.17-21. Sellars, C., Hughes, T. & Langthorne, P. (2005) ‘Speech and language therapy for dysarthria due to nonprogressive brain damage’, Cochrane Database of Systematic Reviews Issue 3. Art. No: CD002088. DOI: 10.1002/14651858.CD002088.pub2. van Oorsouw, W.M.W.J., Embregts, P.J.C.M., Bosman, A.M.T. & Jahoda, A. (2009) ‘Training sta serving clients with intellectual disabilities: a meta-analysis of aspects determining eectiveness’, Research in Developmental Disabilities 30(3), pp. 503-511.
Critical appraisal or speech and language therapists (CASLT) Download the ‘systematic review’ framework document from www.speechmag.com/Members/CASLT. Use it yourself or with colleagues in a journal club, and let us know how you get on (email
[email protected]). SPEECH & LANGUAGE THERAPY IN PRACTICE SPRING 2011
21