BS en 62740. Root Cause Analysis (RCA). (Draft)

December 9, 2017 | Author: apgr2123 | Category: Causality, International Electrotechnical Commission, Business Process, International Organization For Standardization, Expert

Share Embed Donate

Report this link

Short Description

Root Cause Analysis (RCA)...

Description

Draft for Public Comment

Form 36 DPC: 14 / 30267079 DC

BSI Group Headquarters 389 Chiswick High Road London W4 4AL

Date: 12 February 2014 Origin: European

Tel: +44 (0)20 8996 9000 Fax: +44 (0)20 8996 7400 www.bsigroup.com

Latest date for receipt of comments: 9 April 2014

Project No. 2012/01932

Responsible committee: DS/1 Dependability Interested committees:

Title:

Draft BS EN 62740 Root Cause Analysis (RCA)

Please notify the secretary if you are aware of any keywords that might assist in classifying or identifying the standard or if the content of this standard i) has any issues related to 3rd party IPR, patent or copyright ii) affects other national standard(s) iii) requires additional national guidance or information

WARNING: THIS IS A DRAFT AND MUST NOT BE REGARDED OR USED AS A BRITISH STANDARD. THIS DRAFT IS NOT CURRENT BEYOND 9 April 2014 This draft is issued to allow comments from interested parties; all comments will be given consideration prior to publication. No acknowledgement will normally be sent. See overleaf for information on the submission of comments. No copying is allowed, in any form, without prior written permission from BSI except as permitted under the Copyright, Designs and Patent Act 1988 or for circulation within a nominating organization for briefing purposes. Electronic circulation is limited to dissemination by e-mail within such an organization by committee members. Further copies of this draft may be purchased from BSI Shop http://shop.bsigroup.com or from BSI Customer Services, Tel: +44(0) 20 8996 9001 or email [email protected]. British, International and foreign standards are also available from BSI Customer Services. Information on the co-operating organizations represented on the committees referenced above may be obtained from http://standardsdevelopment.bsigroup.com

Responsible Committee Secretary: Mrs Phillippa Younas (BSI) Direct tel: 020 8996 7239 E-mail: [email protected]

Introduction This draft standard is based on European discussions in which the UK took an active part. Your comments on this draft are welcome and will assist in the preparation of the consequent British Standard. Comment is particularly welcome on national legislative or similar deviations that may be necessary. Even if this draft standard is not approved by the UK, if it receives the necessary support in Europe, the UK will be obliged to publish the official English Language text unchanged as a British Standard and to withdraw any conflicting standard. UK Vote Please indicate whether you consider the UK should submit a negative (with reasons) or positive vote on this draft.

Submission of Comments - The guidance given below is intended to ensure that all comments receive efficient and appropriate attention by the responsible BSI committee. Annotated drafts are not acceptable and will be rejected. - All comments must be submitted, preferably electronically, to the Responsible Committee Secretary at the address given on the front cover. Comments should be compatible with version 6.0 or version 97 of Microsoft Word for Windows, if possible; otherwise comments in ASCII text format are acceptable. Any comments not submitted electronically should still adhere to these format requirements. - All comments submitted should be presented as given in the example below. Further information on submitting comments and how to obtain a blank electronic version of a comment form are available from the BSI website at: http://drafts.bsigroup.com/

Date: xx/xx/20xx

Template for comments and secretariat observations 1

2

(3)

MB

Clause No./ Subclause

Paragraph/

No./Annex

Figure/

(e.g. 3.1)

Table/Note

3.1

Definition 1

4

5

Type of com- Commend (justification for change) by ment

ed

Document: ISO/DIS xxxx (6)

(7)

Proposed change by the MB

Secretariat observations on each

the MB

comment submitted

Definition is ambiguous and needs

Amend to read '...so that the mains

clarifying.

connector to which no connection...'

The use of the UV photometer as an

Delete reference to UV photometer.

EXAMPLE ONLY 6.4

Paragraph 2

te

alternative cannot be supported as serious problems have been encountered in its use in the UK.

Microsoft and MS-DOS are registered trademarks, and Windows is a trademark of Microsoft Corporation.

56/1542/CDV COMMITTEE DRAFT FOR VOTE (CDV) PROJET DE COMITÉ POUR VOTE (CDV) IEC 62740/Ed1

® Project number Numéro de projet IEC/TC or SC: Submitted for parallel voting in CENELEC

TC 56

CEI/CE ou SC: Date of circulation Date de diffusion

2014-02-07 Soumis au vote parallèle au CENELEC Also of interest to the following committees Intéresse également les comités suivants

Secretariat / Secrétariat

UK Closing date for voting (Voting mandatory for P-members) Date de clôture du vote (Vote obligatoire pour les membres (P))

2014-05-09 Supersedes document Remplace le document

56/1527/CD and 56/1540/CC Proposed horizontal standard Norme horizontale suggérée Other TC/SCs are requested to indicate their interest, if any, in this CDV to the TC/SC secretary Les autres CE/SC sont requis d’indiquer leur intérêt, si nécessaire, dans ce CDV à l’intention du secrétaire du CE/SC Functions concerned Fonctions concernées Safety Sécurité

EMC CEM

Environment Environnement

Quality assurance Assurance qualité

CE DOCUMENT EST TOUJOURS À L'ÉTUDE ET SUSCEPTIBLE DE MODIFICATION. IL NE PEUT SERVIR DE RÉFÉR ENCE.

THIS DOCUMENT IS STILL UNDER STUDY AND SUBJECT TO CHANGE. IT SHOULD NOT BE USED F OR REFERENCE PURPOSES.

LES RÉCIPIENDAIRES DU PRÉSENT DOCUMENT SONT INVITÉS À PRÉSENTER, AVEC LEURS OBSERVATIONS, LA N OTIFICATION DES DROITS DE PROPRIÉTÉ DONT ILS AURAIENT ÉVENTUELLEMENT CONNAISSANCE ET À FO URNIR UNE DOCUMENTATION EXPLICATIVE.

RECIPIENTS OF THIS DOCUMENT ARE INVITED TO SUBMIT, W ITH THEIR COMMENTS, NOTIFICATION OF ANY RELEVANT PATENT RIGHTS OF WHICH THEY ARE AW ARE AND TO PROVIDE SUPPO RTING DOCUMENTATION.

Title :

IEC 62740/Ed1: Root Cause Analysis (RCA)

Introductory note

ATTENTION IEC – CENELEC PARALLEL VOTING The attention of IEC National Committees, members of CENELEC, is drawn to the fact that this Committee Draft for Vote (CDV) for an International Standard is submitted for parallel voting. The CENELEC members are invited to vote through the CENELEC online voting system.

Copyright © 2013 International Electrotechnical Commission, IEC . All rights reserved. It is permitted to download this electronic file, to make a copy and to print out the content for the sole purpose of preparing National Committee positions. You may not copy or "mirror" the file or printed version of the document, or any part of it, for any other purpose without permission in writing from IEC.

62740/Ed1/CDV © IEC(E) 2

CONTENTS

1 2 3

FOREWORD ........................................................................................................................... 4

4

1

Scope .............................................................................................................................. 7

5

2

Normative references ...................................................................................................... 7

6

3

Terms and definitions ...................................................................................................... 7

7

4

Introduction to RCA ......................................................................................................... 9

8

5

The RCA process .......................................................................................................... 10

9 10 11 12 13 14 15 16 17

6

18 19 20 21

6.1 General ............................................................................................................ 16 6.2 Selection of analysis techniques ...................................................................... 17 6.3 Useful tools to assist RCA................................................................................ 17 Annex A (informative) Summary and criteria of commonly used RCA techniques ................. 19

22 23 24 25

A.1 General ............................................................................................................ 19 A.2 RCA techniques ............................................................................................... 19 A.3 Criteria ............................................................................................................. 20 Annex B (Informative) RCA models ...................................................................................... 23

26 27 28 29 30 31

B.1 General ............................................................................................................ 23 B.2 Barrier Analysis ............................................................................................... 23 B.3 Reason’s Model (Swiss Cheese Model) ........................................................... 24 B.4 Systems models ............................................................................................... 25 B.5 Systems Theoretic Accident Model and Processes (STAMP) ........................... 25 Annex C (Informative) Detailed description of RCA techniques ............................................ 27

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

5.1 5.2 5.3 5.4

Overview.......................................................................................................... 10 Initiation ........................................................................................................... 10 Establishing facts ............................................................................................. 12 Analysis ........................................................................................................... 13 5.4.1 Description ..................................................................................... 13 5.4.2 The analysis team .......................................................................... 14 5.5 Validation ......................................................................................................... 15 5.6 Presentation of results ..................................................................................... 16 Selection of techniques for analysing causes ................................................................. 16

C.1 C.2 C.3

General ............................................................................................................ 27 Events and Causal Factors (ECF) Charting ...................................................... 27 Multilinear Events Sequencing (MES) and Sequentially Timed Events Plotting (STEP) ................................................................................................ 28 C.4 The ‘Why’ method ............................................................................................ 31 C.5 Causes Tree Method (CTM) ............................................................................. 32 C.6 Why-Because Analysis (WBA) ......................................................................... 35 C.7 Fault Tree and Success Tree Method ............................................................... 38 C.8 Fishbone or Ishikawa Diagram ......................................................................... 40 C.9 Safety through Organisational Learning (SOL) ................................................. 42 C.10 Management Oversight and Risk Tree (MORT) ................................................ 43 C.11 AcciMaps ......................................................................................................... 45 C.12 Tripod Beta ...................................................................................................... 47 C.13 Casual Analysis using STAMP (CAST) ............................................................. 49 Annex D (Informative) Useful tools to assist RCA ................................................................ 53

62740/Ed1/CDV © IEC(E) 3 47 48 49 50 51 52 53 54

D.1 General ............................................................................................................ 53 D.2 Data mining and clustering techniques ............................................................. 53 Annex E (Informative) Analysis of human performance ........................................................ 55 E.1 E.2 E.3

General ............................................................................................................ 55 Technique for Retrospective and Predictive Analysis of Cognitive Errors (TRACEr) ......................................................................................................... 56 Human Factors Analysis and Classification Scheme (HFACS) ......................... 58

55

Figure 1 – RCA process ........................................................................................................ 10

56

Figure B.1 – Broken, ineffective and missing barriers causing the focus event ...................... 23

57

Figure C.1 – Example of an ECF chart .................................................................................. 27

58

Figure C.2 – Data in an event building block ......................................................................... 29

59

Figure C.3 – Example of a Time-Actor Matrix ........................................................................ 30

60

Figure C.4 – Example of a Why Tree .................................................................................... 31

61

Figure C.5 – Symbols and links used in CTM ........................................................................ 33

62

Figure C.6 – Example of a Cause Tree ................................................................................. 34

63

Figure C.7 – Example of a WBG ........................................................................................... 37

64

Figure C.8 – Example of a Fault Tree during the analysis ..................................................... 39

65

Figure C.9 – Example of a Fishbone diagram ........................................................................ 41

66

Figure C.10 – Example of a MORT diagram .......................................................................... 44

67

Figure C.11 – Example of an AcciMap .................................................................................. 46

68

Figure C.12 – Example of a Tripod Beta tree diagram ........................................................... 48

69

Figure C.13 – Control structure for the water supply in Walkerton, Canada. .......................... 50

70

Figure C.14 – Example CAST causal analysis for the local Department of Health ................. 50

71 72

Figure C.15 – Example CAST causal analysis for the local public utilit y operations management ......................................................................................................................... 51

73

Figure E.1 – Example of an TRACEr Model [23] .................................................................... 56

74

Figure E.2 – Generation of Internal Error modes ................................................................... 57

75

Figure E.3 – Level 1 Unsafe Acts .......................................................................................... 59

76

Figure E.4 – Level 2 Preconditions ....................................................................................... 59

77

Figure E.5 – Level 3 Supervision Issues ............................................................................... 60

78

Figure E.6 – Level 4 Organisational Issues ........................................................................... 60

79 80

Table 1 – Steps to RCA ........................................................................................................ 10

81

Table A.1 – Brief description of RCA techniques ................................................................... 19

82

Table A.2 – Summary of RCA technique criteria.................................................................... 20

83

Table A.3 – Attributes of the generic RCA techniques ........................................................... 22

84

Table B.1 – Examples of barriers .......................................................................................... 24

85

Table B.2 – Example Barrier Analysis Worksheet .................................................................. 24

86

Table C.1 – Direct and indirect causal factors ....................................................................... 43

87

Table E.1 – External Error Modes ......................................................................................... 58

88

Table E.2 – Psychological Error Mechanisms ........................................................................ 58

89

62740/Ed1/CDV © IEC(E) 4 90 91 92

INTERNATIONAL ELECTROTECHNICAL COMMISSION

93

____________

94 95 96 97 98 99

ROOT CAUSE ANALYSIS (RCA)

FOREWORD

100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133

1) The International Electrotechnical Commission (IEC) is a worldwide organization for standardization comprising all national electrotechnical committees (IEC National Committees). The object of IEC is to promote international co-operation on all questions concerning standardization in the electrical and electronic fields. To this end and in addition to other activities, IEC publishes International Standards, Technical Specifications, Technical Reports, Publicly Available Specifications (PAS ) and Guides (hereafter referred to as “IEC Publication(s)”). Their preparation is entrusted to technical committees; any IEC National Committee interested in the subject dealt with may participate in this preparatory work. International, governmental and nongovernmental organizations liaising with the IEC also participate in this preparation. IEC collaborates closely with the International Organization for Standardization (ISO) in accordance with conditions determined by agreement between the two organizations.

134 135

International Standard IEC 62740 has been prepared by IEC technical committee 56: Dependability.

136

The text of this standard is based on the following documents:

2) The formal decisions or agreements of IEC on technical matters express, as nearly as possible, an international consensus of opinion on the relevant subjects since each technical committee has representation from all interested IEC National Committees. 3) IEC Publications have the form of recommendations for international use and are accepted by IEC National Committees in that sense. While all reasonable efforts are made to ensure that the technical content of IEC Publications is accurate, IEC cannot be held responsible for the way in which they are used or for any misinterpretation by any end user. 4) In order to promote international uniformity, IEC National Committees undertake to apply IEC Publications transparently to the maximum extent possible in their national and regional publications. Any divergence between any IEC Publication and the corresponding national or regional publication shall be clearly indicated in the latter. 5) IEC itself does not provide any attestation of conformity. Indep endent certification bodies provide conformity assessment services and, in some areas, access to IEC marks of conformity. IEC is not responsible for any services carried out by independent certification bodies. 6) All users should ensure that they have the latest edition of this publication. 7) No liability shall attach to IEC or its directors, employees, servants or agents including individual experts and members of its technical committees and IEC National Committees for any personal injury, property dama ge or other damage of any nature whatsoever, whether direct or indirect, or for costs (including legal fees) and expenses arising out of the publication, use of, or reliance upon, this IEC Publication or any other IEC Publications. 8) Attention is drawn to the Normative references cited in this publication. Use of the referenced publications is indispensable for the correct application of this publication. 9) Attention is drawn to the possibility that some of the elements of this IEC Publication may be the subject of patent rights. IEC shall not be held responsible for identifying any or all such patent rights.

FDIS

Report on voting

XX/XX/FDIS

XX/XX/RVD

137 138 139

Full information on the voting for the approval of this standard can be found in the report on voting indicated in the above table.

140

This publication has been drafted in accordance with the ISO/IEC D irectives, Part 2.

62740/Ed1/CDV © IEC(E) 5 141 142 143

The committee has decided that the contents of this publication will remain unchanged until the stability date indicated on the IEC web site under "http://webstore.iec.ch" in the data related to the specific publication. At this date, th e publication will be

144 145 146 147

• • • •

reconfirmed, withdrawn, replaced by a revised edition, or amended.

148 149 150

The National Committees are requested to note that for this publication the stability date is 2018

151 152

THIS TEXT IS INCLUDED FOR THE INFORMATION OF THE NATIONAL COMMITTEES AND WILL BE DELETED AT THE PUBLICATION STAGE .

153 154

62740/Ed1/CDV © IEC(E) 6 155 156

INTRODUCTION

157 158 159 160 161

Root Cause Analysis (RCA) refers to any systematic process that identifies factors that contributed to a particular event of interest (focus event). RCA aims to reveal root causes so that either the likelihood of them occurring or their impact if they do occur can be changed. RCA is performed with the understanding that events are addressed by understanding the root causes, rather than the immediately obvious symptom s.

162 163 164 165 166 167

RCA is used to analyse the root causes of focus events with both positive and negative outcomes, but it is most commonly used for the analysis of failures and incidents. Causes for such events can be varied in nature, including design, processes and techniques, organisational characteristics, human aspects and external events. RCA can be used for investigating the causes of non-conformances in quality (and other) management systems as well as for failure analysis.

168 169 170

An important distinction to make is that RCA is used to analyse a focus event that has occurred and therefore analyses the past (a posteriori). However, knowledge of the root causes of past events can lead to actions that generate improvements in the future.

171 172 173 174 175

This standard is intended to reflect current good pract ices in the conduct of RCA. This standard is general in nature, so that it may give guidance across many industries and situations. There may be industry specific standards in existence that establish preferred methodologies for particular applications. If these standards are in harmony with this publication, the industry standards will generally be sufficient.

176 177 178

This international standard is a generic standard and does not explicitly address safety or accident investigation although the methods described in this standard may be used for this purpose.

179

62740/Ed1/CDV © IEC(E) 7 180 181

ROOT CAUSE ANALYSIS

182

1

Scope

183 184

This International Standard describes the basic principles of Root Cause Analysis (RCA) and specifies the steps that a process for RCA should include.

185 186 187

This International Standard identifies a number of attributes for RCA techniques which assist with the selection of an appropriate technique. It describes each RCA technique and its relative strengths and weaknesses.

188 189 190 191 192

RCA is used to analyse focus events that have occurred, therefore this International Standard only covers a posteriori analyses. It is recognised that some of the RCA techniques with adaptation could be used proactively in the design and development of items and for causal analysis during risk assessment; however, this International Standard focuses on the analysis of events which have occurred.

193 194 195

The intent of this International Standard is to explain the techniques to identify root causes. These techniques are not designed to assign responsibility or liability, which is outside the scope of this International Standard.

196

2

197 198 199 200

The following documents, in whole or in part, are normatively referenced in this document and are indispensable for its application. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

201 202

IEC 60050-191, International Electrotechnical Vocabulary Terminology

203

3

204 205

For the purposes of this document, the definitions given in IEC 60050-191 and the following apply.

206 207 208

3.1 cause set of circumstances that leads to failure or success

209 210

Note 1 to entry: maintenance.

211

[SOURCE: IEC 60050-191, definition 191-43-11 modified – to relate to failure or success]

212 213 214 215

3.2 causal factor condition, action, event or state that produced or contributed to the occurrence of the focus event

216 217 218 219

3.3 contributory factor a causal factor regarded as secondary, according to an explicitly-given prioritisation of causal factors

220 221 222

3.4 direct cause immediate cause where there is no other cause between the direct cause and the focus event

223

Note 1 to entry: There may be more than one direct causal factor.

Normative references

– Part 191: Dependability,

Terms and definitions

A cause may originate during specification, design, manufacture, installation, operation or

62740/Ed1/CDV © IEC(E) 8 224 225 226

3.5 event occurrence or change of a particular set of circumstances

227 228 229

Note 1 to entry: An event can be one or more occurrences, and can have several causes.

230

[SOURCE: ISO Guide 73 2009 [1]]

231 232 233

3.6 failure (of an item) loss of ability to perform as required

234 235 236 237 238 239 240 241

Note 1 to entry: When the loss of ability is caused by a pre-existing latent fault, the failure occurs when a particular set of circumstances is encountered.

242

[SOURCE: IEC 60050-191, definition 191-43-01]

243

Note 5 to entry: This is failure of an item, not more generally of behaviour.

244 245 246

3.7 failure mechanism process that leads to failure

247

Note 1 to entry: The process may be physical, chemical, logical, psychological or a combination thereof.

248

[SOURCE: IEC 60050-191, definition 191-43-12 modified – psychological added]

249 250 251

3.8 focus event event which the RCA is intended to explain causally

252 253 254 255

3.9 necessary causal factor (of an event or state) condition, action, event or state, that resulted in the given event or state, without which the given event or state would not have occurred

256 257 258

3.10 human error discrepancy between the human action taken or omitted, and that intended or required

259 260

Note 1 to entry: Ed.1 of IEC 60050-191 identified “mistake” as a synonym for human error, but a mistake is a type of human error.

261

[SOURCE: IEC 60050-191, definition 191-43-14]

262 263 264

3.11 item subject being considered

265 266 267 268

Note 1 to entry: The item may be an individual part, component, device, functional unit, equipment, subsystem, or system.

269

[SOURCE: IEC 60050-191, definition 191-41-01]

270 271 272

3.12 root cause type of cause with no predecessor that is relevant for the purpose of the analysis

Note 2 to entry: An event can consist of something not happening. Note 3 to entry: An event can sometimes be referred to as an “incident” or “accident”.

Note 2 to entry: A failure of an item is an event that results in a fault state of that item. Note 3 to entry: Qualifiers, such as catastrophic, critical, major, minor, marginal and insignificant, may be used to categorize failures according to the severity of consequences, the choice and definitions of severity criteria depending upon the field of application. Note 4 to entry: Qualifiers, such as misuse, mishandling and weakness, may be used to categorize failures according to the cause of failure.

Note 2 to entry: The item may consist of hardware, software, people or any combination thereof. Note 3 to entry: The item is often comprised of elements that may each be individually considered .

62740/Ed1/CDV © IEC(E) 9 273 274 275

Note 1 to entry: A focus event normally has more than one root cause .

276 277 278 279

3.13 Root Cause Analysis RCA systematic process to identify the causes of a focus event

280 281 282

Note 1 to entry: IEC 60050-191, definition 191-52-05 provides the following more restrictive definition “systematic process to identify the cause of a fault, failure or undesired event, so it can be removed by design, process or procedure changes”. This standard uses an extended definition to allow a wider applicability of the process.

283 284 285 286

3.14 stakeholder person or organisation that can affect, be affected by, or perceive themselves to be affected by a decision or activity

287

[SOURCE: IEC 60300-1, definition 3.1.17 [2]]

288 289 290 291

3.15 stopping rule a reasoned and explicit means of determining when a necessary causal f actor is defined as being a root cause

292

4

293 294 295 296 297

RCA refers to any systematic process that identifies the cause or causes that contribute to a focus event. The immediate or obvious cause of a focus event is often a symptom of underlying causes and may not truly identify the root cause or causes that should be identified and addressed. RCA provides a greater understanding about why events have occurred. RCA may identify the following:

298



single root cause;

299 300



multiple root causes in which the elimination of any cause will prevent the focus event from occurring;

301 302



root causes which are causal factors where elimination will change the likelihood of the focus event occurring but may not directly prevent it;

303



root causes of successes.

304 305 306 307 308

By addressing the root cause or causes it is possible to make decisions regarding appropriate actions that will generate better outcomes in the future; implementing appropriate actions based on RCA are more effective at preventing the same or similar events with negative outcomes occurring or increasing the probability of repeating events with positive outcomes, when compared with just addressing the direct cause.

309

RCA can be applied to any focus event whether success or failure, for example:

310



investigation for technological, medical and occupational focus events;

311 312



failure analysis of technological systems, to determine why an item failed to perform as and when required;

313



analysis of quality control and business processes;

314



analysis of successful outcomes.

315 316 317

RCA can be carried out at various levels of decomposition for example from system to component level or by selecting different events or outcomes as a starting point. The level appropriate to conduct the analysis will be dependent on the focus event.

318 319

RCA is used to analyse focus events which have actually occurred and is therefore applicable during the testing and operational phases of a project or product life cycle.

Note 2 to entry: In some languages the term root cause refers to t he combination of causes which have no causal predecessor (a cut set of causes).

Introduction to RCA

62740/Ed1/CDV © IEC(E) 10 320

The benefits of performing RCA include:

321



obtaining a greater understanding into what has happened;

322



finding the source of problems so corrective action can prevent future events;

323



identifying the causes of events with beneficial outcomes so they can be repeated;

324



identifying more effective actions to address the causes of focus events;

325



achieving the objectives of focus event inves tigations more effectively;

326



supporting traceability between focus event investigation evidence and conclusions;

327



increasing consistency between investigations of similar focus events;

328



increasing objectivity of focus event analysis.

329

5

The RCA process

330

5.1

331 332 333

To be effective RCA should be performed systematically as an investigation, with the root causes and conclusions backed up by documented evidence. To achieve this, RCA should include the five steps shown in Table 1 and illustrated in Figure 1.

334

Table 1 – Steps to RCA

Overview

Step

335 336 337 338 339 340

Concepts and tasks to be performed

Initiation

Determine the need to carry out RCA and define the purpose and scope

Establishing Facts

Collect data and establish the facts of what happened, where, when and who

Analysis

Use RCA tools and techniques to ascertain how and why the focus event occurred

Validation

Distinguish and resolve the different possibilities as to how and why the focus event was caused

Presentation of Results

Present the results of the focus event analysis

RCA is iterative in nature especially the data collection and analysis , in that data is collected on ‘what’ happened, which is then analysed in order to determine what ot her data needs to be collected. Once gathered, further analysis is conducted and any gaps identified, for which further data is collected. This process is repeated until the purpose of the analysis is fulfilled and the root causes identified. The outputs of the RCA will be dependent on the purpose and scope.

Establish the need, purpose & scope

What, Where, When & Who

How and why (identify potential causes)

Potential causes distinguished and resolved Outputs

Immediate Facts

Data Collection

Application of specific tools

Further data and testing

Step 1: Initiation

Step 2: Establishing Facts

Step 3: Analysis

Step 4: Validation

Step 5: Presentation of Results

341 Figure 1 – RCA process

342 343

5.2

Initiation

344 345 346

The first step initiates the RCA process by evaluating the need to carry out RCA. It defines the purpose and scope of the analysis, in response to the focus event, and establishes a team and resources to carry out the RCA.

62740/Ed1/CDV © IEC(E) 11 347 348

There is usually some criterion which is used to determine when an RCA is required, which may include:

349



any single event with a large effect;

350



multiple similar undesirable events;

351



a parameter moving out of a defined tolerance level;

352 353



failures or successes (whatever the level of effect) that involve critical items of equipment or activities.

354 355 356 357 358 359 360

When defining the type of events that require the conduct of RCA, it is important to consider that an event with a large effect may have common root causes to events with minor effects. Analysing and addressing root causes for events with minor effects may prevent a large effect event occurring. Examples of events where RCA may be required include: completion of a project (successes and failures), failures that result in unacceptable costs, injury or fatality, unacceptable performance or delays, large contractual consequences, and equipment breakdown.

361 362 363 364 365 366

If a RCA is required, the focus event(s) to be analysed is/are described and an appropriate team appointed for the analysis. The description should include the background and context in which the focus event(s) occurred. A good description of a focus event is short, simple, and easy to understand and should not be biased towards a specific solution. This description is used to identify appropriate members of the analysis team and ascertain where to start collecting data.

367 368 369 370

The purpose and scope of the RCA should be determined taking into account knowledge of system, functions, interfaces etc. In some cases the purpose of the analysis is to iden tify the causes of the focus event, in others the purpose may be broader. For example, to also identify matters of concern outside those that led to the focus event.

371 372

There are in general two different types of RCA that have different objectives and should not be mixed up. These two types can be described as follows:

373



analysis of an observed focus event from other observed events;

374



analysis of an observed focus events from potential events, not actually observed.

375 376 377 378 379

The first version focuses on observed facts onl y. It may be an analysis "per se" according to the purpose of the study and no hypothesis about event occurrence is acceptable for this analysis. The second version focuses on potential (non -observed) events which may have led to the actually observed focus event. It can be implemented when more facts are needed and the use of hypothesis on the occurrence of events is acceptable for this analysis.

380

The outputs required of the RCA should also be identified, examples are as follows:

381 382



provide a description of each root cause along with sufficient background information to allow the identification of suitable actions;

383 384



recommend actions that, taken together, prevent further occurrences of similar events with adverse consequences and improve the likelihood of succes ses;

385



identify, implement and review actions to address root causes.

386 387 388 389

RCA can include the analysis of open systems ( systems which continuously interact with their environment); this interaction can take the form of information, energy, or material transfer . Therefore, the scope should specify the boundary of the analysis (what is included and what is excluded).

390 391 392 393

The scope of the analysis should where possible include a definition of the ‘stopping rule’ , which is the point at which action can be defined or additional proof of cause is no longer necessary for the purpose of the analysis. For example the last point where corrective action can be identified, before factors that cannot be influ enced e.g. weather. It may however be

62740/Ed1/CDV © IEC(E) 12 394 395

more appropriate to ascertain the stopping rule at review points that determine whether further analysis is required.

396 397

The team members for the analysis should be selected based on the specific expertise needed to analyse the focus event. The team should include:

398 399



a person or persons who between them can provide a complete systems overview and knowledge of the programme or project and focus event;

400 401



a facilitator skilled in the causal analysis technique, desirably trained or experienced in the facilitation of the causal analysis technique;

402



at least one member experienced in the causal analysis technique being applied.

403 404 405 406 407 408 409 410 411 412 413

Team members can change depending on the activity being conducted e.g. team members responsible for data collection will not necessarily be the same as those conducting the analysis. Team members can include engineers, technicians, operators, sales representatives, managers etc. The use of external parties should be considered to provide an independent perspective and avoid blind spots that may exist in the organisation. Additional team members should be included for specific activities during the investigation to either bring expertise into the team or to increase the influence of the team. The role of these additional team members is to support the investigation so that it is not h alted for technical or organisational boundary reasons. It is not appropriate for persons who may have had a role in causation of the focus event to be part of the team. Their input should be collected during the first two steps.

414 415 416 417

RCA can be effectively carried out by one person provided that person is experienced with the causal analysis technique, is a domain expert (or has immediate access to domain experts) and has access to all the data required. However, it is more common to conduct RCA as a team.

418

5.3

419 420 421

This step includes all the activities necessary to prepare for the analysis. Establishing the facts is usually the largest part of the RCA. Facts should be determined on ‘what’ happened, ‘where’, ‘when’ and ‘by whom’.

422 423

Data should be collected, before it is lost (e.g. before evidence is disturbed or removed, or memories fade). Data should aim to identify:

424



the context in which the focus event occurred;

425



the conditions before, during, and after the focus event;

426



personnel involvement including actions taken (or not taken) and decisions made;

427



context data about the surroundings, including environmental data ;

428 429



how the organisation operates including organisation charts, processes and procedures, training and skills data;

430



historical data relating to similar events or precursors;

431



deviations from the expected;

432



interactions with other items and personnel;

433



equipment involved and its operating state .

434

The following lists examples of data that may be relevant.

435 436 437



Inspection of physical evidence such as failed components and failure reports. Generally, it is experience that will determine what physical evidence is required. If there is doubt, the evidence should be retained, it is also important to preserve the evidence.

438 439



Photographs and videos taken by monitorin g cameras. Photographing the area of the occurrence from several views will also be useful in the analysis phase.

Establishing facts

62740/Ed1/CDV © IEC(E) 13 440 441 442 443



Operational data, recorded by monitoring systems, control systems, alarm and event loggers etc. Operator logs can be critical to understanding the operating conditions at the time of failure and since they are typically dated, they are ideal for generating a timeline of events.

444 445 446 447 448 449 450



Personal testimony gathered by conducting interviews. Interviews should concentrate on data collection e.g. building a consistent timeline etc; any premature discussion of the causes of the failure may adversely impact the interview process. Questions should be prepared before the interview to ensure that all necessary information is obtained. Interviews should be conducted with those people, who are most familiar with the focus event, however, consider interviewing other personnel e.g. people who have performed the job in the past. All interviews should be documented.

451 452



Documentary evidence of relevant procedures, o perating environment and regulatory environment.

453 454 455 456

Once all the data associated with the focus event has been collected, the data should be reviewed for correctness and suitability, missing data should be obtained and any inconsistencies should be resolved to ensure a clear and consistent picture of the focus event is determined.

457 458 459

This step can include failure analysis which examines failed components using a wide array of methods including microscopy, spectroscopy and Non-Destructive Testing or models on the development of failure such as fire modelling or crash modelling.

460 461

The outcome of this step is information and understanding, supported by physical evidence and witness statements, concerning:

462



what happened including the circumstances that lead to the focus event;

463



the time sequence of events which lead to the focus event;

464



the location of the focus event;

465



actions of people associated with the focus event;

466



any necessary conditions for the focus event;

467



the consequences of the focus event.

468

5.4

469

5.4.1

470 471 472 473

Having determined ‘what’ happened, ‘where’, ‘when’ and ‘by whom’, this step establishes ‘how’ and ‘why’ the focus event occurred. The objective of this step is to understand the focus event and its causes by structuring the data that has been collected into a form that allows root causes to be systematically derived.

474 475 476 477 478

RCA normally analyses facts to identify the causal factors that were necessary for the focus event to occur. However in some cases, for example where sufficient facts are not available, the analysis may propose one or more hypotheses for cause and may also identify contributing factors and prevailing conditions which were possibly associated with the focus event, but cannot be proven to be necessary causes.

479

Analysis involves the following.

480 481



Organising the physical evidence and witness statements concerning actions, events, conditions and outcomes.

482 483 484



Seeking how and why the focus event occurred using the data collected to justify conclusions. Models of causation, laboratory testing, check lists and taxonomies or statistical analysis of data may be used to assist this process.

485 486



Looking beyond direct causes to why those causes occurred. The aim is to seek all factors (and conditions) that contributed, not only the immediate or obvious causes.

Analysis Description

62740/Ed1/CDV © IEC(E) 14 487 488



489 490 491 492

In general causes may involve technical issues, human aspects and factors relating to the physical or psycho-social environment in which the focus event occurred. All of these should be considered in seeking root causes. People with expertise in these areas should therefore be involved in the analysis.

493 494 495

Causes should be described clearly and unambiguously. Where a human action, omission or decision is identified as a cause, the nature of the act or decision should be specified e.g. ‘the operator switched off the wrong power switch’ and not the human error.

496

The analysis of cause (depending on the purpose and scope of the analysis) can consider:

497 498



how the focus event occurred i.e. the physical chemical psychological or logical process that was involved;

499



preceding events or conditions that were necessary to the focus event occurring;

500 501



relationships between causal factors including how they combined to cause the focus event and how a root cause leads to the focus event;

502 503



organisational and management influences and human factors that were involved in the focus event or in the events and conditions leading up to it;

504 505



prevailing conditions that may have contributed to the event occurring but were not necessary causes;

506 507



matters of concern that could lead to other focus events (these are not strictly causes but may be an outcome of the analysis).

508 509 510 511 512 513 514 515

A structured analysis technique should be used to perform the analysis. S everal formal techniques exist ranging from those that are based on a checklist of causes to techniques that guide the analyst through consideration of causes and graphically present the results. The techniques range from simple to complex and require suitably skilled practitioners or facilitators to conduct the analysis. Some techniques are based on particular models of how a focus event occurs and hence give a particular emphasis to the results. The different models are based on different hypotheses with regard to the causation therefore they tend to lead the investigator to identify different contributory factors.

516 517

In some cases it is appropriate to use more than one technique or take into account considerations of more than one model to identify all root causes.

518 519 520

Models of causation are described in Annex B and a nalysis techniques are described in Annex C. The most appropriate technique will be dependent on the focus event and the purpose and scope of the analysis (see clause 6).

521 522 523

The analysis may indicate that further data is required. Requests for such data should be expected to occur throughout the analysis to resolve inconsistencies or complete gaps in the analysis. The analysis should continue until a ‘stopp ing rule’ is invoked.

524

5.4.2

525 526

A leader should be appointed for the analysis step, who is responsible for the following preparatory work.

527 528



Obtaining copies of the agreed role and responsibilities of the team, and purpose and scope of the analysis.

529



Obtaining copies of the focus event description and the facts established.

530



Deciding the analysis technique(s) to be used.

531 532



Converting the focus event description and facts established into a suitable format for use in the analysis technique selected.

533



Developing an analysis plan.

Continuing this process until the stopping rule is invoked and root causes identified. There may be multiple root causes which can be independent or inter related.

The analysis team

62740/Ed1/CDV © IEC(E) 15 534



Forming the analysis team.

535



Facilitating or arranging for training of team members in the analysis technique selected.

536



Selecting software tools or other templates for use during the analysis .

537 538 539



Arranging for a search to be made of databases, media, legal proceedings etc. to identify focus events of a similar nature, or which may have occurred with the same or similar technologies.

540 541 542 543 544

The leader should review the information available to determine what analysis technique(s) should be applied and what skills are required. Expert advice in the field of RCA may need to be obtained regarding the selection of the analysis technique. The leader may also require an expert RCA facilitator for all or part of the analysis, depending on the complexity o f the focus event, the complexity or volume of evidence and data or analysis technique selected.

545 546 547 548 549 550

The analysis is usually carried out by a team, with each team member being chosen for their experience and skills. The analysis team should be as small as pos sible, consistent with the relevant technical and operating skills and experience being available. Where input will be required from multiple parties, stakeholders or entities, the analysis team should contain representatives of each. It is the leaders responsibility to ensure the relevant stakeholders are informed, so that adequate stakeholder representation is available during the analysis.

551 552 553 554

The role and responsibilities of the analysis team members should be determined and milestones established at the outset of the analysis. A programme of meetings should be developed, which reflects the objectives and milestones provided to the analysis team. This will ultimately enable any recommendations to be carried out in a timely fashion.

555

The leader should develop an analysis plan, which should contain the following:

556



focus event description;

557



agreed roles and responsibilities of the team, and the purpose and scope of the analysis;

558



a list of team members and the stakeholders to be represented;

559



time, date and location of the analysis meetings;

560



a summary of the data available;

561



analysis technique(s) to be used;

562 563



arrangements for training the analysis team in the selected analysis technique (if required);

564 565



the form of recording of the analysis and the analysis results, inclu ding reference to any templates or software tools to be used.

566 567 568 569 570 571 572

Adequate room facilities with visual and recording aids should be arranged by the leader for the efficient conduct of the analysis meetings. A briefing package consisting of the analysis plan and any essential pre-meeting reading or references should be sent to the analysis team members in advance of the first meeting, to allow them to familiarise themselves with the information available and the selected analysis technique. At the first analysis meeting a physical review of the focus event via a re-enactment, mock-up, computer simulation or scale model is desirable.

573 574 575

The leader should ensure that an appropriate communication system is in place for informing and transferring the results of the anal ysis to those responsible for the next step of the RCA process (see 5.5).

576

5.5

577 578 579 580

A number of review activities are conducted throughout the RCA process to determine whether data collected is relevant and the analysis is representative of the data colle cted. This step tests whether the causes identified in the analysis can be substantiated and may be interleaved with the analysis or conducted as a separate activity .

Validation

62740/Ed1/CDV © IEC(E) 16 581 582 583 584 585 586 587

An independent review can be carried out to assess whether the analysis is complete and correct and to determine whether the purpose of the analysis has been fulfilled. The review method will be dependent on the analysis technique used and on the focus event. In some cases experiments can be performed to repeat the occurrence of the focus event, where appropriate statistical methods should be used to assess the degree of confirmation of the hypothesis of cause. If the causes are validated with the help of simulation, care should be taken to ensure that the simulation is true in the casually-relevant respects.

588 589 590

During the analysis there may be several alternative hypothesis of how the event could have happened, at the completion of the analysis the causes should be validated and only a single hypothesis should remain.

591 592

This step may require further data collection to seek direct evidence to support or refute the causes identified. Evidence may not always be available to fully validate all potential causes.

593

5.6

594 595 596 597 598 599 600

The results of the analysis will depend on the purpose of the analysis. For example, if the purpose of the analysis is to identify actions that, taken together, prevent further occurrences of similar events, the analysis results should identify corrective actions which break the causal network and prevent the focus event occurring again. If the purpose of the analysis is to repeat successes, then actions that enhance the likelihood or the consequences of the focus event should be proposed. The effectiveness of the analysis results is dependent on the quality of the analysis.

601 602 603 604

An agreed format for presenting the results of the RCA should be developed that summarises the analysis and captures the required outcomes from the analysis e.g. recommended actions. If recommended actions are the required outcome from the RCA, t he summary should include the following as a minimum:

605 606 607 608



a general description of each cause and contributory environmental conditions requiring action along with sufficient background information and detail to allow those who will make recommendations and implement them to be able to understand the need to address each cause and justify actions to be taken;

609 610



a set of options for treatment actions, and where practicable and within scope a summary of the benefits and costs of each;

611



recommended actions to address each of the causes identified.

612 613

Recommended corrective actions should be evaluated for effectiveness and realism. In general corrective actions aim to achieve the following:

614 615 616



change the likelihood of the focus event and/or its consequences (i.e. reduce the likelihood or consequence of undesirable events or increase the likelihood or consequence of successful events);

617 618



not introduce new unacceptable risks e.g. the safety of other systems must not be degraded by the proposed corrective action.

619 620 621 622 623

Where actions are identified they should be reviewed prior to implementation to determine whether they have not only addressed the root causes, but also not introduced new unexpected consequences and will therefore function as intended. Also reoccurrence of the same or similar event should be monitored. If an unexpected occurrence reoccurs, the original focus event should be re-evaluated to determine why actions were not effective.

624

6

625

6.1

626 627

Formal techniques have been devised to help analysts identify causes and eventually root causes. Analysis techniques may perform one or more of the following functions:

628 629



Presentation of results

Selection of techniques for analysing causes General

organise data and provide structure to the analysis and identify where further evidence is needed;

62740/Ed1/CDV © IEC(E) 17 630 631



provide a visual display of the evidence relating to the focus event for example the time sequence of events or causal chains;

632 633



conduct statistical analysis of the data, particularly from multiple similar events, to identify common causes;

634 635



provide guidance to identify possible causes for further investigation and comparison with data (such methods include check lists and methods based on models of causation );

636



guide the analysts through multiple levels of cause to a set of root causes.

637

6.2

638 639 640

RCA is undertaken in varying degrees of depth and may use one or several analysis techniques ranging from simple to complex. The depth of the analysis and technique(s) used should exhibit the following characteristics:

641 642



be justifiable and appropriate to the focus event under analysis and the scope and purpose of the analysis;

643



provide results that enhance the understanding of the root causes of the focus event;

644



be capable of use in a manner that is traceable, repeatable and verifiable.

645

The analysis techniques to be used are selected based on the applicable factors such as:

646



characteristics of the analysis technique;

647



characteristics of the focus event e.g. severity or potential severity or complexity;

648 649



characteristics of the organisation e.g. industry/sector ap proved techniques or cost benefit evaluation;

650



purpose of the analysis e.g. outputs required or stakeholder expectations ;

651



the causation model or models most appropriate to the objectives of the analysis .

652 653

The attributes of the most commonly used analysis techniques are described at Annex A. The criteria used to characterise the techniques, described in Annex A, are as follows:

654



expertise required;

655



tool support;

656



scalability;

657



graphical representation;

658



modularity;

659



reproducibility;

660



plausibility checks;

661



intellectual rigour;

662



time sequence;

663



specificity.

664 665

The analysis techniques are described in Annex C, which includes the methods and process used for each technique along with their strengths and weaknesses.

666

6.3

667 668 669 670 671

Modern data mining techniques enable a search for specific properties and conditions. Clustering analysis selects data that are closely related, and thereby identify deviating data (outliers). Modern cluster analysis can detect data that are closely related in one, two or more dimensions and thereby analyse products or processes that are closely related and identify deviating data points (outliers). An overview of these techniques is provided in Annex D.

Selection of analysis techniques

Useful tools to assist RCA

62740/Ed1/CDV © IEC(E) 18 672 673 674

Many focus events and analysis techniques involve human factors and several taxonomies have been developed to assist in finding root causes where human behaviour is involved. Two examples are given in Annex E.

62740/Ed1/CDV © IEC(E) 19

Annex A (informative)

675 676 677 678

Summary and criteria of commonly used RCA techniques

679

A.1

General

680 681 682

This Annex lists the most commonly used RCA techniques, with a brief description, and provides a reference list of criteria which can be used to compare different RCA techniques. The list is not comprehensive but covers examples of the different types of techniques used.

683

A.2

684

Table A.1 provides a list and brief description of the most commonly used RCA techniques.

RCA techniques

Table A.1 – Brief description of RCA techniques

685 Technique

Description

Events and Causal Factors (ECF) Charting

ECF analysis identifies the time sequence of a series of tasks and/or actions and the surrounding conditions leading to a focus event. These are displayed in a cause effects diagram.

Multilinear Events Sequencing (MES) and Sequentially Timed Events Plotting (STEP)

MES and STEP are methods of data-gathering and tracking for the anal ysis of complex focus events. The results are displayed as a Time-Actor Matrix of events.

The ‘Why’ Method

The ‘Why’ Method is a simple series of questions that guides the analysis to the root causes.

Causes Tree Method (CTM)

CTM is a systematic technique for analysing and graphically depicting the events and conditions that contributed to a focus event. CTM is similar to the why method in concept but builds a more complex tree and explicitly considers technical, organisational, human and environmental causes.

Why-Because Analysis (WBA)

WBA establishes the network of causal factors responsible for a focus event using a two-factor comparison, the Counterfactual Test. The network of factors is displayed in a Why-Because Graph.

Fault Tree and Success Tree Method

Fault or Success Tree is a graphic display of information to aid the user in conducting a deductive analysis to determine critical paths to success or failure, which are displayed graphically in a logic tree diagram.

Fishbone or Ishikawa Diagram

The Fishbone or Ishikawa Diagram is a technique that helps identify, analyse and present the possible causes of a focus event. The technique illustrates the relationship between the focus event and all the factors that may influence it.

Safety through Organisational Learning (SOL)

SOL is a checklist-driven analysis tool, oriented towards focus events in nuclear power stations. Results in the visual form of a Time -Actor Diagram, derived from the MES/STEP method.

Management Oversight and Risk Tree (MORT)

The MORT chart is a pre-populated fault tree with events, usually faults or oversights, expressed in generic terms. The MORT tree contains two main branches and many sub-branches giving a high level of detail. One main branch identifies about 130 specific control factors while the other main branch identifies over 100 management system factors. The chart also contains a further 30 information system factors common to both main branches of the tree.

AcciMaps

AcciMaps is primarily a technique for displaying the results of a causal analysis. It requires an organisational model to separate factors into layers and to elicit factors in the layers; applies a version of the Counterfactual Test (see WBA) to determine the causal relations amongst the factors.

Tripod

Tripod is a tree diagram representation of the causal network, focusing on human factors and looking for failures in the organisation that can cause human errors.

Causal Analysis for Systems Theoretic Accident Model and Process (STAMP) (CAST)

CAST is a technique that examines the entire socio-technical process involved in a focus event. CAST documents the dynamic process leading to the focus event including the socio-technical control structure as well as the constraints that were violated at each level of the control structure.

62740/Ed1/CDV © IEC(E) 20 686

A.3

Criteria

687 688 689

Table A.2 provides a list and describes the criteria used to characterise the RCA techniques listed in Table A.1. Each criterion has three levels indicated by a (+), (o) or a (-), where the different levels indicate the range.

690

The attributes for each RCA technique using the criteria in Table A.2 are shown in Table A.3.

691

Table A.2 – Summary of RCA technique criteria Criteria Expertise required

Tool Support

Scalability

Graphical Representation

Description Is the method targeted towards the "sophisticated user" (does it require use of techniques such as theorem proving which requires specific expertise)? Is it suitable for use by domain experts only? Is tool support necessary?

Is the method scalable? Can the method be used cost-effectively for simple as well as complex focus events? Can a subset of the method be applied to small, or to less-significant focus events and the full capability applied to large, or to significant focus events? So the question of scalability asks whether the complexity of analysis using the method scales with the complexity of the focus event. What is the nature of the method's graphical representation? The motivating principle is that a picture is better than a thousand words. It is often more comprehensible to display results of an analysis method as an image, a graph, or other form of illustration, than as purely written text.

Levels 

Intuitive, little training necessary (+).



Limited training required e.g. one day (o).



Considerable training effort necessary, e.g. one week (-).



Can be well applied without dedicated tool support, e.g. Office type tools (+).



Tool support not required but usually needed for effective application (o).



Tool support necessary, can be applied only with dedicated tools support (-).



Scales well with complexity (+).



Limited scalability, considerable overhead with every application (o).



Not scalable, the full method has to be applied (-).



Graphical representation with clearly defined semantics and cognitively easy to understand (+).



Graphical representation, but without semantics (o).



No graphical representation defined (-).



The results can be reproduced, differences are only observed on the representation of the results, wording etc. (+).



A significant amount of the results can be reproduced, but some differences will be observed (o).



The results will depend on the analyst and their expertise (-).

The desirable properties of a graphical representation are:

Reproducibility



to display clearly the semantics of causality (including denotation of causal factors, and taxonomic classification of factors);



to be cognitively (relatively) easily evaluated by a single person;



ideally, a graphical representation could also display the history of the analysis.

Are the results of the method reproducible? Would different analysts obtain similar results for the same focus event?

62740/Ed1/CDV © IEC(E) 21 Criteria Plausibility Checks

Intellectual Rigour

Description Are there reasonable, quick plausibility checks on the results obtained which are independent of the tool? What ways are there of checking the "correctness" of the results? One example would be checklists.

How rigorous is the method? Rigor has two relevant aspects: 



Time Sequence

Specificity

692

Does the method have a rigorous meaning, formal semantics, for the key notions of cause, causal factor and root cause? Is the semantics easy to apply?

Levels 

There are plausibility checks for almost all aspects (+).



There are plausibility checks, e.g. checklists, but they do not necessarily cover all aspects (o).



There exist only limited means supporting plausibility checks (-).



Formally defined and can be formally verified (+).



Semi-formal definition (o).



Informal definition (-).



Yes (+).



Only indirectly (o).



No (-).



Method analyses specific necessary causes of the focus event (+).



Method can be used to analyse generic problems as well as the necessary causes of the focus event (o).



Method seeks problems in general whether or not they were necessary causes of the focus event (-).

Are the results of the method amenable to formal (mathematical) verification? To what extent is an application of the method so amenable?

Does the method contain a representation of time sequence of events?

The extent to which the method limits analysis to necessary causal factors of the focus event rather than exploring a range of general problems with the system that existed at the time of the focus event and may have contributed.

62740/Ed1/CDV © IEC(E) 22 Table A.3 – Attributes of the generic RCA techniques

693 Expertise Required

Tool Support

Scalability

Graphical Representation

Reproducibility

Plausibility checks

Intellectual Rigour

Time Sequence

Specificity

ECF

o

o

o

+

o

o

o

-

+

MES and STEP

-

o

o

+

+

o

o

+

+

The ‘Why’ Method

+

+

-

o

-

-

-

-

+

CTM

o

o

o

+

o

o

o

-

+

WBA

o

+

o

+

+

+

+

o

+

Fault Tree and Success Tree Method

o

o

o

+

o

o

o

-

o

Fishbone or Ishikawa Diagram

+

+

-

o

-

o

-

-

o

SOL

o

-

+

o

+

+

o

+

o

MORT

+

-

-

o

+

o

o

-

-

AcciMaps

o

o

o

+

-

o

-

-

o

Tripod Beta

-

+

o

+

o

o

o

o

o

CAST

+

+

+

o

o

o

o

+

+

The criteria for each attribute is described in Table A.2.

694

62740/Ed1/CDV © IEC(E) 23

Annex B (Informative)

695 696 697 698

RCA models

699

B.1

General

700 701 702 703 704 705

This Annex describes the most commonly used RCA models which provide different ways of thinking about focus events. The different models are based on certain hypothesis with regard to the focus event e.g. Barrier Analysis assumes the focus event has occurred as a result of missing, defective or ineffective barriers. Therefore, the different models tend to lead the investigator to identify different causal factors. Models are used to direct thinking in conjunction with the techniques of Annex C, or simply to identify a set of causes.

706

B.2

707

B.2.1

708 709 710 711

Barrier analysis is based on the hypothesis that a focus event occurs as a result of the interaction of a source of harm on a target and that this can be prevented by the use of barriers [3]. An undesirable event occurs when the barriers are missing, defective or ineffective (see Figure B.1).

Barrier Analysis Overview

Target Source of harm

712

Failed barrier

Missing barrier

Failed barrier

Ineffective barrier

Missing barrier

713

Figure B.1 – Broken, ineffective and missing barriers causing the focus event

714 715 716 717 718 719 720

Haddon [3] considered focus events where the source of harm is physical energy and barriers relate to how the energy can be modified or prevented from impinging on the target. The model has been extended in various ways [4], for example barriers are often divided into physical barriers and administrative barriers (see Table B.1 for some examples). Barriers may also be considered in terms of prevention, protection and detection (for example in the context where the focus event is a fire these would be using non -flammable materials, providing fire extinguishers and installing smoke alarms).

721 722 723

The output of the analysis generally includes a Barrier Analysis Worksheet (see Table B.2), which identifies those barriers that either were available but ineffective or were not in place during the occurrence of the focus event.

724

62740/Ed1/CDV © IEC(E) 24 Table B.1 – Examples of barriers

725

Physical or Energy Barriers

Administrative Barriers

Engineered safety features

Plant operating and maintenance

Safety and relief devices

Regulations, Policies and practices

procedures

Conservative design allowances

Training and education

Redundant equipment

Work protection

Locked doors and valves

Work permits

Ground fault protection devices

Skilled people

Shielding and guards

Methods of communication (3-way communication)

Alarms

Supervisory practices

Automatic fire containment systems

Table B.2 – Example Barrier Analysis Worksheet

726 UNDESRIED OUTCOME (WHAT HAPPENED?)

SOURCE OF HARM

(List one at a time, need not be in sequential order.)

Maintenance worker loosened nuts on flange of the pipe line that was pressurized

Pressurised liquid

BARRIER(S) THAT SHOULD HAVE PRECLUDED THE UNDESIRED EVENT (List all physical and administrative barriers for each undesired outcome)

BARRIER FAILURE MECHANISM

Procedure to switch off pump and release pressure before commencing work

Pressure released on wrong system

(HOW THE BARRIER FAILED)

BARRIER ASSESSMENT (WHY THE BARRIER(S) FAILED) (Identify if the barrier was missing, weak, or ineffective; and why.) Unclear labelling

727

B.2.2

Strengths and limitations

728

The strengths of Barrier Analysis are as follows:

729 730



731

The limitations of Barrier Analysis are as follows:

732 733



may not recognise all failed or missing barriers, or th e effect of the rate of frequency with which the barriers are challenged;

734 735



addresses direct causes rather than root causes i.e . it seeks what barrier failed and how, but does not explore why in any depth.

736

B.3

737

B.3.1

738 739

Reason’s model [5] is based on the premise that the basic required elements of any productive system are:

740



appropriate decisions from plant and corporate management;

741



line management activities, operations-maintenance training etc;

742



reliable and fit for use equipment;

743



motivated workforce;

744



integration of human and mechanical elements;

745



safeguards against foreseeable risks.

746 747 748

There are inevitably weaknesses in these elements that can be considered to be latent failures. If these come together a triggering event, which m ay be unimportant in other circumstances, results in failure.

identifies what corrective actions are required to ensure adequate barriers (number and effectiveness) are in place.

Reason’s Model (Swiss Cheese Model) Overview

62740/Ed1/CDV © IEC(E) 25 749 750 751 752

The weaknesses in the elements of the productive system are pictured as holes in slices of Swiss cheese. An event will result when all individual weaknesses align. Reason’s model is not strictly a barrier model as the layers are normal operating systems with weaknesses rather than failed barriers or controls.

753 754

Human error Taxonomies based on Reason’s model have been developed for a number of different industries.

755

B.3.2

756

The strengths of Reason’s Model are as follows:

757 758



759

The limitations of Reason’s Model are as follows:

760 761



superficial analysis of technical or environmental causes which c onsiders technical aspects only in terms of failed barriers;

762 763



assumes the core problem is operator error (errors at other levels and organisational failures are explored primarily in terms of how they influence operator error );

764 765 766



does not supply a taxonomy to assist with the identification of motivations and psychological precursors of human error or in identification of latent failures and hence requires expertise in individual and organisational psychology to use properly.

767

B.4

768 769 770

Systems theory [6] was developed in the 1940’s and 1950’s to handle the increase in complexity of systems after WWII and to consider the social and technical aspects of systems together as a whole.

771 772 773 774 775 776 777

In system models it is assumed that human interaction with technology in complex social structures is influenced by the organisations goals, policy and culture and by internal and external economic, legal, political and environmental elements. This system is stressed by the fast pace of technological change, by an increasingly aggre ssive, competitive environment, and by factors such as changing regulatory practices and public pressure. In this context focus events are due to multiple factors and are typically ‘waiting for release’ and not due to any one act or event.

778 779 780 781 782 783

Failures arise due to the complex interactions between system components that may lead to degradation of system performance. Two or more discrete events within system elements can interact in unexpected ways which designers could not predict and operators cannot comprehend or control without exhaustive modelling or test. Factors contributing to the focus event may include effects of decisions which are normal in the circumstances in which they were made but produce an unwanted outcome.

784 785 786 787

Methods based on a systems model do not seek a causal chain or look for individual error or technical failures but consider the system as a whole, its interactions and its weaknesses. Individual human or hardware failures may be recognised but the focus is on interactions and systemic issues.

788

B.5

789

B.5.1

790 791 792 793 794 795

STAMP [7] is a causality model based on systems theory [6] that extends the traditional model (chains of directly related failure events) to include both the technical and social contributors to focus events and their relationships. It also captures focus event s involving interactions among non-failing system components and processes, indirect and systemic causal mechanisms, complex operator and managerial decision making, advanced technolog y such as digital systems and software, and system design flaws.

796

Strengths and limitations

encourages the analyst to explore causes of operator error and hence possible means of reducing it.

Systems models

Systems Theoretic Accident Model and Processes (STAMP) Overview

62740/Ed1/CDV © IEC(E) 26 797 798 799 800 801 802 803 804 805 806

STAMP assumes incidents arise in interactions among humans, machines, and the environment and treats systems as dynamic control problems in which the controls aim to manage the interactions among the system components and its environment. The goal of the control is to enforce constraints on the behaviour of the system components, for example, aircraft in an air traffic control system must always maintain a minimum separation distance. Focus events result from inadequate control or enforcement of constraints on the development, design, and operation of the system. In the Challenger loss, for example, the O rings did not control propellant gas release through the fi eld joint of the Space Shuttle. In STAMP, the cause of a focus event is a flawed control structure.

807 808 809

STAMP also incorporates the concept that incidents often arise from a slow migration of the whole system toward a state of high risk [8] so that financial and other pressures that lead to changing behaviour over time can be accounted for in the causal analysis process.

810

B.5.2

811

The strengths of STAMP are as follows:

812



considers the role of the entire socio-technical system in causation;

813



includes indirect and systemic factors in the causal explanation;

814



provides a model to explain accidents in very complex systems;

815



identifies causes back to the process with which a system was developed.

816

The limitations of STAMP are as follows:

817 818 819



820

Strengths and limitations

requires focus events to be analysed in a way that is often unfamiliar to engineers, therefore may take more time to learn how to analyse focus events using causal analysis processes based on STAMP.

62740/Ed1/CDV © IEC(E) 27

Annex C (Informative)

821 822 823 824

Detailed description of RCA techniques

825

C.1

General

826 827 828 829

This Annex describes a range of techniques used during a RCA. The list is not comprehensive but covers examples of the different types of techniques used. Many of these techniques are supported by software tools. Some of the methodology’s and software tools have elements that are proprietary, which may impact on the cost of implementing the technique .

830 831 832 833 834 835 836

Some techniques aim to identify causes that can be shown to be necessary if the focus event is to occur. Other methods seek general weaknesses of the system as a whole that probably contributed to the focus event but where the focus event could have occurred in their absence. In some terminologies a “cause” cannot be so described unless it is necessary to the focus event. In this Annex such causes are referred to as “necessary caus al factors”. Identified weaknesses that may have played a part in the focus event but may not be necessary to it are referred to as “contributory factors”.

837 838 839

In general identification of necessary causal factors will be repeatable and based on evidence. There may be a higher level of subjectivity in identifying contribut ory factors and different analysis techniques with a different focus may identify different factors.

840

C.2

841

C.2.1

842 843 844 845 846 847 848 849

The ECF chart [9] records events in chronological order from left to right in rectangles, with events characterised by single subjects and active verbs. Each event is derived strictly from the one before. Conditions necessary for the events are displayed in ovals above and below the sequence of events. (Conditions are states or circumstances rather than happenings). Events are connected by solid lines and conditions by dashed lines. Events and conditions based on evidence have a solid outline, whereas those that are presumptive have a dashed outline. There may be multiple or branching sequences of events each with their own conditions.

850 851 852

Figure C.1 illustrates an example of an ECF chart, in which a maintenance activity was incorrectly carried out due to the maintainer turning up late, resulting in an emergency landing carried out by an aircraft.

Events and Causal Factors (ECF) Charting Overview

853

Maintainer turns up late for work

Flight scheduled for 09:00

Crew decide to start flight prep

Outdoor ambient temperature 15 degrees C

Flight delayed awaiting inspection

Aircraft rolled out of hanger for flight prep

Inspection equipment does not reach ambient temperature

09:15 01/10/12

09:20 01/10/12

Maintainer receives tasking

Maintainer collects required equipment

09:05 01/10/12

No other maintainers available

Maintainer ignors requirement for hangar inspection

Maintainer carries out inspection task on engine

Aircraft declared serviceable

Engine fails

Aircraft carries out an emergency landing

09:50 01/10/12

10:30 01/10/12

13:00 01/10/12

13:20 01/10/12

Equipment storage ambient temperature 25 degrees C

854 855

Figure C.1 – Example of an ECF chart

62740/Ed1/CDV © IEC(E) 28 856

C.2.2

Process

857

The following describes the process for developing an ECF chart.

858

1) Identify the focus event and record it in a box on the right hand side.

859 860 861 862 863

2) Record the primary chain of events that led to the focus event where each event in the chain is both immediate and necessary to the event on the right hand side. Therefore, the consequence is recorded on the right hand side of each event (cause). Also, the consequence of a previous event may be at the same time a cause of following event. The events are displayed in rectangles linked by arrows to the right of the focus event.

864 865

3) Determine what conditions led to these events. State each of them in an oval above the relevant event.

866 867

4) Add any secondary chains of events that may be relevant to the focus event and their conditions.

868 869

5) Check the validity of the causes by obtaining evidence that determines whether the conditions and events are true.

870 871

6) Develop the ECF chart until the event at the start of the sequence is identified and all conditions which can be verified by evidence are added.

872 873 874 875

Generally the exact chronology of events is not known at the beginning of the investigation and becomes clearer as the investigation proceeds. A method should therefore be used that allows investigators to easily change the sequence of events and conditions as more information is gained.

876

C.2.3

877

The strengths of ECF are as follows:

878



assists the verification of causal chains and event sequences;

879



provides a structure for collecting, organising, and integrating evidence;

880



identifies information gaps;

881 882



assists communication by providing an effective visual aid that summarises key information regarding the focus event and its causes.

883

The limitations of ECF are as follows:

884



identifies some causes but may not necessarily determine the roo t causes;

885



it can be overcomplicated for simple problems.

886 887

C.3

888

C.3.1

889 890

MES [10] and STEP [11] are methods developed to analyse focus events in complex systems, where STEP is a successor to MES.

891 892 893 894

As with ECF charting, MES/STEP conceives a focus event as arising from an interlinked succession of events with events characterised by a single subject and an active verb. In MES and STEP the subject is called an actor (which may be human or machine, or even a property).

895 896 897 898

Events are represented as event building blocks (BBs), which consist of (partial or full) data records as described in Figure C.2. These are arranged during the analysis in a Time-Actor Matrix where the vertical axis of the matrix represents the different actors, and the horizontal axis represents time.

899

The Time-Actor Matrix also contains:

900



Strengths and Limitations

Multilinear Events Sequencing (MES) and Sequentially Timed Events Plotting (STEP) Overview

conditions necessary for enabling an event along with precursor events;

62740/Ed1/CDV © IEC(E) 29 901 902



903 904

An example showing part of the representation of a tank maintenance eve nt is given in Figure C.3.

annotations for further tasks in an investigation, such as a note indicating a deficit of information, or an incomplete explanation of an event.

DATE SOURCE

STANDARDIZED MES BUILDING BLOCK ELEMENTS

WITNESS REPORTS ATTRIBUTE CHANGES INSTRUMENT RECORDS TESTS RESIDUES MOVEMENTS OBSERVATIONS STATEMENTS DECISIONS INSTRUCTIONS INJURIES DEBRIS OTHER

ACTOR ACTION OBJECT/DESCRIPTOR LOCATION REMARKS SOURCE

Person, object or energy that did something What Actor did Additional data defining action Location where action began For remarks about BB or investigation Source of data used in BB creation

BEGIN DATE/TIME DURATION END DATE/TIME LINKS FROM/TO N/S STATUS

Date and time action began Duration of action Date and time action ended Links defining interactions among BBs Necessary/sufficient input validation status

905 Figure C.2 – Data in an event building block

906 907

C.3.2

Process

908

MES/STEP has the following steps.

909 910

1) Gather information for the initial series of building blocks, and identify and track missing information.

911

2) Arrange the initial building blocks in an initial Time -Actor Matrix.

912 913

3) Specific techniques are then used to identify and generate hypotheses to ‘fill’ the gaps with events (in the form of further building blocks).

914 915

4) The process terminates when an analyst considers that sufficient information is available in the Time-Actor Matrix.

916

C.3.3

917 918 919 920 921 922 923

MES/STEP has the same strengths and limitations as ECF. Data formatting is relatively more elaborate, and there are explicit mechanisms for determining and tracking missing data and attempts to determine those data. Some such “bookkeeping” mechanisms are necessary for managing complex investigations with multiple investigators. The Time -Actor Matrix also has explicit notation for recording the state of an on-going inquiry along with data-acquisition and explanatory tasks yet to be performed. This means that a comprehensible visual representation of the state of an investigation is available at all points in an investigation.

Strengths and limitations

62740/Ed1/CDV © IEC(E) 30

Reported BB Hypothesized BB Inferred BB

Temporary BB Required BB Questionable BB

Incomplete BB Disputed BB

Actors

Event Building Blocks

Internal

Internal pressure overstressed

Internal pressure blew tank cover upward and to the north

Report A Page Y

Report A Page Z

Pressure

Maintenance Technician

Maintenance Technician signed equipment isolation record

Maintenance Technician proceeded to location A

Maintenance Technician positioned at location A

Maintenance Technician removed bolts to tank cover with tool C

Isolation Record

Report A Page X

Inferred from scenario

Report A Page Y

Operator A accompanied maintainer to assist with task

Operator A positioned at location B

Report A Page X

Inferred from injury

Operator A

Why did the operator place himself at risk?

05:30:00 924 925

05:30:30

05:35:00

05:37:00 Time

Figure C.3 – Example of a Time-Actor Matrix

05:37:30

62740/Ed1/CDV © IEC(E) 31

The ‘Why’ method

926

C.4

927

C.4.1

928

The ‘Why’ Method uses a straightforward questioning process to arrive at the root causes.

929 930 931 932

Questioning starts with a statement of the situation and ask s why it occurred. The answer to this question is turned into a second why question and the answer to this into a third question. Questioning ceases when the stopping rule is reached. Generally this requires approximately 5 levels of questions hence the method is sometimes known as 5 whys.

933 934

Where a why question provides several causes each is explored and the method produces a Why Tree.

935 936 937 938

The why method is used alone for simple situations but is also inherent in more complex tree methods such as the Causes Tree Method (CTM) (see C.5). It can be useful for eliciting information from witnesses on how and why an event occurred because the simple question ‘why’ does not make assumptions about cause or lead the witness.

939 940 941 942 943 944

Figure C.4 illustrates an example of a compressor that has experienced an unscheduled shutdown. In the example the fourth why suggested a number of potential causes for lack of lubrication and evidence was sought to define which in fact occurred. Although a human error was involved in that a person did not follow specified start up procedures, t he recommendation is to improve the design so the compressor and pump motors are linked. Further analysis of why the error occurred, in this case, is not useful.

Overview

945 Figure C.4 – Example of a Why Tree

946 947

C.4.2

Process

948

The ‘Why’ Method has the following steps.

949

1) Identify and record the focus event as the start of a Why diagram .

62740/Ed1/CDV © IEC(E) 32 950

2) Ask why the focus event occurred, seeking only the direct causes.

951 952

3) Ask “why” successively with respect to the previous answer. In each case the answer to the question why should be an immediate ca use of the previous answer.

953 954 955 956 957

The question ‘why’ is asked as many times as is needed to lead to a root cause, which is normally five questions but this is only a guideline. Each time the question is asked, there may be multiple answers and some analysis wil l be needed to eliminate those possible answers that are not applicable. It may be more effective to ask ‘why did the process fail?’ instead of just asking ‘why’.

958 959 960

It can be useful to consider a set of categories of cause such as from the Ishakawa method and to involve a team of people. This will help ensure that all relevant areas are considered by the investigators.

961

C.4.3

962

The strengths of the ‘Why’ Method are as follows:

963



simple to apply by those involved in the problem;

964



easy to understand by others;

965



quick process to achieve results for simple problems;

966



does not require extensive knowledge from the person asking the questions;

967 968



does not require a lot of training or extensive knowledge from the person asking the questions.

969

The limitations of the ‘Why’ Method are as follows:

970



only suitable for simple situations;

971 972 973



heavily dependent on the knowledge and expertise of the people answering the questions, with expertise in both technical failure modes and human error often required to reach the root causes;

974



root causes are likely to be missed if outside the k nowledge base of those involved;

975



possible uncertainty about when the appropriate root ca uses have been identified;

976 977



can be developed to the level where reasons for people’s actions are being consi dered, where evidence is often not available and results are therefore not always repeatable.

978

C.5

979

C.5.1

980 981

CTM [12] is a systematic technique for analysing and graphically depicting the events and conditions that contributed to a focus event.

982 983 984

The method examines all the system components associated with the focus event. The investigation starts by establishing the tangible facts, taking care, in this phase, not to interpret them or to express an opinion about them.

985 986 987 988 989

CTM is similar to the why method in concept but builds a more complex tree and explicitly considers technical, organisational, human and environmental causes. Each antecedent (identified cause) is tested to check it is a direct and necessary cause of the previous one whereas the why method is less rigorous. Therefore, CTM is suitable for more complex situations.

990 991 992 993

CTM is also similar to a fault tree but whereas a fault tree is used prior to an event to explore all possible causes and strict logic relationship(s) between faults are specified, the cause tree includes only those causes which apply to a specific event that has already occurred, and does not develop the logical relationships in detail.

994

A cause tree may be used to explore successes as well as failures .

Strengths and limitations

Causes Tree Method (CTM) Overview

62740/Ed1/CDV © IEC(E) 33 995 996

The Cause Tree forms a network of the causes which have directly or indirectly caused the focus event, using the three logical relations shown in Figure C.5. Sequence: A cause (Y) has a single direct origin (X). i.e. (X) was necessary and sufficient for (Y) to occur.

Conjunction A cause (Y) has several direct origins (X1) and (X2). That means, each of the direct origins (X1) and (X2) was necessary for (Y) to occur.

Y

X

X1 Y X2

Disjunction or Separation: Two or more causes (Y1;Y2, …) have a single and identical direct origin (X). That means (X) was necessary for (Y1) and (Y2) to be produced.

Y1 X

Y2 997

Figure C.5 – Symbols and links used in CTM

998 999 1000 1001 1002 1003 1004 1005 1006 1007

Figure C.6 shows an example tree, in which Mr L (the victim) and Mr A are working nights, as an exception, to store a surplus of cooked matter stock. In accordance with the handbook, Mr L and Mr A were required to load the cooked matter crusher with "flour" which is then bagged and stored. Normally this activity is under the responsibility of a head of team whose presence had not been considered essential by the management for this night. Of his own initiative to save time Mr L took a forklift truck (the ignition key had remained on the dashboard as usual) to store the bags. At the end of the task, Mr L set about returning the forklift truck. Mr L carried out a sharp bend in reverse, forks raised and while seeking to avoid a hole on the ground the forklift tipped over, crushing Mr L between the ground and the right safety upright of the truck.

62740/Ed1/CDV © IEC(E) 34

A hole is present in the path of the truck

L drives round the hole in the floor

L puts the forklift truck back where it was

L & A have finished storing the bags (end of job)

L is on the floor

L turns while reversing

The truck tips over

L is in the path of the upright

L puts the forklift truck away

L & A decide to borrow the forklift truck

L uses the forklift truck for storing the bags

L drives the truck only very occasionally

L reverses with the forks raised

1008 1009

L is ejected from the truck

Figure C.6 – Example of a Cause Tree

The right upright of the truck is present

L is crushed between the floor and the right safety upright of the truck

62740/Ed1/CDV © IEC(E) 35 1010

C.5.2

Process

1011

CTM has the following steps.

1012

1) Identify the focus event to be analysed and record as the starting point for the tree .

1013 1014 1015

2) Collect and record all relevant data including people, their activities, and actions, materials and equipment and factors relating to the physical and psychosocial environment.

1016 1017 1018 1019 1020

3) Make a list of the causal factors to the focus event. These should be supported by evidence and be expressed as precisely as possible. Subjective opinions and judgements are not included. Causal factors include those which are unusual or a change from the normal course of events and those which are a normal but played an active part in the occurrence of the event.

1021 1022

4) Work backwards towards the root causes by asking the following questions systematic ally for each antecedent that has been gathered.

1023

a) What antecedent X has directly caused the antecedent Y?

1024

b) Was X in itself necessary to give rise to Y?

1025 1026

c)

If not, what are the other antecedents (X1, X2...) that were equally necessary in order to give rise directly to Y?

1027 1028 1029

5) Display these immediate necessary causal factors in a box linked by an arrow to the focus event. (The tree may be drawn horizontally or vertically but is normally drawn horizontally starting from the right, so that left to right corresponds to the chronology of events).

1030 1031

6) Continue asking the same questions with respect to each cause found until the team agrees that there is no value in continuing further.

1032 1033

7) Check the validity of the tree by obtaining further evidence that determines whether it is true.

1034

C.5.3

1035

The strengths of CTM are as follows:

1036



provides a method for structuring investigation of complex events ;

1037



orderly easy to read format;

1038



can be used to encourage group participation ;

1039



identifies areas for collecting data as the in vestigation proceeds;

1040



can be used to analyse success or failure events ;

1041



can be used for technical and non-technical events.

1042

The limitations of CTM are as follows:

1043 1044 1045



many human and organisational factors may contribute to the occurrence of the focus event and it is often difficult to establish which in a particular instance were the necessary causal factors;

1046 1047 1048



there is no guidance on how to seek causes, therefore expertise in human error and organisational systems is needed when the tree involves human and organisational failures, where evidence is often difficult to obtain;

1049 1050



it is difficult to apply when an event occurs as a result in a change of quality in several areas, where no single cause is a necessary cause.

1051

C.6

1052

C.6.1

1053 1054 1055

WBA [13] is a causal-analytical technique for establishing which of a given collection of events and situations are necessary causal factors of which others. Given two events or situations, A and B say, a condition called the Counterfactual Test (CT) is used to establ ish

Strengths and limitations

Why-Because Analysis (WBA) Overview

62740/Ed1/CDV © IEC(E) 36 1056 1057 1058 1059 1060 1061 1062

whether A is a necessary causal factor of B. Suppose two events or situations A and B have been observed. The CT asks whether, had A not occurred, B would also not have occurred. (Since A did occur, a supposition that A had not occurred is contrary to fact, hence the word “counterfactual”.) In asking this question all other conditions are assumed to have remained the same. If the answer is yes: B would not have occurred then A is a necessary causal factor of B. If the answer is no: B could have happened anyway even if A had not happened (the CT fails) then A is not a necessary causal factor of B.

1063 1064 1065 1066

The network of causal-factors is displayed as a Why-Because Graph (WBG), a collection of “nodes”, boxes, diamonds and other shapes, containing a brief descript ion of the fact, joined by “edges”, or arrows, where the node at the tail of an arrow is a necessary causal factor of the node at its head, as determined by the CT.

1067 1068 1069

A WBA is acyclic (contains no loops), so is usually drawn with arrows pointing in the gener ally upwards direction, as above, or horizontally with arrows pointing generally left -to-right, or right-to-left.

1070 1071 1072 1073 1074 1075 1076 1077

In order to determine whether sufficiently causal factors are present in the collection of events and situations presented, the Causal Complet eness Test (CCT) is used. The CCT is applied to a given event or situation and its collection of necessary causal factors as determined by the CT. If the CCT is not passed, then the collection of events and situations must be extended by further factors until it is passed. Suppose A 1 , A 2 ,...,A n have been determined to be necessary causal factors of B by the CT. Then the CCT is deemed to be passed if, had B not occurred, one of A 1 , A 2 ,...,A n would not have occurred either (formally, NOT-B is a necessary causal factor of NOT( A 1 AND A 2 AND...AND A n) as determined by the CT).

1078 1079 1080

When a WBG has been constructed and the CCT is passed for all the events and situations therein, then the WBG is complete and is deemed to represent a sufficient causal explanation of the focus event.

1081 1082

Figure C.7 illustrates an example of a WBG for a commercial-aviation runway-overrun accident.

1083

62740/Ed1/CDV © IEC(E) 37 (11) aircraft hits earth bank

(13) earth bank in overrun path

(12) aircraft overuns runway

(16) built by airport authority for radio equipment

(15) unstabilized approach

(14) braking delayed

(18) crew’s actions

(17) wheel braking delayed

(20) aquaplaning

(25) low weight on each main gear wheel

(30) excessive speed on landing

(27) condition of runway surface

(19) speed brakes & thrust reversers deployment delayed

(24) runway very wet

(28) weather conditions

(29) amount of water on runway surface

(23) braking systems logical design

(22) actual landing

(26) behaviour expected by crew

(34) crews actions in expectation of windshear (31) crews training at company X

1084 1085

Figure C.7 – Example of a WBG

(21) divergence between consequences of design & behaviour expected by crew

(32) company X procedures

(33) ‘normal’ behaviour expected by crew

62740/Ed1/CDV © IEC(E) 38 1086

C.6.2

Process

1087

WBA has the following steps.

1088 1089

1) Determine a collection of facts deemed to be rel evant, under guidance of a stopping rule. This gives an initial collection C of facts, divided into events, states and situations.

1090

2) Select the focus event (called in WBA the Accident Event) F.

1091 1092

3) Determine intuitively the immediate necessary causal factors of F from amongst the collection C; check using the CT. Display the results visually as a partial WBG.

1093 1094

4) Determine intuitively the necessary causal factors of those immediate factors; check using the CT. Extend the WBG with these factors.

1095 1096

5) Proceed to fill out the analysis (to extend the WBG) by testing each fact in C against the factors already in the WBG.

1097 1098

6) Apply the CCT to determine whether the WBG is complete, or whether factors are missing from the collection C.

1099 1100 1101

7) Extend C if necessary; incorporate the new facts into the WBG using the CT. If insufficient information is available, assumptions may be included, providing they are clearly labelled as such.

1102 1103 1104

8) Finish when the CCT shows sufficient causal factors for each factor, under cognisance of the stopping rule. If insufficient facts are available, assumptions must be included in order to allow the CCT to succeed, but clearly labelled as such.

1105

C.6.3

1106

The strengths of WBA are as follows:

1107 1108 1109



may be performed with a minimum of training (with the use of suitable tools that provide help on extracting facts from narrative descriptions, an inexperienced analyst can typically perform a first pass WBA inside two hours);

1110



the analysis results are easily understandable by third parties;

1111 1112



the conceptual background required to perform a WBA is limited (an analyst must be able to apply the CT, and then the CCT);

1113



any network of causally-related phenomena may be analysed with a WBA;

1114



the reasoning behind a WBA may be formally checked using a formal logic;

1115 1116



can be used together with other methods; say, those providing more structure to the collection of facts.

1117

The limitations of WBA are as follows:

1118 1119 1120



the method provides no guidance on the collection of facts to which the tests are applied e.g. there is no structuring of facts into categories, for example technical, procedural, human-factors, organisational and legal;

1121 1122



because facts are not structured, WBA provides limited guidance on corrective action, in the case recurrence is to be prevented.

1123

C.7

1124

C.7.1

1125 1126 1127 1128 1129 1130 1131 1132

A Fault Tree [14] displays the immediate necessary causal factors of a focus event, their causal predecessors and the logic relationships between them. Fault Tree Analysis (FTA) [15] is normally used as a priori method of identifying and analysin g potential failure modes, particularly of equipment. The fault tree diagram can be used in RCA by building a tree following the same logic but including in the tree only those events which actually occurred. OR gates may be used during the analysis to describe alternative causes that need to be evaluated, but when all facts are clearly established only AND gates should remain, unless the purpose of the investigation is to prevent other related events. Therefore, as the

Strengths and limitations

Fault Tree and Success Tree Method Overview

62740/Ed1/CDV © IEC(E) 39 1133 1134 1135

investigation proceeds potential causes that do not fit the evidence are gradually ruled out and removed from the tree. By closing out each branch of the tree, the causal factors of the focus event become apparent.

1136 1137 1138 1139

Strictly a fault tree represents binary events where a statement is true or fa lse, e.g. a component failed or not. In RCA the fault tree structure is often applied to a tree of causes where the logic rules are not strictly obeyed and changes in quality are included as well as binary events.

1140 1141

A similar logic can be applied where the focus event is a success. In this case the tree is referred to as a success tree.

1142

Figure C.8 shows an example fault tree.

1143 Fire alarm fails

Signal to solenoid fails

No power source

Battery failure

Break in circuit

Smoke detector fails

Bell fails

Short circuit

Solenoid fails

Clapper stuck

Small sensitivity to smoke

Photoelectric source fails

Mains failure

1144 Figure C.8 – Example of a Fault Tree during the analysis

1145 1146

C.7.2

Process

1147 1148 1149

The process for developing a fault/success tree is as follows.

1150 1151 1152

2) Establish the immediate necessary causal factors of the focus event and display them in a box linked by an arrow to the focus event. (The tr ee may be drawn horizontally or vertically). These are the first level causes of the focus event.

1153 1154 1155 1156

3) Establish the logic relationships between the immediate causes using AND and OR Gates. The events at an AND gate must be both necessary and sufficient to cau se the event above. OR gates may be used during the analysis to describe potential causes that require investigation.

1157 1158

4) Examine each cause to decide whether it is a basic cause or the result of underlying causes.

1159

5) Validate potential causes and update the tree accordingly.

1) Define the focus event to be analysed and record it as the starting point for the tree.

62740/Ed1/CDV © IEC(E) 40 1160

6) Continue down the tree until the stopping rule is reached.

1161 1162 1163

When the tree is developed the possibility of causal factors relating to people, equipment and the environmental are considered for each cause at each level. These should not be separated out at the top of the tree.

1164

C.7.2.1

1165

The strengths of the fault/success tree method are as follows:

1166



provides a method for dividing up the analysis for large complex focus events;

1167 1168



supported by many commercial software packages which as sist in the development of the fault tree structure;

1169



encourages group participation;

1170



uses an orderly, easy-to-read format;

1171



identifies areas for collecting data.

1172

The limitations of the fault/success tree method are as following:

1173



requires an experienced practitioner;

1174 1175



has no underlying model of causation and provides no guidance on how to seek causal factors;

1176 1177 1178



does not easily represent situations where an event occurs as a result of a general changing of quality that affects for example adherence to procedures or tolerances of physical components.

1179

C.8

1180

C.8.1

1181 1182 1183 1184 1185 1186

The Fishbone or Ishikawa Diagram [16] is a technique that helps identify, analyse and present the possible causes of a focus event. It may be used to structure a brainstorming sess ion and to suggest ideas where further evidence may be sought. The technique was invented by Kaoru Ishikawa and graphically illustrates the relationship between an event and all the factors that influence it. This technique is also referred to as a "fishbo ne diagram" because of its appearance.

1187

Figure C.9 shows an example Fishbone or Ishikawa Diagram.

Strengths and limitations

Fishbone or Ishikawa Diagram Overview

62740/Ed1/CDV © IEC(E) 41

Methods

Machinery

Cause Cause

Reason Reason

Cause

Effect Cause

Cause

Cause Reason

Cause

Cause

Cause

Cause Management

Materials

Manpower

1188 Figure C.9 – Example of a Fishbone diagram

1189 1190

C.8.2

Process

1191

The process for developing a Fishbone or Ishikawa Diagram is as follows.

1192 1193

1) Identify the focus event and record it on the right hand side and draw a line horizontally from it. This forms the head and spine of a fish.

1194 1195

2) Establish the main categories of causes to be considered: and draw lines off the spine to represent each category. Categories commonly used include:

1196

a) 5M’s: Methods, Machinery, Management, Materials, Manpower;

1197

b) 4P’s: Place, Procedures, People, Policies;

1198

c) 4S’s: Surroundings, Suppliers, Systems, Skills.

1199 1200 1201 1202

3) For each category identify the possible causes of the focus event. These are presented as smaller lines coming off the 'bones' of the fish. Increasingly more detailed levels of causes can be shown as sub-branches coming off each cause line. It may be necessary to break the diagram into smaller diagrams if one branch has too many sub -branches.

1203 1204 1205

4) Analyse the diagram: The diagram now shows all the possible causes of the focus event. The final step is to investigate the most likely causes which tests whether the analysis is correct. Analysis includes:

1206 1207

a) reviewing the “balance” of the diagram, checking for comparable l evels of detail to identify the need for further identification of causes;

1208

b) identifying causes that appear repeatedly as these may represent root causes;

1209 1210

c) assessing what can be measured in each cause in order to quantify the effects of any changes made;

1211

d) highlighting the causes whose action can be taken.

62740/Ed1/CDV © IEC(E) 42 1212

C.8.3

Strengths and limitations

1213

The strengths of the Fishbone or Ishikawa Diagram are as follows:

1214



encourages group participation to identify people ’s perceptions of causes;

1215 1216



seeks causes under a set of categories, so will identify a range of causes relating to human and organisational factors as well as hardware and procedural factors;

1217



uses an orderly, easy-to-read format;

1218



indicates possible causes of variation;

1219



can be used for simple investigations or as part of a mo re complex investigation.

1220

The limitations of the Fishbone or Ishikawa Diagram are as follows:

1221 1222



1223

C.9

1224

C.9.1

1225 1226 1227 1228

SOL [17] is an event analysis technique, which seeks weaknesses in the complex sociotechnical system in which the event occurred. The purpose of SOL is to provide a model of the system and identify its weaknesses so it can be improved and recurrence of the focus event prevented. The emphasis is on organisational learning.

1229

C.9.2

1230

SOL has the following steps:

1231 1232

1) Starts with a description of the situation using a Time-Actor Matrix produced by MES/STEP (see C.3).

1233 1234 1235 1236 1237

2) Causal factors (which may be direct or indirect see Table C.1) are identified for each event in the Time-Actor Matrix, guided by checklists of questions derived from the experience and research of SOL authors. Direct causal factors are those that immediately resulted in the focus event, indirect causal factors appear further down the causal chain but may involve the same issues.

1238 1239

3) Causal factors are classified into technology, individuals, working group, organisation, and organisational environment.

there is no underlying model or theory of causation, so the causes identified are based on the team’s perceptions.

Safety through Organisational Learning (SOL) Overview

Process

62740/Ed1/CDV © IEC(E) 43 Table C.1 – Direct and indirect causal factors

1240

Direct causal factors

Indirect causal factors

Information

Information

Communication

Communication

Working conditions

Working conditions

Personal performance

Personal performance

Violations

Violations

Technical components

Scheduling Responsibility Control and supervision Group influence Rules Procedures and documents Qualifications Training Organisation and Management Safety principles Quality management Maintenance Regulatory and consulting bodies Environmental influences

1241

C.9.3

Strengths and limitations

1242

The strengths of SOL are as follows:

1243 1244



the check list format allows users who are not specialists in organisational systems or organisational psychology to produce useful analyses;

1245 1246 1247



the emphasis on causal factors other than necessary causal factors allows more factors to be brought into consideration than a narrowly-causal analysis might do, and thereby offer s more chance of identifying possible improvements;

1248 1249



the format of the event building blocks gives less scope to the judgement of individual analysts and helps to give uniformity to SOL analyses;

1250 1251



the stopping rule is implicitly defined by the checklist questions: when these have been answered, the information is deemed to be adequate .

1252

The limitations of SOL are as follows:

1253 1254



there is no specific notion of what is a causal factor other than what is implicit in the checklist questions;

1255 1256



the level of detail is driven by the predetermined checklist of questions rather than the perceived need;

1257 1258



the check list of questions was derived from research in the nuclear power industr y and may be less suitable for other industries.

1259

C.10 Management Oversight and Risk Tree (MORT)

1260

C.10.1

1261 1262 1263

MORT [18] was first developed for analysing the root causes and causal factors for incidents in the nuclear power and aviation industries in the USA, but no w has been applied in many industries.

1264 1265 1266 1267

MORT is a pre populated tree based on a model of an organisations management system , which effectively provides a detailed check list for reviewing which parts of management and control systems were less than adequate when the focus event occurred. The basic structure of the tree is shown in Figure C.10.

Overview

62740/Ed1/CDV © IEC(E) 44

1268 1269

Figure C.10 – Example of a MORT diagram

1270 1271 1272

MORT assumes that a failure occurs as a result of oversights or omissions in either management systems or the specific control factors which should have prevented the focus event from occurring.

1273 1274 1275 1276

Ultimately failures in either branch in the tree occur because someth ing within the general management systems (Information systems, design and planning, operational readiness, maintenance or supervision) was less than adequate. Each box in Figure C.10 is developed into a detailed tree structure showing factors that might have been less than adequate .

1277

C.10.2

1278 1279 1280

Analysis starts with the focus event and then works down the MORT tree in a lo gical manner asking and responding to pre-questions in the MORT manual. Symbols on the MORT chart are colour coded to indicate:

1281



there is no problem with an element (adequate);

1282



the element is giving rise to a problem (less than adequate);

1283



there is need for further enquiries.

1284

C.10.3

1285

The strengths of MORT are as follows:

1286 1287



provides comprehensive guidance for seeking all possible aspects of the system that were not adequate at the time of the focus event;

1288 1289



less specialist expertise is needed than in some techniques because detailed guidance is provided on possible causes;

1290 1291



identifies weaknesses in the system which might apply across a wide range of failure scenarios.

Process

Strengths and limitations

62740/Ed1/CDV © IEC(E) 45 1292

The limitations of MORT are as follows:

1293 1294



explores weaknesses in the system in general that might have played a role in the focus event rather than seeking direct or necessary causes;

1295 1296



a very large number of questions (around 1500) are asked, so the method is time consuming and hence most appropriate for serious events ;

1297 1298 1299



unless the organisation to which it is applied is a high reliability organisation a very large number of weaknesses are found which make defining an i mplementation program difficult;

1300



tedious when first learning or applying the method.

1301

C.11 AcciMaps

1302

C.11.1

1303 1304

AcciMaps [19] is based on concepts of causation published by Rasmussen and Svedung [20] and the organisational systems model (see B.4).

1305 1306 1307 1308 1309 1310

AcciMaps is a graphical representation used to structure the analysis of a focus event and to identify the interactions in the socio-technical system in which the focus event occurred. It is a method designed to reveal the system wide failures, decisions and actions involved in a focus event. These are arranged in layers representing the different levels in a socio -technical system from government down to the equipment and surroundings involved. It also looks at the individual actors at each level and their decision making routines and competence .

1311 1312 1313 1314 1315 1316

An example AcciMap for a gas explosion, showing typical system levels, is given in Figure C.11. The bottom level represents the physical arrangement of the scene of the focus event (buildings, equipment, surroundings etc.). The next level up is the sequence of events leading to the focus event including the failures, actions and decisions (including no rmal actions and decisions) that played a part. The higher levels show decisions and actions at each level that influenced or could have influenced the sequence of events at the lower levels.

1317

C.11.2

1318

An AcciMaps is developed as follows.

1319

1) Define a model of the system with different organisational levels.

1320 1321

2) Populate the levels (using boxes (nodes)) with the decisions and actions relevant to the focus event, the conditions that lead to them and their consequences.

1322

3) Draw arrows that show all linkages and influence s.

1323 1324

4) A process such as WBA may be added to ascertain which of the identified issues were necessary causal factors of the focus event.

Overview

Process

62740/Ed1/CDV © IEC(E) 46

1325 1326

Figure C.11 – Example of an AcciMap

62740/Ed1/CDV © IEC(E) 47 1327

C.11.3

Strengths and weaknesses

1328

The strengths of AcciMaps are as follows:

1329 1330



as there is no taxonomy or guidance, AcciMaps has the potential to be highly comprehensive to identify causal factors across all levels of the system;

1331 1332



the linkages within and between levels helps ensure that failures are considered in the context of the things that influenced them;

1333



human error has equal focus with equipment and higher level organisational factors ;

1334



personal factors which influence decisions particularly at the lower levels are not included .

1335

The limitations of AcciMaps are as follows:

1336 1337



the lack of a taxonomy means that the factors identified are based on the team’s perception;

1338 1339



the organisational model comes from outside the analysis and there is no criterion to ensure it is adequate;

1340 1341



the result of the AcciMaps analysis is lightly constrained, therefore it is possible to derive different AcciMaps for the same focus event;

1342 1343



with no specific taxonomy it is difficult to aggregate multiple analyses to find common factors;

1344 1345



the generality of factors in the nodes is often high and can be very abstract. This makes it difficult to derive precise actions;

1346



it has a weak analytical approach to physical and equipment failures ;

1347



it does not represent the results of a causal analysis by itself.

1348

C.12 Tripod Beta

1349

C.12.1

1350 1351 1352 1353 1354 1355

Tripod Beta [21] is an incident investigation and analysis methodo logy which combines the ideas from Reason’s model (see B.3) and Barrier Analysis (see B.2), along with Rasmussen’s Generic Error Modelling System (GEMS) and Wagenaar’s Tripod causation path. It describes incidents in terms of ‘objects’ e.g. people, equipme nt, etc. being changed by ‘agents of change’, e.g. anything with the potential to change an object. It also models ‘barriers’, showing them as, for example effective, failed or inadequate barriers.

1356 1357 1358 1359 1360 1361

Tripod Beta provides a format and rules for modelling the events (focus event and the events leading up to, and after, the focus event) and linking each element together, working back ultimately to the underlying causes. A number of software packages have been developed based on these rules, but it can be used w ith or without software. The software–based techniques contains checklists derived from the models and from analysis of past events mostly in the off shore oil industry.

1362 1363

The core of a Tripod analysis is a ‘tree’ diagram representation of the causal network (see Figure C.12) which describes the focus event as a network of events and their relationships.

Overview

62740/Ed1/CDV © IEC(E) 48

1364 Figure C.12 – Example of a Tripod Beta tree diagram

1365 1366

C.12.2

Process

1367

The process for developing the Tripod Beta tree diagram is to identify the following.

1368 1369

1) The agent (hazard or hazards) that lead to the focus event and the target that was harmed.

1370 1371

2) Controls or barriers, that were missing or failed, that could have prevented the event or protected the target.

1372 1373 1374

3) Immediate causes - the human act, which resulted in the failed barrier. These are failures or errors that have immediate effect and occur at the point of contact between a human and a system (e.g. pushing an incorrect button, ignoring a warning light) .

1375 1376

4) Preconditions - psychological and situational precursors e.g. the type of human failure (slip, lapse, violation, etc.).

1377 1378 1379 1380

5) Underlying causes (latent failures) in the organisation, i.e. inadequacies in the management system, culture, etc. These can be categorised into pre -defined ‘basic risk factors’, derived from brainstorming and research into results of audits and accident investigations in the Off Shore Oil Industry.

62740/Ed1/CDV © IEC(E) 49 1381

C.12.3

Strengths and limitations

1382

The strengths of the Tripod methodology are as follows:

1383



provides a map of the focus event and its causes;

1384



can help direct the investigation and define its scope;

1385



defines the barriers in the system;

1386 1387



based on scientific research including a model of human behaviour to uncover what i s behind the observed behaviour;

1388 1389



leads the investigator to consider the reasons behind the immediate causes and human error;

1390



menu driven software is available.

1391

The limitations of the Tripod methodology are as follows:

1392



can be resource intensive;

1393 1394



leads to system level underlying causes which an organisation might not be able to accept;

1395 1396



use of the Basic Risk Factors to categorise underlying causes can be too generic and simplistic;

1397



conclusions do not lead to simple remedial actions;

1398



extensive training is generally required.

1399

C.13 Casual Analysis using STAMP (CAST)

1400

C.13.1

1401 1402 1403 1404 1405 1406

CAST [7] is a technique that examines the entire socio-technical process involved in a focus event. CAST is based on STAMP (see B.5), which is used to guide the causal analysis. CAST documents the dynamic process leading to the focus event including the socio -technical control structure as well as the constraints that were violated at each level of the control structure and why. The analysis results in multiple views of the focus event, depending on the perspective and level from which the focus event is being viewed.

1407 1408 1409 1410 1411 1412 1413 1414 1415 1416

To illustrate CAST, consider a focus event involving the contamination of a public water supply with E. coli in a small town in Ontario, Canada. Figure C.13 shows the safety control structure for the water supply of the town. There are three physical systems being controlled: the well system, the water supply, and public health. Each component in the structure controlling these processes has specific safety-related responsibilities. For example, the Ministry of the Environment provides oversight and control of the local water systems. Each component of the control structure gets feedback about the state of the process it is controlling. One common cause is that the controller gets incorrect feedback and thinks the state of the controlled process is different than it is. For example, budgets were cut and the Ministry of the Interior reduced the number of inspections and inspectors.

1417 1418 1419 1420 1421

Figure C.14 shows the analysis of the role of the local health department in the focus event, including the roles and responsibilities, the unsafe control actions, the context in which the unsafe control actions were provided, and the flaws in the process (mental) model that contributed to the behaviour. Figure C.15 shows the same thing for another component of the control structure, the water system operations mana gement.

1422 1423 1424

In a full analysis, each component of the control structure would be considered with respect to their contribution to the focus event. In most focus events, contributions can be found from every component of the control structure.

1425 1426 1427

Other features of the analysis (not shown) include examining the dynamic changes over time in the system that contributed to the focus event and the role of flawed communication and coordination.

Overview

62740/Ed1/CDV © IEC(E) 50

1428 1429

© Copyright Nancy Leveson, Aug. 2006

Figure C.13 – Control structure for the water supply in Walkerton, Canada. BGOS Medical Department of Health Safety Requirements and Constraints:  Provide oversight of drinking water quality  Follow up on adverse drinking water quality reports  Issue boil water and other advisories if public health at risk Context in which decision made:  Most recent water quality reports over 2 years old  Illness surfacing in communities outside Walkerton  E. coli most commonly spread through meat

Mental Model Flaws:  Thought adverse water quality reports were being received  Unaware of reports of E. coli linked to treated water  Thought Mr K was relaying the truth  Unaware of poor state of local water operations Coordination  Assumed the Ministry of Environment was ensuring inspection report problems were resolved

Inadequate control actions:  Advisory delayed  Advisory should have been more widely disseminated  Public health inspector did not follow up on 1998 inspection report 1430

Figure C.14 – Example CAST causal analysis for the local Department of Health

62740/Ed1/CDV © IEC(E) 51 Walkerton PUC Operations Management Safety Requirements and Constraints:  Monitor operations to ensure that sample taking and reporting is carried out  Keep accurate records  Update knowledge as required Context in which decision made:  Complaints by citizens about chlorine taste in drinking water  Improper activities were an established practice for 20 years  Lacked adequate training and expertise

Mental Model Flaws:  Believed sources for water system were generally safe  Thought untreated water was safe to drink  Did not understand health risks posed by under chlorinated water  Did not understand risks of bacterial contaminants like E. coli  Did not believe guidelines were a high priority

Inadequate control actions:  Inadequate monitoring and supervision of operations  Adverse test results not reported when asked  Problems discovered during inspections not rectified  Inadequate response after first symptoms in community  Did not maintain proper training or operations records 1431 1432

Figure C.15 – Example CAST causal analysis for the local public utility operations management

1433

C.13.2

Process

1434

CAST has the following steps.

1435

1) Identify the system(s) involved in the focus event.

1436

2) Identify the system constraints associated with the focus event.

1437 1438 1439 1440

3) Document the control structure in place. This structure includes the roles and responsibilities of each component in the structure as well as the controls provided or created to execute their responsibilities and the relevant feedback (if any) provided to help them do this.

1441

4) Determine the proximate events leading to the focus ev ent.

1442 1443 1444 1445

5) Analyse the focus event at the physical system level. Identify the contribution of each of the following to the events: physical and operational controls, physical failures, dysfunctional interactions, communication and coordination flaws, and unhandl ed disturbances. Determine why the physical controls in place were ineffective.

1446 1447 1448

6) Moving up the levels of the control structure, determine, as follows, how and why each successive higher level allowed or contributed to the inadequate control at the current level.

1449 1450 1451 1452

a) For each system constraint, either the responsibility for enforcing it was never assigned to a component in the control structure or a component (s) did not exercise adequate control to ensure their assigned responsibilities were enforced in the components below them.

1453 1454

b) Identify unsafe decisions or control actions, including actions provided by software, operators, managers, regulators, etc.

1455 1456 1457 1458 1459 1460

c) Any human decisions or flawed control actions need to be understood in terms of the information available to the decision maker as well as any information that was not available, the behaviour shaping mechanisms (the context and influences on the decision making process), the value structures underlying the decision, and any flaws in the process models (mental models) of those making the decisions and why those flaws existed.

62740/Ed1/CDV © IEC(E) 52 1461 1462

7) Examine the overall coordination and communication (including missing feedback) that contributed to the focus event.

1463 1464

Although the process is described in terms of steps, the process need not be linear nor must one step be completed before the next one is started.

1465

C.13.3

1466

The strengths of CAST are as follows:

1467



looks back through time to determine how the system evolved to a state of high risk;

1468 1469



identifies the social and managerial factors and not just the human operations or technical system failures;

1470 1471



does not impose any particular social theory on the analysis, any model of social behaviour could be used to generate the analysis results.

1472

The limitations of CAST are as follows:

1473 1474 1475



it is not possible to graphically present the analysis, as the inclusion of indirect relationships between causal factors means that circles and arrows (which depict direct relationships) are not adequate to describe all the causal factors ;

1476 1477



may require more resources and time to fully understand the focus event than other methods with a more limited focus.

1478 1479

Strengths and limitations

62740/Ed1/CDV © IEC(E) 53

Annex D (Informative)

1480 1481 1482 1483

Useful tools to assist RCA

1484

D.1

General

1485

This Annex describes tools and techniques that can support the conduct of RCA.

1486

D.2

1487

D.2.1

1488 1489 1490 1491 1492

Modern data mining techniques enable a search for specific properties and conditions. Clustering analysis selects data that are closely related, and thereby identify deviating data (outliers). Modern cluster analysis can detect data t hat are closely related in one, two or more dimensions and thereby analyse products or processes that are closely related and identify deviating data points (outliers).

1493 1494 1495 1496 1497

In RCA data mining and clustering analysis can give valuable clues and help to confirm or reject potential root causes. In some cases e.g. aerospace and medical equipment it is required to store batch numbers for the finished products and the associated component batch numbers and raw material batch numbers. This information can provide a us eful structure for identifying correlations which hint at possible causal relations.

1498

D.2.2

1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520

A company observes 12% failures of stocked items. Analysis showed that a plastic part is broken. The start of the 12% failure pattern is identified as a batch nu mber and a manufacturing date. This date is correlated with delivered batches of the plastic parts. There is no correlation. There is no correlation either to the batches of plastic raw materials. However there is a correlation with the batches of a spring that load the plastic part. The problem started 3 days after a new batch of springs was received. It is investigated what changes were made between this batch of springs and the previous batches. The difference is a new surface treatment against corrosion. This surface treatment process is investigated and contains a note that this treatment may interfere with certain plastic materials. Further analysis show that the corrosion protection accelerates crack propagation in this plastic. Analysis of the data sheet for the plastic material showed a warning against local overload that may cause cracks. So the conclusion is that a causal hypothesis can be formulated: that a plastic part is continuously overloaded and undergoes fracture through local overload, whi ch fractures propagate in a manner accelerated by the new anti -corrosion treatment of the springs. These cracks are then propagating accelerated by the new anti-corrosion treatment of the springs. A failure analysis have previously shown a pattern on the f racture surface consisting of crack propagation lines originating in the points of contact with the spring and a brittle surface from the final fracture. The causal-explanatory hypothesis may be confirmed to a given level of confidence by experiment: setti ng up a number of plastic parts with and without the new treatment. If it is observed that plastic parts with the new treatment predominantly fail, one may conclude that the causal hypothesis is confirmed to the appropriate degree of confidence using stand ard methods of statistical inference.

1521

D.2.3

1522 1523 1524 1525 1526 1527 1528 1529

A number of soldering failures is observed in the field. The manufacturing weeks for the failed products are plotted in calendar time. It is observed that the manufacturing dates of the products with soldering failures are clustering in certain weeks. A causal hypothesis may be formulated on the basis of the initial observation, which is then confirmed to a given degree of confidence using standard statistical inference on the process -control data from manufacturing, which indicate that the soldering process in these weeks was likely not performed under appropriate control. The conclusion is, to a high level of confidence, that a root cause of the soldering failures is insufficient process control of the solde r process.

Data mining and clustering techniques Overview

Example 1

Example 2

62740/Ed1/CDV © IEC(E) 54 1530

D.2.4

Example 3

1531 1532 1533 1534 1535 1536 1537 1538 1539 1540

A component is tested on a test board by twisting the board. The number of twist to failure is plotted on a Weibull plot (see IEC 61649). The analysis identifies a ”weak” and a ”strong” population (see IEC 61163-1). One component from the weak and one component from the strong population is analysed by cross sectioning of the micro Ball Grid Array (BGA) solder balls. It is noted that the component from the weak population have a large number of large voids in the solder balls, while the solder balls from the strong population have no or few small voids. It is concluded that a root -causal hypothesis is formulated, that voids in the solder balls of the micro BGA are a root cause of the incident events. The root -causal hypothesis is confirmed by collecting data on operational use and observing through analysis of the data that the reduction in voids correlates with successful use of the component .

62740/Ed1/CDV © IEC(E) 55

Annex E (Informative)

1541 1542 1543 1544

Analysis of human performance

1545

E.1

General

1546 1547 1548 1549

People at any level in an organisation make decisions or perform or omit actions which may play a part in the events leading to a focus event. Human performance may be above or below expectation and the impact may be positive or negative. Decisions can be correct in the circumstances in which they were made but turn out to have unintended results.

1550 1551 1552 1553

People may make errors, be misguided or misinformed, be inappropriately motivated, may be trying to perform correctly or may knowingly violate rules. Analysis of human aspects of causation is complex and generally requires specialist expertise if it is required to go beyond identifying what occurred to seeking why and hence making recommendations.

1554

E.1.1

1555 1556 1557

Analysis of human failures starts by identifying the error mode. This is the extern al manifestation of the error i.e. what is observed to be done (or not done). Examples of error modes are:

1558



omitted;

1559



too early;

1560



too late;

1561



too much;

1562



too little;

1563



wrong direction;

1564



wrong object;

1565



wrong action;

1566



wrong sequence.

1567 1568 1569 1570

There are then a number of different taxonomies for categorising and analysing causes of these errors. They differ in the number and types of classifications they consider and in the models of human behaviour on which the taxonomies are based and on where the most emphasis is placed. The following are generally considered.

1571 1572 1573

1) The internal error mode and error mechanism. This is the reasons behind the error in psychological meaningful terms e.g. for an error mode of took wrong turn in car , the internal error mode and mechanism might be incorrect decision due to habit intrusion.

1574 1575

2) Inherent problems of the task e.g. conflicting goals, planning problems, constraints, cognitive demands etc.

1576 1577 1578

3) Performance Shaping Factors (PSF). These are the conditions of the technical or organisational environment or internal to a person which affect how well a task will be performed (See IEC 62508 Human aspects of dependability [22]).

1579 1580 1581 1582 1583 1584

Some models also include an analysis of the flow of information and feedback without which correct judgements are unlikely to be made. The importance of these methods is that they first identify the psychological error mechanism before identifying why the error was made. For example if the error mechanism is not due to a lack of knowledge or skill then further training is unlikely to be useful. If a decision is made to violate a procedure then the reasons why this occurred should be investigated rather than assuming increased supervision is the solution.

Analysis of Human failure

62740/Ed1/CDV © IEC(E) 56 1585 1586

Two examples of methods which can be used to analyse the causes of human failure which illustrate these principles are:

1587



Technique for Retrospective and Predictive Analysis of Cognitive Errors (TRACEr) ;

1588



Human Factors Analysis and Classification Scheme (HFACS) .

1589 1590

E.2

1591

E.2.1

1592 1593

TRACEr [23] was developed for use in air traffic control. TRACEr, has eight modules as shown in Figure E.1, which can be divided into the following three categories:

1594 1595



the context in which the error occurred i.e. the nature of the task, the environment and the PSFs;

1596 1597 1598



the production of the error i.e. the External Error Modes (EEM), Internal Error Modes (IEM), the Psychological Error Mechanisms (PEM) and the information on which the individual based their actions;

1599



the detection and correction of the error .

1600 1601 1602

The error production modules are based on the cognitive processes involved when a person perceives something needs to be done and takes action e.g. perception, memory, decision making and action (see Figure E2).

Technique for Retrospective and Predictive Analysis of Cognitive Errors (TRACEr) Overview

Environment

Task What task failed? E.g. Radar monitoring error Task Error How was the error recovered? E.g. Plan modification Correction EEM What happened? E.g. Action omitted Information

How was the error detected? E.g. Outcome feedback Detection

What did the controller detect late? E.g. FL IEM What perceptual function failed and in what way did it fail? E.g. late visual detection. PEM How did the error occur? E.g. perceptual tunnelling Cognitive Domains

1603 1604 1605

Figure E.1 – Example of an TRACEr Model [23]

PSF What other factors contributed to the errors or recovery? E.g. Traffic complexity

62740/Ed1/CDV © IEC(E) 57 Cognitive Domain

Cognitive Function

Relevant Keywords

Detection

None, late, incorrect

Late detection

Identification

None, late, incorrect

Misidentification

Recognition/Comparison

None, late, incorrect

Hearback error

Recall perceptual information

None, incorrect

Forget temporary information

Previous action

None, incorrect

Forget previous actions

Immediate/current action

None, incorrect

Forget to perform action

Prospective memory

None, incorrect

Prospective memory failure

Stored information (procedural and declarative knowledge)

None, incorrect

Mis-recall stored information

Judgement

Incorrect

Misprojection

Planning

None, too little, incorrect

Underplan

Decision making

None, late, incorrect

Incorrect decision

Timing

Early, late, long, short

Action too early

Positioning

Too much, too little, incorrect, wrong direction

Positioning error, overshoot

Selection

Incorrect

Typing error

Communication

None, unclear, incorrect

Unclear information transmitted

Vision Perception Hearing

Memory

Judgement, Planning and Decision Making

Action Execution

Example IEM

Figure E.2 – Generation of Internal Error modes

1606 1607

E.2.2

Process

1608

A TRACEr model is created using the following steps.

1609 1610 1611

1) Analyse the task being carried out and identify any environmental or situational factors that might affect human performance (PSFs), which includes task complexity, knowledge and experience of the person, the ambient environment etc.

1612 1613

2) Identify EEMs, which are classified in terms of selection and quality, timing and sequence, and communication (see Table E.1).

1614 1615

3) Identify the IEMs, which describe what cognitive function failed and in what way, the taxonomy for which is shown in Figure E.2.

1616 1617

4) Identify the information issues associated with the IEM i.e. what i nformation was misperceived, forgotten, misjudged or mis-communicated.

1618 1619

5) Identify the PEMs, which are the cognitive biases known to affect performance within each cognitive domain (see Table E.2).

1620 1621 1622

6) Review the error detection process, which is how the person became aware of the error, what medium informed them of the error and what external factors improved or degraded detection.

1623 1624

7) Consider correction i.e. what was done to correct the error, did other factors internal or external improve or degrade the error corr ection.

62740/Ed1/CDV © IEC(E) 58 Table E.1 – External Error Modes

1625 Selection and quality

Timing and sequence

Communication

Omission Action too little Action too much Action in wrong direction Right action wrong object Wrong action on right object Wrong action wrong object Extraneous act

Action too long Action too short Action too early Action too late Action repeated Mis-ordering

Unclear information transmitted Unclear information received Information not sought/obtained Information not transmitted Information not recorded Incomplete information transmitted Incomplete information received Incomplete information recorded Incorrect information recorded

Table E.2 – Psychological Error Mechanisms

1626 Perception

Memory

Decision making

Action

Expectation bias

Similarity interference

Incorrect knowledge

Manual variability

Spatial confusion

Memory overload

Lack of knowledge

Perceptual confusion

capacity

Negative transfer

Failure effects

Mis-learning

Integration failure

Perceptual tunnelling

Insufficient learning

Misunderstanding

Stimulus overload

Infrequency bias

Cognitive fixation

Vigilance failure

Memory block

False assumption

distraction

Distraction /preoccupation

Prioritisation failure

Perceptual discrimination failure

to

consider

Spatial confusion side

Habit intrusion Perceptual confusion Mis-articulation Environmental intrusion Other slip Distraction preoccupation

Risk negation or tolerance Risk recognition failure Decision freeze

1627

E.3

Human Factors Analysis and Classification Scheme (HFACS)

1628

E.3.1

1629 1630 1631

HFACS [24] was developed by behavioural scientists in the Unites States Navy and analyses the causes of human error based on Reason’s model ( see B.3). There are four levels of consideration based on Reason’s slices of Swiss cheese:

1632



organisational influences;

1633



supervision;

1634



preconditions for unsafe acts;

1635



unsafe acts.

1636 1637

Some applications add a fifth level above organisational influences relating to legislation and government.

1638

E.3.2

1639 1640 1641 1642

Each level is subdivided into categories, and examples are given of possible causes within the category. Different applications use the same categories (shown in the boxes below) but may have different examples depending on the industry, and may provide a few examples or a more detailed check list. Examples of the four levels are shown in Figure E.3 to E.6.

1643 1644 1645

Consideration of cause starts with Level 1 so that precursor s for the act in question take account of the type of error involved then continues up through the levels seeking weaknesses that contributed to the focus event.

Overview

Process

62740/Ed1/CDV © IEC(E) 59

Unsafe Acts

Violations

Errors Decision errors actions of operators proceed as intended but prove inadequate to achieve required result E.g • followed inappropriate procedure • exceeded ability

Routine (habitual tolerated by management) Exceptional Isolated departure from authority- not tolerated

Skill based errors E.g • Omitted step • inadvertent operation of controls • fail to prioritise attention • Poor technique Perceptual Error • Decision made based on faulty information • Misjudged distance/angles/speed • Spatial disorientation • Visual illusion

1646 Figure E.3 – Level 1 Unsafe Acts

1647

Preconditions

Environmental factors

Condition of operators

Physical Environment E.g. • weather • heat • lighting • noise Technological environment. E.g • design of equipment • displays and controls • human computer interface • Design of task

1648 1649

Physiological state E.g . • illness • fatigue • intoxication Psychological state E.g • stress • motivation • haste • channelised attention • overconfidence • loss of situational awareness Physical and mental limitations E.g • insufficient reaction time • visual limitation • short term memory limittion • beyond skill level

Figure E.4 – Level 2 Preconditions

Personal Factors

Personal readiness (off duty activities that affect performance

Staff resource management E.g. • communication • planning • team work

62740/Ed1/CDV © IEC(E) 60 Supervision

Planned inappropriate operations

Inadequate supervision E.g.failed to provide • leadership • oversight • incentives • guidance • training

Failed to correct problem

E.g • insufficient time allowed • insufficient opportunity for rest • inappropriate procedures • inadequate risk control

Supervisory violation

E.g. failed to correct • document error • violation

E.g • inadequate documentation • Failed to enforce rules and standards • Authorised unnecessary risk taking

1650 Figure E.5 – Level 3 Supervision Issues

1651

Organisational influences

Resource management

Organisational climate Policies

HR selection staffing levels training Budget Cost cutting

1652 1653

Equipment Fit for purpose

Structure chain of command communication formal accountability

Organisational processes Operations Incentives Pressures Appraisal processes Schedules Procedures Standards Documentation Instructions

Culture Norms and rules Values and beliefs

Oversight

Figure E.6 – Level 4 Organisational Issues

62740/Ed1/CDV © IEC(E) 61 1654

Bibliography

1655 1656

[1]

ISO Guide 73: 2009 Risk management – Vocabulary

1657 1658

[2]

IEC 60300-1 ED.3, Dependability management, Part 1: Guidance for management an d application

1659 1660

[3]

Haddon, W Jr.: Energy Damage and the Ten Counter-Measure Strategies, The Journal of the Human Factors and Ergonomics Society, 1973

1661

[4]

Hollnagel, E.: Barriers and Accident Prevention, Ashgate Publishing Limited, 2004

1662

[5]

Reason, J.: Human Error Cambridge University Press, 1990

1663 1664

[6]

Checkland, P.: Systems Thinking, Systems Practice: Includes A 30-Year Retrospective, Wiley Pages: 416, 1999

1665

[7]

Leveson, N.: Engineering a Safer World, MIT Press, 2012

1666 1667

[8]

Rasmussen, J.: Risk Management in a Dynamic Society: A Modelling Problem, Safety Science Volume 27, Issue 2-3, Pages: 183-213, 1997

1668 1669

[9]

Technical Research and Analysis Center: Events and Causal Factors Analysi s, SCIEDOE-01-TRAC-14-95, 1995

1670 1671

[10]

Benner, L. Jr: Accident Investigations: Multilinear Event Sequencing Methods, Journ al of Safety Research 7, 67-73, 1975

1672 1673

[11]

Hendrick, K. and Benner, L. Jr: Investigating Accidents wit h STEP, Marcel Dekker Inc, 1986

1674 1675

[12]

Monteau, M.: Analysis and reporting: accident investigation. Encyclopaedia of Occupational Health and Safety, 57-22:26, ISBN 1:92-2-1-103290-6, 1982

1676

[13]

Sanders, J.: Introduction to Why-Because Analysis, 2012

1677 1678

[14]

US Nuclear Regulatory Commission: NUREG 0492, Fault Tree Handbook, January, 1981

1679

[15]

IEC 61025, Fault tree analysis

1680

[16]

Ishikawa, K.: Guide to Quality Control, Asia Productivity Organization, 1986

1681 1682

[17]

Fahlbruch, B. and Schöbel, M.: SOL - Safety through organizational learning: A method for event analysis. Safety Science, Volume 49, Pages 27–31, 2011

1683

[18]

Johnson, W. and Dekker, M.: MORT Safety Assurance Systems, 1980

1684 1685 1686

[19]

Svedung, J. and Rasmussen, J.: Graphic representation of Accident Scenarios: Mapping System Structure and the Causation of Accidents, Safety Science, Volume 40, Pages 397-417, 2002

1687 1688

[20]

Svedung, J. and Rasmussen, J.: Risk Management in a Dynamic Society: A Modelling Problem, Safety Science, Volume 27, Pages 183-213, 1997

1689 1690 1691

[21]

Energy Institute: Tripod Beta: Guidance on the use of Tripod Beta in the investigation and analysis of incidents, accidents and business losses, 2013, http:www.tripodfoundation.com.

1692

[22]

IEC 62508 Ed. 1.0 B: 2010, Guidance on human aspects of dependability

62740/Ed1/CDV © IEC(E) 62 1693 1694 1695

[23]

Shorrock, S. and Kirwan, B.: Development and application of a human error identification tool for air traffic control, Applied Ergonomics, Volume 33, Pages 319– 336, 2002

1696 1697 1698

[24]

Shappell, S. and Wiegmann, D.: Applying Reason: The Human Factors Analysis and Classification System (HFACS), Human Factors and Aerospace Safety, Volume 1, Pages 59-86 , 2001

1699

[25]

ISO/IEC 31010 Ed. 1.0: Risk Management - Risk Assessment Techniques

1700

[26]

ISO 31000: 2009 Risk management – Principles and guidelines

1701 1702 1703 1704 1705 1706 1707

BS en 62740. Root Cause Analysis (RCA). (Draft)

Short Description

Description

Comments

We need your help!