(2018-Chemweno) - Risk Assessment Methodologies in Maintenance Decision Making A Review

March 21, 2023 | Author: Anonymous | Category: N/A

Share Embed Donate

Report this link

Short Description

Download (2018-Chemweno) - Risk Assessment Methodologies in Maintenance Decision Making A Review...

Description

Reliability Engineering and System Safety 173 (2018) 64–77

Contents lists available at ScienceDirect

Reliability Engineering and System Safety journal homepage: www.elsevier.com/locate/ress

Risk assessment methodologies in maintenance decision making A review of dependability modelling approaches

Risk assessment methodologies in maintenance decision making: A review of   dependability modelling approaches Peter Chemweno a , ∗,  Liliane Pintelon a,  Peter Nganga Muchiri b,  Adriaan Van Horenbeek a a Center   for   Industrial  Management,  KU   Leuven, Celestijnenlaan 300A,  BE-3001  Heverlee,  Belgium

b School of   Engineering,  Dedan  Kimathi University  of  Technology,  P.O.  Box  657-10100,  Nyeri,  Kenya

󰁡 󰁲 󰁴 󰁩 󰁣 󰁬 󰁥 󰁩 󰁮 󰁦 󰁯

󰁡 󰁢 󰁳 󰁴 󰁲 󰁡 󰁣 󰁴

Keywords: Asset failure Risk assessment Dependability modelling Uncertainty Maintenance decision making

The risk assessment process performs an important role in maintenance decision making, through structuring the process of  identifying, prioritizing, and thereafter formulating effective maintenance strategies. However, the effectiveness of  the implemented strategies is in󿬂uenced by the extent to which asset failure dependencies are taken into account during the risk assessment process. In the literature, several risk assessment methods are discussed that vary widely depending on factors such as modelling of  failure dependencies in dynamic assets, and treating uncertainties associated with sparse reliability data. These factors invariably in󿬂uence the extent to which different risk assessment methods are applicable for maintenance decision making. This article reviews the state-of-the-art knowledge on risk assessment in the context of  maintenance decision making, with a particular focus on dependability modelling methods. The review structures knowledge on dependability modelling approaches, treatment of  uncertainty, and highlights important challenges researchers and practitioners are likely to experience when performing risk assessment in the context of  maintenance decision making. The challenges highlighted include the resolution complexity of  methods such as Bayesian networks, especially while assessing risks of  assets with complex failure dependencies. © 2018 Elsevier Ltd. All rights reserved.

1. Introduction

In recent years, a wide range of  methods have been developed and applied for assessing risks and safety hazards in diverse sectors such as process industries, or power plant facilities [1].  In the maintenance decision making domain, risk assessment is performed with a view of  assist- ing practitioners systematically identify, analyse, evaluate, and mitigate



failure in assets [2,3] . Among most commonly applied methods in this risks context Failurethe (FMEA), include the Mode and Effect Analysis Fault Tree Analysis (FTA) and Bayesian network (BN). Of  these, the FMEA is widely used for prioritizing equipment failures and selecting appropriate maintenance strategies [4].  However, the FMEA is associated with important deficiencies, and in particular, the conventional form of  the risk priority number (RPN), an important metric for quantifying asset failure risk [5,6] . In addition, the FMEA ignores failure dependencies in assets, which in turn, negatively in󿬂uences the risk assessment process [5] . In the literature, several state-of-the-art reviews of  risk assessment methods are presented. Examples includes Li [7] where methods such as Markov models and Monte Carlo simulation are discussed in the con-

text of  assessing risks of  failure of  power utility systems. The reviewed methods, however, insufficiently addressed dependability modelling aspects. In the context of   maintenance decision making, Fraser et al. [8] reviewed methods for assessing equipment failure risks and useful for deriving maintenance decisions. Notably the methods are evaluated considering two maintenance concepts; Risk based Maintenance (RBM) and the Reliability Centered Maintenance (RCM). The RCM embeds the

FMEA On which as mentioned, failure dependency aspects. the other hand, theignores RBM approach embeds faultmodelling trees, which although models asset failure dependencies, ignores temporal aspects that are crucial for effective risk assessment, and optimal maintenance planning. More recently, Aven [9] reviews trends and advances of  risk assessment methods where he evaluates foundational challenges associated with applicability of  different methods for decision making. This includes aspects such as treatment of  uncertainty, however, failure dependability modelling aspects are not explicitly addressed in the review. Smith [10] also reviews methods applicable for quantifying risks of  op- erable assets characterized with sub-optimal reliability and availability. Examples of  methods reviewed includes Hazard and Operability Anal- ysis (HAZOP), and the Fault Tree Analysis (FTA). However, suitability

∗

Corresponding author. E-mail addresses: [email protected] (P. Chemweno), [email protected] (L. Pintelon), [email protected] (P.N. Muchiri),

[email protected] (A. Van Horenbeek). https://doi.org/10.1016/j.ress.2018.01.011 Received 30 June 2016; Received in revised form 6 January 2018; Accepted 20 January 2018 Available online 6 February 2018 0951-8320/© 2018 Elsevier Ltd. All rights reserved.

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

in maintenance decision making, is associated with availability and sufficiency of  maintenance data. Fig. 1 illustrates the organization of   this review. Section 2 reviews dependability modelling concepts where methods such as Fault trees, Bayesian networks, and Stochastic Petri- nets are evaluated. Section 3 reviews concepts for treating aleatory and epistemic uncertainty while Section 4 reviews different Bayesian inferencing methods associated with Bayesian networks. Examples here include methods such as analytic approximation, data augmentation, and Markov chain Monte Carlo simulation. Section 5 reviews methods for quantifying epistemic uncertainties in the context of  dependability modelling where methods such as Fuzzy theory, Interval analysis, and the Dempster-Shafer Theory of  Belief  (DSTE) are discussed. Section 6 dis- cusses the implications of  the review for theory and practice, and further points out directions for future research. Section 7 draws important con- clusions.

Abbreviations

AHP ANP AND BE BN BUGS CBM CMMS

Analytic Hierarchy Process Analytic Network Process AND gate for the static fault tree Basic Event Bayesian Networks Bayesian Inference Using Gibbs Sampling Condition Based Maintenance Computerized Maintenance Management System

DAG Dynamic BN DIC DSTE E-M FMEA FTA HAZOP IVP McMC MCDM M-H OR PAND RBD RBIM RCA RCM RPN SPARE SPN TE VOTING

Directed Acyclic Graph Dynamic Bayesian Network Deviance Information Criterion Dempster-Shafer Theory of  Evidence Expectation-Maximization Algorithm Failure Mode and Effect Analysis Fault Tree Analysis Hazard and Operability Analysis Interval-Valued Probability Markov Chain Monte Carlo Multi-Criteria Decision Making Metropolis-Hastings Algorithm OR gate for the static fault tree Priority AND Gate Reliability Block Diagrams Risk-Based Inspection and Maintenance Root Cause Analysis Reliability Centered Maintenance Risk Priority Number SPARE gate for the dynamic fault tree Stochastic Petri-net Top Event VOTING gate for the dynamic fault tree

2. Dependability modelling in risk assessment

Technical assets are usually characterized by complex dependencies between system components, which in turn, in󿬂uences the extent to which asset failure risks are assessed, and maintenance decisions reached [14] . In absence of  system dependencies, the risk assessment problem reduces a single component analysis where failure events are assumed as independent. For complex systems dependencies, Weber et al. [15] suggest that dependability modelling should consider the following aspects: • •

•

•

Complexity and system size, Inclusion of  temporal aspects and failure propagation in specific time instances, Inclusion of   empirical and/or qualitative knowledge on failure events at different abstraction levels. Inclusion of  failure dependencies and treating uncertainties related to data availability, and estimation of  model parameters.

Weber et al. [15] further describe several examples of  dependability- modelling methods which includes among others:

of  these methods for failure dependability modelling, and maintenance decision support is not sufficiently addressed. Modarres, Zhou et al. [11] evaluates advances in probabilistic risk assessment of  safety-critical installations, where the importance of  methods such as fault trees and Bayesian belief  networks are highlighted for modelling failure dependencies. Similarly, suitability of  the reviewed approaches for maintenance decision making is not clearly addressed. A review of  fault tree analysis and its application for modelling failure dependencies in complex assets is presented in Kabir [12],  likewise, applicability for maintenance decision making is not clearly discussed. Evaluating the above reviews highlights several limitations or gaps which motivates this review article. Firstly, the reviews tend to focus on specific application contexts such as safety or risk assessment in process industries. However, since risks are domain specific, application of   specific risk assessment methods varies depending on the application context [13] . For instance, risks in civil engineering structures such as bridge collapse are rare and periodic, unlike technical failures of  mechanical systems, which occurs more frequently over the operational lifetime of  the equipment, e.g. bearing wear. Secondly, the reviews insufficiently evaluates the suitability of   the reviewed risk assessment methods for failure dependability modelling, especially in the context of  maintenance decision making. The decision making aspects may include aiding root cause analysis, or selecting appropriate maintenance strategies. Hence, this article attempts to bridge the aforementioned gaps by reviewing risk assessment methods discussed in the literature, while fo- cusing on their applicability for maintenance decision support in view of   modelling failure dependencies in assets. The review also evaluates how the methods address aspects such as treatment of  uncertainty, which

• •

• •

Fault trees, further classified into Static and Dynamic fault trees; Bayesian networks, classified into Static and Dynamic Bayesian networks; Combined Fault trees and Bayesian network models, and Stochastic Petri-nets

The following sections reviews the suitability of  the above mentioned methods for assessing asset failure risks in the context of  dependability modelling and maintenance decision support. 2.1.  Fault  trees

Primarily, the fault tree models failure dependencies in a hierarchical form, with a top failure event (TE) at the system level, intermediate failure events (IE) at the sub-system levels, and basic failure events (BE) at the component level. The dependencies are modelled through logical AND OR gates. Assuming failure events as statistically independent, the probability of  occurrence of  the TE modelled through the AND gate is expressed as follows: 󝠵

∏

  (󽠵 󽠵 􍠵  ) =

  

(1)

= =1 1

The OR gate, on the other hand, presumes occurrence of  two or more failure events prior to observing the TE. The probability of  occurrence of  the TE is hence expressed as the sum of  input probabilities of  independent BE denoted as: 󝠵

∑ )=

  (󽠵 󽠵  􍠵

 

= =1 1

65

(2)

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

Probability theory Static fault trees

Fuzzy theory

Dynamic fault trees

Interval analysis

Fault trees

    s     e       h     c     a     o     r     p     p     a     y      t       i       l       i       b     a       d     n     e     p     e      D

Dempster Shafer theory of belief

Hybrid fault tree and Bayesian network

Probability theory Analytic approximation approach

Static Bayesian network Bayesian networks Dynamic Bayesian network

Data augmentation approach

T r e a t    m e n t    o f    u n c e r t    a i    n t      y

Markov chain Monte Carlo simulation approach Simulation approach

Stochastic Petri-nets Fig. 1. Framework for the review.

•

Depending on the inclusion of  temporal aspects, the gates may be static or dynamic. In maintenance decision making, the static fault tree is embedded in the risk based maintenance concept where several examples are discussed in the literature, for instance, see [16–18].  Authors, for instance, Wu [19] propose a formalism which integrates well-known methods such as the FMEA for modelling failure events. Such integrated formalism are rather intuitive to users since resolving equipment failure probabilities is computationally feasible as compared to dependability modelling methods discussed in latter sections of  this review. Bhangu et al. [20] propose a static fault tree formalism for assessing the reliability and failure risks of  a thermal power plant installation where their approach relies on fault data and associated outage hours. Their study suggests alternative maintenance policies for optimizing power plant availability. Choi and Chang [21] also apply the fault tree formalism for assessing the reliability of  seabed storage tanks where their approach relies on reliability data for modelling basic fault events. They also suggest alternative repair strategies for optimizing system availability. Taheriy- oun and Moradinejad [22] integrate a Monte Carlo simulation approach to a fault tree formalism and apply the approach for modelling failure dependencies of  water treatment equipment. Their approach considers human factor aspects as contributors to top event failures. McNelles et al. [23] compare static fault tree formalisms with the dynamic 󿬂ow graph formalism, the latter, for modelling temporal dependencies. They highlight the challenge of  resolving cut-sets for static fault trees, especially for systems characterized with dynamic time steps. Furthermore, to cope with sparse reliability data, which is often an important pre-requisite for modelling static dependencies in technical assets, static fuzzy fault trees are suggested, and described in several application cases, and discussed in more detail in Section 5.   Nonetheless, although considered intuitive for modelling failure dependencies in technical assets, in the static form, the fault trees are associated with important deficiencies that are primarily linked to inclusion of  temporal aspects inherent in dynamic systems. For this reason, dynamic fault trees are proposed where dynamic gates are incorporated. In the literature, different dynamic logical gates are proposed [24]:   •

•

•

Functional dependency  (FD) gate which models instances where the

trigger failure event simultaneously leads to failure of  dependent systems; SPARE  gate which models the failure events of  redundant components; VOTING gate, which models a failure instance where at least k out on n dependent components/events occur.

The use of   dynamic fault trees for maintenance decision support is discussed in the literature. Notably, Ge and Yang [25] propose a modelling formalism based on dynamic binary decision trees where their methodology adapts the Shannon’s decomposition theorem, which scales down the number of  disjoint calculable cut sets, efficiently resolving dynamic gates. Wang et al. [26] propose a dynamic fault tree formalism for assessing the reliability of  non-repairable systems. Their formalism considers the impact of  probabilistic failure dependencies on

critical et al. [27] introduce a novel formalism,system they define asManno which components. the Adaptive Transitions Systems. Their proposed formalism embeds efficient semantics for modelling failure dependencies of  repairable systems. More recently, Chiacchio et al. [28] propose a dynamic fault tree formalism which incorporates deterministic and stochastic dependencies in󿬂uencing complex non-repairable systems. Their formalism incorporates hybrid basic failure events, of  which their failure distribution evolves with time. Salehpour–Oskouei and Pourgol–Mohammad [29] propose a formalism exploiting the Priority AND gate for assessing the reliability of  sensor components attached to equipment for col- lecting health data. Their formalism exploits a Monte Carlo simulation approach for quantifying the probability of  the top event failure of  a steam turbine system. For sparse reliability data, Tu et al. [30] propose a novel fuzzy dynamic tree formalism for modelling the reliability of  safety-critical avionic components. Their formalism models uncertainties associated with sparse failure events, which are assigned fuzzy valued estimates. Volk et al. [31] propose a novel formalism which exploits integrated state-space reduction methods for efficiently resolving dynamic gates. Among the methods integrated in their formalism include Markov chains, which are applied for resolving the mean time to failures of  com-

Priority   AND (PAND) gate which models the sequence in which de-

pendent failures occur once a failure event is initiated, 66

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

plex dynamic systems. Additional formalisms apply sequential binary decision diagrams, and timed dynamic fault tree analysis, the latter, a variation of  the conventional dynamic fault tree analysis are discussed in the literature, for instance, see Peng et al. [32],  Ge et al. [33] and Ge et al. [34] . However, it is important to note that in the aforementioned studies, dynamic gates are resolved largely analytically, i.e. through sequence algebra or Markov models. Often, these resolution approaches are computationally intensive, especially for systems with complex dynamic dependencies. Moreover, Markov models are further associated with deficiencies such as; (i) the state space explosion problem, (ii) limited to modelling dynamic dependencies defined through exponential distribution functions. Hence, to overcome challenges such as the state explosion problem, approximate or simulation resolution approaches are proposed, for instance, Monte Carlo simulation and Stochastic Petri-nets. Simeu-Abazi et al. [35] propose an approach where a modularized fault tree scheme is translated into equivalent Petri-nets, hence enhancing the modelling 󿬂exibility of   systems with complex dependencies, of   which dynamic gates are resolved via Markov models. Codetta Raiteri [36] further extend the versatility of  complex systems, where they propose a framework integrating three formalisms; parametric fault tree, dynamic fault tree, and repairable fault tree. The parametric fault tree here models dependencies of  repairable systems. Flammini et al. [37] also propose a multi- formalism modular approach, which incorporates generalized Stochas- tic Petri-nets, fault trees, and repairable fault trees. Their formalism is applied for assessing the reliability of  railway signalling systems. Tu- ran et al. [38] , propose a dynamic fault tree formalism for assessing the reliability of  maritime diving support vessel. Their formalism incorporates time-dependent dynamic gates for modelling failure dependencies through which, appropriate maintenance and/or repair sequences are proposed. More recently, Rauzy and Blériot-Fabre [39] propose a formalism through which dynamic fault trees are translated into equivalent guarded transition systems, the latter, a form of  generalized stochastic Petri-nets. Their formalism models dependencies of  repairable systems, a challenge noted for systems modelled through dynamic fault trees. Several studies also propose efficient approaches for resolving dynamic gates modelled through Markov models. Notably, Chiacchio et al. [40] propose a Markov-based stochastic approach which is applied for assessing the reliability of  complex multi-state dynamic systems. Their formalism considers the in󿬂uence of  operation and environmental con- ditions on system failure. Yevkin [41] propose an efficient Markov modelling approach which is applied for resolving dynamic dependencies of

y2

y 1

y3 . Fig. 2. Simplified DAG with two parent nodes (y1  and y2  ) and dependent node y 3

assessing maintenance-related risks of  water supply systems. More recently, Nguyen et al. [47] apply a combined approach which embeds a stochastic Petri-net approximate resolution method. They apply their formalism for modelling repairable systems characterized with multi- state failure mechanisms. From the above, approximate (simulation) resolution approaches seemingly improve the computational effort necessary for resolving dynamic gates for systems with complex failure dependencies. However, the reliance on empirical data for fault tree dependability modelling formalisms, is seemingly a challenge, especially where such data is un- available. In addition, fault tree formalisms are limited to systems with fairly simple and straightforward dependencies. This is because of  the combinatorial explosion problem for systems with more complex dependencies. Lastly, risk metrics remain static despite emergence of  new evidences, hence, more versatile modelling formalisms incorporating Bayesian updating are suggested. 2.2.  Bayesian networks

The Bayesian networks models system failure dependencies by incorporating an efficient probabilistic inferencing framework which allows inclusion of  uncertainty associated with sparse reliability information [48].  Typically, the networks consists of  a directed acyclic graph (DAG) which contains a set of  nodes and directed arcs as depicted in Fig. 2.   Each node in the graph represents random (and independent) failure events    = (  1 ,  2,   3 , ..  󝠵 ),  while the directed arcs represent probabilistic dependencies, e.g. between random failure events [49].  In the Bayesian network, the conditional probabilities between random failure events are represented through a  joint probability distribution parameterized as follows: 󝠵

􀀨 󰀩

 ∏ 󰀨 󰁼󰁼 󰁼

􀀨

  1 ,  2 ,  3 , ...  󝠵 =

    󝠵  

= =1 1

(3)

where  p ( y    | parent ( y    )) represents the conditional relationship between i i nodes and their parents (e.g. nodes   y y 1  and   y y  2 have a parent relationship

to node  y 3  ). ). Applying Eq. (3) to the DAG in Fig. 2,  the  joint probability distribution is expressed as follows:

repairable non-repairable systems. Their such approach translates namic gatesand into equivalent Markov models that the numberdy of   transition states is minimized. Merle et al. [42] propose a Monte Carlo simulation approach, which enhances the resolution efficiency of  complex dynamic fault trees otherwise modelled through Markov models. Chiacchio et al. [43] proposed a novel Monte Carlo simulation-based tool, the MatCarloRe, for resolving the reliability of  systems modelling through hierarchical dynamic fault trees, and characterized with non- repairable basic failure events. More recently, Zhu et al. [44] propose an alternative stochastic approach for modelling dependencies in dynamic fault trees while considering system redundancies and probabilistic common cause failures. Their approach applies a non-Bernoulli se- quencing approach for generating input values to the stochastic model. Apart from approximate resolution approaches, several studies incorporate both exact and approximate (or simulation) approaches within the same modelling formalism. Examples include Chiacchio et al. [45] who compares Markov models and Monte Carlo simulation ap-

 􀀨  􀀨  􀀨 󰁼󰁼

􀀨



  1 ,  2 ,  3 =     1    2    3  1 ,  2

(4)

The dynamic Bayesian network (DBN) extends the functionality of   the static Bayesian network through the inclusion of  temporal dependencies using sequences of  time slices. The temporal transition from one time phase to the next may be represented as follows [50] : 󝠵

􀀨 󰁼󰁼  = ∏

    − −1 1

󰀨 󰁼󰁼 󰁼

􀀨 󰀩

     󝠵  

= =1 1

(5)

where    expresses the it th h node at the time instances, i = 1,2,…n, and  󝠵 (    ) expresses the temporal dependencies of  the parent nodes      within the DBN. Extending the DBN to T time slices, the following  joint joint probability distribution is derived [50]:   󽠵

󝠵

􀀨  ∏∏ 󰀨

  1 󽠵   = →

=1 =1

proaches for resolving dynamic gates. They conclude that the choice between the two resolution approaches is a trade-off between system complexity, and computational efficiency of  the specific resolution approach. Lindhe et al. [46] also apply both exact and approximate resolution approaches within the same dynamic fault tree formalism, for

􀀨 󰀩

     󝠵  

󰁼

(6)

Through the  joint joint probability distribution, the Bayesian network embeds a 󿬂exible formalism which allows modelling of  complex dependencies and updating of  risk metrics with emergence of  new failure information. 67

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

failure and maintenance time distributions, which include, exponential, normal and lognormal distributions. Nonetheless, despite the modelling 󿬂exibility of  dynamic Bayesian networks, often the formalism requires high computational effort, especially for resolving the  joint probability of  complex system failure dependencies. This resolution complexity arises where, for instance, the marginal probabilities representing independent failure events are modelled via probability distributions belonging to distinct families, e.g. Weibull or Lognormal. This aspect remains a challenge, where exponential failure transition rates are often assumed for modelling dynamic failure transition, for instance as discussed in Codetta-Raiteri and Portinale [66] .

2.2.1.  Static  Bayesian networks

Several studies demonstrate the potential use of  static Bayesian networks for modelling failure dependencies of  technical systems. In the context of  dependability modelling and maintenance decision support, Ferreiro et al. [51] propose a formalism where failure dependencies of   aircraft systems are modelled while incorporating prognostic information. They evaluate the risk reduction potential of  two maintenance strategies, i.e. preventive and corrective maintenance. Gran et al. [52] incorporate organizational, human and technical risk factors into their Bayesian modelling formalism and consequently evaluating appropriate maintenance interventions, which best mitigate oil leakages of   offshore facilities. Tian et al. [53] also model the failure dependencies of  a subma- rine casing cutting tool using a static Bayesian network formalism, and consequently, apply their formalism for diagnosing faults of  the robot- operated cutting tool. More recently, Liu et al. [54] propose an approach for translating the GO-FLOW methodology into an equivalent static Bayesian network. The GO-FLOW methodology is commonly applied for modelling system reliability such as, in their study, a pressur- ized water reactor. Askarian et al. [55] also apply a static Bayesian network formalism for diagnosing technical faults in a chemical plant. Abbassi et al. [56] integrate Bayesian networks into a quantitative risk assessment methodology, where the methodology is applied for estimating the failure probabilities of  accident/failure scenarios, and associated consequences. How- ever, their approach fails to consider dependencies between system failure events, and moreover, ignores temporal aspects, a limitation which is addressed using dynamic Bayesian networks reviewed discussed next.

2.3. Combined  fault  trees and  Bayesian network modelling  approaches

Combined formalisms present a plausible framework for translating systems modelled via fault trees to equivalent Bayesian network models. Khakzad et al. [67] propose such a formalism where systems modelled via dynamic fault tree gates are translated into equivalent dynamic network nodes, while avoiding generation of  multi-dimensional conditional probability tables representing marginal probabilities of  basic failure events. In Khakzad et al. [48] , they extend their work and propose a modular Object-Oriented Bayesian network (OOBN) formalism for modelling complex failure dependencies represented using fault trees. Their formalism decomposes complex dynamic Bayesian networks into multiple modules, each of  which is resolved independently. Kabir et al. [68] propose a translation approach through which, stochastic failure dependencies of  complex systems modelled via dynamic fault trees are also translated to equivalent dynamic Bayesian networks, and their reliability assessed. More recently, Mi et al. [69] propose an approach which translates complex dependencies of  electromechanical systems modelled through dynamic fault tree. Their approach considers epistemic uncertainty which is expressed through bounded closed intervals, and which incorporates multiple sources of   evidences, e.g. field failure data, test and design data. A similar approach integrating multiple information sources via a Bayesian inference framework is discussed in Wang et al. [70].  However, their approach does not extend to applying a Bayesian network formalism for modelling system failure dependencies. Barua et al. [71] model the sequential dependencies between, on the one hand, operation-related parameters of  chemical processes, and on the other hand, aging components vulnerable to failure. The sequential dependencies are first modelled via dynamic fault tree and translated to equivalent dynamic Bayesian network. Darwish et al. [72] incorporate the Bayesian approach to fault trees, which allows experts assign importance ranking to basic failure events. Hence, by prioritizing basic events, they consider a more optimal allocation of  maintenance resources. Chen et al. [73] propose a translation approach where reliability block diagrams commonly used for modelling and assessing system reliability of   complex dependable systems, are translated to Bayesian networks (BN). In recent years, software applications supporting this translation are discussed in the literature, for instance, the Reliability Analysis with Dy- namic Bayesian networks (RADYBAN) [74].

2.2.2.  Dynamic  Bayesian networks

The versatility of  dynamic Bayesian networks is demonstrated in several studies. For instance, Cai et al. [57] modelled the failure dependencies of  a sub-sea blowout preventer system, where they explore causal relationships between imperfect repair processes, and common cause system failures. Hu et al. [58] evaluate the in󿬂uence of  an opportunis- tic predictive maintenance strategy on system failure using a modelling formalism, which integrates dynamic Bayesian networks and the Haz- ard and Operability Analysis (HAZOP). More recently, Cózar and Gámez [59] demonstrate a modelling formalism which predicts anomalies of   complex dynamic systems, where the prediction forms the basis for triggering predictive maintenance decisions. Zhu and Collette [60] propose a Bayesian modelling formalism which they demonstrate for modelling time-dependent failure mechanisms, such as fatigue crack growth. They also consider maintenance actions, where they integrate a reliability in-

dex ( ) for triggering inspection and maintenance actions. Although applied for structural systems, the applicability of  their approach for modelling low probability (rare failure events) is also evident for mechanical systems. A similar approach for assessing the reliability of  deteriorating structural systems via a dynamic Bayesian modelling formalism is discussed in Luque and Straub [61] . More recently, Li et al. [62] integrate a dynamic Bayesian formalism into the GO 󿬂ow methodology for modelling feedback signals 󿬂ows. The inclusion of  the Bayesian network model enhances the reliability assessment potential of  the GO 󿬂ow methodology, where traditionally, dependencies between system components are one-directional. Ramírez and Utne [63] also propose a formalism for assessing the reliability of  ageing systems while optimizing maintenance policies which include, corrective, condition based maintenance, and time-based maintenance strategies. Salazar et al. [64] also propose a modelling formalism which integrates both reliability and system control performance aspects. In their study, failure dependencies are modelled through a dynamic Bayesian

2.4.  Stochastic  Petri-nets

Stochastic Petri-net (SPN) also provides a formalism for modelling system dependencies and embeds a Petri-net structure which graphically depicts dependent systems through the tuple,  = (    , 󽠵  ,    ,   ,  , ,    ),   where [75]:   • •

•

network model, which allows assessment of   system reliability. They suggest a strategy through which the control effort (for system performance) is redistributed until maintenance is undertaken, hence improv- ing system availability. Liang et al. [65] proposed a formalism for assessing the reliability of  warship systems where they consider varying

• • •

68

P = a finite set of  places containing some tokens with marked places, T = a finite set of  transitions, I t = A finite set of  input places, Ot  = A finite set of  output places, H = A set of  inhibitors, M 0 = the initial system marking vector whose places contain a non- negative number of  tokens.

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

Hybrid FTABN; 14 (13%)

Stac FTA; 17 (16%)

Stochasc PN; 13 (12%)

Dynamic FTA; 25 (23%)

Dynamic BN; 23 (21%)

Stac BN; 15 (14%)

Fig. 3. Percentage distribution of  articles per dependability modelling method.

The Petri-net simulates dynamic system behaviour by firing token

4. Bayesian inferencing approaches

continuously from a set of   input places (P 1 ), ), through transitions (T )   to the output places (P), and the success of  firing the tokens is based on a set of  enabling rules representing the modelled dependencies. For maintenance decision making, Signoret et al. [76] propose a methodology which modularizes large Petri-net structures through a formalism which embeds Reliability Block Diagrams (RBD), a well-known reliability assessment tool. Song et al. [77] propose a formalism which com- bines stochastic fault trees and Petri-net models, and useful for diagnosing faults of  pantograph systems. Flammini et al. [37] also propose a combined formalism synthesizing generalized Stochastic Petri-nets, fault trees, and repairable fault trees. Their formalism is also applied for modelling failure dependencies of  train control systems, and evaluating alternative preventive maintenance policies which mitigate component degradation. Additional studies discussing Stochastic Petri-net formalisms may be found in articles, e.g. [47,78–82].   Stochastic Petri-net modelling formalisms, however, have one notable limitation – they rely on a simulation approach, which is computationally intensive when modelling rare failure events [83] . In such cases, the Petri-net models often underestimates occurrence probabilities of  modelled failure events, hence yielding sub-optimal maintenance strategies. Fig. 3 presents an overview of  the reviewed dependability methods as per percentage distribution.

The Bayesian inferencing framework models quantitative reliability information via likelihood functions, while on the other, epistemic uncertainty is inferred from prior distribution functions, the latter, elicited from domain experts [85].  Both the likelihood and prior functions are combined in the Bayesian inference framework, from which, the probability of  asset failure is inferred from the posterior distribution. Hence, the posterior distribution provides a means of  updating risk metrics with the availability of  new evidences of  failure events. The Bayes theorem is illustrated as shown in Eq. (7):    (  ∕ )  = ∞

 (  )  ∕ ( )

∫ =0  ( ∕  ) (  )  

(7)

Where  ( ) represents the prior distribution function; l(  x /  ) the likelihood function, and  ( / ) the posterior distribution function. x    How- ever, the posterior distribution is often computationally intensive to resolve. Hence, several methods are proposed for resolving such posterior distribution functions, and which are also embedded in Bayesian network modelling formalisms [86] : (i) Analytical approximation method which includes, the numerical integration and Laplace approximation methods, (ii) Data augmentation methods which includes the Expectation- Maximization (E-M) algorithm, (iii) Monte Carlo direct sampling, (iv) Markov chain Monte Carlo including the Metropolis-Hastings algorithm (M-H) and the Gibbs sampling approaches.

3. Quantifying uncertainty in the risk assessment methods

Depending on the approach for modelling failure dependencies, uncertainties associated with the risk assessment process may be treated as either, aleatory or epistemic [84].  The aleatory uncertainty results from the inherent randomness of  input model parameters derived from reliability data, while on the other hand, epistemic uncertainty may re- sult from insufficient reliability data. Quantifying epistemic uncertainty relies on expert domain knowledge. For treating aleatory uncertainty,

The analytical approximation approach resolves posterior distribution functions via a data sampling approach based on a simulation framework, e.g. Monte Carlo simulation. This sampling approach draws samples from probability density functions of  the modelled fail-

statistical failure models are often used, while quantifying epistemic uncertainty relies on models such as Interval Analysis, Fuzzy functions and Belief  functions are applied [84] . For Bayesian networks, uncertainty associated with sparse reliability data is treated through a Bayesian inferencing framework discussed next.

ure events. Thereafter, uncertainties associated with the sampled data are propagated through an appropriate mathematical model, e.g. the Bayes equation, from which the posterior distribution is resolved [87].   Within Bayesian network modelling formalism, the analytical approximation approach is reported in studies, e.g. [88,89].  More recently,

4.1.  Analytical approximation approach

69

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

Wang et al. [90] applies a Monte Carlo simulation approach within a Bayesian network modelling formalism for assessing the reliability of  railway turnout systems exposed to weather-related elements, from which, optimal maintenance intervention strategies are formulated. However, the simulation sampling approach has one important drawback – assumes the existence of  a closed-form posterior distribution from which samples are drawn. This is, however, not the case, especially where the prior and likelihood functions belong to different families of  distributions. This makes the posterior distribution function computationally intensive to resolve [91] . Moreover, the analytic approximation approach often yields poor risk estimates, especially where reliability data is sparse. Hence, alternative resolution approaches such as data augmentation are suggested.

for assessing the reliability of   components characterized with multi- state, Markov degradation processes. In their study, the Gibbs sampler is applied for resolving the posterior distributions generated from the degradation processes. Other studies incorporating the Gibbs sampler in Bayesian network formalisms are discussed in, e.g. [104–106].   Some studies attempt to integrate the Gibbs sampler and M-H algorithms within the same modelling formalism. Examples include Soliman et al. [98],  where a combined formalism is proposed for estimating the reliability of  multi-component systems characterized with dependencies modelled via a modified Weibull posterior distribution. More recently, the sampler is also discussed for modelling the in󿬂uence of  dependencies such as stress and component strength on system reliability [107].   Zaidan et al. [108] also applies the approach for estimating the remaining useful life of  aerospace gas turbine engines. Other authors have extended the hybrid McMC resolution approach by allowing inclusion of  parametric sensitivity analysis, for instance, see [109–111].  Of  particular interest, the resolution efficiency of  McMC is extended to analyzing rare failure events. In recent years, the McMC resolution has evolved to software applications such as BUGS (Bayesian inference using Gibbs sampling) where applicability of  the approach is demonstrated for assessing asset failure risks, e.g. see [112–114] .

4.2.  Data augmentation approach

The data augmentation approach works by augmenting observed data with missing data which yields an augmented posterior density function that is computationally tractable, and more efficiently resolved. The Expectation-Maximization (E-M) algorithm is widely applied for augmenting missing reliability data, and hence estimating the lifetime distribution of   repairable systems/assets. For Bayesian network formalisms, Mahmoud and Khalid [92] apply the approach for augmenting censored fault data of  electro-hydraulic rotational drive systems. Zhang et al. [93] also apply the method within a dynamic Bayesian network formalism for estimating the remaining useful life (RUL) of  systems characterized with complex failure dependencies, where the in󿬂uence of  a condition-based maintenance strategy is considered for degrading components. Zhang and Dong [94] also apply the approach within a dynamic Bayesian network formalism where they incorporate a Gaussian model for augmenting missing failure data. More recently, Ratnapinda and Druzdzel [95] incorporate the E-M augmentation approach within Bayesian networks, and consider an application scenario where continuous data streams are used to augment sparse reliability data. Other studies where the E-M method is embedded in Bayesian network formalisms is discussed in studies, for instance, see Bacha et al. [96] . Nonetheless, despite its usefulness for augmenting sparse reliability data, the E-M is constrained for modelling dependencies where the prior and likelihood functions belong to different families of  distributions [97] . Part of  this constraint is addressed by the Markov chain Monte Carlo method.

5. Methods for quantifying epistemic uncertainty

Although the Bayesian inferencing framework is useful for combining evidences, both quantitative and qualitative, lack of, or insufficient reliability data may necessitate alternative methods for quantifying epistemic uncertainty. Such methods would allow expert elicitation to be considered in dependability modelling formalisms. Examples of  methods for quantifying epistemic uncertainty include; (1) Theory of  Fuzzy sets; (2) Interval Analysis; and (3) the Dempster–Shafer Theory of  Evi- dence [115].   5.1.  Fuzzy  approach  for  quantifying  uncertainty

The fuzzy set concept was first suggested for modelling vague and imprecise information through membership functions, where the function specify a degree of  belonging in the continuous interval [116].   Ideally, a function of  ‘0 ’ implies no membership, while conversely, a function of  ‘1 ’ implies full membership in the continuous interval. The fuzzy concept is applied within fault tree modelling formalisms, e.g. in Purba et al. [117] , for assessing the probability of  failure of  basic events of  a nuclear power plant facility. In the study, modelling the basic events relied on fuzzy functions elicited from domain experts. The embedded-

4.3.  Markov  chain  Monte Carlo

The Markov chain Monte Carlo (McMC) approach works by simulating Markov chains within a given parameter space where the chains are constructed in such a way that the posterior distribution function converges to an asymptotic distribution. From this convergence, posterior statistical parameters (e.g. mean, standard deviation) are approx- imated from ergodic averages of  the Markov chains [98–100].  A pri- mary advantage of  the McMC compared to conventional Monte Carlo sampling approach, is its ability to estimate posterior distribution parameters for complex mathematical models having a large number of   parametric values, and belonging to different distribution families [98].   This is in addition to enhancing the suitability of  the method for dy- namically updating risk metrics with emergence of  new evidences of   failure events. Commonly applied McMC inferencing methods include the Metropolis-Hastings (M-H) algorithm and Gibbs sampler [101] . The latter is a rejection-sampling algorithm that generates a sequence of   samples from any complicated probability density function. In the context of   risk and reliability analysis, the Gibbs sampler

ness of fuzzy concept within . static fault tree formalisms is also discussed in studies, e.g. [118,119] For dynamic fault trees, the fuzzy concept is discussed in Tu et al. [30] where the concept is applied for quantifying uncertainties associated with sparse failure information of  critical avionic systems. Kabir et al. [120] also incorporate the concept while assessing the reliability of   fuel distribution system of  marine ships. More recently, a fuzzy fault tree analysis modelling formalism is discussed in Yazdi et al. [121] where importantly, the formalism is applied for analysing failure risks associated with common cause failures. Assessing such risks is often challenging owing to sparse fault information. The concept is embedded in dynamic fault tree formalisms as discussed in studies, e.g. [122,123].  Recent attempts are also seen in the literature where some authors integrate the fuzzy concept to Bayesian network modelling formalisms, for instance, in He et al. [124] where fuzzy functions are assigned to failure probability estimates of  complex systems characterized with multi-state failures.

method is embedded in Bayesian network formalisms. For instance, Lin et al. [102] proposes a Gibbs sampler-based approach for estimating the service lifetime distributions of  locomotive wheels. Their approach considers factors such as wheel installation positioning, a factor in󿬂uencing wheel wear, and maintenance. Liu et al. [103] applied the method

5.2.  Interval analysis

In interval analysis, the uncertain and imprecise parameters of  interest are assumed to lie within the lower and upper interval bounds . Compared to the fuzzy approach where fuzzy membership [115] 70

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

Petrochemical facilities

26%

Industrial/manufacturing systems

21%

Nucleur power generation/research generation/research

19%

Railway systems/Marine applications

15%

Food/paper/process industries

11%

Electronics/telecommunications

8%

Fig. 4. Distribution of  articles as per application domain.

functions are specified, in the interval analysis, domain experts assign crisp lower and upper bound values to the uncertainty range the parameters are  judged judged to lie within [115].  For example, the failure probability of  a wind turbine gearbox may be specified as lying within the lower and upper bounds .  The interval analysis allows to 1 × 10−2 estimates from several experts to be combined within a probabilistic framework described by the interval functions [125].  Although the interval analysis is demonstrated to work well within Bayesian network formalisms, especially where reliability data is sparse, the analysis lacks a concise mathematical structure or density function through which uncertainty can be propagated [115,126].  To overcome this 󿬂aw, two algorithms are suggested in the literature, and based on, (1) simulation methods; and (2) surrogate models [115] . For reliability analysis and probabilistic safety assessment, the interval analysis method is gaining attention in the field of  uncertainty quantification (UQ). In UQ, aleatory and epistemic uncertainties are analysed through separate second-order distribution functions. This sep- aration approach is suggested as useful for assessing the reliability of   complex, high reliability safety-critical systems, e.g. aerospace systems [127–129] . Within dependability modelling formalisms, the UQ separa- tion approach is discussed recently in Novack et al. [130] for quantifying the epistemic uncertainty of  basic failure events of  space launch vehicles. Fig. 4 depicts the distribution of  reviewed dependability approaches as per the application domain.

towards the plausibility function (upper bound) while weak evidence would suggest the contrary, i.e. a tendency towards the belief  function (lower bound). In the literature, the DSTE is discussed in Eldred et al. [115] where the authors use computational experiments to compare the DSTE and the Interval Valued Probability (IVP) methods. The IVP segregates aleatory and epistemic uncertainties, and allows nested operations to be performed [115] . Based on the experiments, the authors conclude that although the DSTE and IVP approach produce comparable results, the DSTE is sensitive to the number of  input variables. As such, the computational effort increases in tandem with the number of  input variables. Helton and Johnson [126] also compare the DSTE, the Interval Analysis and the Fuzzy methodology and conclude that the DSTE is rather attractive in that; (i) it allows inclusion of  more information compared to the Interval Analysis; and (ii) it requires fewer assumptions for specifying input uncertainties as compared to both, the Interval Analysis and the Fuzzy set concept. Although not applied within a dependability modelling formalism, Ding et al. [133] demonstrate how the DSTE may be applied for assessing the reliability of  early fire detection systems by aggregating multi- sensor information, e.g. smoke and light sensor information. For technical systems, Agaram [134] reviews recent applications of  DSTE approaches which embeds concepts of  information fusion for reliability analysis, and fault diagnosis in the automotive industry. Notably, the review highlights usefulness of  the DSTE approach for early fault detection through combining multiple sources of  evidences, including sensor data, e.g. vibration, or ultrasound, and expert information on potential failure events. For dynamic fault tree analysis, Duan et al. [135] integrates an evidential information network in which, component failure rates are expressed through interval number estimates, with epistemic uncertainties associated with the sparse failure data modelled via the DSTE concept. Inclusion of  interval valued probabilities to fault tree modelling formalism is also discussed in Toppila and Salo [136],  where the authors cau- tion of  challenges upscaling such formalisms for assessing the reliability of  complex dependable systems. Zhang et al. [137] demonstrated how

5.3.  Dempster–Shafer  evidence theory

The Dempster–Shafer Theory of  Evidence (DSTE) is founded on two ideals; (i) obtaining degree of  beliefs for subjective probability estimates and; (ii) combining the degree of  beliefs within a probabilistic framework [131] . The DSTE provides an efficient framework for aggregating information from multiple sources, both qualitative and quantitative, where this aggregation is achieved through the Dempster’s combina- tion rules [132] . In DSTE, estimates of  the risk metrics of  interest are bounded within the belief  (lower bound) and plausibility functions (upper bound), expressed by the Equation [131] : Bel ( )  ≤   ( )  ≤    ( )

linguistic information may be incorporated into an evidential network which is based on the DSTE method and a Bayesian network formalism. Flage et al. [138] also apply an approach which synthesizes the DSTE and the fuzzy concept within a fault tree modelling formalism, for quantifying epistemic uncertainty of  basic failure events of  general sys-

(8)

The exact position where the metric (e.g. probability of  failure) lies depends on the degree of  evidence or information available at the time of   analysis [131] . Hence, strong evidence would suggest a tendency 71

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

Table 1 Overview of  methods for quantifying uncertainty in dependability modelling.

Methods

Literature

No. of  articles

Bayesian inference approaches

[87–91] [92–96] [111,89,114,112,110,98–103,171–175] [54,117,120,30,123,124,176–179] [125,127–130,139,180] [133–135,129,140–146,155,139,137,181]

5(8%) 6 (9%) 16 (24%) 12 (18%) 9(14%) 18 (27%)

Analytic approximation Data augmentation Markov chain Monte Carlo Approaches for quantifying epistemic uncertainty Theory of  fuzzy sets Interval analysis Dempster-Shafer theory of  belief

tems. More recently, Giuseppe et al. [139] apply an approach which also synthesizes the DSTE and Interval-Valued Probability estimates elicited from domain experts where similarly, their combined formalism is embedded in a fault tree modelling formalism and applied for assessing the reliability of  systems with different configurations, i.e. parallel or series. In Bayesian network formalisms, authors such as Kabir et al. [140] apply the DSTE for fusing censored failure data with expert estimates where their approach is applied for assessing the reliability of   technical components of  a water distribution system. Within Bayesian network modelling formalism, DSTE is also discussed in [129,141–146].   Table 1 summarizes the main methods for treating uncertainty in dependability modelling approaches discussed in this review.

This challenge is particularly apparent for static and dynamic fault trees, as discussed in Gharahasanlou et al. [16] . As regards user intuitiveness, integrated formalisms are seemingly attractive owing to the trade-off between intuitiveness, and modelling complexity, especially when temporal aspects are considered. This is where fault tree formalisms are translated to equivalent Bayesian networks, hence seems to cope better with uncertainties associated with sparse reliability data, or qualitative aspects such as operations risks, or human-related factors, as seen in studies, e.g. Dongiovanni and Iesman- tas [18].  However, it should be mentioned that despite the modelling versatility introduced by the integrated formalisms, incorporating maintenance policies within the formalisms is seemingly a challenge. Effort towards this direction is discussed for dynamic fault trees, and in particular, repairable dynamic fault trees suggested by authors e.g. Manno et al. [27].  However, repairable fault tree as discussed, excludes alternative maintenance such as optimized maintenance planning, or condition- based maintenance. This omission also extends to incorporating prog-

6. Discussion 6.1. General insights, and implications of  the review  for  research and practice

nostic information, such as inclusion of  the remaining useful life to dependability modelling formalisms. Although recent studies consider this aspect, nonetheless it is noted as an important gap which could be further explored. For static and dynamic Bayesian networks which constitute 35% of  the reviewed approaches, an important trend towards more 󿬂exible modelling formalisms is seen. Importantly, apart from incorporating temporal aspects, the Bayesian networks offers the advantage of  updating risk metrics with the emergence of  new failure information. The formalism also seems robust for incorporating qualitative information, such as human-related maintenance errors. Such human aspects are often dif- ficult to quantify, yet are important contributors to equipment failures, and accident events in safety-critical assets. Important human-related performance shaping factors contributing to maintenance-related errors includes fatigue, skill level, or not incorrect repair procedures. Al- though inclusion of  human factor aspects in Bayesian network modelling formalisms are discussed, this is seemingly limited to safety and accident analysis, for instance, as discussed in Akhtar and Utne [149] and Calviño, Grande [150].   However, one important challenge of  incorporating human factors within dependability modelling formalisms is the difficulty quantifying the probability of  errors linked to performance shaping factors associated with human errors. Quantifying such errors requires use of  scenario analysis where propagation of  human errors to potential maintenance errors is evaluated. Bayesian network formalisms are limited in this regard. Noroozi et al. [151] proposes an alternative approach where Event trees are applied for scenario analysis, and quantifying the impact of  human errors on equipment maintenance. From the review, the important role of  Bayesian network formalisms for rare event analysis is also discussed. In particular, the data augmentation approach seems attractive for decision support in maintenance since, often, availability of  sufficient data for failure modelling is an important challenge. This is especially the case for high reliability and safety-critical systems depicted in Fig. 4.  A trend towards this direction is discussed in studies, for instance, in [152,153].  An important concern, however, for rare event analysis relates to validation concerns for formalisms integrating such analysis. This is an important challenge neccesatitating future work in this direction.

This review offers important insights for decision support in risk assessment, and more specifically, dependability analysis in maintenance decision making. In particular, such insights could assist risk analysts and maintenance practitioners assess equipment failure risks more ro- bustly, and consequently, formulate effective maintenance strategies that mitigate the effects of  equipment failures. As depicted in Fig. 4,   performing risk assessment is especially an important consideration for formulating maintenance strategies for safety-critical systems such as nuclear power generation facilities, railway systems, and chemical process facilities. For such facilities, sub-optimal risk assessment may re- sult in failure events leading to catastrophic accidents, for instance, the Bhopal disaster, or recently, the Deepwater horizon spill event in the Gulf   of   Mexico [147,148] . By structuring knowledge on dependability modelling, risk assessment, and maintenance decision making, it is expected that risk analysts and maintenance practitioners will better

assess methods. the relevance, and applicability of different dependability modelling From the review, significant research is seemingly directed towards more versatile dependability modelling methods such as dynamic fault trees, dynamic Bayesian networks, hybrid fault trees/Bayesian networks, and stochastic Petri-nets which overall, accounts for 69 % of   the reviewed methods (see Fig. 3 ). Nonetheless, static dependability approaches such as fault trees, and Bayesian networks constitute a notice- able proportion of  the reviewed approaches, which may be attributed to the intuitiveness of  the methods by analysts and practitioners (31% of   reviewed methods). This contrasts to dynamic dependability modelling methods where equipment failure probabilities are primarily resolved through Markov models, and Monte Carlo simulation approaches. However, apart from ignoring temporal aspects, the static fault tree is still limited to the extent to which basic failure events are modelled through varying empirically derived distribution functions, for instance, Weibull or Lognormal functions. Largely, in the reviewed methods, basic events are assumed as exponentially distributed, an assumption considered for modelling simplicity. Although empirically derived distributions would ideally mimic failure models expected in real-life, incorporating such distributions within the reviewed formalisms is not straightforward, and presents additional resolution complexities of  the methods. 72

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

To address some of  the validation concerns for rare failure events, the Markov chain Monte Carlo (McMC) simulation approach is discussed where apart from efficiently resolving complex posterior distributions, the approach addresses validity concerns for sparse data sets. This is achieved partly through computing the Deviance Information Crite- rion (DIC) which is embedded in software applications such as BUGS (Bayesian Inference Using Gibbs Sampler). Although a useful resolution approach for posterior distributions, and addressing model validity concerns, its usage is limited to fairly simple systems with straightforward dependencies. Extending the McMC for modelling more complex dependencies such as maintenance policies, and human-related maintenance errors is an interesting direction for future work. Although demonstrated as applicable for rare event analysis, Stochastic Petri-net applies enabling rules within a simulation modelling framework which also introduces model validity concerns, for instance, as discussed in Paolieri et al. [154].  Moreover, the formalism may not be intuitive to maintenance practitioners as compared to methods such as fault trees, or Bayesian network formalisms, hence its seemingly low proportion as compared to other reviewed dependability modelling approaches. For quantifying epistemic uncertainty, integrating fuzzy and DSTE concepts within dependability modelling formalisms, such as the static and dynamic fault trees is an interesting observation (45% of  uncertainty quantification methods, see Table 1 ). This is because, in absence of  sufficient data for modelling basic failure events, eliciting fuzzy estimates from domain experts is an intuitive approach for addressing data availability challenges. However, the fuzzy concept raises model validity concerns which is partly addressed by Bayesian updating. The DSTE method also provides a useful platform for augmenting sparse reliability information with expert estimates, for instance, discussed in studies, e.g. Khalaj et al. [155] , and Flage et al. [138] . In particular, the DSTE integrates a useful data fusion framework which allows synthesis of  maintenance-related information from multiple sources, e.g. condition monitoring sensor data such as vibration and ultrasound. The fusion further extends to integrating information elicited from domain experts within the modelling formalism.

Other plausible approaches may include alternative formalisms, such as use of  dynamic event trees for instances where information on fault incidences modelled via fault trees is limited. This approach is discussed in Ibánez et al. [158] where they argue that the DET formalism avoids the need for exploring all potential system failure configurations or dependencies. A similar trend towards using the DET modelling formalism is also seen in Karanki et al. [159] where uncertainties associated with stochastic failure probabilities and modelling parameters are incorporated within DET’s. For integrating sparse information to dependability formalisms, information fusion architectures are suggested. For instance, Guo et al. [160] propose an approach where information from both expert and data sources are integrated via a Bayesian inferencing framework. Their approach importantly uses linear and geometric pooling methods, hence allowing importance weights to be assigned to the prior failure information. This diversifies the characteristics of  possible prior that may be integrated in the Bayesian inferencing framework. A Naives Bayes approach for handling missing or unsynchronized is also proposed recently in Dabrowski et al. [161],  and integrated in a dynamic Bayesian network modelling formalism. Hence such recent formalisms indicate an interesting trend towards more data driven dependability modelling approaches. For rare failure analysis, a notable constraint is the reliance on both numerical reliability data and expert analysis, which necessitates nu- merous modelling assumptions for augmenting subjective estimates. To mitigate the impact of  such assumptions, authors such as Khorsandi and Aven [162] propose inclusion of  the ‘assumption deviation risk’ for mitigating modelling uncertainties. Inclusion of  such aspects to dependability modelling may further enhance treatment of  uncertainty, hence, an interesting area of  future work. Combined formalisms such as, such as the generalized stochastic Pertinets integrated with fault trees is demonstrated for rare failure/accident analysis. Talebberrouane, Khan [80] demonstrates that such formalisms provides more information on fault occurrences at different operational states and dependability sequences, and may consider alternative maintenance and repair strategies. A similar Petrinet/fault tree formalism is also discussed recently in Yan et al. [163] for assessing the reliability of  complex automated guided vehicle systems while considering optimal inspection and maintenance timings. Data-driven machine learning approaches, and the DSTE method also seems to provide a plausible data fusion platform. For instance, integrating methods such as the Least square Support Vector Machine (SVM) in dependability modelling is widely discussed method for diagnosing faults of  technical assets, e.g. see [164,165].

6.2. General directions  for   future work

From the above discussion, dependability modelling formalisms present interesting prospects for future research within the maintenance decision making domain. Firstly, there is need to extend the modelling 󿬂exibility of  fault tree and Bayesian network formalisms such that em-

pirical failure models are integrated in the formalisms. This deviates from the traditional assumption in dynamic fault trees where basic failure events are assumed as exponentially distributed. Incorporating such empirically derived failure models may pave way for more 󿬂exible formalisms where the reliability of  complex electromechanical systems, such as collaborative robots is more practically assessed. Often such robots systems constitute components exhibiting varying failure mechanisms, such as random failures (electronic components) or Weibull or Gamma distributed failures (mechanical systems) [156].   Secondly, mapping failure dependencies objectively in the formalisms discussed in this review is challenging. Often, the failure dependencies are mapped qualitatively, either based on expert knowledge on associations between failure mechanisms, or based on the system configuration. The latter considers how components are interconnected, and presumes that failure dependencies are aligned to the system configuration. Data exploration methods combined with data fusion approaches may provide a plausible platform for objectively mapping dependencies

Thirdly, the combinatorial explosion problem remains an important challenge for upscaling graphical-oriented dependability methods discussed in this review, i.e. fault trees, stochastic Petri-net, and Bayesian network. This is especially a challenge for modelling systems with complex dependencies owing to multiple interconnected components exhibiting varying failure mechanics. Although object-oriented modelling approaches try to address this concern by modularizing complex dependability formalisms, the decomposition limits the extent to which reliability, and maintenance-related aspects are integrated into such formalisms. Invariably, this limits the robustness of  the risk assessment process, and maintenance decision making, the latter linked to selecting optimal maintenance strategies. Hence, exploring more efficient decomposition schemes forms an interesting prospect for future work. In addition to decomposition schemes, application of  more efficient algorithms for reducing storage necessary for constructing modular schemes such as Bayesian network may assist upscale dependability models. Recent work in this direction is discussed in Tien and Der Kiureghian [166].

between failure events, for instance, discussed in Chemweno et al. [3].   In particular, information fusion may allow synthesizing data from systems of  similar configuration or design. This approach is discussed in Raz et al. [157] where Information Fusion System architecture is suggested.

Alternative integrated formalisms may also allow upscaling of  dependability models, and overcome the challenge of   traditional approaches which so far focus on simple systems with limited dependencies. Recent application of  continuous-time Markov chain seems promis- ing in this regard, for instance, proposed in Liang et al. [167].  Func- 73

P. Chemweno et  al.

Reliability   Engineering  and  System  Safety  173 (2018) 64–77

[177] Liu H-C , You J-X , Duan C-Y . An integrated approach for failure mode and effect analysis under interval-valued intuitionistic fuzzy environment. Int J Prod Econ 2017 In Press . [178] Duan R , Fan J . Dynamic diagnosis strategy for redundant systems based on reliability analysis and sensors under epistemic uncertainty. J Sens 2015;1–14 . [179] Abdo H , Flaus J . Monte Carlo simulation to solve fuzzy dynamic fault tree. IFAC– PapersOnLine 2016;49:1886–91 . [180] Eldred MS,  Swiler LP,  Tang G.  Mixed aleatory-epistemic uncertainty quantification with stochastic expansions and optimization-based interval estimation. Reliab Eng Syst Saf  2011;96:1092–113 . [181] Helton JC , Johnson JD . Quantification of  margins and uncertainties: alternative representations of  epistemic uncertainty. Reliab Eng Syst Saf  2011;96:1034–52.

. The Application of  Bayesian Networks in System Reliability. Arizona State [172] Zhou D University; 2014 . [173] Roy A , Srivastava P , Sinha S . Risk and reliability assessment in chemical process industries using Bayesian methods. Rev Chem Eng 2014;30:479–99 . [174] Vergé C , Morio J , Del Moral P . An island particle algorithm for rare event analysis. Reliab Eng Syst Saf  2016;149:63–75 . [175] Pan Z , Balakrishnan N . Reliability modeling of  degradation of  products with multiple performance characteristics based on gamma processes. Reliab Eng Syst Saf   2011;96:949–57 . [176] Jee TL , Tay KM , Lim CP . A new two-stage fuzzy inference system-based approach to prioritize failures in failure mode and effect analysis. IEEE Trans Reliab 2015;64:869–77 .

77

(2018-Chemweno) - Risk Assessment Methodologies in Maintenance Decision Making A Review

Short Description

Description

Comments

We need your help!