Download (2018-Chemweno) - Risk Assessment Methodologies in Maintenance Decision Making A Review...
Reliability Engineering and System Safety 173 (2018) 64–77
Contents lists available at ScienceDirect
Reliability Engineering and System Safety journal homepage: www.elsevier.com/locate/ress
Risk assessment methodologies in maintenance decision making A review of dependability modelling approaches
Risk assessment methodologies in maintenance decision making: A review of dependability modelling approaches Peter Chemweno a , ∗, Liliane Pintelon a, Peter Nganga Muchiri b, Adriaan Van Horenbeek a a Center for Industrial Management, KU Leuven, Celestijnenlaan 300A, BE-3001 Heverlee, Belgium
b School of Engineering, Dedan Kimathi University of Technology, P.O. Box 657-10100, Nyeri, Kenya
Keywords: Asset failure Risk assessment Dependability modelling Uncertainty Maintenance decision making
The risk assessment process performs an important role in maintenance decision making, through structuring the process of identifying, prioritizing, and thereafter formulating effective maintenance strategies. However, the effectiveness of the implemented strategies is inuenced by the extent to which asset failure dependencies are taken into account during the risk assessment process. In the literature, several risk assessment methods are discussed that vary widely depending on factors such as modelling of failure dependencies in dynamic assets, and treating uncertainties associated with sparse reliability data. These factors invariably inuence the extent to which different risk assessment methods are applicable for maintenance decision making. This article reviews the state-of-the-art knowledge on risk assessment in the context of maintenance decision making, with a particular focus on dependability modelling methods. The review structures knowledge on dependability modelling ap- proaches, treatment of uncertainty, and highlights important challenges researchers and practitioners are likely to experience when performing risk assessment in the context of maintenance decision making. The challenges highlighted include the resolution complexity of methods such as Bayesian networks, especially while assessing risks of assets with complex failure dependencies. © 2018 Elsevier Ltd. All rights reserved.
1. Introduction
In recent years, a wide range of methods have been developed and applied for assessing risks and safety hazards in diverse sectors such as process industries, or power plant facilities [1]. In the maintenance deci- sion making domain, risk assessment is performed with a view of assist- ing practitioners systematically identify, analyse, evaluate, and mitigate
failure in assets [2,3] . Among most commonly applied methods in this risks context Failurethe (FMEA), include the Mode and Effect Analysis Fault Tree Analysis (FTA) and Bayesian network (BN). Of these, the FMEA is widely used for prioritizing equipment failures and selecting ap- propriate maintenance strategies [4]. However, the FMEA is associated with important deficiencies, and in particular, the conventional form of the risk priority number (RPN), an important metric for quantifying asset failure risk [5,6] . In addition, the FMEA ignores failure dependen- cies in assets, which in turn, negatively inuences the risk assessment process [5] . In the literature, several state-of-the-art reviews of risk assessment methods are presented. Examples includes Li [7] where methods such as Markov models and Monte Carlo simulation are discussed in the con-
text of assessing risks of failure of power utility systems. The reviewed methods, however, insufficiently addressed dependability modelling as- pects. In the context of maintenance decision making, Fraser et al. [8] reviewed methods for assessing equipment failure risks and useful for deriving maintenance decisions. Notably the methods are evaluated considering two maintenance concepts; Risk based Maintenance (RBM) and the Reliability Centered Maintenance (RCM). The RCM embeds the
FMEA On which as mentioned, failure dependency as- pects. the other hand, theignores RBM approach embeds faultmodelling trees, which although models asset failure dependencies, ignores temporal aspects that are crucial for effective risk assessment, and optimal maintenance planning. More recently, Aven [9] reviews trends and advances of risk assessment methods where he evaluates foundational challenges asso- ciated with applicability of different methods for decision making. This includes aspects such as treatment of uncertainty, however, failure de- pendability modelling aspects are not explicitly addressed in the review. Smith [10] also reviews methods applicable for quantifying risks of op- erable assets characterized with sub-optimal reliability and availability. Examples of methods reviewed includes Hazard and Operability Anal- ysis (HAZOP), and the Fault Tree Analysis (FTA). However, suitability
∗
Corresponding author. E-mail addresses:
[email protected] (P. Chemweno),
[email protected] (L. Pintelon),
[email protected] (P.N. Muchiri),
[email protected] (A. Van Horenbeek). https://doi.org/10.1016/j.ress.2018.01.011 Received 30 June 2016; Received in revised form 6 January 2018; Accepted 20 January 2018 Available online 6 February 2018 0951-8320/© 2018 Elsevier Ltd. All rights reserved.
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
in maintenance decision making, is associated with availability and sufficiency of maintenance data. Fig. 1 illustrates the organization of this review. Section 2 reviews dependability modelling concepts where methods such as Fault trees, Bayesian networks, and Stochastic Petri- nets are evaluated. Section 3 reviews concepts for treating aleatory and epistemic uncertainty while Section 4 reviews different Bayesian infer- encing methods associated with Bayesian networks. Examples here in- clude methods such as analytic approximation, data augmentation, and Markov chain Monte Carlo simulation. Section 5 reviews methods for quantifying epistemic uncertainties in the context of dependability mod- elling where methods such as Fuzzy theory, Interval analysis, and the Dempster-Shafer Theory of Belief (DSTE) are discussed. Section 6 dis- cusses the implications of the review for theory and practice, and further points out directions for future research. Section 7 draws important con- clusions.
Abbreviations
AHP ANP AND BE BN BUGS CBM CMMS
Analytic Hierarchy Process Analytic Network Process AND gate for the static fault tree Basic Event Bayesian Networks Bayesian Inference Using Gibbs Sampling Condition Based Maintenance Computerized Maintenance Management System
DAG Dynamic BN DIC DSTE E-M FMEA FTA HAZOP IVP McMC MCDM M-H OR PAND RBD RBIM RCA RCM RPN SPARE SPN TE VOTING
Directed Acyclic Graph Dynamic Bayesian Network Deviance Information Criterion Dempster-Shafer Theory of Evidence Expectation-Maximization Algorithm Failure Mode and Effect Analysis Fault Tree Analysis Hazard and Operability Analysis Interval-Valued Probability Markov Chain Monte Carlo Multi-Criteria Decision Making Metropolis-Hastings Algorithm OR gate for the static fault tree Priority AND Gate Reliability Block Diagrams Risk-Based Inspection and Maintenance Root Cause Analysis Reliability Centered Maintenance Risk Priority Number SPARE gate for the dynamic fault tree Stochastic Petri-net Top Event VOTING gate for the dynamic fault tree
2. Dependability modelling in risk assessment
Technical assets are usually characterized by complex dependen- cies between system components, which in turn, inuences the extent to which asset failure risks are assessed, and maintenance decisions reached [14] . In absence of system dependencies, the risk assessment problem reduces a single component analysis where failure events are assumed as independent. For complex systems dependencies, Weber et al. [15] suggest that dependability modelling should consider the fol- lowing aspects: • •
•
•
Complexity and system size, Inclusion of temporal aspects and failure propagation in specific time instances, Inclusion of empirical and/or qualitative knowledge on failure events at different abstraction levels. Inclusion of failure dependencies and treating uncertainties related to data availability, and estimation of model parameters.
Weber et al. [15] further describe several examples of dependability- modelling methods which includes among others:
of these methods for failure dependability modelling, and maintenance decision support is not sufficiently addressed. Modarres, Zhou et al. [11] evaluates advances in probabilistic risk assessment of safety-critical installations, where the importance of methods such as fault trees and Bayesian belief networks are highlighted for modelling failure depen- dencies. Similarly, suitability of the reviewed approaches for mainte- nance decision making is not clearly addressed. A review of fault tree analysis and its application for modelling failure dependencies in com- plex assets is presented in Kabir [12], likewise, applicability for main- tenance decision making is not clearly discussed. Evaluating the above reviews highlights several limitations or gaps which motivates this review article. Firstly, the reviews tend to focus on specific application contexts such as safety or risk assessment in pro- cess industries. However, since risks are domain specific, application of specific risk assessment methods varies depending on the application context [13] . For instance, risks in civil engineering structures such as bridge collapse are rare and periodic, unlike technical failures of me- chanical systems, which occurs more frequently over the operational lifetime of the equipment, e.g. bearing wear. Secondly, the reviews in- sufficiently evaluates the suitability of the reviewed risk assessment methods for failure dependability modelling, especially in the context of maintenance decision making. The decision making aspects may in- clude aiding root cause analysis, or selecting appropriate maintenance strategies. Hence, this article attempts to bridge the aforementioned gaps by reviewing risk assessment methods discussed in the literature, while fo- cusing on their applicability for maintenance decision support in view of modelling failure dependencies in assets. The review also evaluates how the methods address aspects such as treatment of uncertainty, which
• •
• •
Fault trees, further classified into Static and Dynamic fault trees; Bayesian networks, classified into Static and Dynamic Bayesian net- works; Combined Fault trees and Bayesian network models, and Stochastic Petri-nets
The following sections reviews the suitability of the above mentioned methods for assessing asset failure risks in the context of dependability modelling and maintenance decision support. 2.1. Fault trees
Primarily, the fault tree models failure dependencies in a hierarchical form, with a top failure event (TE) at the system level, intermediate failure events (IE) at the sub-system levels, and basic failure events (BE) at the component level. The dependencies are modelled through logical AND OR gates. Assuming failure events as statistically independent, the probability of occurrence of the TE modelled through the AND gate is expressed as follows:
∏
( ) =
(1)
= =1 1
The OR gate, on the other hand, presumes occurrence of two or more failure events prior to observing the TE. The probability of occurrence of the TE is hence expressed as the sum of input probabilities of inde- pendent BE denoted as:
∑ )=
(
= =1 1
65
(2)
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
Probability theory Static fault trees
Fuzzy theory
Dynamic fault trees
Interval analysis
Fault trees
s e h c a o r p p a y t i l i b a d n e p e D
Dempster Shafer theory of belief
Hybrid fault tree and Bayesian network
Probability theory Analytic approximation approach
Static Bayesian network Bayesian networks Dynamic Bayesian network
Data augmentation approach
T r e a t m e n t o f u n c e r t a i n t y
Markov chain Monte Carlo simulation approach Simulation approach
Stochastic Petri-nets Fig. 1. Framework for the review.
•
Depending on the inclusion of temporal aspects, the gates may be static or dynamic. In maintenance decision making, the static fault tree is embedded in the risk based maintenance concept where several exam- ples are discussed in the literature, for instance, see [16–18]. Authors, for instance, Wu [19] propose a formalism which integrates well-known methods such as the FMEA for modelling failure events. Such integrated formalism are rather intuitive to users since resolving equipment failure probabilities is computationally feasible as compared to dependability modelling methods discussed in latter sections of this review. Bhangu et al. [20] propose a static fault tree formalism for assessing the relia- bility and failure risks of a thermal power plant installation where their approach relies on fault data and associated outage hours. Their study suggests alternative maintenance policies for optimizing power plant availability. Choi and Chang [21] also apply the fault tree formalism for assessing the reliability of seabed storage tanks where their approach re- lies on reliability data for modelling basic fault events. They also suggest alternative repair strategies for optimizing system availability. Taheriy- oun and Moradinejad [22] integrate a Monte Carlo simulation approach to a fault tree formalism and apply the approach for modelling failure dependencies of water treatment equipment. Their approach considers human factor aspects as contributors to top event failures. McNelles et al. [23] compare static fault tree formalisms with the dynamic ow graph formalism, the latter, for modelling temporal dependencies. They highlight the challenge of resolving cut-sets for static fault trees, espe- cially for systems characterized with dynamic time steps. Furthermore, to cope with sparse reliability data, which is often an important pre-requisite for modelling static dependencies in technical assets, static fuzzy fault trees are suggested, and described in several application cases, and discussed in more detail in Section 5. Nonetheless, although considered intuitive for modelling failure de- pendencies in technical assets, in the static form, the fault trees are asso- ciated with important deficiencies that are primarily linked to inclusion of temporal aspects inherent in dynamic systems. For this reason, dy- namic fault trees are proposed where dynamic gates are incorporated. In the literature, different dynamic logical gates are proposed [24]: •
•
•
Functional dependency (FD) gate which models instances where the
trigger failure event simultaneously leads to failure of dependent systems; SPARE gate which models the failure events of redundant compo- nents; VOTING gate, which models a failure instance where at least k out on n dependent components/events occur.
The use of dynamic fault trees for maintenance decision support is discussed in the literature. Notably, Ge and Yang [25] propose a modelling formalism based on dynamic binary decision trees where their methodology adapts the Shannon’s decomposition theorem, which scales down the number of disjoint calculable cut sets, efficiently re- solving dynamic gates. Wang et al. [26] propose a dynamic fault tree formalism for assessing the reliability of non-repairable systems. Their formalism considers the impact of probabilistic failure dependencies on
critical et al. [27] introduce a novel for- malism,system they define asManno which components. the Adaptive Transitions Systems. Their proposed formalism embeds efficient semantics for modelling failure de- pendencies of repairable systems. More recently, Chiacchio et al. [28] propose a dynamic fault tree formalism which incorporates deterministic and stochastic dependen- cies inuencing complex non-repairable systems. Their formalism in- corporates hybrid basic failure events, of which their failure distribu- tion evolves with time. Salehpour–Oskouei and Pourgol–Mohammad [29] propose a formalism exploiting the Priority AND gate for assess- ing the reliability of sensor components attached to equipment for col- lecting health data. Their formalism exploits a Monte Carlo simulation approach for quantifying the probability of the top event failure of a steam turbine system. For sparse reliability data, Tu et al. [30] propose a novel fuzzy dynamic tree formalism for modelling the reliability of safety-critical avionic components. Their formalism models uncertainties associated with sparse failure events, which are assigned fuzzy valued estimates. Volk et al. [31] propose a novel formalism which exploits integrated state-space reduction methods for efficiently resolving dynamic gates. Among the methods integrated in their formalism include Markov chains, which are applied for resolving the mean time to failures of com-
Priority AND (PAND) gate which models the sequence in which de-
pendent failures occur once a failure event is initiated, 66
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
plex dynamic systems. Additional formalisms apply sequential binary decision diagrams, and timed dynamic fault tree analysis, the latter, a variation of the conventional dynamic fault tree analysis are discussed in the literature, for instance, see Peng et al. [32], Ge et al. [33] and Ge et al. [34] . However, it is important to note that in the aforementioned studies, dynamic gates are resolved largely analytically, i.e. through sequence algebra or Markov models. Often, these resolution approaches are com- putationally intensive, especially for systems with complex dynamic de- pendencies. Moreover, Markov models are further associated with de- ficiencies such as; (i) the state space explosion problem, (ii) limited to modelling dynamic dependencies defined through exponential distribu- tion functions. Hence, to overcome challenges such as the state explosion problem, approximate or simulation resolution approaches are proposed, for in- stance, Monte Carlo simulation and Stochastic Petri-nets. Simeu-Abazi et al. [35] propose an approach where a modularized fault tree scheme is translated into equivalent Petri-nets, hence enhancing the modelling exibility of systems with complex dependencies, of which dynamic gates are resolved via Markov models. Codetta Raiteri [36] further ex- tend the versatility of complex systems, where they propose a framework integrating three formalisms; parametric fault tree, dynamic fault tree, and repairable fault tree. The parametric fault tree here models depen- dencies of repairable systems. Flammini et al. [37] also propose a multi- formalism modular approach, which incorporates generalized Stochas- tic Petri-nets, fault trees, and repairable fault trees. Their formalism is applied for assessing the reliability of railway signalling systems. Tu- ran et al. [38] , propose a dynamic fault tree formalism for assessing the reliability of maritime diving support vessel. Their formalism incorpo- rates time-dependent dynamic gates for modelling failure dependencies through which, appropriate maintenance and/or repair sequences are proposed. More recently, Rauzy and Blériot-Fabre [39] propose a formal- ism through which dynamic fault trees are translated into equivalent guarded transition systems, the latter, a form of generalized stochastic Petri-nets. Their formalism models dependencies of repairable systems, a challenge noted for systems modelled through dynamic fault trees. Several studies also propose efficient approaches for resolving dy- namic gates modelled through Markov models. Notably, Chiacchio et al. [40] propose a Markov-based stochastic approach which is applied for assessing the reliability of complex multi-state dynamic systems. Their formalism considers the inuence of operation and environmental con- ditions on system failure. Yevkin [41] propose an efficient Markov mod- elling approach which is applied for resolving dynamic dependencies of
y2
y 1
y3 . Fig. 2. Simplified DAG with two parent nodes (y1 and y2 ) and dependent node y 3
assessing maintenance-related risks of water supply systems. More re- cently, Nguyen et al. [47] apply a combined approach which embeds a stochastic Petri-net approximate resolution method. They apply their formalism for modelling repairable systems characterized with multi- state failure mechanisms. From the above, approximate (simulation) resolution approaches seemingly improve the computational effort necessary for resolving dy- namic gates for systems with complex failure dependencies. However, the reliance on empirical data for fault tree dependability modelling formalisms, is seemingly a challenge, especially where such data is un- available. In addition, fault tree formalisms are limited to systems with fairly simple and straightforward dependencies. This is because of the combinatorial explosion problem for systems with more complex de- pendencies. Lastly, risk metrics remain static despite emergence of new evidences, hence, more versatile modelling formalisms incorporating Bayesian updating are suggested. 2.2. Bayesian networks
The Bayesian networks models system failure dependencies by incor- porating an efficient probabilistic inferencing framework which allows inclusion of uncertainty associated with sparse reliability information [48]. Typically, the networks consists of a directed acyclic graph (DAG) which contains a set of nodes and directed arcs as depicted in Fig. 2. Each node in the graph represents random (and independent) failure events = ( 1 , 2, 3 , .. ), while the directed arcs represent probabilistic dependencies, e.g. between random failure events [49]. In the Bayesian network, the conditional probabilities between random failure events are represented through a joint probability distribution parameterized as follows:
∏
1 , 2 , 3 , ... =
= =1 1
(3)
where p ( y | parent ( y )) represents the conditional relationship between i i nodes and their parents (e.g. nodes y y 1 and y y 2 have a parent relationship
to node y 3 ). ). Applying Eq. (3) to the DAG in Fig. 2, the joint probability distribution is expressed as follows:
repairable non-repairable systems. Their such approach translates namic gatesand into equivalent Markov models that the numberdy of transition states is minimized. Merle et al. [42] propose a Monte Carlo simulation approach, which enhances the resolution efficiency of com- plex dynamic fault trees otherwise modelled through Markov models. Chiacchio et al. [43] proposed a novel Monte Carlo simulation-based tool, the MatCarloRe, for resolving the reliability of systems modelling through hierarchical dynamic fault trees, and characterized with non- repairable basic failure events. More recently, Zhu et al. [44] propose an alternative stochastic approach for modelling dependencies in dy- namic fault trees while considering system redundancies and probabilis- tic common cause failures. Their approach applies a non-Bernoulli se- quencing approach for generating input values to the stochastic model. Apart from approximate resolution approaches, several studies incor- porate both exact and approximate (or simulation) approaches within the same modelling formalism. Examples include Chiacchio et al. [45] who compares Markov models and Monte Carlo simulation ap-
1 , 2 , 3 = 1 2 3 1 , 2
(4)
The dynamic Bayesian network (DBN) extends the functionality of the static Bayesian network through the inclusion of temporal depen- dencies using sequences of time slices. The temporal transition from one time phase to the next may be represented as follows [50] :
= ∏
− −1 1
= =1 1
(5)
where expresses the it th h node at the time instances, i = 1,2,…n, and ( ) expresses the temporal dependencies of the parent nodes within the DBN. Extending the DBN to T time slices, the following joint joint probability distribution is derived [50]:
∏∏
1 = →
=1 =1
proaches for resolving dynamic gates. They conclude that the choice between the two resolution approaches is a trade-off between system complexity, and computational efficiency of the specific resolution ap- proach. Lindhe et al. [46] also apply both exact and approximate res- olution approaches within the same dynamic fault tree formalism, for
(6)
Through the joint joint probability distribution, the Bayesian network em- beds a exible formalism which allows modelling of complex dependen- cies and updating of risk metrics with emergence of new failure infor- mation. 67
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
failure and maintenance time distributions, which include, exponential, normal and lognormal distributions. Nonetheless, despite the modelling exibility of dynamic Bayesian networks, often the formalism requires high computational effort, especially for resolving the joint probabil- ity of complex system failure dependencies. This resolution complexity arises where, for instance, the marginal probabilities representing inde- pendent failure events are modelled via probability distributions belong- ing to distinct families, e.g. Weibull or Lognormal. This aspect remains a challenge, where exponential failure transition rates are often assumed for modelling dynamic failure transition, for instance as discussed in Codetta-Raiteri and Portinale [66] .
2.2.1. Static Bayesian networks
Several studies demonstrate the potential use of static Bayesian net- works for modelling failure dependencies of technical systems. In the context of dependability modelling and maintenance decision support, Ferreiro et al. [51] propose a formalism where failure dependencies of aircraft systems are modelled while incorporating prognostic in- formation. They evaluate the risk reduction potential of two mainte- nance strategies, i.e. preventive and corrective maintenance. Gran et al. [52] incorporate organizational, human and technical risk factors into their Bayesian modelling formalism and consequently evaluating appro- priate maintenance interventions, which best mitigate oil leakages of offshore facilities. Tian et al. [53] also model the failure dependencies of a subma- rine casing cutting tool using a static Bayesian network formalism, and consequently, apply their formalism for diagnosing faults of the robot- operated cutting tool. More recently, Liu et al. [54] propose an ap- proach for translating the GO-FLOW methodology into an equivalent static Bayesian network. The GO-FLOW methodology is commonly ap- plied for modelling system reliability such as, in their study, a pressur- ized water reactor. Askarian et al. [55] also apply a static Bayesian network formalism for diagnosing technical faults in a chemical plant. Abbassi et al. [56] in- tegrate Bayesian networks into a quantitative risk assessment methodol- ogy, where the methodology is applied for estimating the failure proba- bilities of accident/failure scenarios, and associated consequences. How- ever, their approach fails to consider dependencies between system fail- ure events, and moreover, ignores temporal aspects, a limitation which is addressed using dynamic Bayesian networks reviewed discussed next.
2.3. Combined fault trees and Bayesian network modelling approaches
Combined formalisms present a plausible framework for translat- ing systems modelled via fault trees to equivalent Bayesian network models. Khakzad et al. [67] propose such a formalism where systems modelled via dynamic fault tree gates are translated into equivalent dy- namic network nodes, while avoiding generation of multi-dimensional conditional probability tables representing marginal probabilities of ba- sic failure events. In Khakzad et al. [48] , they extend their work and propose a modular Object-Oriented Bayesian network (OOBN) formal- ism for modelling complex failure dependencies represented using fault trees. Their formalism decomposes complex dynamic Bayesian networks into multiple modules, each of which is resolved independently. Kabir et al. [68] propose a translation approach through which, stochastic fail- ure dependencies of complex systems modelled via dynamic fault trees are also translated to equivalent dynamic Bayesian networks, and their reliability assessed. More recently, Mi et al. [69] propose an approach which translates complex dependencies of electromechanical systems modelled through dynamic fault tree. Their approach considers epistemic uncertainty which is expressed through bounded closed intervals, and which in- corporates multiple sources of evidences, e.g. field failure data, test and design data. A similar approach integrating multiple information sources via a Bayesian inference framework is discussed in Wang et al. [70]. However, their approach does not extend to applying a Bayesian network formalism for modelling system failure dependencies. Barua et al. [71] model the sequential dependencies between, on the one hand, operation-related parameters of chemical processes, and on the other hand, aging components vulnerable to failure. The sequential de- pendencies are first modelled via dynamic fault tree and translated to equivalent dynamic Bayesian network. Darwish et al. [72] incorporate the Bayesian approach to fault trees, which allows experts assign impor- tance ranking to basic failure events. Hence, by prioritizing basic events, they consider a more optimal allocation of maintenance resources. Chen et al. [73] propose a translation approach where reliability block dia- grams commonly used for modelling and assessing system reliability of complex dependable systems, are translated to Bayesian networks (BN). In recent years, software applications supporting this translation are dis- cussed in the literature, for instance, the Reliability Analysis with Dy- namic Bayesian networks (RADYBAN) [74].
2.2.2. Dynamic Bayesian networks
The versatility of dynamic Bayesian networks is demonstrated in sev- eral studies. For instance, Cai et al. [57] modelled the failure dependen- cies of a sub-sea blowout preventer system, where they explore causal relationships between imperfect repair processes, and common cause system failures. Hu et al. [58] evaluate the inuence of an opportunis- tic predictive maintenance strategy on system failure using a modelling formalism, which integrates dynamic Bayesian networks and the Haz- ard and Operability Analysis (HAZOP). More recently, Cózar and Gámez [59] demonstrate a modelling formalism which predicts anomalies of complex dynamic systems, where the prediction forms the basis for trig- gering predictive maintenance decisions. Zhu and Collette [60] propose a Bayesian modelling formalism which they demonstrate for modelling time-dependent failure mechanisms, such as fatigue crack growth. They also consider maintenance actions, where they integrate a reliability in-
dex ( ) for triggering inspection and maintenance actions. Although applied for structural systems, the applicability of their approach for mod- elling low probability (rare failure events) is also evident for mechanical systems. A similar approach for assessing the reliability of deteriorating structural systems via a dynamic Bayesian modelling formalism is dis- cussed in Luque and Straub [61] . More recently, Li et al. [62] integrate a dynamic Bayesian formalism into the GO ow methodology for modelling feedback signals ows. The inclusion of the Bayesian network model enhances the reliability assess- ment potential of the GO ow methodology, where traditionally, depen- dencies between system components are one-directional. Ramírez and Utne [63] also propose a formalism for assessing the reliability of ageing systems while optimizing maintenance policies which include, correc- tive, condition based maintenance, and time-based maintenance strate- gies. Salazar et al. [64] also propose a modelling formalism which inte- grates both reliability and system control performance aspects. In their study, failure dependencies are modelled through a dynamic Bayesian
2.4. Stochastic Petri-nets
Stochastic Petri-net (SPN) also provides a formalism for modelling system dependencies and embeds a Petri-net structure which graphically depicts dependent systems through the tuple, = ( , , , , , , ), where [75]: • •
•
network model, which allows assessment of system reliability. They suggest a strategy through which the control effort (for system perfor- mance) is redistributed until maintenance is undertaken, hence improv- ing system availability. Liang et al. [65] proposed a formalism for as- sessing the reliability of warship systems where they consider varying
• • •
68
P = a finite set of places containing some tokens with marked places, T = a finite set of transitions, I t = A finite set of input places, Ot = A finite set of output places, H = A set of inhibitors, M 0 = the initial system marking vector whose places contain a non- negative number of tokens.
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
Hybrid FTABN; 14 (13%)
Stac FTA; 17 (16%)
Stochasc PN; 13 (12%)
Dynamic FTA; 25 (23%)
Dynamic BN; 23 (21%)
Stac BN; 15 (14%)
Fig. 3. Percentage distribution of articles per dependability modelling method.
The Petri-net simulates dynamic system behaviour by firing token
4. Bayesian inferencing approaches
continuously from a set of input places (P 1 ), ), through transitions (T ) to the output places (P), and the success of firing the tokens is based on a set of enabling rules representing the modelled dependencies. For maintenance decision making, Signoret et al. [76] propose a methodol- ogy which modularizes large Petri-net structures through a formalism which embeds Reliability Block Diagrams (RBD), a well-known reliabil- ity assessment tool. Song et al. [77] propose a formalism which com- bines stochastic fault trees and Petri-net models, and useful for diag- nosing faults of pantograph systems. Flammini et al. [37] also propose a combined formalism synthesizing generalized Stochastic Petri-nets, fault trees, and repairable fault trees. Their formalism is also applied for modelling failure dependencies of train control systems, and evalu- ating alternative preventive maintenance policies which mitigate com- ponent degradation. Additional studies discussing Stochastic Petri-net formalisms may be found in articles, e.g. [47,78–82]. Stochastic Petri-net modelling formalisms, however, have one no- table limitation – they rely on a simulation approach, which is com- putationally intensive when modelling rare failure events [83] . In such cases, the Petri-net models often underestimates occurrence probabili- ties of modelled failure events, hence yielding sub-optimal maintenance strategies. Fig. 3 presents an overview of the reviewed dependability methods as per percentage distribution.
The Bayesian inferencing framework models quantitative reliability information via likelihood functions, while on the other, epistemic un- certainty is inferred from prior distribution functions, the latter, elicited from domain experts [85]. Both the likelihood and prior functions are combined in the Bayesian inference framework, from which, the proba- bility of asset failure is inferred from the posterior distribution. Hence, the posterior distribution provides a means of updating risk metrics with the availability of new evidences of failure events. The Bayes theorem is illustrated as shown in Eq. (7): ( ∕ ) = ∞
( ) ∕ ( )
∫ =0 ( ∕ ) ( )
(7)
Where ( ) represents the prior distribution function; l( x / ) the like- lihood function, and ( / ) the posterior distribution function. x How- ever, the posterior distribution is often computationally intensive to re- solve. Hence, several methods are proposed for resolving such posterior distribution functions, and which are also embedded in Bayesian net- work modelling formalisms [86] : (i) Analytical approximation method which includes, the numerical integration and Laplace approximation methods, (ii) Data augmentation methods which includes the Expectation- Maximization (E-M) algorithm, (iii) Monte Carlo direct sampling, (iv) Markov chain Monte Carlo including the Metropolis-Hastings al- gorithm (M-H) and the Gibbs sampling approaches.
3. Quantifying uncertainty in the risk assessment methods
Depending on the approach for modelling failure dependencies, un- certainties associated with the risk assessment process may be treated as either, aleatory or epistemic [84]. The aleatory uncertainty results from the inherent randomness of input model parameters derived from reliability data, while on the other hand, epistemic uncertainty may re- sult from insufficient reliability data. Quantifying epistemic uncertainty relies on expert domain knowledge. For treating aleatory uncertainty,
The analytical approximation approach resolves posterior distri- bution functions via a data sampling approach based on a simula- tion framework, e.g. Monte Carlo simulation. This sampling approach draws samples from probability density functions of the modelled fail-
statistical failure models are often used, while quantifying epistemic un- certainty relies on models such as Interval Analysis, Fuzzy functions and Belief functions are applied [84] . For Bayesian networks, uncertainty associated with sparse reliability data is treated through a Bayesian in- ferencing framework discussed next.
ure events. Thereafter, uncertainties associated with the sampled data are propagated through an appropriate mathematical model, e.g. the Bayes equation, from which the posterior distribution is resolved [87]. Within Bayesian network modelling formalism, the analytical approx- imation approach is reported in studies, e.g. [88,89]. More recently,
4.1. Analytical approximation approach
69
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
Wang et al. [90] applies a Monte Carlo simulation approach within a Bayesian network modelling formalism for assessing the reliability of railway turnout systems exposed to weather-related elements, from which, optimal maintenance intervention strategies are formulated. However, the simulation sampling approach has one important drawback – assumes the existence of a closed-form posterior distribu- tion from which samples are drawn. This is, however, not the case, es- pecially where the prior and likelihood functions belong to different families of distributions. This makes the posterior distribution function computationally intensive to resolve [91] . Moreover, the analytic ap- proximation approach often yields poor risk estimates, especially where reliability data is sparse. Hence, alternative resolution approaches such as data augmentation are suggested.
for assessing the reliability of components characterized with multi- state, Markov degradation processes. In their study, the Gibbs sampler is applied for resolving the posterior distributions generated from the degradation processes. Other studies incorporating the Gibbs sampler in Bayesian network formalisms are discussed in, e.g. [104–106]. Some studies attempt to integrate the Gibbs sampler and M-H algo- rithms within the same modelling formalism. Examples include Soliman et al. [98], where a combined formalism is proposed for estimating the reliability of multi-component systems characterized with dependencies modelled via a modified Weibull posterior distribution. More recently, the sampler is also discussed for modelling the inuence of dependen- cies such as stress and component strength on system reliability [107]. Zaidan et al. [108] also applies the approach for estimating the remain- ing useful life of aerospace gas turbine engines. Other authors have extended the hybrid McMC resolution approach by allowing inclusion of parametric sensitivity analysis, for instance, see [109–111]. Of particular interest, the resolution efficiency of McMC is extended to analyzing rare failure events. In recent years, the McMC resolution has evolved to software applications such as BUGS (Bayesian inference using Gibbs sampling) where applicability of the approach is demonstrated for assessing asset failure risks, e.g. see [112–114] .
4.2. Data augmentation approach
The data augmentation approach works by augmenting observed data with missing data which yields an augmented posterior density function that is computationally tractable, and more efficiently resolved. The Expectation-Maximization (E-M) algorithm is widely applied for augmenting missing reliability data, and hence estimating the lifetime distribution of repairable systems/assets. For Bayesian network for- malisms, Mahmoud and Khalid [92] apply the approach for augmenting censored fault data of electro-hydraulic rotational drive systems. Zhang et al. [93] also apply the method within a dynamic Bayesian network formalism for estimating the remaining useful life (RUL) of systems char- acterized with complex failure dependencies, where the inuence of a condition-based maintenance strategy is considered for degrading com- ponents. Zhang and Dong [94] also apply the approach within a dynamic Bayesian network formalism where they incorporate a Gaussian model for augmenting missing failure data. More recently, Ratnapinda and Druzdzel [95] incorporate the E-M augmentation approach within Bayesian networks, and consider an ap- plication scenario where continuous data streams are used to augment sparse reliability data. Other studies where the E-M method is embed- ded in Bayesian network formalisms is discussed in studies, for instance, see Bacha et al. [96] . Nonetheless, despite its usefulness for augmenting sparse reliability data, the E-M is constrained for modelling dependen- cies where the prior and likelihood functions belong to different families of distributions [97] . Part of this constraint is addressed by the Markov chain Monte Carlo method.
5. Methods for quantifying epistemic uncertainty
Although the Bayesian inferencing framework is useful for combin- ing evidences, both quantitative and qualitative, lack of, or insufficient reliability data may necessitate alternative methods for quantifying epis- temic uncertainty. Such methods would allow expert elicitation to be considered in dependability modelling formalisms. Examples of meth- ods for quantifying epistemic uncertainty include; (1) Theory of Fuzzy sets; (2) Interval Analysis; and (3) the Dempster–Shafer Theory of Evi- dence [115]. 5.1. Fuzzy approach for quantifying uncertainty
The fuzzy set concept was first suggested for modelling vague and im- precise information through membership functions, where the function specify a degree of belonging in the continuous interval [116]. Ideally, a function of ‘0 ’ implies no membership, while conversely, a function of ‘1 ’ implies full membership in the continuous interval. The fuzzy concept is applied within fault tree modelling formalisms, e.g. in Purba et al. [117] , for assessing the probability of failure of basic events of a nuclear power plant facility. In the study, modelling the basic events relied on fuzzy functions elicited from domain experts. The embedded-
4.3. Markov chain Monte Carlo
The Markov chain Monte Carlo (McMC) approach works by simulating Markov chains within a given parameter space where the chains are constructed in such a way that the posterior distribution function converges to an asymptotic distribution. From this convergence, poste- rior statistical parameters (e.g. mean, standard deviation) are approx- imated from ergodic averages of the Markov chains [98–100]. A pri- mary advantage of the McMC compared to conventional Monte Carlo sampling approach, is its ability to estimate posterior distribution pa- rameters for complex mathematical models having a large number of parametric values, and belonging to different distribution families [98]. This is in addition to enhancing the suitability of the method for dy- namically updating risk metrics with emergence of new evidences of failure events. Commonly applied McMC inferencing methods include the Metropolis-Hastings (M-H) algorithm and Gibbs sampler [101] . The latter is a rejection-sampling algorithm that generates a sequence of samples from any complicated probability density function. In the context of risk and reliability analysis, the Gibbs sampler
ness of fuzzy concept within . static fault tree formalisms is also discussed in studies, e.g. [118,119] For dynamic fault trees, the fuzzy concept is discussed in Tu et al. [30] where the concept is applied for quantifying uncertainties asso- ciated with sparse failure information of critical avionic systems. Kabir et al. [120] also incorporate the concept while assessing the reliability of fuel distribution system of marine ships. More recently, a fuzzy fault tree analysis modelling formalism is discussed in Yazdi et al. [121] where im- portantly, the formalism is applied for analysing failure risks associated with common cause failures. Assessing such risks is often challenging owing to sparse fault information. The concept is embedded in dynamic fault tree formalisms as discussed in studies, e.g. [122,123]. Recent at- tempts are also seen in the literature where some authors integrate the fuzzy concept to Bayesian network modelling formalisms, for instance, in He et al. [124] where fuzzy functions are assigned to failure probabil- ity estimates of complex systems characterized with multi-state failures.
method is embedded in Bayesian network formalisms. For instance, Lin et al. [102] proposes a Gibbs sampler-based approach for estimating the service lifetime distributions of locomotive wheels. Their approach con- siders factors such as wheel installation positioning, a factor inuenc- ing wheel wear, and maintenance. Liu et al. [103] applied the method
5.2. Interval analysis
In interval analysis, the uncertain and imprecise parameters of in- terest are assumed to lie within the lower and upper interval bounds . Compared to the fuzzy approach where fuzzy membership [115] 70
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
Petrochemical facilities
26%
Industrial/manufacturing systems
21%
Nucleur power generation/research generation/research
19%
Railway systems/Marine applications
15%
Food/paper/process industries
11%
Electronics/telecommunications
8%
Fig. 4. Distribution of articles as per application domain.
functions are specified, in the interval analysis, domain experts assign crisp lower and upper bound values to the uncertainty range the param- eters are judged judged to lie within [115]. For example, the failure probability of a wind turbine gearbox may be specified as lying within the lower and upper bounds . The interval analysis allows to 1 × 10−2 estimates from several experts to be combined within a probabilistic framework described by the interval functions [125]. Although the in- terval analysis is demonstrated to work well within Bayesian network formalisms, especially where reliability data is sparse, the analysis lacks a concise mathematical structure or density function through which un- certainty can be propagated [115,126]. To overcome this aw, two al- gorithms are suggested in the literature, and based on, (1) simulation methods; and (2) surrogate models [115] . For reliability analysis and probabilistic safety assessment, the in- terval analysis method is gaining attention in the field of uncertainty quantification (UQ). In UQ, aleatory and epistemic uncertainties are analysed through separate second-order distribution functions. This sep- aration approach is suggested as useful for assessing the reliability of complex, high reliability safety-critical systems, e.g. aerospace systems [127–129] . Within dependability modelling formalisms, the UQ separa- tion approach is discussed recently in Novack et al. [130] for quanti- fying the epistemic uncertainty of basic failure events of space launch vehicles. Fig. 4 depicts the distribution of reviewed dependability ap- proaches as per the application domain.
towards the plausibility function (upper bound) while weak evidence would suggest the contrary, i.e. a tendency towards the belief function (lower bound). In the literature, the DSTE is discussed in Eldred et al. [115] where the authors use computational experiments to compare the DSTE and the Interval Valued Probability (IVP) methods. The IVP segregates aleatory and epistemic uncertainties, and allows nested operations to be per- formed [115] . Based on the experiments, the authors conclude that al- though the DSTE and IVP approach produce comparable results, the DSTE is sensitive to the number of input variables. As such, the compu- tational effort increases in tandem with the number of input variables. Helton and Johnson [126] also compare the DSTE, the Interval Analysis and the Fuzzy methodology and conclude that the DSTE is rather attrac- tive in that; (i) it allows inclusion of more information compared to the Interval Analysis; and (ii) it requires fewer assumptions for specifying input uncertainties as compared to both, the Interval Analysis and the Fuzzy set concept. Although not applied within a dependability modelling formalism, Ding et al. [133] demonstrate how the DSTE may be applied for assess- ing the reliability of early fire detection systems by aggregating multi- sensor information, e.g. smoke and light sensor information. For tech- nical systems, Agaram [134] reviews recent applications of DSTE ap- proaches which embeds concepts of information fusion for reliability analysis, and fault diagnosis in the automotive industry. Notably, the review highlights usefulness of the DSTE approach for early fault detec- tion through combining multiple sources of evidences, including sensor data, e.g. vibration, or ultrasound, and expert information on potential failure events. For dynamic fault tree analysis, Duan et al. [135] integrates an ev- idential information network in which, component failure rates are ex- pressed through interval number estimates, with epistemic uncertainties associated with the sparse failure data modelled via the DSTE concept. Inclusion of interval valued probabilities to fault tree modelling formal- ism is also discussed in Toppila and Salo [136], where the authors cau- tion of challenges upscaling such formalisms for assessing the reliability of complex dependable systems. Zhang et al. [137] demonstrated how
5.3. Dempster–Shafer evidence theory
The Dempster–Shafer Theory of Evidence (DSTE) is founded on two ideals; (i) obtaining degree of beliefs for subjective probability estimates and; (ii) combining the degree of beliefs within a probabilistic frame- work [131] . The DSTE provides an efficient framework for aggregating information from multiple sources, both qualitative and quantitative, where this aggregation is achieved through the Dempster’s combina- tion rules [132] . In DSTE, estimates of the risk metrics of interest are bounded within the belief (lower bound) and plausibility functions (up- per bound), expressed by the Equation [131] : Bel ( ) ≤ ( ) ≤ ( )
linguistic information may be incorporated into an evidential network which is based on the DSTE method and a Bayesian network formal- ism. Flage et al. [138] also apply an approach which synthesizes the DSTE and the fuzzy concept within a fault tree modelling formalism, for quantifying epistemic uncertainty of basic failure events of general sys-
(8)
The exact position where the metric (e.g. probability of failure) lies depends on the degree of evidence or information available at the time of analysis [131] . Hence, strong evidence would suggest a tendency 71
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
Table 1 Overview of methods for quantifying uncertainty in dependability modelling.
Methods
Literature
No. of articles
Bayesian inference approaches
[87–91] [92–96] [111,89,114,112,110,98–103,171–175] [54,117,120,30,123,124,176–179] [125,127–130,139,180] [133–135,129,140–146,155,139,137,181]
5(8%) 6 (9%) 16 (24%) 12 (18%) 9(14%) 18 (27%)
Analytic approximation Data augmentation Markov chain Monte Carlo Approaches for quantifying epistemic uncertainty Theory of fuzzy sets Interval analysis Dempster-Shafer theory of belief
tems. More recently, Giuseppe et al. [139] apply an approach which also synthesizes the DSTE and Interval-Valued Probability estimates elicited from domain experts where similarly, their combined formalism is em- bedded in a fault tree modelling formalism and applied for assessing the reliability of systems with different configurations, i.e. parallel or series. In Bayesian network formalisms, authors such as Kabir et al. [140] apply the DSTE for fusing censored failure data with expert es- timates where their approach is applied for assessing the reliability of technical components of a water distribution system. Within Bayesian network modelling formalism, DSTE is also discussed in [129,141–146]. Table 1 summarizes the main methods for treating uncertainty in de- pendability modelling approaches discussed in this review.
This challenge is particularly apparent for static and dynamic fault trees, as discussed in Gharahasanlou et al. [16] . As regards user intuitiveness, integrated formalisms are seemingly attractive owing to the trade-off between intuitiveness, and modelling complexity, especially when temporal aspects are considered. This is where fault tree formalisms are translated to equivalent Bayesian net- works, hence seems to cope better with uncertainties associated with sparse reliability data, or qualitative aspects such as operations risks, or human-related factors, as seen in studies, e.g. Dongiovanni and Iesman- tas [18]. However, it should be mentioned that despite the modelling versatility introduced by the integrated formalisms, incorporating main- tenance policies within the formalisms is seemingly a challenge. Effort towards this direction is discussed for dynamic fault trees, and in par- ticular, repairable dynamic fault trees suggested by authors e.g. Manno et al. [27]. However, repairable fault tree as discussed, excludes alterna- tive maintenance such as optimized maintenance planning, or condition- based maintenance. This omission also extends to incorporating prog-
6. Discussion 6.1. General insights, and implications of the review for research and practice
nostic information, such as inclusion of the remaining useful life to de- pendability modelling formalisms. Although recent studies consider this aspect, nonetheless it is noted as an important gap which could be fur- ther explored. For static and dynamic Bayesian networks which constitute 35% of the reviewed approaches, an important trend towards more exi- ble modelling formalisms is seen. Importantly, apart from incorporating temporal aspects, the Bayesian networks offers the advantage of updat- ing risk metrics with the emergence of new failure information. The for- malism also seems robust for incorporating qualitative information, such as human-related maintenance errors. Such human aspects are often dif- ficult to quantify, yet are important contributors to equipment failures, and accident events in safety-critical assets. Important human-related performance shaping factors contributing to maintenance-related er- rors includes fatigue, skill level, or not incorrect repair procedures. Al- though inclusion of human factor aspects in Bayesian network modelling formalisms are discussed, this is seemingly limited to safety and acci- dent analysis, for instance, as discussed in Akhtar and Utne [149] and Calviño, Grande [150]. However, one important challenge of incorporating human factors within dependability modelling formalisms is the difficulty quantifying the probability of errors linked to performance shaping factors associ- ated with human errors. Quantifying such errors requires use of scenario analysis where propagation of human errors to potential maintenance errors is evaluated. Bayesian network formalisms are limited in this re- gard. Noroozi et al. [151] proposes an alternative approach where Event trees are applied for scenario analysis, and quantifying the impact of hu- man errors on equipment maintenance. From the review, the important role of Bayesian network formalisms for rare event analysis is also discussed. In particular, the data augmen- tation approach seems attractive for decision support in maintenance since, often, availability of sufficient data for failure modelling is an important challenge. This is especially the case for high reliability and safety-critical systems depicted in Fig. 4. A trend towards this direc- tion is discussed in studies, for instance, in [152,153]. An important concern, however, for rare event analysis relates to validation concerns for formalisms integrating such analysis. This is an important challenge neccesatitating future work in this direction.
This review offers important insights for decision support in risk as- sessment, and more specifically, dependability analysis in maintenance decision making. In particular, such insights could assist risk analysts and maintenance practitioners assess equipment failure risks more ro- bustly, and consequently, formulate effective maintenance strategies that mitigate the effects of equipment failures. As depicted in Fig. 4, performing risk assessment is especially an important consideration for formulating maintenance strategies for safety-critical systems such as nuclear power generation facilities, railway systems, and chemical pro- cess facilities. For such facilities, sub-optimal risk assessment may re- sult in failure events leading to catastrophic accidents, for instance, the Bhopal disaster, or recently, the Deepwater horizon spill event in the Gulf of Mexico [147,148] . By structuring knowledge on dependabil- ity modelling, risk assessment, and maintenance decision making, it is expected that risk analysts and maintenance practitioners will better
assess methods. the relevance, and applicability of different dependability modelling From the review, significant research is seemingly directed towards more versatile dependability modelling methods such as dynamic fault trees, dynamic Bayesian networks, hybrid fault trees/Bayesian net- works, and stochastic Petri-nets which overall, accounts for 69 % of the reviewed methods (see Fig. 3 ). Nonetheless, static dependability ap- proaches such as fault trees, and Bayesian networks constitute a notice- able proportion of the reviewed approaches, which may be attributed to the intuitiveness of the methods by analysts and practitioners (31% of reviewed methods). This contrasts to dynamic dependability modelling methods where equipment failure probabilities are primarily resolved through Markov models, and Monte Carlo simulation approaches. However, apart from ignoring temporal aspects, the static fault tree is still limited to the extent to which basic failure events are modelled through varying empirically derived distribution functions, for instance, Weibull or Lognormal functions. Largely, in the reviewed methods, ba- sic events are assumed as exponentially distributed, an assumption con- sidered for modelling simplicity. Although empirically derived distribu- tions would ideally mimic failure models expected in real-life, incorpo- rating such distributions within the reviewed formalisms is not straight- forward, and presents additional resolution complexities of the methods. 72
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
To address some of the validation concerns for rare failure events, the Markov chain Monte Carlo (McMC) simulation approach is discussed where apart from efficiently resolving complex posterior distributions, the approach addresses validity concerns for sparse data sets. This is achieved partly through computing the Deviance Information Crite- rion (DIC) which is embedded in software applications such as BUGS (Bayesian Inference Using Gibbs Sampler). Although a useful resolution approach for posterior distributions, and addressing model validity con- cerns, its usage is limited to fairly simple systems with straightforward dependencies. Extending the McMC for modelling more complex depen- dencies such as maintenance policies, and human-related maintenance errors is an interesting direction for future work. Although demonstrated as applicable for rare event analysis, Stochastic Petri-net applies enabling rules within a simulation modelling framework which also introduces model validity concerns, for instance, as discussed in Paolieri et al. [154]. Moreover, the formalism may not be intuitive to maintenance practitioners as compared to methods such as fault trees, or Bayesian network formalisms, hence its seemingly low proportion as compared to other reviewed dependability modelling ap- proaches. For quantifying epistemic uncertainty, integrating fuzzy and DSTE concepts within dependability modelling formalisms, such as the static and dynamic fault trees is an interesting observation (45% of uncer- tainty quantification methods, see Table 1 ). This is because, in absence of sufficient data for modelling basic failure events, eliciting fuzzy esti- mates from domain experts is an intuitive approach for addressing data availability challenges. However, the fuzzy concept raises model valid- ity concerns which is partly addressed by Bayesian updating. The DSTE method also provides a useful platform for augmenting sparse reliabil- ity information with expert estimates, for instance, discussed in stud- ies, e.g. Khalaj et al. [155] , and Flage et al. [138] . In particular, the DSTE integrates a useful data fusion framework which allows synthesis of maintenance-related information from multiple sources, e.g. condi- tion monitoring sensor data such as vibration and ultrasound. The fusion further extends to integrating information elicited from domain experts within the modelling formalism.
Other plausible approaches may include alternative formalisms, such as use of dynamic event trees for instances where information on fault incidences modelled via fault trees is limited. This approach is discussed in Ibánez et al. [158] where they argue that the DET formalism avoids the need for exploring all potential system failure configurations or de- pendencies. A similar trend towards using the DET modelling formalism is also seen in Karanki et al. [159] where uncertainties associated with stochastic failure probabilities and modelling parameters are incorpo- rated within DET’s. For integrating sparse information to dependability formalisms, in- formation fusion architectures are suggested. For instance, Guo et al. [160] propose an approach where information from both expert and data sources are integrated via a Bayesian inferencing framework. Their approach importantly uses linear and geometric pooling methods, hence allowing importance weights to be assigned to the prior failure infor- mation. This diversifies the characteristics of possible prior that may be integrated in the Bayesian inferencing framework. A Naives Bayes approach for handling missing or unsynchronized is also proposed re- cently in Dabrowski et al. [161], and integrated in a dynamic Bayesian network modelling formalism. Hence such recent formalisms indicate an interesting trend towards more data driven dependability modelling approaches. For rare failure analysis, a notable constraint is the reliance on both numerical reliability data and expert analysis, which necessitates nu- merous modelling assumptions for augmenting subjective estimates. To mitigate the impact of such assumptions, authors such as Khorsandi and Aven [162] propose inclusion of the ‘assumption deviation risk’ for mitigating modelling uncertainties. Inclusion of such aspects to de- pendability modelling may further enhance treatment of uncertainty, hence, an interesting area of future work. Combined formalisms such as, such as the generalized stochastic Pertinets integrated with fault trees is demonstrated for rare failure/accident analysis. Talebberrouane, Khan [80] demonstrates that such formalisms provides more information on fault occurrences at different operational states and dependability se- quences, and may consider alternative maintenance and repair strate- gies. A similar Petrinet/fault tree formalism is also discussed recently in Yan et al. [163] for assessing the reliability of complex automated guided vehicle systems while considering optimal inspection and main- tenance timings. Data-driven machine learning approaches, and the DSTE method also seems to provide a plausible data fusion platform. For instance, integrating methods such as the Least square Support Vector Machine (SVM) in dependability modelling is widely discussed method for diag- nosing faults of technical assets, e.g. see [164,165].
6.2. General directions for future work
From the above discussion, dependability modelling formalisms present interesting prospects for future research within the maintenance decision making domain. Firstly, there is need to extend the modelling exibility of fault tree and Bayesian network formalisms such that em-
pirical failure models are integrated in the formalisms. This deviates from the traditional assumption in dynamic fault trees where basic fail- ure events are assumed as exponentially distributed. Incorporating such empirically derived failure models may pave way for more exible for- malisms where the reliability of complex electromechanical systems, such as collaborative robots is more practically assessed. Often such robots systems constitute components exhibiting varying failure mech- anisms, such as random failures (electronic components) or Weibull or Gamma distributed failures (mechanical systems) [156]. Secondly, mapping failure dependencies objectively in the for- malisms discussed in this review is challenging. Often, the failure depen- dencies are mapped qualitatively, either based on expert knowledge on associations between failure mechanisms, or based on the system config- uration. The latter considers how components are interconnected, and presumes that failure dependencies are aligned to the system configura- tion. Data exploration methods combined with data fusion approaches may provide a plausible platform for objectively mapping dependencies
Thirdly, the combinatorial explosion problem remains an important challenge for upscaling graphical-oriented dependability methods dis- cussed in this review, i.e. fault trees, stochastic Petri-net, and Bayesian network. This is especially a challenge for modelling systems with com- plex dependencies owing to multiple interconnected components ex- hibiting varying failure mechanics. Although object-oriented modelling approaches try to address this concern by modularizing complex de- pendability formalisms, the decomposition limits the extent to which reliability, and maintenance-related aspects are integrated into such for- malisms. Invariably, this limits the robustness of the risk assessment process, and maintenance decision making, the latter linked to selecting optimal maintenance strategies. Hence, exploring more efficient decom- position schemes forms an interesting prospect for future work. In addi- tion to decomposition schemes, application of more efficient algorithms for reducing storage necessary for constructing modular schemes such as Bayesian network may assist upscale dependability models. Recent work in this direction is discussed in Tien and Der Kiureghian [166].
between failure events, for instance, discussed in Chemweno et al. [3]. In particular, information fusion may allow synthesizing data from sys- tems of similar configuration or design. This approach is discussed in Raz et al. [157] where Information Fusion System architecture is sug- gested.
Alternative integrated formalisms may also allow upscaling of de- pendability models, and overcome the challenge of traditional ap- proaches which so far focus on simple systems with limited dependen- cies. Recent application of continuous-time Markov chain seems promis- ing in this regard, for instance, proposed in Liang et al. [167]. Func- 73
P. Chemweno et al.
Reliability Engineering and System Safety 173 (2018) 64–77
[177] Liu H-C , You J-X , Duan C-Y . An integrated approach for failure mode and effect analysis under interval-valued intuitionistic fuzzy environment. Int J Prod Econ 2017 In Press . [178] Duan R , Fan J . Dynamic diagnosis strategy for redundant systems based on relia- bility analysis and sensors under epistemic uncertainty. J Sens 2015;1–14 . [179] Abdo H , Flaus J . Monte Carlo simulation to solve fuzzy dynamic fault tree. IFAC– PapersOnLine 2016;49:1886–91 . [180] Eldred MS, Swiler LP, Tang G. Mixed aleatory-epistemic uncertainty quantification with stochastic expansions and optimization-based interval estimation. Reliab Eng Syst Saf 2011;96:1092–113 . [181] Helton JC , Johnson JD . Quantification of margins and uncertainties: alternative representations of epistemic uncertainty. Reliab Eng Syst Saf 2011;96:1034–52.
. The Application of Bayesian Networks in System Reliability. Arizona State [172] Zhou D University; 2014 . [173] Roy A , Srivastava P , Sinha S . Risk and reliability assessment in chemical process industries using Bayesian methods. Rev Chem Eng 2014;30:479–99 . [174] Vergé C , Morio J , Del Moral P . An island particle algorithm for rare event analysis. Reliab Eng Syst Saf 2016;149:63–75 . [175] Pan Z , Balakrishnan N . Reliability modeling of degradation of products with mul- tiple performance characteristics based on gamma processes. Reliab Eng Syst Saf 2011;96:949–57 . [176] Jee TL , Tay KM , Lim CP . A new two-stage fuzzy inference system-based ap- proach to prioritize failures in failure mode and effect analysis. IEEE Trans Reliab 2015;64:869–77 .
77