December 23, 2016 | Author: João Augusto | Category: N/A
Automatic Identification of Formation Iithology from Well Log Data: A Machine Learning Approach Seyyed Mohsen Salehi*...
Journal of Petroleum Science Research (JPSR) Volume 3 Issue 2, April 2014 www.jpsr.org doi: 10.14355/jpsr.2014.0302.04
Automatic Identification of Formation Iithology from Well Log Data: A Machine Learning Approach Seyyed Mohsen Salehi*1, Bizhan Honarvar 2 Department of Petroleum Engineering, Omidiyeh Branch, Islamic Azad University, omidiyeh, Iran
*1
Islamic Azad University, Fars Science and Research Branch, Shiraz, Iran
2
E‐mails: *
[email protected]; 2
[email protected]
Received 22 December 2013; Accepted 10 February 2014; Published 14 April 2014 © 2014 Science and Engineering Publishing Company
Abstract Determination of the hydrocarbon content and also the successful drilling of petroleum wells are highly contingent upon the lithology of the underground formation. Conventional lithology identification methods are either uneconomical or of high uncertainties.The main aim of this study is to develop an intelligent model based on Least Squares Support Vector Machine (LSSVM) and Coupled Simulated Annealing (CSA) algorithm simply called CSA‐ LSSVM for predicting the lithology in one of the Iranian oilfields. To this end, photoelectric index (PEF) values were simulated by CSA‐LSSVM algorithm based on valid well logging data generally known as lithology indicators. Model predictions were compared to the real data obtained from well logging operation and the overall Correlation Coefficient (R2) of 0.993 and Average Absolute Relative Deviation (AARD) of 1.6% were obtained for the total dataset (3243 data points) which shows the robustness of the CSA‐ LSSVM algorithm in predicting accurate PEF values. In order to check the validity of the employed well log data,value statistical method was implemented in this study for detecting the possible outliers. However, diagnosing only one single data point as the suspected data or probable outlier reveals the validity of recorded data points and shows high applicability domain of the proposed model. Keywords Lithology; Least Squares Support Vector Machine (LSSVM); Coupled Simulated Annealing (CSA); Outlier
Introduction Efficient drilling of hydrocarbon wells in an oilfield certainly entails identification of the lithologies crossed by the well. The knowledge of lithology on a hydrocarbon well can be employed in determining a
variety of other parameters, the most important of which is its fluid content. One way of determining the lithologies and lithofacies is to infer from the cuttings obtained during drilling operations. However,it is always uncertain about the depth of the retrieved cuttings and the samples are not usually large enough for accurate and reliable determination of petro‐ physical parameters (Serra and Abbott, 1982).The other method to obtain such parameters may be through observation and analysis of the core samples taken from underground formation. Nevertheless, this approach is highly expensive and may require a huge amount of time and effort to obtain reliable information about the underground lithofacies. Moreover, different geophysicists and geologists may obtain non‐unique results based on their own observations and analysis (Akinyokun et al., 2009; Serra and Abbott, 1982). Considering the constraints mentioned for other methodologies, there has been a growing interest in identification of lithologies through interpretation of well log data which is cheaper, more reliable, and economical than core analysis. Wire‐line logging provides the advantage of covering the entire geological formation of interest along with providing extensive and exceptional details of the underground formation (Serra and Abbott, 1982). Unfortunately, ambiguities in measurements, mineralogical complexities of geological formations, and many other factors may, in some cases, bring unexpected difficulties to lithology identification from well log interpretations. In this perspective, a number of studies have been undertaken for accurate and reliable determination of
73
www.jpsr.org Journal of Petroleum Science Research (JPSR) Volume 3 Issue 2, April 2014
crude lithological indicators. Shale, bentonite, and coals tend to cave into the wellbore, so producing an increased wellbore diameter. On the other hand, no borehole deviations are observed in sandstones and carbonates since they do not tend to cave into the wellbore (Evenick, 2008).
lithologies by employing the data obtained from well logging operations (Akinyokun et al., 2009; Hsieh et al., 2005; Serra and Abbott, 1982). In recent years, engineers and geoscientists have applied computational algorithms and statistical approaches to define the lithologies and petro‐physical parameters,furthermore, try to reduce the errors and difficulties associated with conventional well logging interpretations (Akinyokun et al., 2009). Conventional computational algorithms or statistical methods may be defective in providing adequate information for lithology identification, especially in carbonate oil reservoirs. Broad families of algorithmic approaches are subsumed under category of machine learning techniques. These algorithms are based on a coherent statistical foundation and aim to find reliable predictions through inferring from a set of measurements. Some researchers have recently employed Artificial Neural Networks (ANNs) to improve the past performance in solving the problems concerned with lithology determination (Chang et al., 2002). However, ANN‐based models possess some deficiencies in reproducing the obtained results, partly due to random initialization of the network parameters and variations of stopping criteria during optimization processes (Cristianini and Shawe‐Taylor, 2000; Suykens and Vandewalle, 1999). Recently, support vector machine (SVM) has been proved to be an established and powerful tool employed in solving several complex problems encountered in many disciplines (Baylar et al., 2009; Byvatov et al., 2003; Scholkopf and Smola, 2002; Vapnik, 1995).
In sonic logs, the speed of sound transmitted through the formation is recorded in microseconds per foot (μs/ft). These logs are good indicators of lithology and density since transmission rate highly depend on the media that the sound is passing through. Deep induction resistivity logs record the resistance of a formation to flow of electricity far away from the invasion core produced by drilling mud in Ohm meter (Ωm). Most rocks are insulators and most formation fluids are electrical conductors. High resistivity is recorded when the formation contains hydrocarbon (Akinyokun et al., 2009; Evenick, 2008). A neutron log normally measures a formation’s porosity based upon the quantity of hydrogen present in the formation. It is mainly used in lithology identification, porosity evaluation, and differentiation between liquids and gases due to their dissimilar hydrogen contents (Akinyokun et al., 2009; Evenick, 2008). The density log measures the porosity of a formation based on the assumed density of the formation and drilling fluid in grams per cubic centimeter (g/cm3). It can also be employed in differentiation between gases and liquids through cross‐plotting the overestimated porosity values (from density logs) and underestimated porosity values (from neutron logs) (Akinyokun et al., 2009).
This research employeds a least square modification of SVM approach called Least Squares Support Vector Machine (LSSVM) in an effort to alleviate the shortcomings and deficiencies carried by conventional well log interpretation methods and previously applied algorithmic approaches. Our main focus is the determination of lithology from the data recorded in wire‐line logging operation from one of the Iranian oil wells in Ahwaz oilfield. In this study, caliper log (CALI), sonic log (DT), deep induction resistivity log (ILD), neutron log (NPHI), density log (RHOB), and gamma ray log (CGR) were identified as lithology indicators. All raw data obtained from wire‐line logging are initially corrected for environmental effects owing to borehole size, mud salinity, etc. These corrections are rendered indispensible prior to any interpretations being performed on well log data.
Gamma ray logs are indicators of radioactivity of the formation as shale‐free sandstones and carbonates yield low gamma ray values. Shales on the other hand usually exhibit high gamma ray readings if they contain adequate amounts of accessory minerals containing isotopes like potassium, uranium, and/or thorium (Hsieh et al., 2005). This article is organized in the following sections. In the section 2, acquisition of data and assembled database are explained in detail. In section 3 details and equations behind the intelligent model are provided along with some discussions on advantages and disadvantages of some methods based on machine learning theory. In section 4, results obtained from the LSSVM model are compared with real well log data and accuracy of the model is fully described. Finally, a
Caliper log is a tool for measuring the diameter and shape of the wellbore. Caliper logs can be used as
74
Journal of Petroleum Science Research (JPSR) Volume 3 Issue 2, April 2014 www.jpsr.org
TABLE I RANGES OF INPUT/OUTPUT VARIABLES USED FOR DEVELOPING
statistical method is applied for determination of the possible outliers and also for investigating the validity of the employed dataset.
AND TESTING THE MODEL
Data Acquisition Borehole geophysical data were obtained from an oil well in Ahwaz Iranian oilfield. Some of the well log data were selected as indicators of lithology. For each data point, these are caliper log (CALI), sonic log (DT), deep induction resistivity log (ILD), neutron log (NPHI), density log (RHOB), and gamma ray log (CGR). These readings were then connected to photoelectric index (PEF) which is a supplementary measurement used for recording the adsorption of low energy gamma rays by the formation in units of barns per electron. The logged values are directly proportional to the aggregate atomic number of the elements in formation, thus it is a sensitive indicator of mineralogy and has to be predicted with high accuracy. Figure 1 indicates different values of PEF in different formation lithologies. A total number of 3243 log readings were assembled into a dataset including 7 inputs (lithology indicator logs) and 1 output (PEF values). The overall range of recorded data along with their average and standard deviations are summarized in Table I.
Parameter
Minimum
Maximum
Average
Standard
Depth (m)
2575.712
3075.889
2827.878
124.5312
CALI (in)
8.1504
22.2763
9.345049
0.659798
DT (μs/ft)
53.1954
113.1356
77.09043
9.722123
ILD (Ωm)
0.1975
1705.562
12.79944
15.99413
NPHI (p.u)
0.041645
0.494965
0.199554
0.047319
RHOB (g/cm3)
1.4736
2.8639
2.420654
0.158964
CGR (API)
0.0139
111.2971
30.33772
19.87745
PEF (barn/electron)
1.8121
6.635
3.096314
0.845851
Details Of The Intelligent Model Support Vector Machine (SVM) The concept of SVM was initially introduced by Vapnik (1995) as a supervised learning algorithm for solving several classification and function approximation problems (Moser and Serpico, 2009; Suykens, 2001). SVM has a number of distinct advantages as compared to traditional learning methods based on ANN (Byvatov et al., 2003; Cristianini and Shawe‐Taylor, 2000; Suykens and Vandewalle, 1999): 1) In contrast to ANN, the need for determining the topology of the network is eliminated in SVM and it is automatically established during the learning process. 2) Possibility of over‐fitting or under‐fitting is minimized in SVM paradigm by incorporating a structural risk minimization (SRM) strategy. 3) In SVM, a limited number of parameters need to be adjusted during learning process, compared to large number of adjusting weight factors in ANN models. Assuming S (x1 , y1 ),...,(xn , yn ) where xi represents input patterns (CALI, DT, ILD, NPHI, RHOB, RT, and CGR), yi denotes output data (PEF in this study) and n is the total number of recorded data. SVM employs a nonlinear mapping procedure in order to map the input parameters into a higher dimensional or even infinite dimensional feasible space (Cristianini and Shawe‐Taylor, 2000; Suykens and Vandewalle, 1999). Thus, the main aim of SVM is to locate an optimum hyper‐plane, from which all experimental data have a minimum distance. Assuming that the data samples are linearly separable, the form of decision function employed by SVM is represented as follows
FIGURE 1 MEASUREMENTS OF PHOTOELECTRIC INDEX (PEF) FOR DIFFERENT UNDERGROUND LITHOLOGIES
75
www.jpsr.org Journal of Petroleum Science Research (JPSR) Volume 3 Issue 2, April 2014
Least Squares Support Vector Machine (LSSVM)
(Cristianini and Shawe‐Taylor, 2000; Suykens and Vandewalle, 1999):
Regardless of outstanding performance of SVM for solving static function approximation problems, it has a higher computational burden, owing to required constraint optimization programming (Haifeng and Dejin, 2005). Thus, application of SVM in large scale function approximation problems with a wide range of experimental data is limited by the time and memory consumed during optimization (Haifeng and Dejin, 2005). In an effort to minimize the complexity of SVM and also to enhance its speed of convergence, Suykens and Vandewalle (1999) proposed a modified version of SVM, called Least Squares Support Vector Machine (LSSVM). In LSSVM, equality constraints are used instead of inequality ones employed in traditional SVM (Haifeng and Dejin, 2005; Suykens and Vandewalle, 1999). Although LSSVM benefits from the same advantages as SVM; however, the optimum solution can be obtained through solving a set of linear equations (linear programming) rather than solving a quadratic programming (Gharagheizi et al., 2011; Suykens and Vandewalle, 1999). In general, the following equation is implemented as an objective function in order to train the LSSVM algorithm (Suykens and Vandewalle, 1999):
t
f(x) w g(x) b (1)
where g(x) is the mapping function, w and b are weight vectors and bias terms, respectively, and superscript t denotes the transpose of the weight matrix. The decision function is subjected to the following condition under the assumption that the data from two classes are separable: f(xi ) 1 f(xi ) 1
if yi 1 (2) if yi 1
Support vectors (SVs) are selected from a pool of training data which satisfy the constraints (Cristianini and Shawe‐Taylor, 2000; Suykens and Vandewalle, 1999). If the problem is linearly separable in the feature margin, there will be unlimited number of decision functions which satisfy the Equation (2). Hence, the optimal separating plane can be determined through maximizing the margin and minimizing the noise by a slack margin introduced below (Cristianini and Shawe‐Taylor, 2000; Suykens and Vandewalle, 1999):
min(
n 1 2 w ) C ζ i (3) 2 i 1
where C is a positive constant which is the tradeoff between maximum margin and minimum
Q
where it is subjected to the following linear constraints:
classification error, is the slack variable representing the distance between data points in the false class and margin of their virtual class.
y wt (x ) b e , i i i
i 1, 2,...,n (6)
In Equations (5) and (6), ei represents the regression error relevant to n number data set; denotes the
Taking into consideration the equations presented earlier, we have a typical convex optimization problem that can be solved using the Lagrange multipliers method given below (Baylar et al., 2009; Cristianini and Shawe‐Taylor, 2000; Suykens and Vandewalle, 1999): g(w,b, , , )
n 1 t w w ei2 (5) 2 i 1
relative weight regarding the summation of regression errors compared to regression weight. Regression weight coefficient (w) can be written in terms of Lagrangian multiplier (αi) and input vector (xi) as represented below (Farasat et al., 2013; Fazavi et al., 2013; Rafiee‐Taghanaki et al., 2013; Shokrollahi et al., 2013):
n n 1 t C n w w ζ i αi (yi wt xi b 1 ζ i ) βi ζ i 2 2 i 1 i 1 i 1
(4)
n
w αi xi
where α,β are the Lagrange multipliers. The solution is defined through the saddle point of the Lagrangian when the value of i is greater than zero (Cristianini
where
and Shawe‐Taylor, 2000; Suykens and Vandewalle, 1999). Owing to the specific formalism of the SVM algorithm, sparse solutions can be found for both linear and non‐linear regression problems (Cristianini and Shawe‐Taylor, 2000; Suykens and Vandewalle, 1999).
Considering the assumption that a linear regression exists between the dependent and independent parameters of the LSSVM algorithm, equation (15) can be reformulated as (Farasat et al., 2013; Fazavi et al., 2013; Rafiee‐Taghanaki et al., 2013; Shokrollahi et al., 2013):
76
i 1
i 2 ei (7)
Journal of Petroleum Science Research (JPSR) Volume 3 Issue 2, April 2014 www.jpsr.org
n
Coupled Simulated Annealing (CSA) algorithm was employed to optimize two of the model parameters
y i xi t x b (8)
controlling its accuracy and convergence namely,
i 1
Thus, after some mathematical manipulations, the Lagrange multipliers in equation can be determined from following relationships (Farasat et al., 2013; Fazavi et al., 2013; Rafiee‐Taghanaki et al., 2013; Shokrollahi et al., 2013):
i
and 2 . Coupled Simulated Annealing Simulated Annealing (SA) is a population based search method which is usually used for combinatorial optimization problems. The method was initially proposed by Metropolis et al. (1953), and was popularized by Kirkpatrick et al. (1983) afterwards. The motivation behind this method lies in the physical process of annealing, during which a metal is heated to a liquid state and then cooled slowly enough that all crystal grains eventually reach to the lowest minimum inner energy. Like the metal cooling process, SA gradually converges to the optimum solution which further guarantees global optimum accomplishment and evades the local optimality (Fabian, 1997).
( yi b ) xi x (2 ) 1 (9) t
The linear regression equation developed earlier can be converted to nonlinear form employing the Kernel function as follows (Farasat et al., 2013; Fazavi et al., 2013; Rafiee‐Taghanaki et al., 2013; Shokrollahi et al., 2013): n
f ( x ) i K ( x, xi ) b (10) i 1
K ( x, x )
i is the Kernel function obtained from where inner product of vectors φ(x) and φ(xi) in the feasible margin as is represented below (Farasat et al., 2013; Fazavi et al., 2013; Rafiee‐Taghanaki et al., 2013; Shokrollahi et al., 2013):
K ( x, xi ) ( xi )t . ( x )
This study employs the Couple Simulated Annealing (CSA) proposed by Xavier‐de‐Souza et al. (2010) in an effort to enhance the quality of optimization process. The concept of CSA was inspired by the Coupled Local Minimizers (CLM) in which multiple gradient descent optimizers are used instead of multi‐start gradient descent for optimization problem. CSA describes a set of individual SA processes coupled by a term in acceptance probability function. The aim of CSA is to obtain a faster and robust convergence. The coupling is a function of the current costs of all the individual SA processes (Xavier‐de‐Souza et al., 2010). The information between individual SA is shared through both coupling term and acceptance probability function, allowing for controlling general optimization indicator using optimization control parameters (Xavier‐de‐Souza et al., 2010). While the acceptance probability of an uphill move in traditional SA is often given by Metropolis rule (Metropolis et al., 1953), which depends merely on the current and probing solution, CSA considers other current solutions as well. This probability is also dependent on the costs of solutions through a coupling term in state set S ,
(11)
The Kernel function implemented in this study is the radial basis function (RBF) which is one the most powerful kernel functions commonly employed in this field (Farasat et al., 2013; Rafiee‐Taghanaki et al., 2013; Shokrollahi et al., 2013): (12)
2
K ( x, xi ) exp( xi x / 2 )
2 where is squared bandwidth which is optimized
through an external optimization technique during the training process. The mean squared error (MSE) between the real PEF values and those of predicted by LSSVM algorithm was defined as (Farasat et al., 2013; Rafiee‐Taghanaki et al., 2013; Shokrollahi et al., 2013): n
MSE
( PEFpred PEFreal )
i 1
i
N
i
(13)
where S is the set of all possible solutions. is
where PEF represents the PEF values, N is the number of training objects and subscripts pred and real denote the predicted and real PEF values, respectively. The LSSVM algorithm employed in this study to train the well log data has been developed by Pelckmans et al. (2002) and Suykens and Vandewalle (1999). In order to enhance model performance during learning process,
generally believed to be a function of all costs of solution in . The acceptance probability function in CSA, A , is represented as follows: A ( , xi yi )
exp ( E ( xi ) max x E ( xi )) / Tka i
(14)
77
www.jpsr.org Journal of Petroleum Science Research (JPSR) Volume 3 Issue 2, April 2014
Implement Coupled Simulated Annealing (CSA)
In the next step, assembled well log data were initially divided into three subsets namely, train, validation and test. The “Train” set is employed to perform and generate the model structure, the “Validation” set is applied for adjusting the model parameters and also to check the validity of the patterns learned by CSA‐ LSSVM over the whole range of dataset, and finally, the “Test” set is used to investigate the final performance and validity of the proposed model for unseen data. To increase the model applicability and robustness, the whole database was divided randomly into 70%, 15%, and 15% fractions of the main dataset for the “ Train” set (2270 data points), the “Validation” set (486 data points), and the “Test” set (487 data points), respectively.
2
( μ and σ )
Vldn. set
Select Model features
Tst. set
Trn. set
Read well log dataset
Employ feature subset ( μ and σ ) 2
NoO
Construct PEF prediction model
Meet stopping criteria?
Yes
Evaluate model accuracy
Optimum Model features (
and
2
Re-train LSSVM using the optimum features
) obtained
RBF kernel function was implemented in this study due to its superior performance compared to other kernel types like linear or polynomial kernels. CSA algorithm was then implemented for tuning the LSSVM parameters during learning process. The optimum values found for these parameters at the end
Final CSA-LSSVM model
FIGURE 2 A TYPICAL FLOWCHART REPRESENTING THE CSA‐ LSSVM ALGORITH
of optimization and 2 0.9916 .
where Tka is the acceptance temperature, xi and yi
exp( l
Tka
TRAIN SET
) (15)
This study proposes a CSA‐based approach for parameter optimization and feature selection in LSSVM, termed CSA‐LSSVM. A typical flowchart of the CSA‐LSSVM algorithm is shown in Figure 2. The objective function of CSA‐LSSVM when searching for optimum model parameters is to minimize the Mean Squared Error (MSE) given in Equation (13).
R2
0.995
AVERAGE ABSOLUTE RELATIVE DEVIATION
1.3
STANDARD DEVIATION ERROR
0.84
ROOT MEAN SQUARE ERROR
0.07
N
2270
VALIDATION SET
Result And Discussion
R2
0.987
AVERAGE ABSOLUTE RELATIVE DEVIATION
2.2
STANDARD DEVIATION ERROR
0.82
ROOT MEAN SQUARE ERROR
0.11
N
486
TEST SET R2
Model Accuracy And Validation In this research, CSA‐LSSVM algorithm was implemented in order to obtain PEF as a function of several other measurements recorded during well logging operation. PEF can be used as a general indicator of lithologies and mineralogical complexities of different layers of formation. In this study, PEF was linked to some other parameters generally known as lithology indicators:
0.985
AVERAGE ABSOLUTE RELATIVE DEVIATION
2.2
STANDARD DEVIATION ERROR
0.86
ROOT MEAN SQUARE ERROR
0.12
N
487 TOTAL
PEF=f (Depth, CALI, DT, ILD, NPHI, RHOB, CGR)(16)
78
284.8173
STATISTICAL PARAMETERS
corresponding probing solution, respectively. And coupling term, , is given as: i
were:
TABLE II STATISTICAL PARAMETERS OF THE PROPOSED CSA‐LSSVM MODEL
represent individual solutions in and their
E ( xi ) max x E ( xi )
process
R2
0.993
AVERAGE ABSOLUTE RELATIVE DEVIATION
1.6
STANDARD DEVIATION ERROR
0.84
ROOT MEAN SQUARE ERROR
0.08
N
3243
Journal of Petroleum Science Research (JPSR) Volume 3 Issue 2, April 2014 www.jpsr.org
5.5 Real PEF LSSVM prediction
45 ° line Train Validation Test
6.5
6
5
4.5 R2 = 0.993
5
4 4.5
PEF values
LSSVM prediction of PEF
5.5
4
3.5
3.5
3 3
2.5 2.5
2
2 2
2.5
3
3.5
4 4.5 Real PEF
5
5.5
6
6.5
1.5
FIGURE 3 GRAPHICAL REPRESENTATION OF PEF VALUES PREDICTED BY CSA‐LSSVM ALGORITHM VERSUS REAL PEF VALUES.
0
50
100
150
200 250 300 350 Total number of test data
400
450
500
FIGURE 6 COMPARISON BETWEEN CSA‐LSSVM MODEL PREDICTIONS AND REAL DATA FOR TEST DATASET
7 Real PEF LSSVM prediction
600 Train Validation Test
6
500
400
Data frequency
PEF values
5
4
3
300
200 2
100 1
0
500
1000 1500 Total number of train data
2000
2500
FIGURE 4 COMPARISON BETWEEN CSA‐LSSVM MODEL PREDICTIONS AND REAL DATA FOR TRAIN DATASET Real PEF LSSVM prediction
4.5
PEF values
4
3.5
3
2.5
2
50
100
150 200 250 300 350 Total number of validation data
400
450
0
0.5
FIGURE 7 HISTOGRAM OF ERROR FREQUENCY SKETCHED FOR ALL DATA INCLUDING TRAIN, VALIDATION, AND TEST SETS
5
0
−0.5 Relative deviation
5.5
1.5
0 −1
500
FIGURE 5 COMPARISON BETWEEN CSA‐LSSVM MODEL PREDICTIONS AND REAL DATA FOR VALIDATION DATASET
Some statistical parameters indicating the accuracy and validity of the proposed model are outlined in Table II. A total Correlation Coefficient (R2) of 0.993, Average Absolute Relative Deviation (AARD) of 1.6%, Standard Deviation Error (STD) of 0.84, and Root Mean Squared Error (RMSE) of 0.08 highly confirms the accuracy and validity of the CSA‐LSSVM model in prediction of PEF values from well log data. Regression plot of real PEF values and those predicted by CSA‐LSSVM model is also shown in Figure 3, for Train, Validation, and Test data sets. High concentration of data around the 45° line indicates a good agreement between model predictions and real PEF values. Deviations of the real PEF values from those predicted by CSA‐LSSVM model are also shown in Figures 4‐6 for Train, Validation, and Test set,
79
www.jpsr.org Journal of Petroleum Science Research (JPSR) Volume 3 Issue 2, April 2014
respectively. Obviously, model predictions and the real values approximately overlap suggesting small deviations and high accordance. Frequency of errors between model predictions and real PEF data has also been plotted in Figure 7. This figure indicates a normal error distribution which is a measure of robustness and accuracy in the developed LSSVM model. N
a
( pred (i ) exp.(i ))
2
R 1
1993; Gramatica, 2007; Mohammadi et al., 2012). This plot represents the correlation existing between Hat indices and standardized cross‐validated residuals. A warning leverage (H*) is typically defined equally to 3(n+1)/m, where m denotes the total number of dataset and n represents the number of input parameters. A leverage value of 3 is generally considered as the cut‐ off value to accept the measurements within 3 range standard deviations from the mean (represented as two green lines) (Eslamimanesh et al., 2013; Goodall, 1993; Gramatica, 2007; Mohammadi et al., 2012). Existence of the majority of data points in the range
2
i
N
( pred average(exp.(i )))
2
i
b
c
% AARD N
STD i
100 N | pred (i ) exp.(i ) | N i exp.(i )
0 H H * and 3 R 3 reveals the high applicability and reliability of developed model. Based on these values, suspected data may be categorized into two types namely, leverage points and regression outliers. Leverage points are also subdivided into two groups namely, good leverage point and bad leverage point. Good leverage points are those data points located
(error (i ) average(error (i ))) 2 N
Outlier Detection In PEF Measurements Developing a valid and highly applicable model for predicting PEF values from well log measurements, recorded data must be reliable and accurate. However, accurate measurements of well log data is almost not feasible and environmental interferences in some cases may introduce some flawed measurements into recorded database. These observations usually differ from bulk of the data and are considered as a menace to successful lithology prediction. Thus, constructing an accurate and reliable model is highly dependent upon detecting these values from well logging data.
between H * H and 3 R 3 . Although these measurements possess high leverage values, they do not necessarily affect the correlation coefficient and they are close to the line around which most data are centered. Bad leverage points are those measurements in the range of R>3 or R