Stochastic Hydrology

November 26, 2022 | Author: Anonymous | Category: N/A
Share Embed Donate


Short Description

Download Stochastic Hydrology...

Description

 

e-books download weblog: http://www.tooraj-sabzevari.blogfa.com/ water engineering weblog

GEO4-4420

Stochastic Hydrology

Prof. dr. Marc F.P. Bierkens Department of Physical Geography Utrecht University

 

2

 

Contents

1.

Introduction

5

2.

Descriptive statistics

15

3.

Probablity and random variables

27

4.

Hydrological statistics and extremes

51

5.

Random functions

71

6.

Time series analysis (by Martin Knotters, Alterra)

97

7.

Geostatistics

117

8.

Forward stochastic modelling

147

9.

State estimation and data-assimilation (handouts)

183

References

185

3

 

4

 

Chapte Cha pterr 1: Int Introd roduct uction ion 1.1 

Why stochastic hydrology?

The term “stochastic” derives from the Greek word στοχαστιχηζ which is translated with “a person who forecasts a future event in the sense of aiming at the truth”, that is a seer or soothsayer. So, “stochastic” refers to predicting the future. In the modern sense “stochastic” in stochastic methods refers to the random element incorporated in these methods. Stochastic methods thus aim at predicting the value of some variable at nonobserved times or at non-observed locations, while also stating how uncertain we are when making these predictions. But why should we care so much about the uncertainty associated with our predictions? The following example (Figure 1.1) shows a time series of observed water table elevations in a piezometer and the outcome of a groundwater model at this location. Also  plotted are the differences (residuals) between the data and the model results. We can observe two features. First, the model time series seems to vary more smoothly then the observations. Secondly, there are noisy differences between model results and observations. These differences, which are called residuals, have among others the following causes: •  observation errors. errors. Is it rarely possible to observe a hydrological variable without error. Often, external factors influence an observation, such as temperature and air   pressure variations during observation of water levels; •  errors in boundary conditions, initial conditions and input . Hydrological models only describe part of reality, for example groundwater flow in a limited region. At the  boundaries of the model values of the hydrological variables (such groundwater heads or fluxes) have to be prescribed. These boundary values cannot be observed everywhere, so there is likely to be some error involved. Also, if a model describes the variation of a hydrological system in time, then the hydrological variables at time step zero must be known as it determines how the system will be evolve in later time steps. Again, the initial values of all the hydrological variables at all locations are not exactly known and are estimated with error. Finally, hydrological models are driven  by inputs such as rainfall and evaporation. Observing rainfall and evaporation for  larger areas is very cumbersome and will usually u sually be done with considerable eerror; rror; •  unknown heterogeneity and parameters. parameters. Properties of the land surface and subsurface are highly heterogeneous. Parameters of hydrological systems such as surface roughness, hydraulic conductivity and vegetation properties are therefore highly variable in space and often also in time. Even if we were able to observe these  parameters without error, we cannot possibly measure them everywhere. In many hydrological models parameters are assumed homogeneous, i.e. represented by a single value for the entire (or part of the) model region. Even if models take account of the heterogeneity of parameters, this heterogeneity is usually represented by some interpolated map from a few locations where the parameters have been observed. Obviously, these imperfect representations of parameters lead to errors in model results;

5

 

• 

 scale discrepancy. Many Many  hydrological models consist of numerical approximations of  solutions to partial differential equations using either finite element or finite difference methods. Output of these models can at best be interpreted as average values for elements or model blocks. The outputs thus ignore the within element or  within block variation of hydrological variables. So, when compared to observations that represent averages for much smaller volumes (virtually points), there is discrepancy in scale that will yield differences between observations and model outcomes; •  model or system errors.  errors.  All models are simplified versions of reality. They cannot contain all the intricate mechanisms and interactions that operate in natural systems. For instance, saturated groundwater flow is described by Darcy’s Law, while in reality it is not valid in case of strongly varying velocities, in areas of partly nonlaminar flow (e.g. faults) or in areas of very low permeability and high concentrations of solvents. Another example is when a surface water model uses a kinematic wave approximation of surface water flow, while in reality subtle slope gradients in surface water levels dominate the flow. In such cases, the physics of reality differ from that of  the model. This will cause an additional error in model results. In conclusion, apart from the observation errors, the discrepancy between observations and model outcomes are caused by various error sources in our modelling process. 40    ) 20   m   c    ( 0   s    l   a   u    d    i -20   s   e   r

Groundwater model Groundwater Observations Residuals

-40 -60    )   e   c   a    f   r   u -80   s   m   c -100    (   e    l    b -120   a    t   r   e    t   a -140    W -160 0

200

400

600

800

1000

1200

1400

1600

1800

2000

Day number (day 1 is January 1 1985)

 Figure 1.1 Observed water table depths and water table depths predicted with a groundwater model at the  same location. Also Also shown are the residuals: the differences between between model outc outcome ome and observations. observations.

There are two distinct ways of dealing dea ling with errors in hydrological model outcomes:

6

 

 Deterministic hydrology. In deterministic hydrology one is usually aware of these errors. They are taken into account, often in a primitive way, during calibration of models. During this phase of the modelling process one tries to find the parameter values of the model (e.g. surface roughness or hydraulic conductivity) such that the magnitude of the residuals is minimized. After calibration of the model, the errors are not explicitly taken into account while performing further calculations with the model. Errors in model outcomes are thus ignored. Stochastic Hydrology. Hydrology. Stochastic hydrology not only tries to use models for predicting hydrological variables, but also tries to quantify the errors in model outcomes. Of course, in practice we do not know the exact values of the errors of our model predictions; if we knew them, we could correct our model model outcomes for them and be totally accurate. What we often do know, usually from the few measurements that we did take, is some  probability distribution of the errors. We will define the probability distribution more  precisely in the next chapters. Here it suffices to know that a probability distribution tells one how likely it is that an error has a certain value. To make this difference more clear, Figure 1.2 is shown. Consider some hydrological variable z  variable  z , say soil moisture content, whose value is calculated (at some location and at ( . We then some time) by a unsaturated zone model. The model output is denoted as  z   ( consider the error e =  z  − z . Because we do not know it exactly we consider it as a so called random variable  variable  (chapter 3)  E   (notice the use of capitals for random variables) whose exact value we do not know but of which we do know the probability distribution. ( So in case of deterministic hydrology modelling efforts would only yield  z  (upper figure ( of Figure 1.2a), while stochastic hydrology would yield both  z and the probability distribution of the (random) error E  error E  (lower  (lower figure of Figure 1.2a).

(

a

 z  0.1 0.2 0.3 0.4 0.5 0.6

 z 

  y    t    i   s   n   e    d   y    t    i    l    i    b   a    b   o   r    P

b

  y    t    i   s   n   e    d   y    t    i    l    i    b   a    b   o   r    P

 z  ˆ

0.1 0.2 0.3 0.4 0.5 0.6

 z  -0.2 -0.1 -0.0 0.1 0.2 0.3

e  Figure 1.2 Stochastic Stochastic Hydrology is about combini combining ng deterministi deterministicc model outcomes with a probability probability distribution of the errors (Figure 1.2a), or alternatively, considering the hydrological variable as random and determining its probability distribution and some “best prediction”(Figure 1.2b).

7

 

Most of the methods used in stochastic hydrology do not consider errors in model outcomes explicitly. Instead it is assumed that the hydrological variable  z   itself is a random variable  Z . This means that we consider the hydrological variable (e.g. soil moisture) as one for which we cannot know the exact value, but for which we can calculate the probability distribution (see Figure 1.2b). The probability distribution of  Figure 1.2b thus tells us that although we do not know the soil moisture content exactly, we do know that it is more likely to be around 0.3 then around 0.2 or 0.5. Models that  provide probability distributions of target variables instead of single values are called  stochastic models. models . Based on the probability distribution it is usually possible to obtain a ˆ , which is the one for which the errors are smallest on average. so called best prediction  prediction  z  Incidentally, the value of the best prediction does not have to be the same as the ( deterministic model outcome  z . Box 1. Stochastic models and physics A widespread misconception about deterministic and stochastic models is that the former  use physical laws (such mass and momentum conservation), while the latter are largely empirical and based entirely on data-analysis. This of course is not true. Deterministic models can be either physically based (e.g. a model based on Saint-Venant equations for 

flood routing) and empirical a rating used asany a deterministic model for   predicting sediment loads from(e.g. water levels).curve Conversely, physically based model  becomes a stochastic model once its inputs, parameters or outputs are treated as random. There are a number of clear advantages in taking the uncertainty in model results into account, i.e. using stochastic instead of deterministic models. •  The example of Figure 1.1 shows that model outcomes often give a much smoother   picture of reality. This is because models are often based on an idealized representation of reality with simple processes and homogenous parameters. However, reality is usually messy and rugged. This may be a problem when interest is focussed on extreme values: deterministic models typically underestimate the  probability of occurrence of extremes, which is rather unfortunate when predicting for  instance river stages for dam building. Stochastic models can be used with a technique called “stochastic simulation” (see chapters hereafter) which is able to  produce images of reality that are rugged enough to get the extreme statistics right. ˆ does not have to be the same as the •  As stated above, the value of the(best prediction  z  deterministic model outcome  z . This is particularly the case when the relation  between model input (e.g. rainfall, evaporation) or model parameters (e.g. hydraulic conductivity, manning coefficient) and model output is non-linear (this is the case in almost all hydrological models) and our deterministic assessment of model inputs and  parameters is not error free (also almost always the case). In this case, stochastic models are able to provide the best prediction using the probability distribution of  model outcomes, while deterministic models cannot and are therefore less accurate. •  If we look closely at the residuals in Figure 1 it can be seen that they are correlated in time: a positive residual is more likely to be followed by another positive residual and vice versa. This correlation, if significant, means that there is still some information  present in the residual time series. This information can be used to improve model

8

 

 predictions between observation times, for instance by using time series modelling (chapter 5) or geostatistics (chapter 6). This will yield better predictions than the deterministic model alone. Also, it turns out that if the residuals are correlated, calibration of deterministic models (which assume no correlation between residuals) yield less accurate or even biased (with systematic errors) calibration results when compared with stochastic models that do take account of the correlation of residuals (te Stroet, 1995). •  By explicitly accounting for the uncertainty in our prediction we may in fact be able to make better decisions. A classical example is remediation of polluted soil, where stochastic methods can be used to estimate the probability distribution of pollutant concentration at some non-visited location. Given a critical threshold above which regulation states that remediation is necessary, it is possible to calculate the  probability of a false positive decision (we decide to remediate, while in reality the concentration is below the threshold) and that of a false negative (we decide not to remediate while in reality the concentration is above the threshold). Given these  probabilities and the associated costs (of remediation and health risk) it is then  possible for each location to decide whether to remediate such that tha t the total costs and health risk are minimised. •  There are abundant stochastic methods where a relation is established between the uncertainty in parameterize model outcomes and thethe number in exists, time and space used to either or calibrate model.ofIf observations such a relation it can be used for monitoring network design. For example, in groundwater exploration wells are drilled to perform pumping tests for the estimation of transmissivities and to observe hydraulic heads. The transmissivity observations can be used to make an initial map of transmissivity used in the groundwater model. This initial map can subsequently be updated by calibrating the groundwater model to head observations in the wells. Certain stochastic methods are able to quantify the uncertainty in groundwater head predicted by the model in relation to the number of wells drilled, their location and how often they have been observed. These stochastic methods can therefore be used to perform monitoring network optimization: finding the optimal well locations and observation times to minimise uncertainty in model predictions. •  The last reason why stochastic methods are advantageous over deterministic methods is related to the previous one. Stochastic methods enable us to relate the uncertainty in model outcomes to different sources of uncertainty (errors) in input variables,  parameters and boundary conditions. Therefore, using stochastic analysis we also know which (error) source contributes the most to the uncertainty in model outcomes, which source comes second etc. If our resources are limited, stochastic hydrology thus can guide us where to spend our money (how many observations for which variable or parameter) to achieve maximum uncertainty reduction at minimum cost. An excellent book on this view on uncertainty is written by Heuvelink (1998).

1. 1.2 2

Sc Scop opee and and cont conten entt of of tthe hese se lect lectur uree not notes es

These notes aim at presenting an overview of the field of stochastic hydrology at an introductory level. This means that a wide range of topics and methods will be treated,

9

 

while each topic and method is only treated at a basic level. So, the book is meant as an introduction to the field while showing its breadth, rather than providing an in depth treatise. References are given to more advanced texts and papers for each subject. The  book thus aims at teaching the basics to hydrologists who are seeking to apply stochastic methods. It can be used for a one-semester course at third year undergraduate or first year  graduate level. The lecture notes treat basic topics that should be the core of any course on stochastic hydrology. These topics are: descriptive statistics; probability and random variables; hydrological statistics and extremes; random functions; time series analysis; geostatistics; forward stochastic modelling; state prediction and data-assimilation. A number of more advanced topics that could constitute enough material for a second course are not treated. These are, among others: sampling and monitoring; inverse estimation; ordinary stochastic differential equations; point processes; upscaling and downscaling methods, uncertainty and decision making. During the course these advanced topics will be shortly introduced during the lectures. Students are required to study one of these topics from exemplary papers and write a research proposal about it.

1.3 1.3

So Some me use usefu full defi defini niti tion onss for for the the foll follow owin ing g chap chapte ters rs

1.3.1 1.3 .1

Descri Descripti ption on of a model model accor accordin ding g to syste system’s m’s theor theoryy

Many methods in stochastic hydrology are best understood by looking at a hydrological model from the viewpoint of system’s theory. What follows here is how a model is defined in system’s theory, as well as definitions for state variables, input variables,  parameters and constants.

input variables

state variables

output variables

 parameters constants

model boundary  Figure 1.3 Model and model properties properties according to system’s theory

Figure 1.3 shows a schematic representation of a model as used in system’s theory. A model  is   is a simplified representation of part of reality. The model boundary separates boundary separates the  part of reality described by the model from the rest of o f reality. Everything that is to know about the part of reality described by the model at a certain time is contained in the  state variables.. These are variables because their values can change both in space and time. variables The variation of the state variables is caused by the variation of one or more input  variables.. Input variables are always observed and originate from outside the model variables

10

 

 boundary. Consequently, input variables also include boundary conditions and initial conditions such as used when solving differential equations. If the state variables are known, one or more output variables can be calculated. An output variable traverses the model boundary and thus influences the part of reality not described by the model. Both input variables and output variables can change in space and time. The state variables are related to the input variables and output variables through  parameters  parameters.. Parameters may change in space but are invariant in time. Because they are constant in time, parameters represent the intrinsic properties of the model. Finally, a model may have one or more constants.. Constants are properties of a model that do not change in both space and time constants (within the confines of the model). Examples of such constants are the gravity constant and the viscosity of water in density independent groundwater flow at a constant temperature.

 p((t )  p

v(t )

A k  r  q(t )

 Figure 1.4  1.4  Illustration of model properties following system’s theory with a model of a catchment; v(t):  state variable, storage surface water in catchment [L3 ]; q(t): output variable, surface runoff from catchment [L3T -1 ]; p(t): input variable, precipitation [LT -1 ]; k : parameter, reservoir constant [T -1 ]; r :  parameter, infiltration infiltration capacity [LT -1 ]; A: constant, constant, area of the catchment catchment [L2 ].

Because the description above is rather abstract, we will try to illustrate it with the example shown in Figure 1.4. We consider a model describing the discharge from surface 3

-1

-1

runoff q [L T ] from a catchment caused by the average precipitation  p [LT  p [LT ] observed as averages over discrete time steps ∆t , i.e. q(t ) and p and  p((t ) represent the average discharge and precipitation between t -∆t   and t . The model boundary is formed by geographical  boundaries such as the catchment boundary (i.e. the divide) d ivide) on the sides, the catchment’s surface below and a few meters above the catchment’s surface above, and also by the virtual boundary with everything that is not described by the model such as groundwater  flow, soil moisture, chemical transport etc. Obviously, precipitation is the input variable and surface runoff the output variable. The state variable of this model is the amount of  3 water stored on the catchment’s surface: v  [L ]. The state variable is modelled with the following water balance equation: v (t )

= v(t − 1) + {A  ⋅ [ p (t ) − r ] + − q (t )}∆t 

(1.1)

11

 

-1

where r  [LT  [LT ] is the infiltration capacity. The superscript + is added to [ p(  p(t ))-r  -r ] to denote that if p if  p((t ) < r  we   we have [  p( p(t )-r  )-r ] = 0. The output variable q is related to the state variable v at the previous time step with the following equation: q(t  ) = kv(t )

(1.2)

Through substitution of (1.2) into (1.1) we can calculate the development in time of the state variable directly from the input variable as: v (t )

= [1 − k ∆t ] ⋅ v (t −  1 ) + A ⋅ [ p (t ) − r ] + ∆t 

(1.3) -1

Two model parameters can be distinguished: the infiltration capacity of the soil r   [LT ] which relates the input variable with the state variable and the catchment parameter k  -1 2  [T ] relating the output variable to the state variable. The constant A [L ] is the area of  the catchment.

1.3.2

Notation

The concept of random variables random functions will be conventions explained inbriefly detail in following chapters. However, it isand useful to define the notation in the the  beginning. Readers can thus refer back to this subsection while studying the rest of this  book. Constants are denoted in roman, e.g. the constant g for gravity acceleration, or A for the area. Variables and parameters are denoted in italics: e.g. h  for hydraulic head and k   for  hydraulic conductivity. The distinction between deterministic and random (stochastic) variables is made by denoting the latter as capital italics. So, h  stands for the deterministic groundwater head (assumed completely known) and  H   for groundwater  head as a random variable. Vectors and matrices are given in bold face notation. Vectors are denoted as lower case, e.g. h  a vector of groundwater heads at the nodes of a finite difference model, while matrices are denoted as capitals, such as K   for a tensor with conductivities in various directions. Unfortunately, it is difficult to make a distinction between stochastic and deterministic vectors and matrices. Therefore, if not clear from the context, it will be indicated explicitly in the text whether a vector or matrix is stochastic or not. Spatial co-ordinates (  x, x,  y, y,  z  z ) are denoted with the space vector x, while t   is reserved for  time. Discrete points in space and time are denoted as  x i  and t k  k   respectively. Random functions of space, time and space-time are thus denoted as (example with  H ))::  H (x),  H (t )), H  , H (x,t ))..

12

 

(

Outcomes from a deterministic model are denoted as (example with h): h . Optimal estimates of deterministic parameters, constants or variables are denoted with a hat ˆ , while optimal predictions of realisations of random variable (example with k ): ): k  ˆ . Note that the term estimate denoted by  K  estimate   is reserved for deterministic variables and  prediction for  prediction  for random (stochastic) variables. To denote a spatial or temporal or spatio-temporal average of a function an overbar is ˆ ( stands used, e.g. h  if hydraulic head is deterministic and  H  if it is stochastic. So,  H  x) for the prediction of the spatial average of the random function function H   H (x).

13

 

14

 

Chapter Chap ter 2: Descr Descripti iptive ve stati statistics stics In this chapter and further on in this book we make use of a synthetic but extremely illustrative data set (Walker lake data set) that has been constructed by Journel and 1 Deutsch (1998) . The data set is used to show how some simple statistics can be calculated.

2.1

Univariate statistics

Let us assume that we have made 140 observations of some hydrological variable z (e.g. hydraulic conductivity in m/d). Figure 2.1 shows a plot of the the sample locations with the grey scale of the dots according to the value of the observation.

 Figure 2.1 Samples Samples of hydraulic conductivity z 

To obtain insight into our dataset it is good practice to make a histogram histogram.. To this end we divide the range of value found into a number (say m) of classes  z 1- z   z 2,  z 2- z   z 3,  z 3- z   z 4, …,  z m-1-  z  z m  and counts the number of data values falling into each class. The number of  observations falling into a class divided by the total number of observations is called the (relative)  frequency  frequency.. Figure 2.2 shows the histogram or  frequency distribution  distribution  of the  z   1

 All of the larger numerical examples shown in this chapter are based on the Walker-lake data set. The geostatistical analyses and the plots are performed using the GSLIB geostatistical software of Deutsch and Journel (1998).

15

 

data. From the histogram we can see how the observations are distributed over the range of values. For instance, we can see that approximately 33% of our data has a value of  hydraulic conductivity between 0-1 m/d.

 Figure 2.2 Histogram Histogram or frequency frequency distribution of hydraulic conduct conductivity ivity z 

Another way of representing the distribution of data values is by using the cumulative  frequency distribution. Here distribution. Here we first sort the data in ascending order. Next data are given a rank number i, i= i=1 1 ,..,n  ,..,n,, with n the total number of observations (in our case 140). After  that, the data values are plotted against the rank number divided by the total number of  observations plus on: i/(n /(n+1). Figure 2.3 shows the cumulative frequency distribution of  the hydraulic conductivity data.

16

 

 Figure 2.3 Cumulative Cumulative frequency distribution of hydraulic conductivi conductivity ty

The cumulative frequency distribution shows us the percentage of data with values smaller than a given threshold. For instance, from 2.3 we see that 64% of the observations has a value smaller than 5 m/d. Note, that if the 140 samples were taken in such a way that they are representative of the area (e.g. by random sampling) that the cumulative frequency distribution provides an estimate of the fraction of the research area with values smaller or equal to a certain value. This may for instance be relevant when mapping pollution. The cumulative frequency distribution then provides immediately an estimate of the fraction of a terrain with concentrations above critical thresholds, i.e. the fraction that should be remediated. To make a continuous curve the values between the data points have been linearly interpolated. Figure 2.4 shows shows the relation relation between the histogram and the the cumulative frequency distribution. It shows that once the cumulative frequency distribution function is constructed from the data (5 data values for this simple example) it can be used to construct a histogram by “differentiation”.

17

 

Values: 10 7 9 8 15 : Rank i: 4 1 3 2 5

n=5

derived deri ved hist histogra ogram m

i n +1 1

1 d3 d2 d3

d2 d1

0

d1

0

5

10

0

15

0

5

10

15

 Figure 2.4 The The relation betwee between n the Cumulati Cumulative ve frequency distr distribution ibution (left) and the histogram

To describe the form of frequency distribution a number of measures are usually calculated. The mean m is m is the average value of the data and is a measure of locality locality,, i.e. the centre of  mass of the histogram. With n the number data and z  and  z i the value of the ith observation we have m z 

=

1 n

n

∑ z 

(2.1)

i

i =1

2

The variance s z  is a measure of the spread the  spread of the data and is calculated as: 2  z 

 s

=

1

n

∑ ( z  − m ) i

n

i =1

 x

2

=

1

n

∑ z  i

n

2

− m z 2

(2.2)

i =1

The larger the variance the wider is the frequency distribution. For instance in Figure 2.5 two histograms are shown with the same mean value but with a different variance.

18

 

large variance

small variance

 z 

 z 

mean

mean

 Figure 2.5 Two Two histograms of datasets with with the same m mean ean value but wit with h different varianc variances es

Standard deviation The standard deviation is also a measure of spread and has the advantage that is has the same units as the original variable. It is calculated as the square-root of the variance:

 z 

 s

=

n

1

2  z 

=

 s

n

2

∑ ( z  − m ) i

i =1

 x

(2.3)

Coefficient of variation To obtain a measure of spread that is relative to the magnitude of the variable considered the coefficient of variation is often used: CV  z   =

 s z 

(2.4)

m z 

 Note that this measure only makes sense for variables with strictly strictly positive values (e.g. hydraulic conductivity, soil moisture content, discharge). Skewness The skewness of the frequency distribution tells us whether it is symmetrical around its central value or whether it is asymmetrical with a longer tail to the left (0) 1

n

∑ ( z  − m ) n i

CS  z  =

3

 z 

i =1

3

 s z 

(2.5)

Figure 2.6 shows two histograms with the same variance, where one is negatively and one is positively skewed.

19

 

Skewn Sk ewnes esss < 0

Skewne Ske wness ss > 0

 z 

 z 

 Figure 2.6 Two Two frequency dist distributions ributions with the same variances but with diff different erent coefficie coefficients nts of skewness.

Curtosis The curtosis measures the “peakedness” of the frequency distribution (see Figure 2.7) and is calculated from the data as: 1

n

∑ ( z  − m ) n i

CC  z 

=

4

 z 

i =1

4

−3

(2.6)

 s z 

Curto Cu rtosis sis < 0

Curto urtosi siss > 0

 z 

 z 

 Figure 2.7 Frequency Frequency distributions distributions with posit positive ive and negative curtosis curtosis

Figure 2.8 shows some additional measures of locality and spread for the cumulative frequency distribution function.

20

 

Percentiles

Interquartile Interquarti le range: Q3-Q1

1.00 0.90

  y   c   n 0.75   e   u   q   r   e    f   e 0.50   v    i    t   a    l   u   m 0.25   u    C

75-percentile third quartile: Q3

0.00

z

25-percentile

50-percentile

first quartile: Q1

mediaan

90-percentile

Second quartile: Q2  Figure 2.9 Some additional measure measuress of locality and spread based on tthe he cumulative dis distribution tribution function.

The  f-percentile (or f/100-quantile) of The f-percentile f/100-quantile) of a frequency distribution is the value that is larger  than or equal to f to f percent  percent of the data values. The 50-percentile (or 0.5-quantile) is also called the median median.. It is often used as an alternative measure of locality to the mean in case the frequency distribution is positively skewed. The mean is not a very robust measure in that case as it is very sensitive to the largest (or smallest) values in the dataset. The 25-percentile, 50-percentile and 75-percentile are denoted as the first, second and third quartiles quartiles of  of the frequency distribution: Q1, Q2, Q3 respectively. The interquartile range  range  Q3-Q1 is an alternative measure of spread to the variance that is  preferably used in case of skewed distributions. The reason is that the variance, like the mean, is very sensitive to the largest (or smallest) values in the dataset. An efficient way of displaying locality and spread statistics of a frequency distribution is making a Box-and-whisker plot. Figure 2.10 shows an example. The width of the box  provides the interquartile range, its sides the first and third quartile. The line in the middle represents the median and the cross the mean. The whiskers length’s are equal to the minimum and the maximum value (circles) as long as these extremes are within 1.5 times the interquartile range (e.g. lower whisker in Figure 2.10), otherwise the whisker is set equal to 1.5 times the interquartile range (e.g. upper whisker in Figure 2.10). Observations lying outside 1.5 times the interquartile range are often identified as outliers. Box-and-whisker plots are a convenient way of viewing statistical properties,

21

 

especially when comparing multiple groups or classes (see Figure 2.11 for an example of  observations of hydraulic conductivity for various texture classes). mean Minimum value

Maximum value

lower whisker 

upper whisker  Q1

median

Q3

 Figure 2.10 Components Components of a box-and-whisker box-and-whisker plot 

 Figure 2.11 Box-and-whisker Box-and-whisker plots are a convenient way to compare the statistical properties of multiple  groups or classes (from Bierkens Bierkens,, 1996)

2.2 

Bivariate statistics

Up to know we have considered statistical properties of a single variable: univariate statistical properties. In this section statistics of two variables are considered, i.e. bivariate statistics. bivariate  statistics. In case we are dealing with two variables measured simultaneously at a single location or at a single time, additional statistics can be obtained that measure the degree of co-variation of the two data sets, i.e. the degree to which high values of one variable are related with high (or low) values of the other variable. Covariance The covariance measures linear co-variation of two datasets of variables  z   and  y.  y. It is calculated from the data as:

22

 

C  zy

=

1 n

n

∑ ( z  − m  )( y i

 z 

i

− m y ) =

i =1

1 n

n

∑ z  y i

i

− m z m y

(2.7)

i =1

Correlation coefficient  The covariance depends on the actual values of the variables. The correlation coefficient  provides a measure of linear co-variation that is normalized with respect to the magnitudes of the variables z  variables z  and y  and y::

1 n r  zy

=

C  zy  s z  s y

=

ni

∑ ( z i − m z )( yi − m y ) =1

1 n

∑ ( z i − m z ) 2 ni =1

1 n

(2.8)

∑ ( yi − m y ) 2 ni =1

A convenient way of calculating the correlation coefficient is as follows:

n r  zy

∑ n

 z i y i

i =1

= n

n



 z i2

i =1

    −  ∑ z i   i =1   n

− ∑ z i ∑ yi n

n

i =1

i =1

2

n

n



 y i2

i =1

    −  ∑ yi   i =1   n

2

(2.9)

So, one calculates Σ z i , Σ y i , Σ z i2 , Σ y i2 and Σ z i y i  and evaluates (2.9). Figure 2.12 shows a so called scatterplot between the zthe z-values values observed at the 140 locations of Figure 2.1 and the y the  y-values -values also observed there (e.g. z could for instance be hydraulic conductivity and y sand fraction in %). The correlation coefficient between the z  the z - and y and y-values -values equals 0.57. Figure 2.13 shows examples of various degrees deg rees of correlation between two variables, including negative correlation (large values of one exist together with small values of the other). Beware that the correlation coefficient only measures the degree of linear  co covariation (i.e. linear dependence) between two variables. This can also be seen in Figure 2.13 (lower right figure), where obviously there is strong dependence between z and y, although the correlation coefficient is zero.

23

 

25

20

15

  e   u    l   a   v     y 10

5

0 0

5

10

15

20

25

z-value

 Figure 2.12 Scatter Scatter plot of z- and y-data showing showing covariation. The The correlation ccoefficient oefficient equals 0.57 

z

  ρ YZ = 1

z

0 0

 2 −( h2 / a 2 )

C  z (h)  = σ  Z e Spherical covariance

 2  3  h  1  h  3  σ  1 −   +    C  z ( h) =   Z   2  a  2  a    0   h ≥ 0,  a > 0

Hole effect (wave) model White noise model

h ≥ 0, a > 0

C  z (h)  = σ  Z 2

*

b sin(h / a )

1  ρ  z ( h) =  0

h

h ≥ 0, a

if h < a if  h ≥ a

>0

h=0 h>0

* The white noise process has infinite variance, so strictly speaking it is not wide sense stationary. Here, we thus only provide the correlation function that does does exist.  exist.

5.3.4 5.3. 4

Rela Relation tionss betwe between en vario various us fform ormss of stationa stationarity rity

A strict sense stationary random function is also second order stationary and is also wide sense stationary, but not necessarily the other way around. However, if a random function is wide sense stationary and its multivariate pdf is a Gaussian distribution (Equation 3 3.87), it is also second order stationary   and also a strict sense stationary random function. More important, a wide sense stationary random function that is multivariate Gaussian (and thus also strict stationary) is completely characterised by only a few statistics: a constant mean µ  Z (t ) = µ Z  and a covariance function C  Z (t 2-t 1) that is only dependent on the separation distance. So to recapitulate (arrow means “implies”) “implies”) In general: Type of stationarity: Property:

Strict sense Multivariate pdf   translation invariant

Second order Bivariate pdf  translation invariant

Wide sense Mean and variance translation invariant

  3

 Often in the literature the term “second order stationary” is used when in fact one means “wide sense stationary”.

80

 

If multivariate pdf is Gaussian: Type of stationarity: Property:

5.3. 5.3.5 5

Wide sense Mean and variance translation invariant

Second order Bivariate pdf  translation invariant

Strict sense Multivariate pdf  translation invariant

Intr Intrin insi sicc rando random m funct functio ions ns

An even milder form of stationary random functions are intrinsic intrinsic random  random functions. For  an intrinsic random function we require (we show the spatial form here):  E [ Z ( x ) − Z ( x  )] = 0 2

1

 E [ Z ( x 2 ) − Z    ( x1 )]

2

∀x

,x 1

(5.15) 2

=  2γ  Z  (x 2 − x1 ) ∀x1 , x 2

(5.16)

So the mean is constant and the expected quadratic difference is only a function of the the  semivariogram and  and is defined lag-vector h = x2-x1. The function γ (x 2 - x1 ) is called the semivariogram as: γ  Z  (x1 , x 2 ) = γ  Z  ( x 2 −   x1 ) = 12 E[ Z (x 2 ) − Z (x1 )]2 (5.17) The semivariogram can be estimated from observations as (similarly in the temporal domain): 1 n (h ) 2 (5.18) γ ˆ Z  (h) = { z ( x i ) − z (x i + h ± ∆h)} 2n(h) i =1



Table 5.2 shows examples of continuous semivariogram models that can be fitted to estimated semivariograms. Table 5.2 A number of possible semivariance models for intrinsic random functions   −( h / a ) Exponential model

γ  z ( h) = c[1 − e

] h ≥ 0; a, c > 0

Gaussian model

  −( h 2 / a 2 ) ] γ  z  (h) = c[1 − e

Spherical model

  3  h  1  h  3  c    +    if h < a γ  z ( h) =   2  a  2  a     c if  h ≥ a  h ≥ 0; a , c > 0  b sin(h  / a)  h ≥ 0; a, c > 0 γ  z ( h) = c 1 −  h  0 h = 0 γ  z ( h) =    c>0 > 0 c h  γ  Z  (h) = ah b h ≥ 0; a > 0; 0 ≤ b ≤ 2

Hole effect (wave) model Pure nugget model

Power model

h ≥ 0; a > 0

81

 

The semivariogram  and the covariance function of a wide sense stationary random function are related as follows: γ  Z  (x 2

− x1 ) = σ   Z 2 − C  Z  (x 2 − x1 )

(5.19)

This means that the semivariogram and the covariance function are mirror images with c =  σ  Z 2 . This can be seen in Figure 5.7. This also means that where the covariance function becomes zero for large enough separation distances, the semivariogram will reach a plateau (called the  sill   of the semivariogram) that is equal to the variance. The distance at which this occurs (called the range range   of the semivariogram) is the distance  beyond which values on the random function are no longer correlated. The first five models of table 5.2 are semivariogram models that imply wide sense stationary functions whith c =  σ  Z 2 .   For the sixth model, the power model, this is not the case. Here, the

variance does not have to be finite, while the semivariance keeps on growing with increasing lag. This shows that if a random function is wide sense stationary, it is also intrinsic. However, an intrinsic random function does not have to be wide sense stationary, i.e. if the semivariogram does not reach a sill.

σ  Z 2

γ  Z  ( h) C  Z  ( h)

sill

h=|x2- x1|

range

 Figure 5.7. Covariance Covariance function and semivariogram for a wide se sense nse stationary rand random om function

5.3.6 5.3 .6

Integr Integral al scal scalee and sca scale le of fluc fluctua tuatio tion n

The integral scale  scale  or correlation scale is a measure of the degree of correlation for  stationary random processes and is defined as the area under the correlation function. ∞

 I  Z ( t )

=

∫ ρ (τ )  d τ 

(5.20)

0

For the exponential, Gaussian and spherical correlation functions the integral scales are equal to a, ( a  π ) / 2 and (3/8)a (3/8)a respectively. Given that the correlation functions of wide sense stationary processes are even functions, i.e.  ρ (τ ) =     ρ (−τ ) , another measure of  correlation is the scale the scale of fluctuation defined fluctuation defined as: 82

 



θ  =

∫ ρ (τ )d τ   = 2 I 

(5.21)

 Z ( t )

−∞

For a 2D and 3D random space function the integral scale is defined as:

 I  Z ( x )

5.4

 4 ∞ ∞ =  ∫ ∫ ρ (h1 , h2 )dh1  dh2    π  0 0

1/ 2

Conditio ition nal ran random fu fun nctio ction ns

; I  Z ( x )

6 ∞ ∞  =  ∫ ∫ ρ (h1 , h2 , h3 )dh1 dh2 dh3   π  0 0 

1/ 3

(5.22)

In this section we investigate what happens if observations are done on a random function. Suppose that we have a stationary random function in time that is observed at a number of locations. Suppose for the moment that these observations are without error. Figure 5.8 shows a number of realisations of a continuous time random function that is observed at four locations without error. It can be seen that the realisations are free to vary and differ between the observation points but are constraint, i.e. conditioned , by these points. This can be seen when comparing the pdfs at two locations t 1 and t 22..  It can  be seen that uncertainty is larger further from an observation (t 1) than close to an observation (t  (t 2). This is intuitively correct because an observation is able to reduce uncertainty for a limited interval proportional to the integral scale of the random function. At a distance larger than the range, the random values are no longer correlated with the random variable at the observation location and the uncertainty is as large (the variance of the pdf is a large) as that of the random function without observations.

 Z (t )

t 1

t 2



 Figure 5.8. Realisations Realisations of a random function that is conditional to a number of observations; dashed line is the conditional mean.

83

 

The random function that is observed at a number of locations and/or times is called a conditional random function and the probability distributions at locations t 1  and t 2 conditional probability density functions (cpdfs):  f  Z  ( z ; t 1 |  y1 ,.., y m ) ,  f  Z  ( z ; t 2 |  y1 ,.., y m ) . Where  y1,…, ,…,   ym  are the observations. The complete conditional random function is defined by the conditional multivariate pdf:  f ( z 1 ,z 2 ,…. z   N  ; t 1 ,t 2 ,…. t  N  |  y1 ,.., y m ). The conditional multivariate pdf can in theory be derived from the (unconditional) multivariate pdf using Bayes’ rule. However, this is usually very cumbersome. An alternative way of obtaining all the required statistics of the conditional random function is called stochastic simulation. In chapters 7 and 8 some methods are presented for  simulating realisations of both unconditional and conditional random functions. The conditional distribution of Z  of  Z (  ss1,t )),, or its mean value (see dashed line) and variance, can also be obtained directly through geostatistical prediction or kriging (chapter 7) and

state-space prediction methods (chapter 8). These methods use the observations and the statistics (e.g. semivariogram or covariance function) of the random function (statistics estimated from the observations) to directly estimate the conditional distribution or its mean and variance.

5.5 

Spectral representation of random functions

The correlation function of the time series of water table depth at De Bilt in Figure 5.6 has only been analysed for two years. Had we analysed a longer period we would have seen a 5.1 correaltion with periodic behaviour, such asofthe model of  in Tables and 5.2. function Figure 5.9 shows the correlation function the Hole-effect daily observations discharge of the Rhine river at Lobith. A clear periodic behaviour is observed as well. The periodic behaviour which is also apparent in the time series (see Figure 4.1) is caused  by the fact that evaporation which is driven by radiation and temperature has a clear  seasonal character in higher and lower latitudes and temperate climates. In arctic climates the temperature cycle and associated snow accumulation and melt cause seasonality, while in the sub-tropics and semi-arid climates the occurrence of rainfall is strongly seasonal. In conclusion, most hydrological time series show a seasonal variation. This means that to analyse these models with stationary random functions requires that this seasonality is removed (see for instance chapter 6). The occurrence of seasonality has also inspired the use of spectral methods in stochastic modelling, although it must be stressed that spectral methods are also very suitable for analysing stationary random functions.

84

 

1.0

0.8

0.6   n   o    i    t   a    l 0.4   e   r   r   o    C 0.2

0.0

-0.2 0

200

400

600

800

1000

Lag (days)  Figure 5.9 Correlation Correlation function of daily average averaged d discharge of the river Rhine at Lobith.

5.5. 5.5.1 1

Sp Spec ectr tral al den densi sity ty func functi tion on

We will therefore start by a spectral representation of a stationary random function  Z (t )).. Such a presentation means that the random function is expressed as a sum of its mean  Z  and 2 K   sinusoids with increasing frequencies, where each frequency has a random amplitude C kk   and random phase angle Φ k  k :  Z (t ) = µ  Z  +

 K 

∑ C  cos(ω  t + k 



Φ k  )

k = − K 

(5.23) with C k 

= C − k ,

Φk 

= Φ−k  and ω k  = ± 12 (2k − 1)∆ω 

Based on this representation it can be shown (see Vanmarcke pp. 84-86) that the following relations hold: ∞

C  Z  (τ ) =

∫ S  (ω ) cos(ωτ )d ω   Z 

(5.24)

−∞

S  Z  (ω ) =

1





C  Z  (τ   ) cos(ωτ ) d τ  2π  −∞

(5.25)

85

 

These relations are known as the Winer-Khinchine relations. The function S  Z ( ) is known as the spectral the  spectral density function density function of the random process and Equations (5.24) thus show that the covariance function is a Fourier transform of the spectrum and vice versa: they form a Fourier pair. The physical meaning of the spectral density function can best be understood by setting the lag τ  equal  equal to zero in  in (5.24). We then obtain: ∞

  C  Z  (0) = σ 

2  Z 

=

∫ S  (ω )d ω   Z 

(5.26)

−∞

It can be seen that the variance of the random function is equal to the integral over the spectral density. This means that the variance is a weighted sum of variance components,

where each component consists of a random harmonic function of a given frequency. The spectral density then represents the weight of each of the attributing random harmonics, i.e. the relative importance importance of each random harmonic in explaining the total variance of  the random function. It is easy to see the analogy with the electromagnetic spectrum where the total energy of electromagnetic radiation can be attributed to relative contributions from different wavelengths. In table 5.3 expressions are given of spectral density functions belonging to some of the covariance functions given in Table 5.1. Figure 5.10 (From Gelhar, 1993) shows typical realisations of the random functions involved, their correlation function and the associated spectrum. What can be seen from this is that the spectrum of white noise is a horizontal line, implying an infinite variance according to Equation (5.26). This shows that white noise is a mathematical construct, and not a feasible physical process: the area under the spectrum is a measure for the total energy of a process. This area is infinitely large, such that all the energy in the universe would not be sufficient to generate such a  process. In practice one often talks about wide band processes, where the spectrum has a wide band of frequencies, but encloses a finite area. Table 5.3 A number of possible covariance function and associated spectral density functions (τ = =|t  |t 2-t 1|) 2 Exponential  2 −(|t |/ a) aσ 

C  z (τ )  = σ  Z e

S  Z  (ω ) =

a>0 Random harmonic:

a2

 Z (t ) = a cos(ω 0 t  + φ )

C  (τ ) =  Z 

where a,

2 τ  ≥ 0, a , ω 0 > 0

0

are deterministic

S  Z  (ω ) =

cos(ω  τ ) 0

 Z  2

π (1 + a ω 2 ) a2

  δ (ω − ω 0 )

4

constants and φ  is random Hole effect (wave) model

 

C  z  (τ ) = σ  Z 2 (1− | t | / a )e a>0

White noise model

1 τ  = 0  ρ  z (τ ) =  0 τ  > 0

−(|τ |/ a )

S  Z  (ω ) =

π (1 + a 2ω 2 ) 2

S  Z  (   ) = c

86

 

a

a

a 3σ  Z 2 ω 2

ρ τ) =

c

 Figure 5.10 Schematic examples of covariance function-spectral density pairs (adapted from Gelhar, 1993).

The spectral density function is an even function  function  (as is the covariance function): S  Z ( ) = S  Z  ( − ) . This motivates the introduction of the one-sided   spectral density function G Z  ( ) = 2 S  Z   ( ),  become:

≥ 0.  

The Wiener-Khinchine relations relations then



C  Z  (τ ) =

∫G

(ω    ) cos(ωτ ) d ω 

(5.27)

∫ C   (τ ) cos(ωτ )d ω 

(5.28)

 Z 

0

G Z  (ω ) =

2 π 



 Z 

0

Sometimes it is convenient to work with the normalised spectral density functions, by dividing the spectra by the variance:  s Z  (ω )  =  S  Z  (ω ) / σ  Z 2   and  g  Z  (ω )  =  G Z  (ω ) / σ  Z 2 . For  instance, from (5.25) we can see that there is a relation between the normalised spectral

87

 

density function  s Z  ( )   and the scale of fluctuation. Setting obtain:  s Z  (0) =

5.5.2 5.5 .2

1



∫  ρ  Z  (τ )d ω  =

2π  −∞

= 0 in Equation (5.25) we

θ 

(5.29)

2π 

Form Formal al (comp (complex lex)) spec spectra trall repres represent entati ation on

Often a more formal definition of the spectral density is used in the literature based on the formulation in terms of complex calculus. Here the random function  Z (t ) is defined as the *

real part of a complex random function Z  (t ): ):  



 K     iω  t   Z (t ) = Re{ Z  (t )} = Re µ  Z  + ∑ X k e k     → Reµ  Z  + ∫ e iω t dX (ω )  k = − K    K →∞  −∞  *

(5.30)

with ω k  = 12 ( 2k − 1) ∆ω   the frequency, and  X k  k   a complex random number representing the amplitude. This equation entails that the complex random process is decomposed into a large number of complex harmonic functions e iω t  = cos(ω t ) + i sin(ω t ) with random complex amplitude, Given this representation it is possible derive the Wiener-Khinchine equations as (Vanmarcke, 1983, p. 88): C  Z  (τ ) =



∫ S   (ω   )eiωτ d ω 

(5.31)

 Z 

−∞



S  Z  (ω ) =

1   iωτ  C  Z   (τ ) d τ  2π  −∞



(5.32)

It can be shown (Vanmarcke, 1983, p.94, that Equations (5.31) and (5.32) are mathematically equivalent to Equations (5.24) and (5.25) respectively.

5.5.3  Estimating the spectral density function

For wide sense stationary random estimated covariance function as: functions the spectral density can be estimated from  M    ˆ ( k ∆τ ) cos(ω  k )  S  Z  (ω i ) =  λ 0 C     Z  (0) +  2∑ λ k C   Z  i 2π    k =1  

1

with

i

=  iπ  / M    , | i |= 1,.., M . The

(5.33)

weights λ k  are necessary to smooth the covariances

 before performing the transformation. This way a smoothed spectral density function is

88

 

obtained displaying only the relevant features. There are numerous types of smoothing weights. Two frequently used expressions are the Tukey window λ k 

k π   1   =   1 + cos ,  M   2  

k  = 0,1,...., M   

(5.34)

and the Parzen the Parzen window

   k   2   k   3 1 − 6  + 6    M      M   λ k  =     k   3

0 ≤ k  ≤ M  / 2 (5.35)

 2 −      M  

 M  / 2 ≤ k  ≤ M 

The highest frequency that is analysed is equal to  f max

=

   / 2π  = 0.5.   This is the

max

highest frequency that can be estimated from a time series, i.e. half of the frequency of  the observations. This frequency is called the Nyquist frequency. So if hydraulic head is observed once per day than the highest frequency that can be detected is one cycle per  two days. The smallest frequency (largest wavelength) that can be analysed depends on the discretisation  M :  f min = π   /( 2π   M ) = (1 / 2 M ) , where  M   is also the cutoff level (maximum lag considered) of the covariance function. The width of the smoothing windows is adjusted accordingly. As an example the spectrum of the daily discharge data of the Rhine river at Lobith (Figure 4.1 and Figure 5.9 for the correlation function) was estimated using a Parzen window with  M=  M=9000. 9000. Figure 5.11 shows the normalised spectral density function so obtained. Clearly, small frequencies dominate with a small peak between 4 and 5 years. Most prominent of course, as expected, there is a peak at a frequency of once a year, which exemplifies the strong seasonality in the time series.

5.3.4

Spec Spectral tral represen representati tations ons of rando random m space space fun functio ctions ns

If we extend the previous to two dimensions the stationary random function Z  function  Z ( x  x1, x  x2) can  be expressed in terms of random harmonics as:  Z ( x1 , x2 ) = µ  Z  +

k 1 = K 1

k 1 = K 2

∑   ∑ C  cos(ω   x + ω   x 12

k 1 = − K 1 k 2 = − K 2

with

 

k 1 1

k 2

2

+ Φ12 )

= C 12 and Φ12 random amplitude and phase angle belonging to frequency ( 2k i ∆  i − 1) ω k  = ± i

2

(5.36)

k i

, i  = 1,2 : (5.37)

89

 

300    )          f 250    (    Z   s   y    t    i   s   n 200   e    d    l   a   r    t   c 150   e   p    S    d   e 100   s    i    l   a   m   r   o

50    N   0 0.0001

0.001

0.01

0.1

1

Frequency f  (cycles/day)

 Figure 5.11 Normalised Normalised spectral density function of daily averaged discharge of the river river Rhine at L Lobith. obith.

The Wiener-Khinchine equations then become

∞ ∞

C  Z  (h1 , h2 ) =

∫ ∫ S  (ω  , ω   ) cos(ω  h + ω  h )d ω  d ω   Z 

1

2

1 1

1 1

1

(5.38)

2

− ∞− ∞

S  Z  (ω 1 , ω 2 ) =

∞ ∞

1

(2π )2  −∫∞−∫∞

 C  Z  ( h1 , h2 ) cos(ω 1 h1

+ ω 1 h1 )dh1dh2

(5.39)

The variance is given by the volume under the 2D spectral density function. ∞ ∞ 2  Z 

σ 

=

∫ ∫ S  (ω  , ω  )d ω  d ω   Z 

1

2

1

(5.40)

2

−∞−∞

ω

T

T

1 If we use a vector-notation we have: , ω 2 ) , h = ( h1 , h2 ) and   ω 1 h1 = (ω results: short hand way of writing (5.38) and (5.39)

ω

+ ω 1 h1

= ⋅ h. A



C  Z  (h) =

∫ S  (

 Z  ω

) cos(   ω ⋅ h) d ω

(5.41)

−∞

S  Z  (ω) =

1



(2π )2  −∫∞

C  Z  (h) cos(ω ⋅ h) d h

(5.42)

90

 

These equations are valid for higher dimensional random functions, where 1 /( 2π ) D  ( D = dimension   process) replaces 1 /(2π ) 2   in (5.42). The more formal definition using complex calculus then gives: ∞   i ⋅x  Z (x) = Re µ  Z  + e dX    ( ω)  ∫ −∞   ω

(5.43)

The Wiener Khinchine equations become ∞

C  (h) =  Z 

iω⋅h  Z  ω 

∫ S  (

−∞

)e

d ω

(5.44)

1

C  (h) exp (2π )   ∫

S  Z  (ω) =

−iω⋅h

 Z 

 D

d h

(5.45)

−∞

5.6 

Local averaging of stationary random functions

Consider a stationary random function Z  function Z (t ) and consider the random function Z  function  Z T  T (t ) that is obtained by local moving averaging (see Figure 5.12):  Z  (t ) = T 

1

t +T  / 2

(5.46)

   Z (τ )d τ  T  t −T / 2





 Z (t )

µ  Z 

t   Z T  T  (t )

Var [ Z T  (t )]  = V  Z  (T )σ T 2

µ  Z 

t   Figure 5.12 Local Local (moving) avera averaging ging of a stati stationary onary random function function

91

 

Local averaging of a stationary random process will not effect the mean, but it does reduce the variance. The variance of the averaged process can be calculated as (without loss of generality we can set the mean to zero here): Var [ Z T  (t )] =  E [  E [

t +T  / 2

1



t +T  / 2



 Z (τ 1 )d τ 1  Z (τ 2 )d τ 2 ] ( because Z is stationary) = T  t −T / 2 T  t −T / 2 1



1







 Z (τ 1 )d τ 1  Z (τ 2 )d τ 2 ] = T  0 T  0

1 2



1

T T 

 E [ Z (τ 1 ) Z (τ 2 )]d τ 1 d τ 2

∫∫

0 0

T T 

=

(5.47)

1 T 2

∫ ∫ C  (τ  − τ  )d τ  d τ  2

 Z 

1

1

2

0 0

A new function is introduced that is called c alled the variance function: V  Z  (T ) =

Var [ Z T  (t )]

(5.48)

σ  Z 2

The variance function thus describes the reduction in variance when averaging a random function as a function of the averaging interval T . From (5.47) we can see that the variance function is related to the correlation function as: V  Z  (t ) =

1 T 2

T T 

∫ ∫ ρ    (τ  − τ  )d τ  d τ   Z 

2

1

1

(5.49)

2

0 0

Vanmarcke (1983, p 117) shows that Equation (5.49) can be simplified to: V  Z  (t ) =



 1 −  τ   ρ  (τ )d τ    Z   T  ∫0   T   1

(5.50)

In Table 5.4 a number of correlation functions and their variance functions are given. If  we examine the behaviour of the variance function for large T   we get (see Vanmarcke, 1983): lim V  Z  (T ) →

T →∞

θ 

(5.51)



where θ   is is scale of fluctuation. In table 5.1 the scale of fluctuation is also given for the three correlation models. The scale of fluctuation was already introduced earlier as a measure of spatial correlation of the random function and can also be calculated using the correlation function (Equation 5.21) or the spectral density (Equation 5.29). Equation 92

 

5.51 thus states that for larger averaging intervals the variance reduction through averaging is inversely proportional to the scale of fluctuation: the larger the scale of  fluctuation the larger T   should be to achieve a given variance reduction. In practice, relation (5.51) is already valid for T  > 2θ   (vanmarcke, 1983). Table 5.4 Variance functions and scale of fluctuation for three different correlation functions (τ = =|t  |t 2-t 1|) 2 Exponential (first     −(τ / a) a   T    =  ρ  τ  e ( )  z  order autoregressive V (T ) = 2  ( − 1 + e −T / a )  process)  T   a τ  ≥ 0, a > 0

θ  = 2a Second order   Autoregressive  process

2a       τ  −(τ / a)  ρ  (τ ) = 1 + e 2 + e −T /  a V (T ) =  z  T    a  τ  ≥ 0, a > 0 θ  = 4a



2a T 

(1 − e −T / a )

 

Gaussian correlation  Function

  −(τ 2 / a2 )  ρ  z (τ ) = e  

τ  ≥ 0, a > 0

2

 a   V (T ) =     T    θ  = a π 

   T   −(T / a )2  π  Erf   − 1 + e   a  

The covariance of the averaged process is given by: τ + T 

T

1 C  Z (T ,τ ) = T 2

∫ ∫

C  Z (t 1 , t 2 ) dt 1dt 2

τ 

0

(5.52)

Generally it is not easy to obtain closed form expressions for (5.51). However as shown in chapter 7, it is relatively easy to obtain values for this function through numerical integration. We end this section by giving the equations for the 2D-spatial case, where it is straightforward to generalise these results to higher dimensions. The local average  process is for an area A area A= = L1 L2 defined as:  Z T ( x1 , x2 ) =

1  L L 1



 x1 + L1 / 2  x 2 + L2 / 2



 x1 − L1 / 2  x 2 − L2 / 2

 Z (u1 , u2 )du1du2

(5.53)

2

The variance function is given by: V  Z ( L1 , L2 ) =

 L1

1



 L1 L2

0

L2

  h1    h 2   ∫0  1 −  L 1   1 −  L 2  ρ  Z (h1, h2 )dh1dh2  

(5.54)

The limit of the variance function defines the spatial “scale of fluctuation” or  characteristic area  area α :

93

 

lim V  Z ( L1 , L2 ) →

 L1 , L2 → ∞

α   L1 L2

(5.55)

where α  can  can calculated from the correlation co rrelation function as follows:



α  =



∫ ∫ ρ  (u , u )du du  Z 

1

2

1

2

(5.56)

−∞ − ∞

The characteristic area α   can also be obtained through the spectral representation by setting 1 =   2 = 0 in the Wiener-Khinchine relation (5.39)

 s Z (0,0) = S  Z (0,0) / σ 

2  Z 

=

∞ ∞

1

(2π )2 −∫∞ −∫∞

ρ  Z ( h1 , h2 ) dh1dh2

(5.57)

Combining equations (5.56) and (5.57) then leads to:

= 4π 2 s Z (0,0) α   =

(5.58)

In Table 5.5 the various way of obtaining the scale of fluctuation and the characteristic area summarized: Table 5.5 Three ways of obtaining the scale of fluctuation (time) an characteristic area (2D space) (after  Vanmarcke, 1983)

Scale of fluctuation θ  lim TV  Z (T )

Characteristic area α  lim  L1 L2V  Z ( L1 , L2 )  L1 , L 2 → ∞

T →   ∞



∞ ∞

∫ ρ  (τ )d τ 

∫ ∫ ρ  (u , u )du du

 Z 

 Z 

−∞

1

2

1

2

−∞ −∞

4π 2 s Z (0,0)

2π   s Z (0)

Finally, the covariance of the averaged random function in two dimensions is given by:  L

C  Z  ( L1 , L2 ; h1 , h2 ) =

1 ( L1 L2 )2

+h

L  L

1

2

0

0

1

1

+h

 L 2

 ∫ ∫ ∫ ∫ h1

2

C  Z  ( x1 , y1 , x 2 , y 2 )dx1 dy1 dx 2 dy 2

(5.59)

h2

The covariance of the spatially averaged random function is frequently used in geostatistical mapping, as explained in chapter 7, where its values are approximated with numerical integration. To limit the notational burden, Equation (5.59) is usually written in vector notation with x1 = ( x1 , y1 ) T , x 2 = ( x 2 , y 2 ) T , h = ( h1 , h2 ) T and and A  A= = L1 L2: 1 (5.60) C  Z  ( A; h) = 2 C  Z  ( x1 , x 2 ) d x1 d x 2  A  A A+h

∫∫

94

 

5.7

Exercises

1. 

Give examples of hydrological variables that can be modelled with a continuousvalued and a discrete-valued 1) random series, 2) lattice process, 3) continuoustime process, 4) continuous-space process, 5) continuous space-time process, 6) time compound point process, 7) space-compound point process. Note that 14 combinations are asked for. The semivariance of a random function is described by: γ ( h)  = 10h 1.5 , while the mean is constant. Is this process: a) intrinsic; b) wide sense stationary; c) second order stationary; d) strict stationary? The semivariance of a random function is described by: γ ( h) = 10  exp( −h / 30), and a constant mean. The pdf at a given location is the Gamma distribution. Is this process: a) intrinsic; b) wide sense stationary; c) second order stationary; d) strict

2. 

3. 

6. 

 process: a) intrinsic; b) wide sense stationary; c) second order stationary; d) strict stationary? The semivariance of a random function is described by: γ ( h) = 10  exp( −h / 30), and a constant mean. The pdf at a given location is the Gaussian distribution. Is this  process: a) intrinsic; b) wide sense stationary; c) second order stationary; d) strict stationary? The semivariance of a random function is described by: γ ( h) = 10  exp( −h / 30), and a constant mean. The multivariate pdf of any set of locations is the Gaussian Gaussian distribution. Is this process: a) intrinsic; b) wide sense stationary; c) second order  stationary; d) strict stationary? Show that the integral scale of the exponential correlation function of Table 5.1 is

7.  8. 

equal to parameter a and of the spherical correaltion function is equal to (3/8)a (3/8) a. Derive Equation (5.24) and (5.26). The following relation holds for the expectation of the complex amplitudes:

4. 

5. 

S (ω )d ω  if ω 1 = ω 2 = ω    if  ω 1 ≠ ω 2 0

 E [ dX (ω 1 ) dX * (ω 2 )] = 

4

where dX * (ω ) is the complex conjugate   of the random complex amplitude dX ( ) in Equation (5.30). Given this relation, derive Equation (5.31). 9. 

Consider a random function  Z (t ) with a scale of fluctuation θ  =  =50 days, an 2 exponential covariance function and σ  Z  = 20 . Plot the relation between the variance of the averaged process Z  process Z T (t ) with T  increasing   increasing from 1 to 100 days.

  4

 A complex number z  number  z   =a =a +  +bi bi has a complex conjugate z  conjugate z * =  =a a – bi. bi. The producty of a complex number and

its complex conjugate is real valued: ( a + bi )(a − bi )

= a 2 + bia − bia − i 2 b = a 2 + b 2 . 95

 

96

 

6.

Time series analysis

97

 

98

 

Chapter 1

Introduction to Time Series

Modelling The state state of many phenome phenomena na in nature nature changes changes with time. time. This This dynamic dynamic ‘behaviour’ can be described by time series models, which can be used to estimate target parameters. These may include expected values at certain times such as the start of the growing season, or probabilities that critical levels are exceeded at certain times or during certain periods. These target target parameters parameters are estimated with the purpose of obtaining characteristics of the development of a certain universe in time. Suchthe characteristics can, fortoinstance, extrapolated Inherently, universe is assumed develop be in time, followingtoafuture processsituations. about which information is obtained by an observed time series. Because of restricted knowledge, there is no certainty about the ‘true’ process, if any, along which a universe develops in time. Therefore, the ‘assumed’ process is referred to as a  stochastic   process. Processes which are fully known are called  deterministic  processes.   processes. The future state of a deterministic process can be calculated exactly, resulting in one series, whereas the future state of a stochastic stochastic process can only be forecasted forecasted or predicted, predicted, resulting resulting in numerous series, called  realizations  of the process, which can be regarded as the outcomess of a probabilit outcome probability y experiment experiment.. Typically, a process is described by a   model . One genera generall class of mode models ls is that of time series models as described by  Box and Jenkins Jenkins   (1976). 1976).   Hipel and McLeod (1994 1994)) describe its applications in hydrology. It is emphasized here that in series modelling, the observed time series itself is regarded as the realization oftime a process. This process, which depends on the sampling interval, should not be confused with the underlying physical processes which cause the variation in the observed time series, serie s, as well well as observ observation error. error. It is stressed stressed that the assumption assumption of a stochastic stochastic process does not imply that nature is stochastic. Basically, stochastic processes are data-based, and they are imaginary, enabling statistical inference. This syllabus is restricted to   discrete   processes, processes, i.e. observ observations made at discrete discrete time steps, separated by intervals. intervals. Note that the underlying underlying physical physical process may be contin continuous uous in time. time. Further urthermor more, e, only   equally space spaced  d   discrete time series are discussed discu ssed in this syllabus, syllabus, i.e., daily, daily, weekly weekly,, monthly monthly or yearly yearly data. The syllabus focuses on the  time  domain.   domain. For analyses of series in the  frequency  domain, Priestley   domain,  Priestley

7

 

(1981 1981)) is referred referred to. Finall Finally y, this this syllabu syllabuss only deals deals with with   linear   processes. processes. For processes,  Tong (  (1990 1990)) is referred to.   Chatfield ( Chatfield  (1989 1989)) is referred to for a nonlinear   processes, Tong brief introduction into the spectrum of methods for time series analysis.

8

 

Hydrological Time Series, M. Knotters

Chapter 2

Stationary Processes A process is said to be   stationary  if its statistical properties do not change with time. tim e. It is importan importantt to note that that station stationari arity ty is not found in nature, nature, whether whether in geological, geologi cal, evolutionary evolutionary or any other processes. processes. Stationarit Stationarity y can only be b e assumed, given the length of the period and the length of time intervals. or  strict stationarity   means that all statistical properties are time-independent, Strong   or stationarity  means so they do not change change after time shifts. shifts. It is often often sufficie sufficient nt to assume assume   weak stationarity of order k , which means that the statistical moments up to order   k   only depend on differences in time and not on time as such.   Sec Second ond order stationarity  means that the stochastic process can be described by the mean, the variance and the autocorrelation function. This is also called   covariance stationarity . We now consider second order stationary stochastic processes. Suppose that we have an equidistant time series of   n  observations,   z1 , z2 , z3 , . . . , zn . The proces processs cannot cannot be exactly described, so  { zt }  is considered to be a realization of a stochastic process {Z t }. The mean is defined as the expected value of   Z t µ  =  E [Z t ]  ,

 

(2.1)

which can be estimated from an observed time series by the simple estimator ˆ  = z¯ = µ

 1 n

n



zt  .

 

(2.2)

t=1

The variance of the stochastic process   {Z t }  is defined as the expected value of the squared deviations from the mean: 2   =  E [( [(Z t  − µ)2 ]  . S Z 

 

(2.3)

2  can be estimated by S Z 

  1 S Z   = n−1 ˆ2

n

zt  −  ¯ z )2 .

(

 

(2.4)

t=1

The autocovarianc autocovariancee for lag   k  is defined by [(Z t − µ)(Z t+k  − µ)]  , γ k   =   E [(   2 γ 0   =   S 0   .

9

 

(2.5)

For lag 0 the autocov autocovari arianc ancee equals equals the varianc ariance. e. The autocorrel autocorrelatio ation n functi function on (ACF) for lag  k  is a scaled form of the autocovariance: ρk  =

  γ k . γ 0

 

(2.6)

The sample autocovariance function for lag   k   can be calculated from a time series by ck   =

 1 nk

n−k

(

zt  −  ¯ z )(zt+k  −  ¯ z )  ,

 

(2.7)

t=1

terms, with a maximum of   n where n k  is the number of summed terms, n − k ; terms for which a value of   zt   or  z t+k  is missing are excluded. The sample ACF is estimated by



rk  =

2.1



  k  c k 1− . n c0

 

(2.8)

Autore Autoregre gressi ssiv ve (AR) (AR) process processes es

For many environmental processes it is likely that the state at a particular time is correlat correlated ed with with the state at previo previous us times. times. These These processes processes are referr referred ed to as autoreg auto regres ressiv sivee (AR) (AR) process processes. es. An autore autoregre gressi ssive ve process process of order order 1, an AR(1) AR(1) process or Markov process, is given by Z t  − µ  =  φ1 (Z t

1  −



µ) +  at  ,

 

(2.9)

where µ  is the mean level,  φ 1  is the AR parameter, and  at  is the error term with zero mean and variance  S a2 .   at  is assumed to be identically and independently distributed (IID), so   S a2   , k  = 0   (2.10) E [at at k ] = 0  , =0 k  





for all   t. Using the backward shift operator   B , (2.9) 2.9) can be written as Z t  − µ  =  φ1 (BZ t  − µ) +  at  ,

where  B k Z t  =  Z t

 

(2.11)

(2.11) 2.11) can also be written as

k .



φ(B )(Z t  − µ) =  a t  ,

 

(2.12)

with   φ(B ) = 1 − φ1 B . An autoregressive process of order  p , an AR(p ) process, is given by Z t  − µ  =  φ1 (Z t

1  −



µ) +  φ2 (Z t

2  −



µ) + · · · + φ p (Z t

 p  −



µ) +  at  ,

 

(2.13)

or using the backward shift operator: φ(B )(Z t  − µ) =  a t  ,

10

 

(2.14)

Hydrological Time Series, M. Knotters

 

Time lag

Figure 2.1.   Theor Theoretic etical al ACF for an AR(1) AR(1) process with  φ 1  = 0.8

where   φ(B ) = 1 − φ1 B − φ2 B 2 − · · · − φ p B p is the autoregressive operator of order  p. To obey the assumption of stationarity, the values of the AR parameters are restricted. For an AR(1) process, this restriction is  | φ1 |  <  1. This This can be b e analyzed analyzed as follows. Equation (2.13 (2.13)) can be written as Z t  − µ  =  a t  +  φ1 at

3 2 1  +  φ1 at−2  +  φ1 at−3  +  . . .

 



Now, analyze the relation between   Z t  − µ   and   at and if   || φ1 | ≥ 1.

k , k  =



(2.15)

1, 2, 3, . . . , both if   |φ1 |  <  1

Important tools to identify an AR(p ) process from an observed time series are the ACF partial autocorrelation (PACF). The ACF and PACFand for the an AR( proc ess are derived derivfunction ed as fol follo lows. ws. First, First, thetheoretical terms terms of the AR( p ) process p) process in (2.13 (2.13)) are multiplied by (Z t k  − µ): −

k  − µ)(Z t 1  − µ) +  φ2 (Z t k  − µ)(Z t 2  − µ) + · · · + φ p (Z t k  − µ)(Z t  p  − µ) + ( Z t k  − µ)at  . (2.16) By taking expectations of the terms in (2.16 (2.16)) we obtain

(Z t

k  −



µ)(Z t − µ) =   φ1 (Z t







γ k  =  φ 1 γ k

1  +  φ2 γ k−2  +









· · · + φ p γ k



 p  ,



 

(2.17)

with  k > 0.   E [( [(Z t k − µ)at ] equals zero for  k >  0, because  Z t k  only depends on the error process up to and including  t − k  and is uncorrelated with  a t . The theoretical ACF is obtained by dividing (2.17 ( 2.17)) by   γ 0 : −



ρk  =  φ 1 ρk

1  +  φ2 ρk−2  +



· · · + φ p ρk

 p  ,



 

(2.18)

with  k > 0. Figures 2.1 Figures  2.1  and  2.2  give the theoretical ACFs for AR(1) processes with (2.9)). )). φ  = 0.8 and  φ  =  −0.8, respectively (see (2.9

1

1

11

 

Figure 2.2.   Theor Theoretica eticall ACF for an AR(1) AR(1) process with   φ1  =  − 0.8

Extending (2.18 (2.18)) for   k  = 1, 2, · · ·   , p  results in a set of Yule–Walker equations: ρ1   =   φ1   +   φ2 ρ1   +   · · ·   +   φ p ρ p ρ2   =   φ1 ρ1   +   φ2   +   · · ·   +   φ p ρ p

· · ·

· · ·

· · ·

ρ p   =   φ1 ρ p

  +   φ2 ρ p

1

2



··· · ··· · ··· ·   +   · · ·   +   φ p

2





1



(2.19)

which in matrix notation is equal to

 1  ·· ·

 

ρ1   ρ2   · · ·   ρ p

ρ1   1

 

· · ·

ρ p

1



1



  ρ p

2



ρ1   · · ·   ρ p

2



· · ·

  ρ p

3



··· ··· ···   ···  

· · · 1

φ1 φ2

    ··  ·

φ p

ρ1 ρ2

   =  ··  ·

ρ p

  

 

.

(2.20)

(2.20)) Now if  φ  φ kj  is the  j th coefficient of an AR process of order  k  ( j  = 1 . . . k), then (2.20 can be written as

1  · · ·

ρ1

 

ρ1   ρ2   · · ·   ρk   1   ρ1   · · ·   ρk

· · ·

· · ·

··· ··· ···

· · ·

1



2



   ·  ·  ·

φk 1 φk 2

    · = ·  ·

ρ1 ρ2

   

.

 

(2.21)



ρk

1



  ρk

2



  ρk

3



  ···   1

φkk

ρk

    

2.21)) is a function of lag   k, which is called the theoretical The coefficient   φkk   in (2.21 partial autocorrelation function (PACF). The sample PACF is used in model identification;  φˆkk  is estimated and plotted against  k   for  k  = 1, 2, . . . . The sample ACF and PACF of a deseasonalized series of water table depths are given in Fig.   2.3 2.3..

12

Hydrological Time Series, M. Knotters

 

Figure 2.3.   Sample ACF and PACF PACF for a deseasonalized time series of water table depths.

2.2

Movin Moving ga av verage erage (MA) (MA) process processes es

In moving average processes the state at a certain time depends on a random shock at that time and a random shock which occurred at one or more previous times. A first order moving average process, MA(1), is given by Z t  − µ  =  a t  +  θ1 at

 

1  .



(2.22)

Here   at   and   at 1  are random shocks which form part of a white noise process with zeroo mean zer mean and finite and consta constant nt varian variance. ce. Using Using the backwa backward rd shift shift operator operator,, −

(2.22 2.22)) can be written as

Z t  − µ  =  θ (B )at  ,

 

(2.23)

where  θ (B ) = 1 − θ1 B  is the MA operator of order one. The assumption of stationarity is obeyed for an MA(1) process if   |θ1 |   <  1. This This can be analyzed analyzed by writing writing (2.22 2.22)) as Z t  − µ  =  a t − θ1 (Z t

1  −



µ) − θ12 (Z t

2  −



µ) − θ13 (Z t

3 −



µ) − . . . .

 

(2.24)

Next, analyze the effect of the value of  θ  θ 1  on the relation between   Z t   and  Z t

k.



t k

The theoretical ACF for an MA(1) process with   θ1  = 0.8 is given in Fig.   2.4 2.4.. An MA(q ) process is given by Z t  − µ  =  a t  − θ1 at

1 − θ2 at−2  − · · · − θq at−q   ,

 



(2.25)

or Z t  − µ  =  θ (B )at  ,

 

(2.26)

where  θ (B ) is the MA operator of order   q .

13

 

Figure 2.4.   Theor Theoretica eticall ACF for a MA(1) MA(1) process with   θ1  = 0.8

2.3

Autoregre Autoregressiv ssive e moving moving average verage (ARMA) (ARMA) process processes es

A time series may contain properties of an autoregressive process as well as a moving average process. An autoregressive moving average ARMA(1,1) process is given by Z t  − µ  =  φ1 (Z t

1  −



µ) +  at  − θ1 at

 

1  .



(2.27)

The ARMA(p,q ) process is given by φ(B )(Z t  − µ) =  θ (B )at   ,

 

where  φ (B ) and   θ(B ) are the AR( p) and the MA(q ) operator, respectively.

(2.28)

14

Hydrological Time Series, M. Knotters

 

Chapter 3

Nonstationary Processes 3.1 3. 1

Diffe Differe renc ncin ing g

Calculating differences allows a trend to be removed from a series:

∇Z t  = (Z t  − µ) − (Z t

1  −



∇2 Z t  =  ∇ Z t  − ∇Z t

1



µ)

(3.1)

 

(3.2)

and so on, until a series of differences is obtained with a constant mean in time.

3.2

Autoregre Autoregressiv ssive e Int Integrat egrated ed Mov Moving ing Average Average (ARIMA) (ARIMA) processes

Basically, an ARIMA model is an ARMA model for stationary differences: (∇d Z t  − µ) =

3.3

  θ (B ) at  . φ(B )

 

(3.3)

Seaso Seasonal nal nonsta nonstatio tionar nary y proce processe ssess

A form of nonstationarity often encountered in environmental processes is seasonal shows the sample ACF for the time series of daily water table depths. ity. Figure 3.1 Figure 3.1 shows Note that the sample ACF clearly clearly reflects the annual seasonal pattern. pattern. Besides Besides seasonal variation variation of the mean, the variance variance itself may vary seasonally. seasonally. For example, shallow water tables in the wet season may vary more than deep water tables in the dry season, due to reduced storage capacity of the unsaturated zone in the wet season.. If the variance season ariance is nonconstant nonconstant in time, i.e., there is heterosced heteroscedastici asticity ty,, the variance ariance should should be made constant constant by an appropriate appropriate deseasonalizat deseasonalization ion procedure or McLeod,,  1994).  1994). by a Box–Cox transformation of the time series (Hipel ( Hipel and McLeod

15

 

r k

(-)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0

100

200

300

400

500

600

700

800

k (days) Figure 3.1.   Sample AC ACF F of a daily observed observed series of water table table depths

 

3.4

Seasonal Seasonal in integra tegrated ted autoreg autoregressi ressive ve mov moving ing ave average rage (SARIMA) processes

In the case of a seasonal autoregressive moving average process, differences are calculate cul ated d for the so-called so-called seasonal seasonal distanc distance, e, with with the aim of remov removing ing a season seasonal al trend.. For example, the seasonal trend seasonal distance for monthly monthly values values is twelv twelve. e. The general notation of a SARIMA(p,d,q)×(P,D,Q) model is (∇d ∇D s   Z t  − µ) =

16

  φ(B )Φ(B s ) at  . θ(B )Θ(B s )

(3.4)

Hydrological Time Series, M. Knotters

 

Chapter 4

Causality 4.1

 

Cross-co Cross-cov variance, ariance, cross-co cross-correla rrelation tion

Two time series are observed at   n  equally spaced time steps:   xt , t   = 1, . . . , n   and want to assess assess if there there is a linear linear relatio relationsh nship ip between between   {xt } zt , t   = 1, . . . , n. We want and   {zt }. This This could be done done by regres regressio sion n analys analysis, is, where where   xt   is plotted against round. Howe Howeve ver, r, this this will will only give give insigh insightt in into to zt , t   = 1, . . . , n   or the other way round. = the relationship between   x  a  and nd   z  at time   t, whereas   xt  may be related to   zt+k , k   0. The   cross-covariance function   and the   cross-correlation function  give the linear cross-covarianc ariancee function is often estimated estimated relationship between   xt   and   zt k . The cross-cov + by n k  1 t=1   xt zt+k   k  ≥  0 n   (4.1) cxz (k ) = n 1  0  x z   k < t t + k t =1 k n

  





McLeod   (1994 1994)). )). The cross-correl cross-correlation ation function (CCF) (for instance, in  in   Hipel and McLeod is a scaled form of the cross-covariance function:   cxz (k )   (4.2) . rxz (k ) = cx (0)cz (0)

 

In other textbooks or statistical statistical packages packages (eg., Genstat, Genstat, Payne  Payne (  (2000 2000)) )) the following formula is used instead of (Eq. 2001) for estimating the cross-covariance function: cxz (k ) =

n−k

 1

(

xt  −  ¯ x)(zt+k  −  ¯ z)

n

(4.3)

k t=1

number of summed summed terms; terms; products containing containing for positiv positivee lags. Here, Here,   nk   is the number missing missi ng values are excluded. excluded. The cross-correlat cross-correlation ion function function is estimated estimated by rxy (k ) =

4.2



 k 1− n



  (0)( )(0)   cxy k cx

cy

.

 

(4.4)

Transfer ransfer Function-No unction-Noise ise Processes Processes

A class of time series models which describe the linear dynamic relationship between one or more input series and an output series is that of the transfer function model

17

 

 X 1,t   X 2 ,t 

transfer  function model 1

*

 Z 1, t  *

 Z 2 , t 

2 *

 X 3,t 

 Z 3, t  3

Σ

 Z t 

ε 



noise model

 N t 

Figure 4.1.   Schemati Figure Schematicc representati representation on of a transfer function function model with added noise. noise. X 1,t , X 2,t , X 3,t , . . .   are input variables.   Z 1,t , Z 2,t , Z 3,t , . . .   are transfer components.   εt   is ∗





an error term which which forms a series series of independent independent and identica identically lly distribute distributed d disturbanc distu rbances, es, with finite and constant variance   S ε2 .   N t   is the noise component.   Z t  is the output variable

1976). For applications to with added noise (TFN) developed by Box by  Box and Jenkins  Jenkins   (1976). 1994). ). The general TFN model environmental series, we refer to Hipel to  Hipel and McLeod ( McLeod  (1994 is given schematically in Fig.   4.1 4.1.. If one input series  { X t }  is considered, the TFN model is defined as ∗

Z t  =  Z t   + N t  ,

where

 

r

  =



Z t

(4.5)

s ∗

δ i Z t

i b  +  ω0 X t−b

− −

i=1

  −

ω j X t

 

 j −b



(4.6)

 j =1

is the transfer component, and  p

N t  − µ  =

q

( i=1

φN t i − µ) +  t  −  j =1 t −





 j   ,

 

(4.7)

is the noise noise componen component. t. The  subscript   b   is a pure delay, which is the number of  time steps after which a reaction to an input change is observed in the output. The extension exten sion to more input series series is straightf straightforwa orward. rd. The transfer component in (4.5 (4.5)) can be written as Z t   =   ν 0 X t  +  ν 1 X t =   ν (B )X t  . ∗

1  +  ν 2 X t−2  +



···  

(4.8)

The weights  ν 0 , ν 1 , ν 2 , . . .  form the impulse–response function  ν (B ): ν (B ) =

  ω(B ) δ (B )

18

 =

  ω0  − ω1 B  − ω2 B 2 − · · · − ωs B s   . r 2

 

(4.9)

1 − δ 1 B − δ 2 B − · · · − δ r B

Hydrological Time Series, M. Knotters

 

The theoretical impulse–response function reflects the same autoregressive and moving average characteristics as the theoretical autocorrelation function, given in Chapter 2 ter  2.. Jenkins (1976 Box and Jenkins ( 1976)) present a procedure for identifying the order of TFN models. This procedure is summarized by the following steps:

1. An appropriate appropriate univariate univariate time series series model is fitted to the input series { xt }. The resulting white noise sequence of residuals is called the prewhitened input series  { αt }. 2. The output series   {zt }   is filtered by the univariate time series model for the input series obtained in the previous step. This results in a series  { β t }. 3. The residual cross-corre cross-correlation lation function function  r αβ (k) (residual CCF) is calculated for the  { αt }  a  and nd  { β t }  series: rαβ (k ) =



 k 1− n



where cαβ (k) =

 1 nk

  (0)( )(0)   cαβ  k cα

cβ 

 

,

(4.10)

n−k



αt β t+k

 

(4.11)

t=1

for positive lags, and  n k  is the number of summed terms. Terms with missing values are excluded.   cα (0) and   cβ (0) are the sample variances of the   α  series and the   β  series,   series, respectively, calculated by (2.4 (2.4). ). ), the parameters required in the 4. Based on the residual CCF given by (4.10 ( 4.10), transfer trans fer function function   ν (B ) in (4.9 (4.9)) are identified.   Box and Jenkins  Jenkins   (1976 1976)) show that the theoretical CCF between  α t   and  β t  is directly proportional to  ν (B ). 5. Next, a noise model is identified identified for the series series ˆ t  = (zt  −  ¯ z ) − ˆ ν ν  (B )(xt  −  ¯ x)  , n

 

(4.12)

(2.8)) and (2.21 by using the sample ACF and sample PACF for n ˆ t , given by (2.8 (2.21). ).

4.3

Inter Interv ventio ention n Analys Analysis is

A special form of TFN models are the intervention models, described by  Hipel et al.   (1975 al. 1975)) and  and   Hipel and McLeod McLeod   (1994). 1994). The interv intervent ention ion model for a step trend trend is given by Z t  =  I t  +  N t ,   (4.13) where  t  = 1, . . . , n indicates the  t th element of a series of length  n ,  Z t  is the process of interest,   I t   is the trend component and   N t   is a noise component describing the part of  Z   Z t  that cannot be explained from the trend. The noise component is usually an ARMA model, see (2.28 (2.28). ). The trend trend componen componentt   I t   is a transfer function with the following general form: I t  =  δ 1 I t

1  +  δ 2 I t−2  +



) · · · + δ r I t r  +  ω0 S t(T b) − ω1 S t(T 1) b  − · · · − ωm S t(T m ,   (4.14) −









19

 

where   δ 1 . . . δr    are autoregressive parameters up to order   r,   ω0 . . . ωm   are moving average parameters up to order  m ,  b  is a pure delay delay parameter. parameter. Using the backward backward shift operator  B , (4.14 4.14)) can be written as I t  =

with   B k zt  =  zt

k   and  k  is



  ω (B ) b (T ) B S t   , δ (B )

 

(4.15)

a positive integer.

(T )

series ies indicating indicating the step step interve interventio ntion: n: S t   is an input ser (T )

S t   = 0 (T ) S t   = 1

if  t  < T, if   t  ≥  T .

(4.16)

Step interventions influence processes in different ways, which can be expressed by different forms of the transfer function, see Fig.   4.2 4.2.. The The model model in in (4.13 (4.13)) can be extended with other transfer components besides the intervention: Z t  =  I t  +  X i,t i,t  +  N t ,   ı = 1 . . . m ,

 

(4.17)

where  X i,t  are re  m  transfer components of   m  independent inputs. i,t , i  = 1 . . . m  a

20

 

Hydrological Time Series, M. Knotters

I t  t 

0

5

10

15

20

25



(T )

(T 

Responses to a step interv intervent ention. ion.   T   = 5.   S t   = 0 fo forr   t < T ,   S t   = 1 fo forr Figure 4.2.   Responses Figure t  ≥  T .   ω0  = 3.,   δ 1  = 0.7,  b  = 1. a. step model, b. delayed step model, c. step decay model,

d. linear model, e. impulse decay model.

21

 

Chapter 5

Time Series Modelling 5.1

Identific Identificatio ation, n, Estima Estimation tion and Diagno Diagnostic stic Checking Checking

Jenkins   (1976) Box and Jenkins  1976) distinguish three steps in model construction: 1. Identificat Identification; ion; 2.   Estimation   (calibration); 3.   Diagnostic checking  (verification). In the identification stage is analyzed which stochastic model is most representative for the observed observed time series. series. Identificat Identification ion starts with a visual visual analysis of a time series plot. A graph may indicate the presence of a seasonal component or some other form of trend. It may be useful to filter the series in order to obtain a more  smooth  picture, which the mean level more clearly. So called it box-and-whisker-plots can indicate thereflects time-dependence of variance. Furthermore, may be necessary to apply a transformation of the data, in order to approximate the assumption. Other tools in the identification are the sample ACF and the sample PACF for univariate timee series tim series models. models. Plots Plots of the sample sample ACF ACF and the sample sample PAC PACF F indica indicate te the type of stochastic process that can be assumed (AR, MA or ARMA) and the order of this process. A transformation applied to obtain approximately normally distributed residuals with constant variance is the so called Box–Coxtransformatie, which is defined as follows:   λ 1 [(zt  +  c)λ − 1], λ  =0 (λ)   (5.1) zt   = ln(zt  +  c), λ = 0





Of course, course, the transformati transformatieparam eparameter eter   λ  can be calibrated from observed series. However, it is recommended to use knowledge of the underlying physical process in choosing for a transformation. The identification procedure given by Box by  Box and Jenkins ( Jenkins  (1976 1976)) is summarized in section   4.2 4.2.. In the estim estimatio ation n stage stage the paramete parameterr values alues are estima estimated ted by using using an

23

 

optimization algorithm, based on a least squared criterion or a maximum likelihood criterion. This is not discussed in further detail in this syllabus. In the stage of diagnostic checking (verification) it is checked if the model assumptionss are answe tion answered red.. This This is mainly mainly based based on analysi analysiss of the residu residuals als:: the plot of  residuals against time, the residual ACF and the residual PACF. In the case of a transfer function–noise model the CCF of the input variable and the residuals is also inspected. Besides Besides visual inspections, several several statistical tests on the presence of  autocorrelation autocorr elation or  cross -correlation can be applied.

5.2

Autom Automati aticc Model Model Select Selection ion

Automatic model selection is an objective and reproducible alternative for the procedure ced ure of ident identific ificati ation on en estima estimatio tion n descri described bed in the previous previous section. section. A large large set of candidate candidate models with varying varying complexity complexity is compiled. Next, these models are calibrated to the observed data and a selection criterion is calculated, which is composed as follows: au auto toma mati ticc sele select ctio ionc ncri rite teri rion on = lac lackoffit offit + comp comple lexi xitty

(5.2 (5.2))

An appropriate model goes with a small criterion. Examples of automatic selection criteria are Akaike’s Akaike’s Informaton Criterion (AIC) and a nd Schwartz’ Bayesian Bayesian Information Criterion (BIC). BIC generally tends to selection of less complex models than AIC. After selection of a model, it must be verified if the model assumptions are answered. It is important important that the set of candidate models is compiled carefully carefully.. If the set is too large, the risk for overfitting overfitting is large too. Physical Physical knowledge knowledge can be helpful helpful in compil com piling ing the set of candid candidate ate models. Howe Howeve ver, r, if the set of candid candidate ate models is restricted tooinsights much onfrom the the basis of existing knowledge, the opportunity to gaining new physical data will be limited.

5.3 5. 3

Val alid idat atio ion n

In validation, the results of prediction or simulation are compared with independent observations (i.e., observations that were not used in model identification, calibra-

tion or selection). selection). Typically Typically,, validation alidation focuses on the appropriatenes appropriatenesss of a model given its  purpose , whereas verification concerns on the theoretical soundness of the model. Validation may be an alternative to verification (or diagnostic checking). If  the validation errors are less than a predefined value, the model can be accepted for further further applic applicati ation. on. Validat alidation ion enables enables to fit the model select selection ion to the practical purpose of the (eg., simulat ion, prediction, prediction , forecasting forec , ulate estimation estim of  chara characte cteris ristic tics). s). For model instan instance, ce, if simulation, the purpose of the model is asting, to simula sim te ation extrem extreme e situations, validation on the reproduction of minimum and maximum values would be approp appropria riate. te. An independe independent nt valid validati ation on set will will not always always be av avail ailabl able. e. In cross–v cross –validati alidation on two subsets of a series series are distinguish distinguished: ed: the calibration calibration set and the validation set. Next, calibration and validation are repeated until all data were used both for calibration and for validation.

24

Hydrological Time Series, M. Knotters

 

7. 7.1

 

Geostatistics Introduction

Geostatistics is the part of statistics that is concerned with geo-referenced data, i.e. data that are linked to spatial coordinates. To describe the spatial variation of the property observed at data locations, the property is modelled with a spatial random function (or random field) field) Z   Z x, T T x     x,  x, y or   x x     x,  x, y,  y, z . The focus of geostatistics can be further explained by Figure 7.1. Suppose that the values of some property (for instance hydraulic conductivity) have been observed at the four locations x 1 , x 2 , x 3  and  x 4  and that, respectively, the values z  values  z 1 ,  z 2 ,  z 3  and  z 4  have been found at these locations. Geostatistics is concerned with the unknown value  z 0  at the non-observed location x 0 . In particular, geostatistics deals with: 1.   spatial interpolation and mapping: predicting the value of   of   Z 0   at   x 0   as accurately as  possible, using the t he values found at the surrounding locations (note that  Z 0  is written here in capitals to denote that it is considered to be a random variable); 2.   local uncertainty assessment: estimating the probability d distribution istribution of   Z 0  at  x 0  given the values found at the surrounding locations, i.e. estimating the probability density function  f   z  z 0 ; x 0   |   z 1 x 1 ,   z 2 x 2 ,   z 3 x 3 . This probability probability distributi distribution on expresses expresses the uncertainty uncertainty about the actual but unknown value value z   z 0 at  x 0 ; 3.   simulation: generating realisations of the conditional RF   Z x| z x i , i     1,..  1,..,, 4at many non-observed locations  x i  simultaneously (usually on a lattice or grid) given the values found at the observed locations; e.g. hydraulic conductivity is observed at a limited number of locations but must be input to t o a groundwater model on a grid.

 Figure7.1  Figure 7.1 Focus  Focus of geostatistics

117

 

Geostatistics was first used as a practical solution to estimating ore grades of mining blocks using observations of ore grades that were sampled preferentially. i.e. along outcrops (Krige, 1993). Later it was extended to a comprehensive statistical theory for geo-referenced data (Mathe (Ma theron ron,, 1970) 1970) Presen Presently tly,, geosta geostatist tistics ics is applied applied in a great great number number of fields fields such such as  petroleum engineering, hydrology, soil science, environmental pollution and fisheries. Standard text books have been written by David (1977) , Journel and Huijbregts (1998) , Isaaks and Srivastava (1989) and Goovaerts (1997). Some important hydrological problems that have been tackled using geostatistics are among others:    spatial interpolation and mapping of rainfall rainfall depths and hydraulic hydraulic heads;    estimat estimation ion and simula simulatio tion n of repres represent entativ ativee condu conductiv ctivitie itiess of model model blocks blocks used used in groundwater models;    simulation of subsoil subsoil properties such as rock types, texture classes and geological facies;    uncertainty analysis of groundwater groundwater flow and -transport -transport through heterogeneous heterogeneous formations formations (if hydraulic conductivity, dispersivity or chemical properties are spatially varying and largely unknown) (see chapter 8). Thee remain Th remaining ing of this this chapte chapterr is divide divided d into into four four parts. parts. The The first first part part briefly briefly revisits revisits descriptive statistics, but now in a spatial context. The second part is concerned with spatial interpolation using a technique called kriging. The third part uses kriging for the estimation of  the local local condit condition ional al probab probabilit ility y distri distribut bution ion.. The last last part part deals deals with with the simula simulatio tion n of  realisations of spatial random functions.

7.2

De Desc scri ript ptiv ivee spat spatia iall stati statist stic icss

 Declustering  In this section we will briefly revisit the subject of descriptive statistics, but now focussed on spatial (i.e. geo-referenced) data. Looking at Figure 7.1 it can be seen that not all observation locations are evenly spread in space. Certain location appear to be clustered. This can for  instance be the case because it is convenient to take a number of samples close together. Another reason could be that certain data clusters are taken purposively, e.g. to estimate the

short distance variance. If the histogram or cumulative frequency distribution of the data is calculated with the purpose of estimating the true but unknown spatial frequency distribution of an area, it would not be fair to give the same weight to clustered observations as to observations that are far from the others. The latter represent a much larger area and thus deserve to be given more weight. To correct for the clustering effect declustering effect  declustering methods can methods can  be used. Here, one particular declustering method called polygon declustering is illustrated. Figure 7.2 shows schematically a spatial array of measurement locations. The objective is to estima esti mate te the spatia spatiall statis statistics tics (mean (mean,, varian variance, ce, histog histogram ram)) of the proper property ty (e.g. (e.g. hydrau hydraulic lic conductivity) of the field. The idea is to draw Thiessen polygons around the observation lo locat catio ions ns fir first st:: by th this is pr proc oced edur uree each each lo loca catio tion n of th thee field field is as assig signe ned d to th thee cl clos oses estt observation. The relative sizes of the Thiessen polygons are used as declustering weights: w i      A i /  j A j . Using these weights the declustered histogram and cumulative frequency distribution can be calculated as shown in Figure 7.3, as well as the declustered moments such as the mean and variance:

118

 

n

m z   



w i z   z i

 

(7.1)

i1 n

 s z 2





w i  z i   m z 2

(7.2)  

i1

w1 = w2

=

 A1  A1 +  A2

+

 A3

+

 A4

+

 A5

+

 A4

+

 A5

 A2  A1 +  A2

+

 A3

10

etc.

7

 A4

 A1

 A3 9

 A1= 0.30  A = 0.25 2

 A3= 0.10

8

m z  =  A w4= z 0.20+ w  z  11 2 2

+

 A5= 0.15

w   z  3 3

+

 A2

w  z  4 4

+

 A5 15 w z  5 5

m z = 0.3 ⋅ 7 + 0.25 ⋅ 8 + 0.10 ⋅ 9 + 0.20 ⋅ 10 + 0.15 ⋅ 15 = 9.25  s z 2 = 0.3 ⋅ (−2.25) 2 + 0.25 ⋅ (−1.25) 2 + 0.10 ⋅ (−0.25) 2 + 0.20 ⋅ (0.75)

2

+

0.15 ⋅ (5.75)

2

=

6.99

 Figure 7.2  Figure  7.2 Schematic  Schematic example of polygon declustering .

Declustered cum. Freq. Distr.

Declusterd histogram

w1+ .. ..+ +w5

1 w1+ w2+w3

1

w1+ ..+ ..+w4

w1+ .. ..+ +w4 w1+ w2 w1

0

0

5

10

w5 15

0

0

5

10

15

 Figure 7.3  Figure  7.3 Schematic  Schematic example of declustered frequency distributions

119

 

The effect of declustering can be demonstrated using the synthetic Walker lake data-set shown in Figure 2.1 (all of the larger numerical examples shown in this chapter are based on the Walker-lake data set. The geostatistical analyses and the plots are performed using the GSLIB geostatistical software of Deutsch and Journel (1998). Figure 2.1 only shows the 140 values at the sample locations. The associated histogram is shown in Figure 2.2. Because this is a synthetic data set we also have the exhaustive dat of the entire area (2500 values). Figure 7.4 shows the declusterd histogram based on the 140 data and the ”true” histogram based on 2500 values. Clearly, the are very much alike, while the histogram based on the non-weighted data (Figure 2.2) is much different. The estimated mean without weighting equals 4.35 which is much too large. The reason is the existance of clusters with very high data values present in the observations (see Figure 2.1). Declustering can correct for this as can be seen from the decslustered mean in Figure 7.4 which is 2.53 and very close to te true mean of 2.58. Histogram of declustered obsv.

True histogram

 Figure 7.4  Figure  7.4  Declustered histogram of the 140 the  140   data values ( values  (left  left )  and the true histogram of the

Walker lake data set   (right  (right )

Semivariance and correlation Using the Walker-Lake data set of 140 observations we will further illustate the concept of the semivariogram and the correlation function. Figure 7.5 shows scatter plots of the   z x and  z x   h for |h|   1,5,10,20 units (pixels) apart. For each pair of points the distance   d i  to the one-to-one line is can be calculated. The semivariance of a given distance is given by (with nh the number of pairs of points that are a distance h distance h     | h| apart): nh

nh

h    1 d 2i    1   z x   h  z x2   2nh i1 2nh i1

(7.3)  

and the correlation coefficient:

120

 

nh

h    1 



nh

 z x   h z x m z xhm z x  s z xh s z x

(7.4)  

i1

where   m z xh , m z x and where   and   s z xh, s z x are the means and variances of the  the   z x and and   z x   h data values respectively. These estimators were already introduced in chapter 5 for data that are a re not on a grid. Figure 7.6 shows plots of the semivariance and the correlation as a function of  distance. These plots are called the semivariogram and the correlogram respectively. If we imagine the data   z  to   to be observations from a realisation of a random function   Z x and this random function is assumed to be intrinsic or wide sense stationary (see chapter 5) then (7.3) and (7.4) are estimators for the semivariance function and the correlation function.

h = 1 units:γ  ((h) 0.43 h ) ==0.43    ρ ((h) h) = 0.54

5 4

 

4

d i

3

3

2

2

   )    h    + 1   x    (   z

   )    h    + 1   x    (   z

0

0

-1

-1

-2

-2

-3

-3 -3

--2 2

--1 1

0

1

2

3

4

5

-3

z(x)

h = 10 10 uni units: ts:γ  ((h) h ) =2.17    ρ ((h) h) = 0.014

   )    h

-2 -2

-1

0

1

2

3

z(x)

 

h = 20 units:γ  (h (h) ) =2.42    ρ ((h) h) =- 0.17

5

5

4

4

3

3

2

h = 5 units:γ  ((h) h ) =1.25    ρ ((h) h) = 0.36

5

   )    h

2

4

5

   + 1   x    (   z

   + 1   x    (   z

0

0

-1

-1

-2

-2

-3

-3 -3

--2 2

-1

0

1

2

3

4

5

-3

--2 2

--1 1

0

1

z(x)

2

3

4

5

z(x)

 Figure 7.4 Scatter  Figure 7.4  Scatter plots of z  x and z  x   h for  |h  | h|     1,5,10,20  1,5,10,20 units  units (  ( pixels  pixels))  apart from the Walker lake data set .

121

 

3.0 2.5   e 2.0   c   n   a    i   r   a 1.5   v    i   m   e 1.0   s

0.5 0.0 0

5

10

15

20

1.0 0.9 0.8 0.7   n 0.6   o    i    t 0.5   a    l 0.4   e   r   r   o 0.3   c 0.2 0.1 0.0 -0.1 -0.2 0

5

h (units)

10

15

20

h (units)

 Figure 7.5  Figure  7.5 Semivariogram  Semivariogram and correlogram based on Figure 7.4. Figure  7.4.

7.3

Sp Spat atia iall in inte terp rpol olati ation on by krig krigin ing g

Kriging is a collection of methods that can be used for spatial interpolation. Kriging provides optimal linear predictions at non-observed locations by assuming that the unknown spatial variation of the property is a realisation of a random function that has been observed at the data points only. Apart from the prediction, kriging also provides the variance of the prediction error. Here, two kriging variants are discussed: simple kriging and ordinary kriging which are  based on slightly different random function function models.

7.3.1

Simple kriging

Theory The most elementary of kriging methods is called simple kriging and is treated here. Simple kriging is based on a RF that is wide sense stationary, i.e. with the following properties (see

also chapter 5):  E  Z x       Z      constant 2 Var  Z x      E  Z x   Z 2     Z     constant (and finite)

COV  Z x 1 , Z x 2       E  Z x 1    Z  Z x 2    Z       C   Z  Z x 2    x 1     C   Z  Z h Simple kriging is the appropriate kriging method if the RF is second order stationary and the  mean of the RF E  RF  E  Z x       is known without error. With simple kriging a predictor  Z x 0  is sought that 1.   is a linear function of the surrounding surrounding data, data,  2.  is unbiased: E  unbiased:  E  Z x 0   Z x 0      0,  3.   and has the smallest smallest possible error, i.e.  Z x 0   Z x 0  is minimal.

122

 

A linear and unbiased predictor is obtained when considering the following weighted average of deviations from the mean: n   Z x 0      Z     i  Z x i    Z 

 

(7.5)

i1

with   Z x i    the values of   of   Z x  at the surround surrounding ing observatio observation n locations. locations. Usually, not all observed locations are included in the predictor, but only a limited number of locations within a given search neighbourhood. Equation (7.5) is unbiased by definition: n

 E  Z x i    Z  E  Z x 0      E  Z x 0   Z x 0        Z     i  E  i1 n

 Z     i  Z    Z   Z      0

(7.6)

i1

The weights   i  should be chosen such that the prediction error is minimal. However, as the real value   z x 0  is unknown, we cannot calculate the prediction error. Therefore, instead of  minimizing the prediction error we must be satisfied with minimizing the variance of the   prediction error   error   Var  Z x 0   Z x 0 . Because the predictor is unbiased, the variance of the  prediction error can be be written as:   Var  Z x 0   Z x 0       E  Z x 0   Z x 0 2  2

n

 E   



i  E  E  Z x i    Z   Z x 0    Z 



i1 n

n

  i1  j1

i  j  E  E  Z x i    Z  Z x j    Z    

n

 E  Z x i    Z  Z x 0    Z   E  Z x 0    Z 2 2  i E 

(7.7)  

i1

Using the definition of the covariance of a second order stationary SF 2  E  Z x i    Z  Z x j    Z       C   Z  Z x i   x j  and   C   Z  Z 0     Z , we obtain for the variance of  the prediction error:  Var  Z x 0   Z x 0     n

n

n

 

2 i  j C   Z   Z x i   x 0    Z  Z x i    x j   2  i C  Z 

(7.8)  

i1

i1  j1

To obtain the mininum value of Equation (7.8) we have to equate all its partial derivatives with 123

 

respect to the   i to zero: n

  i Var  Z x 0   Z x 0      2   j C  Z x i   x j   2C  Z x i   x 0    0   jj1

i     1,.., n

 

(7.9)

This results in the following system of   n equations n equations referred to as the simple the  simple kriging system: system: n



 j C  Z   Z x i    x j     C  Z   Z x i    x 0 

 

i     1,.., n

 

(7.10)

 j1

The   n  unknown values   i  can be uniquely solved from these   n   equations if all the   x i   are different. The predictor (7.5) with the   i   found from solving (7.10) is the one with the minimum prediction error variance. This variance can be calculated using equation (7.8). However, it can be shown (e.g. de Marsily, 1986, p.290) that the variance of the prediction error can be written in a simpler form as: n

 2 Var  Z x 0   Z x 0        Z    i C  Z   Z x i    x 0 

 

(7.11)

i1

The error variance very nicely shows how kriging takes advantage of the spatial dependence of   Z x i . If only the marginal probab probability ility distribution distribution had been estimated from the data and the spatial coordinates had not been taken into account, the best prediction for every non-observed location would have been the mean    Z . Consequently, the variance of the prediction error  2 would have been equal to    Z  . As the larger kriging weights are positive, it can be seen from (7.11) that the prediction error variance of the kriging predictor is always smaller than the variance of the RF.

To obtain a positive error variance using Equation (7.11) the function   C h must be positive   N 

definite. This means that for all possible x 1 ,...x n        N    1,2 or 3 and for all   1 ,..., n       the following inequality must hold: n

n

 

i  j C  Z   Z x i    x j     0

(7.12)

i1  j1

It is difficult to ensure that this is the case for any covariance function. Therefore, we cannot  just estimate a covariance function directly from the data for a limited number of separation di dista stanc nces es an and d th then en ob obtai tain n a co cont ntin inuo uous us fu func ncti tion on by line linear ar in inte terp rpol olat atio ion n be betw twee een n th thee experimental points (suchnot as in Figure 7.5)lead . If such a covariance function wereprediction used in (7.10) Equation (7.11) would necessarily to a positive estimate of the error , variance. In fact, there are only a limited number of functions for which it is proven that inequality (7.12) will always hold. So the practical solution used in kriging is to take one of  these ‘permissible’ functions and fit it through the points of the experimental covariance function. Next, the values of the fitted function are used to build the kriging system (7.10) and

124

 

to estimate the kriging variance using (7.11). Table 7.1 gives a number of covariance functions that can be used for simple kriging (i.e. using a wide sense stationary RF). Such a table was already in chapter 5 but for estimated convenience here. Figure 7.6 shows an exampleintroduced of an exponential model thatisis repeated fitted to the covariances. Of course, in case 2 of second order stationarity the parameter  c should  c  should be equal to the variance of the RF: c RF: c       Z  . Table7.1 Table 7.1 Permissible  Permissible covariance functions for simple kriging ;  his the length of the lag vector  3

(a) spherical model   C h    

c   1     32   ha       12   ha   

if   if   h     a

0

if   if   h     a

(b) exponential model   C h    c exph/a (c) Gaussian model   C h    c exph/a 2  c (d) nugget model

 

if   if   h     0

 

C h     0

if   if   h     0

30

25

           )

20

     h

           (

     C   e   c   n 15   a    i   r   a   v   o    C

10

5

0

0

100

200

300

400

500

600

Separation distance h  (m)

700

800

900

1000

 Figure 7.6  Figure  7.6 Example  Example of an exponential covariance model fitted to estimated covariances

Some remarks should be made about the nugget model. The nugget stems from the mining  practice. Imagine that we find occasional gold nuggets in surrounding rock that doesn’t contain any gold itself. If we were to estimate the t he covariance function of gold content from our  2 observation, we would get the nugget model with c with  c       Z     p1    p (with (with p  p the  the probability of  finding a gold nugget).

Any linear combination of a permissible covariance model is a permissible covariance model itself. Often a combination of a nugget model and another model is observed, e.g:

125

 

   c   c 0

C h 

if   if   h     0

 

1

(7.13)  

c 1 exph/a   if   if   h     0 2 where   c 0    c 1       Z  . In this case   c 0  is often used to model the part of the variance that is attributable to observation errors or spatial variation that occurs at distances smaller than the minimal distance between observations. The box below shows a simple numerical example of simple kriging.

Box 3: Simple kriging example 0 3

x1

x2 7 x0   x3

  14

Spatial lay-out

1

2

3

0 0

3 1

1

1 3

0

4

2 1

3 0 1

3 1

4  1

3

0

Distance table (units)

λ 1C (x1



λ 1C (x 2



x1 ) + λ 2 C (x1 − x 2 ) + λ 3C (x1 − x 3 ) = C (x1 − x 0 ) x1 ) + λ 2 C (x 2



x 2 ) + λ 3C (x 2



x 3 ) = C (x 2



x0 )

λ 1C (x 3



x1 ) + λ 2 C (x 3



x 2 ) + λ 3C (x 3



x 3 ) = C (x 3



x0 )



C (x i  − x j ) = 22.69e

x i − x  j

2

µ  = 7.1

22.69λ 1 + 5.063λ 2

+

3.071λ 3

=

5.063λ 1

22.69λ 2

+

13.76λ 3

=

13.76

3.071λ 1 + 13.76λ 2

+

22.69λ 3

=

13.76

λ 1

=

+

λ 2    = 0.357

0.0924

 Z * ( x 0 ) = 7.1 + 0.0924 ⋅ ( −4.1) *

*

var[ Z  ( x0 )   − Z   ( x0 )] 22.69



+

λ 3

5.063

=

0.357 ⋅ ( −0.1)

0.378 +

0.378 ⋅ 6.9

=

9.29

=

0. 0924 ⋅5.063



0.357 ⋅13.76



0.378 ⋅13.76 =12.11

A simple example of simple kriging. Top left gives the spatial lay-out of the data points and the target location for prediction of the property. Top right shows the table of distances between these locations. The kriging system is shown therafter, with the xi-x j covariances on the left and the xi-x0 on the right. Using the assumed mean and covariance function shown, the next boxes show the numerical solution of the kriging equations and the evaluation of the kriging  predictor and the kriging variance.

Practice The practical application of simple kriging would involve the mapping of some variable observed at a limited number of locations. In practice, the kriging routine would consist of the following steps, which will be illustrated with the Walker-lake dataset: 1.   Estimate the mean and the the covariance function from the da data ta The mean value of the Walker-lake data based on the observations and declustering is 2.53. 2.   Fit a permissible covariance model model to the experimental semivariogram

126

 

Usually Usuall y one one do does es not estima estimate te the covaria covariance nce functi function on bu butt the semiva semivario riogra gram m when when krigin kri ging. g. The The semiva semivario riogra gram m is somewh somewhat at better better suited suited for estimat estimating ing data data that that are irregularly in space. After fitting semivariogram function that suited for  wide sense distributed stationary processes (See the firsta four models in Table 7.2), theiscovariance 2 function can be obtained through Equation (5.19):  C   Z  Z h     Z    Z h. Figure 7.7 shows the semivariogram of the Walker-lake data set based on 140 data points and the fitted model: If kriging is used for making maps, the locations where the predictions are made are usually located on a grid. So, when in the following steps we refer to a prediction location x 0  we refer  to a location on this grid. Thus, the following steps are repeated for every grid node: 3.   Solve the simple simple kriging system Using Equation (7.11) and the covariance function C  function  C   Z  Z h the   i  are obtained for location x0. 4.   Predict the value Z x 0  i , the observed values z  With the  value  z x i  and the estimated value of     Z  in Equation (7.5) the unknown of  Z   Z x 0  isvalues predicted 5.   Calculate the variance variance of the prediction error  2 Using  i x 0 ,  C  Z   Z h and   Z  the variance of the prediction error is calculated with (7.11). The result is a map of predicted properties on a grid and a map of associated error variances. Figure 7.8 shows the map of kriging predictions and the associated prediction variance or  kriging variance: 16 14 12 10            )

        h

           (

 

8 6

h γ  (h) = 8.73 sph(   ) + 3.42 sph( 30h.17 )   13.31

estimated semivariogram fitted model

4 2 0 0

5

10

15

20

25

30

35

40

lag h

Figure igure 7.7  7.7 Semivariogram  Semivariogram and fitted semivariance function of the 140 the  140 locations  locations of the Walker  lake data set  (  ( Figure  Figure 2.1);  2.1); SPH   SPH (() spherical )  spherical model .

127

 

 Figure 7.8  Figure  7.8 Interpolation  Interpolation with simple kriging predictions and the associated kriging variance of   theWalker lake data

7.3.2

Ordinar ary y kriging

Theory Ordinary kriging can be used if  1.   Z x is a wide sense stationary RF but the mean of  Z   Z x is unknown, or  2.   Z x is an intrinsic RF. An intrinsic RF has the following properties (see also chapter 5):  E  Z x 2   Z x 1      0  E  Z x 2   Z x 1 2   2x 2    x 1    2 h The mean difference between the RVs at any two locations is zero (i.e. constant mean) and the

variance of this difference is a function that only depends on the separation vector   h. The function    h    21  E   Z x  Z x   h2 is the semivariogram. The ordinary kriging predictor is a weighted average of the surrounding observations: observations:   Z x 0  

n



i Z   Z x i 

 

(7.14)

i1

with   Z x i  the values of   of   Z x at the observation locations (usually within a limited search neighbourhood). neighbourho od). As with the simple kriging predictor we want (7.14) to be unbiased: n

  Z x i   Z x 0      E  Z x 0   Z x 0       E  i Z  i1

128

 

n



i  E  E  Z x i   E  Z x 0       0

(7.15)

i1

As the unknown mean is constant, i.e.   E  Z x i       E  Z x 0  x i , x 0 , we find the following “unbiasedness constraint” for the   i : n



i      1

(7.16)

i1

Apart from being unbiased we also want to have a predictor with a minimum variance of the  prediction error. The error variance for predictor (7.14) can be written in terms of the semivariance as (see for instance de Marsily (1986) for a complete derivation):   V  Z x 0   Z x 0       E  Z x 0   Z x 0 2  n

n





n

i  j  Z x i    x j   2  i  Z x i    x 0 

 

(7.17)

i1

i1  j1

We want to minimize the error variance subject to the constraint (7.16). In other words, we want to find the set of values   i ,  i     1,.., n  for which (7.17) is minimum without violating constraint (7.16). To find these, a mathematical trick is used. First the expression of the error  variance is extended as follows:   E  Z x 0   Z x 0 2  n

n





i1  j1

n

n

i  j  Z x i   x j   2  i  Z x i    x 0   2    i    1 i1

(7.18)

i1

If the estimator is really unbiased, nothing has happened to the error variance as the added

term is zero by definition. The dummy variable    is called the Lagrange the  Lagrange multiplier . It can be shown that if we find the set of   i , i     1,.., n  and the value of     for which (7.18) has its minimum value, we have also have the set of   i x 0 , i     1,.., n for which the error variance of  the ordinary kriging predictor is minimal, while at the same time      i     1. As with simple krigin kri ging, g, the minimu minimum m value value is found found by partia partiall differe differenta ntatio tion n of (7.18 (7.18)) with with respec respectt to i ,  i     1,.., n  and    and equating the partial derivatives to zero. This results in the following system of (n (n  1) linear equations: n

  j  Z x i    x j        Z x i    x 0 

 

i     1,.., n

  j 1

(7.19)  

n

 i     1 i1

129

 

Using the langrange langrange multiplier multiplier,, the value for the (minimum (minimum)) variance variance of the prediction prediction error  can be conveniently written as:  V  Z x 0   Z x 0    

n



i  Z x i    x 0   

 

(7.20)

i1

A unique solution of the system (7.19) and a positive kriging variance is only ensured if thew semivariog semiv ariogram ram function function is “conditio “conditionally nally non-negative non-negative definite”. This means means that for all   N   possible   x 1 ,...x n        N      1,2 or 3 and for all   1 ,..., n         such that    i i     1, the following inequality must hold: n

n





i  j  Z x i   x j    0

(7.21)

i1  j1

This is ensured if one of the permissible semivariogram models (Table 7.2 ,see also chapter 5)) is fitted to the experimental e xperimental semivariogram data. Table  7.2 Permisible Table 7.2  Permisible semivariogram models for ordinary kriging ;  here h denotes the length of   the lag vector h. h. (a) spherical model   h    

c c

 1 h 3   3 h         a  a 2 2

if   if   h      a if   if   h      a

 

(b) exponential model   h    c1   exph/a (c) Gaussian model   h    c1   exph/a 2  0

if   if   h     0

(d) nugget model

 

h     1

if   if   h     0

(e) power model

 

h    ch 

0           2

Models (a) to (d) are also permissible in cases the RF is wide sense stationary. The power  model, which does not reach a sill can be used in case of an intrinsic RF but not in case of a wide sense stationary RF.

The unknown mean    Z  and   and the langrange multiplier     require some further explanation. If all the data are used to obtain predictions at every location, at all locations the same unknown mean    Z  is implicitly estimated by the ordinary kriging predictor. The lagrange multiplier  represents the additional uncertainty that is added to the kriging prediction by the fact that the mean is unknown and must be estimated. Therefore, if the RF is wide sense stationary, the variance of the prediction error for ordinary kriging is larger than that for simple kriging, the di diffe ffere renc ncee be bein ing g th thee lagra lagrang ngee multi multipl plie ier. r. This This ca can n be de dedu duced ced from from subs substi titu tuti ting ng in 2 account that      i     1. This means that, Equation(7.20)    Z h     Z   C  Z   Z h and taking into account

130

 

whenever the mean is not exactly known and has to be estimated from the data it is better to use ordinary kriging, so that the added uncertainty about the mean is taken into account. Even in number simple kriging rarely uses all data tolocation obtain kriging predictions. only limited of dataone close to the prediction are used. This is to Usually avoid that thea kriging systems become too large and the mapping too slow. The most common way of  selecting data is to center an area or volume at the prediction location x 0 . Usually the radius is taken about the size of the variogram range. A limited number of data points that fall within the search area are retained for the kriging prediction. This means that the number of data locations becomes a function of the prediction location:  n      nx 0 . Also, if ordinary kriging is used, a local mean is implicitly estimated that changes with  x 0 . So we have         x 0  and      x 0  footnote . This shows that, apart from fr om correcting for the uncertainty in the mean and  being able to cope with a weaker form of stationarity, ordinary kriging has a third advantage when compared to simple kriging: even though the intrinsic hypothesis assumes that the mean is constant, using ordinary kriging with a search neighbourhood enables one to correct for  local deviations in the mean. This makes the ordinary kriging predictor more robust to trends in the data than the simple kriging predictor.  Note:that for briefness of notation we will use   n   and    in the kriging equations, instead of  ns 0  and   s 0 . The reader should be aware that in most equations that follow, both the number of observations and the lagrange multipliers depend on the prediction location   s 0 , except for those rare occasions that a global search neighbourho neighbourhood od is used. In box 4 the ordinary kriging prediction is illustrated using the same example as Box 3. When compared to simple kriging it can be seen that the prediction is slightly different and that the  prediction variance is larger.

Practice In practice ordinary kriging consists of the following steps (illustrated again with the Walker  lake data set): 1.   Estimate the semivariogram semivariogram

2.   Fit a permissible semivariogram semivariogram model  model  For every node on the grid repeat: 3.   Solve the kriging equations Using the fitted semivariogram model    Z h in the   n  1 linear equations (7.19) yields, after solving them, the kriging weights  i , i     1,.., n and the lagrange multiplier    . 4.   Predict the value Z x 0  With the   i , the observed values   z x i   (usually within the search neighbourhood) in equation (7.14) the unknown value of  Z   Z x 0  is predicted. variance of the prediction error  5.   Calculate the variance Using  i ,   Z h and  the variance of the prediction error is calculated with (7.20). The semivariogram was already shown in Figure 7.7. Figure 7.9 shows the ordinary kriging  prediction and the ordinary kriging variance. Due to the large number of observations (140) there are no visual differences between Figure 7.9 and 7.8.

131

 

Box 4: Ordinary kriging example λ 1γ  ( x1



x1 ) + λ 2γ  (x 1 − x 2 ) + λ 3γ ( x 1 − x 3 ) + ν 

=

γ (x 1 − x 0 )

λ 1γ  ( x 2



x1 ) + λ 2γ ( x 2



x 2 ) + λ 3γ ( x 2

=

γ ( x 2 − x 0 )

λ 1γ  ( x 3



x 1 ) + λ 2γ ( x 3



x 2 ) + λ 3γ (x 3

λ 1  γ (x i

+

λ 2

x j )

=



x 3 ) +ν 

x 3 ) + ν  = γ ( x 3

λ 3

+







22  .69 (1 − e

=

2

16. 627λ 1

=

+ +

19.619λ 3

+

=

*

ν  = 17.628

8.930λ 3 + ν  = 8.930 +

λ 2

0.172 λ 2

1

)

19. 619λ 1 + 8.930λ 2

λ 1

x0)

x i − x j

17.627λ 2

λ 1 +



0.381   λ 3

+

=

 Z  (x 0 ) = 0 .172 ⋅ 3 + 0 .381 ⋅ 7

λ 3

ν 

=

8.930

=

1

0. 447 ν  = 2.147

+

0 . 447 ⋅ 14

=

9. 44

var[ Z * (x 0 )  − Z  * (x 0 )] = 0.172 ⋅17.627

+

0.381⋅ 8.930 + 0.447 ⋅ 8.930 + 2.147 =12.57

A simple example of ordinary kriging. For spatial lay-out of the data points and the table of distances between locations we refer to Box 3. The kriging system is shown therafter, with the xi- x j semivariances on the left and the xi-x0 semivariances on the right. ri ght. Using the assumed mean and semivariance function shown, the next boxes show the numerical solution of the kriging equations and the evaluation of the kriging predictor and the kriging variance.

 Figure 7.9 Interpolation  Figure 7.9  Interpolation with simple kriging predictions and the associated kriging variance of   theWalker lake data

132

 

7.3.3

Block kriging

Up to known we have been concerned with predicting the values of attributes at the same   (averaging volume) as the observations, usually point support. However, in many  support  (averaging cases one may be interested in the mean value of the attribute for some area or volume much larger than the support of the observations. For instance, one may be interested in the average  porosity of a model block that is used in a numerical groundwater model, or the average  precipation of a catchment. These average quantities can be predicted using block kriging. kriging. The term “block kriging” is used as opposed to “point kriging” or “punctual kriging” where attributes are predicted at the same support as the observations. Any form of kriging has a  point form and a block form. So, there is simple point kriging and a nd simple block kriging and ordinary point kriging and ordinary block kriging etc. Usually, the term “point” is ommited and the term “block” is added only if the block kriging form is used. Consider the problem of predicting the mean   Z   Z   of the attribute   z   z   that varies with spatial co-ordinate x for some area or volume D volume  D  with size |  D| D| (length, area or volume):  Z      1     Z xd x |  D| D| x D In case  case   D is a block in three dimensions with upper an lower boundaries boundaries x boundaries  x l ,  y l ,  z l ,  x u ,  y u ,  z u  the spatial integral (7.22) stands for  1     Z xd x    |  D| D| x D  z 

 y

 x

u u u 1  Z  s 1 , s 2 , s 3 ds 1 ds 2 ds 3      | x  x u    x l |||| y  y u    y l ||||  z  z u    z ll  |  z l l   y l   x l 

 

(7.23)

Of course, the block   D can D can be of any form, in which case a more complicated spatial integral is used (e.g. Figure 7.10 in two dimensions):

 Z   Figure 7.10 Example  Figure 7.10  Example of block kriging in two dimensions to predict the mean value of Z of some irregular area D 133

 

 can be predicted as linear combination of the Similar to point kriging, the unknown value of  Z   Z  observations by assuming that the predictant and the observations are partial realizations of a RF. So, the ordinary bock kriging predictor becomes:   Z   

n



i Z   Z x i 

 

(7.24)

i1

 where the block kriging weights   i  are determined such that  Z  is   is unbiased and the prediction  variance   Var  Z    variance   Z  is minimal. This is achieved by solving the   i  from the ordinary block  kriging system: n

  j  Z x i    x j        Z x i , D

 

i     1,.., n

 j1

(7.25)   n

 i     1 i1

It can be seen that the ordinary block kriging system looks almost the same as the ordinary (point) kriging system, except for the term on the right hand side which is the average semivariance between an location x i and all the locations inside the area of interest D interest  D::  Z x i , D     1     Z x i    xd x |  D| D| x D

 

(7.26)

When building the block kriging system, the integral in equation (7.26) is usually not solved. Instead,Second, it is approximated by firstare discretizing area of in a limited  points. the semivariances calculated the between theinterest observation locationnumber and the  theof    N   points   x j   discretizing   D  (see Figure 7.10 left figure). Third, the average semivariance is approximated by averaging these semivariances as:

 N 

 Z x i , D     1     Z x i   x j   N 

 

(7.27)

 j1

γ  Z   ( x i , D)  

xi

γ  Z  ( D  , D )  D

 D

(left )  and  (7.29)  (7.29) (right  (right )  Figure7.11 Numerical  Figure7.11  Numerical approximation of the spatial integrals (7.26) integrals  (7.26) (left 

134

 

The variance of the prediction error is given by   Var  Z      Z     E   Z      Z 2 

n



i  Z x i , D     Z  D  D,, D

 

(7.28)

i1

where    Z  D  D,, D is the average semivariance within the area   D, i.e. the average semivariance  between all locations with D with D::  Z  D  D,, D     1 2     x 1    x 2 d x 1 d x 2 |  D| D| x 2  D x 1  D

 

(7.29)

which in practice is approximated by N  by  N  points  points x i  discretizing  discretizing D  D as  as (see also Figure 7.11, right figure)  N 

 N 

 Z  D  D,, D     12     N  i1



x i   x j 

 

(7.30)

 j1

Figure 7.12 shows the result of block kriging applied to the Walker lake data set with block  sizes of 5   5 units.

Here we have given the equations for ordinary block kriging. The simple block kriging equations can be deduced in a similar manner from the simple kriging equations (7.10) by replacing the covariance on the right hand side by the point-block covariance   C   Z  Z x i , D. The 2  prediction error variance is given by (7.11) with    Z  replaced by the within block variance C   Z   D  D,, D  (the average covariance of points within   D) and   C  Z  Z   Z x i   x 0  by   C   Z  Z x i , D. The  point-block covariance and the within block covariance are defined as in Equations (7.26) and  Z x 1    x 2 . (7.29) with   Z x 1    x 2  replaced by C  by C  Z 

 Figure 7.12  Figure  7.12 Block  Block kriging applied to the Walker lake data set with block sizes of   5  5 units

135

 

7.4

Estim Estimati ating ng the local local condit condition ional al distri distribut bution ion

Krigin Krig ing g ca can n also also be us used ed to es estim timat atee fo forr ea each ch no nonn-ob obser serve ved d lo loca catio tion n th thee pr prob obab abil ility ity distribution f   distribution  f   z  z ; x  |   |   z (x i , i     1,.., n, i.e the probability distribution given the observed values at the observation locations. Let us return to Figure 5.8. This figure shows conditional random functi fun ctions ons.. Each Each realisa realisatio tion n is condit condition ioned ed by the observ observatio ations, ns, i.e i.e.. it passes passes throu through gh the ob obser serve ved d va valu lue, e, bu butt is free free to va vary ry be betw twee een n ob obse serv rvati ation ons. s. The The farth farther er away away from from an observatio obser vation, n, the more the realisations realisations differ. This is reflected reflected by the condition conditional al pdf   pdf   f   z  z ; x  |  z (x i , i     1,.., n at a given location (two of which are shown in Figure 5.8). The farther away from an observation, the larger the variance of the conditional pdf, which means the more uncertain we are about the actual but unknown value   z x. In the following sections methods are shown that can be used to estimate the conditional pdf   pdf    f   z  z ; x  |   z (x i , i     1,.., n through kriging.

7.4.1

Mu Mult ltiv ivar aria iate te Gaus Gaussi sian an rand random om func functi tion onss

If, apart from being wide sense stationa stationary, ry, the RSF is also multivariate multivariate Gaussian distributed distributed then we have:    The kriging error is Gaussian distributed with with mean zero and variance equal to the simple simple 2  kriging variance    SK x 0    VAR  Z x 0   Z x 0 . A 95%-prediction interval would then SK x 0   2 SK x 0 ,   z  SK x 0   2 SK x 0 , where z  SK x 0  is the simple kriging  be given by  z  where  z   prediction.    The condition conditional al cumulative cumulative probability probability distribution distribution function function (ccpdf) is Gaussian Gaussian with  x  (the dashed line in Figure 5.8) and mean equal to the simple kriging prediction   z 

SK 

0

2 variance equal to the variance of the simple kriging prediction error     SK  x 0  (the variance over the realisations shown in Figure 5.8):

 F   z z |  z z 1..  z ; x 0 | z x i , i     1,.., n  1.. z n  1 2 2 SK  x 0 

 z 





SK x 0  2     z      z  exp dz  2  SK  x 0 

 

(7.31)

where   z x 1 ,...., where  ,...., z   z x n  are the observed observed values at locations locations x 1 ,...., x n  respectively. The second  property is very convenient and the reason why the multivariate Gaussian and stationary RSF is very popular model in geostatistics. After performing simple kriging predictions, one is able to give an estimate of the ccpdf of  Z   Z x exceeding a given threshold for every location in the do doma main in of in inte tere rest st.. For For in inst stan ance ce,, if  if    Z x   is a co conc ncen entr trati ation on of some some po pollu lluta tant nt in th thee groundwater and  and   z c  is critical threshold above which the pollutant becomes a health hazard,

136

 

simple kriging and Equation (7.31) can be used to map the probability of exceeding this threshold, given the concentrations found at the observation locations. Instead of delineating a single plume some predicted alternative plumes delineated, depending on based whichupon probability contour isvalue, taken several as its boundary. This way, can bothbethe observed concentration values as well as the local uncertainty are taken into account when mapping the  plume. Also, the risk of not treating hazardous groundwater can be weighted against the costs of remediation. For instance, if the risk of not treating hazardous groundwater should be smaller than 5 %, all the water within the 0.05 contour should be treated. Obsviously this results in much higher costs then if, for instance, a 10% risk is deemed acceptable. For a more elaborate discussion about probability distributions and the trade off between risk and costs, we refer to Goovaerts (1997, section 7.4).

7.4.2

Log-normal kriging

Many geophysical variables, such as hydraulic conductivity and pollutant concentration, are approximately log-normal log-normal distributed. A frequently used RSF model to decribe these variables is the multivariate logGaussian distribition. If   If   Z x is multivariate lognormal distributed, the natural logarithm   Y x   ln  Z x is multivariate Gaussian distributed. Log-normal kriging then consists of the following steps: 1.   Transform the the observations observations  z x i  by taking their their logarithms y logarithms y x i    ln  z x i . from the logtra logtransf nsform ormed ed data data   yx i    aan 2.   Estimat Estimatee the semiva semivario riogra gram m   Y h   from nd fit a  permissible model (note: that mean mean m  m Y  must  must be determined and assumed known if simple kriging is used). 3.   Using the semivariogram   Y h, the data   yx i  (and the mean   m Y   in case of simple kriging), the kriging equations are solved to obtain at every non-observed location  x 0  the SK x 0  and prediction error variance    2  prediction   Ŷ SK  YSK x 0  in cased of simple kriging or  2 Ŷ OOK  K x 0 ,   YOK x 0  in case of ordinary kriging.

4.   An unbiased prediction of  Z   Z x 0  is obtained by the following f ollowing backtransforms footnote :

0

for simple kriging:   1   2  x 0    exp Ŷ SSK   Z  K x 0   2 YSK 

 

(7.32)

and for ordinary kriging:   1   2    x 0    exp Ŷ O  Z  OK  K x 0   Y  2 YOK 

 

(7.33)

where Y  is  is the lagrange multiplier used in the ordinary kriging system. 5.   If   If   Y x is multivariate Gaussian distributed and stationary, the ccpdf can be calculated from the simple kriging prediction ŷ SK x 0  and prediction variance as:

137

 

 F   z z |  z z 11.... z n  z ; x 0 | z x i , i     1,.., n  2

 z 

1

2 2YSK x 0 

 exp  

ln z   xSK x 0   2YSK  ŷ  0

dz 

 

(7.34)

An additional reason why in many geostatistical studies the observations are logtransformed  before kriging is that the semivariogram of logtransorms can be better estimated (shows less noise) because of the imposed variance reduction.

7.4.3

Kriging normal-score transforms

An even more general transformation is the normal-score transform using the histogram. Through this transform, it is possible to transform any set of observations to univariate Gaussian distributed variables, regardless of the distribution of these observations. A normal score-transform proceeds as follows: 1.   The n The n observations  observations are ranked in ascending order:  z x i 1    z x j 2   ..     z x k r    ..     z x l n where r     1,.., n are the ranks of the observations. where r  2.   The cumulative probability probability associated with observatio observation n z x k     z kk   with  with rank  r   r  is  is estimated as:  z k k     r  z kk  /n   1.  F 

or in case of declustering

(7.35)

r  z k  k 

 z k k    F 



w r  z k k .

(7.36)

i1

3.   The associated normal score transform is given by the the p  pr -quantile -quantile of the standard normal distribution:  z kk  x k   y ns  z k k x k       N 1  F 

 

(7.37)

where   N  z   is the standard Gaussian cumulative distribution function and   N 1  p  its inverse. Figure 7.13 shows graphically how the normal-score transform works. The left figure shows the estimated cumulative distribution of the original (non-transformed) data and the right figure the standard Gaussian cumulative distribution. The dashed lines show how the observed

138

 

values z  values  z k k  are  are transformed into the normal-score transform y transform y ns  z kk  . Cum. Freq. Distr. (declu (decluster ster if neccesary)

 F  Z ( z )

 F Y ( y)  y)

1

0

Standard normal distribution

1

0

5

10

 z 

15

0

-2

0

2  yns( z )

 Figure 7.13  Figure  7.13 Normal  Normal score transformation If we assume assume that that the no norma rmal-s l-scor coree transf transform ormss are statio stationar nary y and multiv multivari ariate ate Gaussia Gaussian n distributed (see Goovaerts, 1997 for suggestions how to check this), the local ccpdfs can be obtained through simple kriging as follows: 1.   Perform a normal score transform of the observations observations such as decribed abov above. e. 2.   Estimate the semivariogram of the normal-score normal-score transformed data data y  y ns x k     y ns  z x k  and fit a permissibl permissiblee semivariog semivariogram ram model model   Y h. By de defi fini niti tion on,, th thee mean mean va valu luee of th thee tranformed RSF Y  RSF Y nnss x is zero. 3.   Use the fitted semivariogram model model and the normal-score transforms transforms y  y ns x k  in the simple kriging krigi ng equations equations (with   m Y      0) to obtain the prediction   Ŷ SSK  K x 0   and the associated 2 variance  YSK x 0 .

 prediction error 4.   The local ccpdf is then given given by

 F   z z |  z z 11.... z n  z ; x 0 | z x i , i     1,.., n   Pr G ŷ SK x 0 ,  YSK x 0       y ns  z     1 2 2YSK x 0 

 z 



    y ns  z   ŷ SK x 0  exp  2YSK x 0 

2

dz 

 

(7.38)

where  y ns  z  is the normal-score transform of the value z  where y value  z  and G  and  G ,  a Gaussian variate with mean   and variance   . This Th is is also also show shown n gr grap aphi hica call lly y in Figu Figure re 7.13 7.13.. Supp Suppos osee we want want to kn know own n fo forr th thee no nonn-ob obse serv rved ed locat locatio ion n th thee pr prob obab abil ility ity that that   Z x 0     z . We firs firstt obta tain in th thro rou ugh th thee transformation the value of  y  y ns  z . From the simple kriging of transformed data we have at  x 0 : 2  ŷ SK x 0 and  YSK x 0 . Finally, we evaluate Pr G ŷ SK x 0 ,  YSK x 0      y ns  z (Equation 7.38) to obtain the probability.

To calculate the normal-score transform of any given value z  value  z  (which  (which is not necessarily equal to  z  must be the value of one of the observations), the resolution of the estimated cpdf   cpdf   F 

139

 

 z  between two increased. Usually, a linear interpolation is used to estimate the values of   of   F  ob observ servati ations ons (see (see Figure Figure7. 7.13) 13).. Of course course,, most most critica criticall is the extrap extrapola olatio tion n that that must must be    performed to obtain the lower and upper tails of   F  z . For instance, if the upper tail of   F  z  rises too quickly to 1, the probability of high values of  z   z  (e.g.  (e.g. a pollutant in groundwater) may  be underestimated. Usually a power model is used to extrapolate the lower tail and a hyperbolic model to extrapolate the upper tail. Several models for interpolating between quantiles, as well as rules of thumb about which model to use, are given in Deutsch and Journel (1998) and Goovaerts (1997).

This section is concluded by application of the normal score transform to the Walker lake data set. Figure 7.14 shows the histogram and the semivariogram of of the normal score transforms. It can be seen that semivariogram is less noisy than that of the non-transformed data (Figure 7.7), because transformation decreases the effect of the very large values. The simple kriging  predictions and associated variances are shown in Figure 7.15. Figure 7.16 shows the  probability that the   z  exceeds   exceeds the value of 5 and 10. If these were critical values and the Walker lake data groundwater concentrations Figure 7.16 shows the effect of the critical concentration on the probability of exceeding and through this on the area that must be cleaned up.

1.2

1.0

0.8

  e   c   n   a    i   r   a 0.6   v    i   m   e    S

0.4

0.2

0.0 0

5

10

15

20 Lag

25

30

35

40

h

 Figure 7.14 Histogram  Figure 7.14  Histogram and semivariogram of normal score transforms of the Walker lake data  set ;  fitted semivariogram model :    h   0.2Nugh  0.8Sphh/19.9

140

 

 Figure 7.15  Figure  7.15 Simple  Simple kriging results of normal score transforms of the Walker lake data set 

 Figure 7.16  Figure  7.16 Probability  Probability of exceeding  5  5   and  10  10  based on normal score simple kriging of the

Walker lake data set 

7.5

Geos Geosta tati tist stic ical al simu simula lati tion on

The third field of application of geostatistics is simulating realisations of the conditional random rando m function function   Z x|  z  z x i , i     1,.., n. Retu Return rnin ing g to Figu Figure re 5. 5.8: 8: in case case of a wi wide de se sens nsee stationary and multiGaussian RSF Z  RSF  Z x, simple kriging provides the dashed line, which is the mean mea n of all po possib ssible le con condit dition ional al realisa realisatio tions. ns. The aim of   of   geostatistical simulation   is to genera gen erate te in the indivi individu dual al condit condition ional al realisa realisatio tions. ns. There There are two import important ant reason reasonss why sometimes individual realisations of the conditional RSF are preferred over the interpolated map that is provided by kriging: 1.  kriging provides a so called best linear prediction (it produces values that minimize the  variance of the prediction error:   Var  Z x 0   Z x 0 ), but the resulting maps are much 141

 

smoother than reality. This can again be seen from Figure 5.8. The individual realisations are very noisy and rugged while the kriging prediction produces a smoothly varying surface.The realisations have semivariogram that is resembles thatlike of the so one can say that noisy the real variation of theaproperty considered much more thedata, realisations than the kriging map. This has repercussions if the kriging map is not the end point of the analys ana lysis is (such (such as mappin mapping g concen concentrat tration ions). s). For instan instance, ce, suppo suppose se that that the go goal al is to  produce a map of hydraulic conductivities that is to be used in a groundwater flow model. To use the kriged map as input in the groundwater flow model would produce flow lines thatt are probab tha probably ly too smooth smooth also. also. Especia Especially lly if the go goal al is to model model groun groundw dwater  ater  transport, a smooth map of hydraulic conductivity will yield an underestimation of solute spreading. In that case it is better use realisations of the random function as input. Of  course, as each realisation has equal probability to be drawn and is therefore an equally viable picture of reality, the question remains: which realisation should then be used? The answer is: not a single realisation should be analysed, but a great number of realisations. This conclusion brings us to the second reason why realisations are often preferred over  kriging maps; 2.   multiple realisations as input for a model can be used for   for   uncertainty uncertainty analysis analysis   and ensemble ensem ble prediction prediction.. Figure 5.8 shows that usually we only have limited information about reality and we therefore represent our uncertainty about reality with a random function (see also chapters 1 and 5). Returning to the example of hydraulic conductivity, if  we are uncertain about the parameters of a groundwater model, we also want to know what the uncertainty is about the model output (heads, fluxes). So instead of analysing a single sin gle input input of hy hydra drauli ulicc condu conductiv ctivity ity,, a large large nu numb mber er of condit condition ional al realisa realisatio tions ns of  hydraulic conductivity (say 1000) are used as model input. If we use 1000 conditional realisations of hydraulic conductivity as input, we also have 1000 model runs with the groundwater model, producing (in case of a steady state groundwater model) 1000 head fields and 1000 flow fields. From this, it is possible to estimate the probability of  hydraulic head at each location, or the probability that a contaminant plume reaches certain sensitive area. This way of modelling is really stochastic modelling, and because we do not produce one prediction, but an ensemble of predictions it is often referred to as ensemble ensem ble prediction prediction.. The The vari varian ance ce of the the outp output ut re real alis isat atio ions ns is a meas measur uree of ou our  r 

uncertainty about the output (e.g. hydraulic heads) that is caused by our uncertainty (lack  of perfect perfect knowledg knowledge) e) about about the model model parameters parameters (e.g. hydraulic hydraulic conductivity) conductivity).. So, through this way of stochastic modelling one performs an an uncertainty  uncertainty analysis: analysis: estimating the uncertainty about model output that is caused by uncertainty about model input or  model parameters. There are several ways of performing such an analysis, as will be shown extensively in chapter 8. The method described here, i.e. generating realisations of   parameters or input variables and analysing them with a numerical model, is called  Monte Carlo simulation. simulation. In Figure 7.17 the method of Monte carlo Simulation for uncertainty analysis is shown schematically.

142

 

 f  Y ( y; x   x 0|(n))

True hydraulic conductivity

 y

n Observations

Conditional simulation realisations

Statistical analyses

Statistics: histogram semivariogram

1

1 Groundwater  model

 M 

 M   M  realisations

 

M  outputs  outputs

 Figure   7.17   Schemat Schematic ic represe representa ntatio tion n of Monte Monte Carlo Carlo simula simulatio tion n ap appli plied ed for uncert uncertain ainty ty an analy alysis sis of hydrau hydraulic lic condu conductiv ctivity ity in ground groundwa water ter model modellin ling  g .   Hydraul Hydraulic ic condu conductiv ctivity ity is  spatially varying and sampled at a limited number of locations. locations.   Hydraulic Hydraulic conductivi conductivity ty is modelled as a random space function. function.   Using the observations statistics are estimated that  characterise chara cterise this function function   (histogram histogram,,   senivariogram). senivariogram).   Next Next M realisa realisatio tions ns of this this rando random m  function are simulated and used in the groundwater groundwater model .   This This yields yields M realis realisati ation onss of     groundwater  groundwa ter model output   (e ( e.  g  g .  head fields). fields).   From these realisations it is possible to obtain  for a given location ( location (ee. g   g . x 0 )  the probability density function of the ouput variables ( variables (ee.  g  g .  head , concentration). concentration ). The technique of Monte Carlo simulation is further explained in the next chapter. Here, we focus only on the generation of multiple realisations of the conditional random space function,

commonly referred to as (conditional) (c onditional) geostatistical  geostatistical simulation. simulation. There are quite a few methods for simulating realisations of MultiGaussian random space functions. The most commonly used are LU-decomposition LU-decomposition (Alabert, 1987), the turning band method (Mantoglou and Wilson, 1982) and Sequential Gaussian simulation (Goméz-Hernández and Journel, 1993), while there are even more methods for simulating non-Gaussian random functions (e.g. Amstrong and Dowd, 1994). The most flexible simulation algorithm and mostly used nowadays is sequential simulation. Sequential Gaussian simulation (sGs) will be treated here briefly. For a more elaborate description of the method one is referred to Goovaerts (1997) and Deutsch and Journel (1998). Conditional simulation with sGs needs the mean    Z  and the semivariogram of the random space function  Z h and proceeds as follows. 1.   The area is divided into a finite number number of grid points N  points  N  (location   (location indices  x 1 , x 2 ,.., x N ) at which values of the conditional realisations are to be simulated. The grid points are visited in a random order. 2.   For the first grid point point x 1  a simple kriging is performed from the given data z  data  z s 1 ,.., ,.., z   z s n  2 SK x 1  and the prediction variance   x 1 . Under the assumption yielding the prediction Z  prediction Z  SK 

143

 

that   Z x   is is station stationary ary and multiG multiGaus aussian sian the condit condition ional al cumula cumulativ tivee distri distribu butio tion n is Gaussian: SK x 1 ,  SK x 1   F  Z   z , x 1 |  z  z s 1 ,.., ,.., z   z s n       N  z ; Z   Z 

 

(7.39)

. 3.  A random value   P   between between zero and one is drawn from a uniform distribution   U 0, 1. Using the inverse of the conditional distribution (7.39) the random quantile   P  is   is used to draw a random value Z  value  Z : SK x 1 ,  SK x 1   Z x 1     N 1  P ; Z 

 

(7.40)

4.   For the second grid point x 2  a simple kriging is performed using the data z  data  z s 1 ,.., ,.., z   z s n  and the previously simulated value z  value  z x 1  in the kriging equations (so the previously simulated SK x 2  and the prediction value is now treated as a data point). This yields the prediction  Z  2

rom m which the conditional cumulative distri tribution variance    SK x 2    fro  F   Z   z , x 2 | z   z x 1 , z s 1 ,.., ,.., z   z s n       N  z ; Z  Z  SK x 2 ,  SK x 2  is build. 5.  A random value P  value  P  between  between zero and one is drawn from a uniform distribution U  distribution  U 0, 1and 1 SK x 2 ,  SK x 2   the random using the inverse of the conditional distribution   N   P ; Z  quantile P  quantile  P  is  is used to draw a random value Z  value Z x 2 . 6.  For the third grid point  x 3  a simple kriging is performed using the data  z s 1 ,.., ,.., z   z s n  and th thee prev previo ious usly ly si simu mula late ted d va valu lues es   z x 1 , z x 2    in th thee kr krig igin ing g eq equa uati tion onss yi yiel eldi ding ng   F   Z   z , x 3 | z   z x 1 , z x 2 , z s 1 ,.., ,.., z   z s n       N  z ; Z SK x 3 ,  SK x 3 . Z  7.  Using a random value   P  drawn   drawn from a uniform distribution   U 0, 1 the random variable  Z x 3  is drawn and added to the data set. 8.  Steps 6 and 7 are repeated adding more and more simulated values to the conditioning data set until all values on the grid have been simulated: the last simple kriging exercise ,.., z   z s n . ,..., z   z x N , z s 1 ,.., z x 1 ,..., thus yields the conditional probability F  probability  F  Z  z , x N |  z 

It can be shown heuristically that by construction this procedure produces a draw (realisation) from the multivari riaate conditional distrib ibu ution   F   Z   z x 1 ,..., ,..., z   z x N | z   z s 1 ,.., ,.., z   z s n  Z  (Goméz-Hernández (Goméz-Hernánd ez and Journel, 1993; Goovaerts, 1997), i.e. a realisation from the conditional randum function  function   Z x|  z  z s 1 ,.., ,.., z   z s n . To simulate another realisation realisation the above above procedure procedure is repeated using a different random path over the grid nodes and drawing different random numbers for the quantiles P  quantiles  P      U 0, 1. Unconditional realisations of the random function Z  function  Z x can also be simulated by starting at the first grid point with a draw from the Gaussian distribution   N  z ;  Z ,  Z  and conditioning at every step on previously simulated points only. distribution Obviously, the number of conditioning points and thus the size of the kriging system to be solved increases as the simulation proceeds. This would lead to unacceptably large computer  storage requirements and computation times. To avoid this, a search area is used, usually with a radius radius equal equal to the semivariogram semivariogram range, range, while only a limited limited number number of observatio observations ns and  previously simulated points in the search radius are used in the kriging system (Deustch and a nd Journel, 1998).

144

 

Obviousl Obvio usly, y, the assump assumptio tion n un under derlyi lying ng the simula simulatio tion n algori algorithm thm is that that the RSF RSF   Z x  is stationary and multiGaussian. For a RSF to be multiGaussian it should at least have a univ univariate ariateto Gaussian Gauss ian distributi distribution onset,   f   Z axnormal So,, if th this is meth method od is ap appl plie ied, d, fo for  r      N  z ;  Z ,  Z . So instance, the Walker-lake data score transformation is required. The simulation  procedure for a realisation of  of  Z   Z x|  z  z s 1 ,.., ,.., z   z s n  would then involve the following steps:  z kk  x i  (see 1.   Perform a normal score transform of the observations .  yy ns  z kk  s i       N 1  F  Figure 7.13). 2.   Estimate Estimate the semiva semivario riogra gram m of the no norma rmal-s l-scor coree transf transform ormed ed data data   y ns x i    and fit a  permissible semivariogram model   Y h. By definition, the mean value of the tranformed RSF Y  RSF  Y nnss x is zero. 3.   Assuming   Y nnss x   to be statio stationar nary y and multiG multiGaus aussian sian,, simula simulate te a realisa realisatio tion n of the condit con dition ional al rando random m functio function n   Y ns  y ns x 1 ,.., ,.., y  y ns x N    u usi sing ng se sequ quen enti tial al Gaus Gaussia sian n ns x| y simulation. 1   N  y ns  z kk  x, i.e.reversi i.e.reversing ng the arrows arrows 4.   Back-transf Back-transform orm the simulated simulated values ( z  ( z x  F  inxF| z  ig 7 z  .13 .1 tain a re real alis isat atio ion n of th thee co cond ndit itio iona nall ra rand ndo om fu fun nct ctio ion n  Z   igur z ure s 1e,.., ,.., z  s3n).to obtain

In the geostatistical toolbox of Deutsch and Journel (1998) the simulation program sgsim  performs the normal score transform, sequential simulation and the back transform all together. The parameters of the semivariogram of transforms   Y h have to be provided separately. Figure 7.18 shows two realisations of the conditional random function based on the Walker lake data.

 Figure7.18 Two  Figure7.18  Two simulated realisations of a conditional random function based on the Walker  lake data set 

145

 

7.6

More ore geos osta tattistic tics

In this chapter the basic geostatistical methods have been presented. Naturally, the area of  geostatistics is much more extensive. More advanced geostatistical methods are presented in various textbooks, such as that of Cressie (1993), Rivoirard (1994), Goovaerts (1997), Chilès and Delfin Delfiner er (1999 (1999), ), and Christ Christako akoss (2000) (2000).. More More advanc advanced ed geosta geostatis tistica ticall method methodss are concerned with:    kriging in case of non-stationary non-stationary random functions;    kriging using auxiliary information; information;    estimating conditional conditional probabilities of non-Gaussian non-Gaussian random functions;    simula simulatin ting g realis realisatio ations ns of no non-G n-Gaus aussia sian n rando random m functi functions ons (e.g. (e.g. positi positivel vely y skewed skewed variables such a s rainfall; categorical data such as texture classes);    geostatistical methods applied to sp space-time ace-time random functions;    geostatistics applied to random functions defined on on other metric spaces such as a sp sphere here or river networks;    Bayesian Bayesian geostatistics, geostatistics, i.e using various various forms of a priori information information about the random function and formally updating this prior information with observations. One is referred to above references for elaborate descriptions of these methods.

7.7

Exercises

Consider a square area size 1010 units. Data points are located at locations (2,3), (3,8) and (7,9) with and values of a property   z  of   ofGaussian 3, 8, 5 respectively. property modelled a stationary isotropic multivariate random spaceThe function  Z xiswith mean   with  Z     6 and exponential semivariogram h   20exph/2.

1.   Predict the value of  Z   Z x at x 0      5, 5 using simple kriging. 2.   Predict the value of  Z   Z x at x 0      5, 5 using ordinary kriging. 3.   Calculate the probability probability that Z  that Z 5, 5   10.  value of the 1010 area using block kriging. For calculating the 4.  Predict the average   Z  necessary necess ary point-blo point-block ck semivarianc semivariances es   x, D   and average block semivariance    D  D,, D discretise the block with four points at locations (2,2), (2,8), (8,2), (8,8).

146

 

8. 

Forward stochastic modelling

8.1

Introduction

In previous chapters methods were introduced for stochastic modelling of single variables, time series and spatial fields. A hydrological property that is represented by a random variable or a random function can be the target itself, e.g. flood events (chapter  4), groundwater head series (chapter 6) and areas of high concentration in groundwater  (chapter 7). Often however, we have imperfect knowledge about some hydrological  property that is used as parameter, pa rameter, input series, boundary condition or initial condition in a hydrological model. In that case, interest is focussed on the probability probability distribution or  some uncertainty measure of the model output, given the uncertainty about the model input. This chapter is focussed on deriving these probability distributions or uncertainty measures. 5

More formally, consider a random variable Z  variable  Z  that  that is used as input  for some hydrological model g  model  g  to  to produce an output variable Y , which is also stochastic: Y   =  g ( Z )

(8.1)

The problem to solve is then: given that we know the probability distribution of  Z   f  Z  ( z ) or some of its moments (e.g. mean and variance), what is the probability distribution  f Y  ( y ) of Y   or its moments? This problem is called  forward stochastic modelling , as opposed to backward or inverse (stochastic) modelling . In the latter case we have observations of Y   and the unknown value of some deterministic parameter  z   is estimated from these observations or, if  Z   is stochastic, its conditional probability distribution.  f   Z  Z  ( z |  y ) .

Obviously, the problem of forward stochastic modelling can be put in more general terms, i.e. in case the input or the output are random functions of time, space or spacetime, vectors of more random variables or even vector random functions. Also, he function  g () () can have various forms, such a an explicit scalar or vector function, a differential equation or the outcome of a numerical model. Based on the form of g() and the form of Z  of  Z  the  the following types of relations are considered in the framework of forward stochastic modelling (see Heuvelink (1998) for a good monograph about the subject): •  explicit functions of one random variable; •  explicit functions of multiple random variables; •  explicit vector functions;

  ••  • 

explicit functions of random functions of time, space or space-time; differential equations with a random parameter; stochastic differential equations.

  5

 We use “input” here, but we mean in fact (see chapter 1 for system theory definitions) “input variables”, “parameters”,, “boundary conditions” or “initial conditions”. “parameters”

147

 

In the following sections each of these problems is treated. For each problem type, a number of solution techniques are presented, where for each solution technique the conditions are given that should be met for its application. 8.2 

Explicit functions of one random variable

Consider the relation between two random variables as shown in Equation (8.1).

a)  Derived distributions

Goal: •  the probability density function  f Y  ( y ) . Requirements: of Z  is  is known; •  the probability density  f  Z  ( z ) of Z 

• 

the function  g ( Z   Z ) is monotonous (only increasing or only decreasing), differentiable and can be inverted.

The cumulative distribution function of Y  can   can be obtained from the distribution function of Z  of Z  as  as follows:  F Y  ( y ) =  F  Z  ( g −1 ( y ))

(8.2)

while the probability density function (pdf) of Y  is   is related to the pdf of  Z  as   as (Papoulis, 1991):  f Y  ( y ) =

d [ g −1 ( y )] dy

 f  Z  ( g −1 ( y ))

(8.3)

where  g −1 ( y ) is the inverse function and the term | . | the absolute value of its derivative. The term | . | ensures that the area under  f Y  ( y ) is equal to 1. Example Take the relation between water height above a weir crest h and the discharge q that is used to measure discharge with a weir (this could also be a rating curve for some river):   q = ah b (8.4)  Now suppose that the water height is observed with some error making it stochastic with  pdf The inverse of this relation and its derivative are given as: 1

 g −1

q  b =      a 

(8.5)

148

 

1−b

( g − )

1 '

1 q  =     b  a 

b

(8.6)

The probability density function of discharge  f  H  ( h) then is given by: 1−b

  1b    q    f Q ( q ) =    f  H      b  a    a       1  q 

b

(8.7)

b) Derived moments

Goal:

• 

the moments of Y , e.g. Y   and σ Y 2 . Requirements: •  the probability density  f  Z  ( z ) of Z  of Z  is  is known. The first two moments are then obtained through (see also 3.24): ∞

µ Y 

=

∫ g ( z ) f  ( z )dz 

(8.8)

 Z 

−∞ ∞ 2 Y 

σ 

=

∫ g  ( z   ) f   ( z )dz − µ  2

 Z 

2  Z 

(8.9)

−∞

Example Consider the same rating curve (Equation 8.4) with  H   following a uniform distribution between upper and lower values h  and h :

u

 f  H  ( h) =



1 hu

(8.10)

− hl 

The mean then becomes ∞

µ Q

=

∫h

−∞

ah b

u

− hl 

dh =

1 hu

hu



− hl  h

ah b dh =



a

 

(b + 1)(hu

− hl  )

[h +

b 1 u

− hl b +1 ]

(8.11)

and the variance is given by: 2 Q

σ 

=

1 hu

hu

∫a h 2

2b

− hl  h

dh =



 

 

a2

( 2b + 1)(hu

− hl  )

[h

2 b +1 u

− hl 2b +1 ] − µ  Z 2

(8.12)

149

 

In case that (8.7) and (8.8) cannot be evaluated analytically, the integrals can of course be solved numerically using for instance Euler-type integration methods.

c) Monte Carlo simulation

Goal: •  the probability density function  f Y  ( y ) or its moments. Requirements: •  the probability density  f  Z  ( z ) of Z  of Z  is  is known. The principle of Monte Carlo simulation has been explained before in chapter 7, but is repeated here. Monte Carlo simulation is the advised method if the probability distribution of the input is known, if the complete distribution of model output is required and if the derived approach (a) cannotas befollows: applied or if (8.2) cannot be evaluated analytically. Montedensity Carlo simulation proceeds 1. Draw a realisation  z i  of  Z   using the pdf  f  Z  ( z ) . This is achieved by calculating the distribution function from the pdf   z 

 F  z ( z ) = Pr[ Z  ≤  z ] =

∫  f  ( z ' )dz   Z '

(8.13)

−∞

drawing a uniform deviate ui  between 0 and 1 using a pseudo random number  generator (e.g. Press et al, 1986), and converting u i   using the inverse  z i =  F  Z −1 (u i ) (see Figure 8.1).

 F  Z ( )

1

ui

0 i  Figure 8.1 Drawing Drawing a random n number umber from a gi given ven distribution function

2. Calcu Calcula late te the the reali realisa sati tion on y  yi of Y  by  by inserting z  inserting z i:  y i  =  g ( z i ) . 3. Repeat steps steps 1 and and 2 a large number number of of times times (typically (typically order order 1000 1000 to 10000 draws are necessary). 4. From the  M  simulated   simulated output realisations y realisations  yi, i=1,.., M   M  the   the probability density function or cumulative distribution function of Y  can  can be estimated.

150

 

Example Consider again the rating curve (8.4) with parameter values a=5 and b=1.5, with 3 Q  in m /d and with  H   in m following a Gaussian distribution with mean 0.3 m and standard error of 0.02 m. Figure 8.2 shows the cumulative distribution function estimated from 1000 realisations of Q  calculated from 1000 simulated realisations of  H.  H.   Also shown is the exact cumulative distribution function calculated using (8.2). It can be seen that both distributions are very close. 1.0 Equation (8.2) Monte Carlo

0.9 0.8 0.7    ] 0.6

   q

   <  

0.5

    Q    [

  r    P 0.4 0.3 0.2 0.1 0.0 0.5

0.6

0.7

0.8

0.9

1

1.1

3

q  (m  /s)  Figure 8.2 Cumulative Cumulative distribution distribution functions: exact and estimat estimated ed from Monte C Carlo arlo simulatio simulation. n.

The Monte Carlo simulation presented here uses  simple random sampling : values of U 

are drawn from the entire range 0 1. To limit the number of realisations needed to accurately estimate the pdf of model output, a technique called  stratified random  sampling   can be used. In that case, the interval 0-1 is divided into a finite number of  intervals, preferably of equal width (e.g. 0-0.1, 0.1-0.2,..,0.9-1 in case of 10 intervals). In each interval a number of values of U  and   and the associated Z  associated  Z  are   are drawn. The result of this  procedure is that the drawn realisations of Z  of Z  are  are more evenly spread over the value range, and that less realisations are necessary to obtain accurate estimates of  f Y  ( y ) . d) Tayl Taylor or expan expansion sion approxi approximati mation on

Goal: •  the moments of Y , e.g. Requirements: •  the moments of Z, of Z, e.g.

• 



 and σ Y 2 .

 Z 

 and σ  Z 2 , are known;

the variance σ  Z 2 should not be too large.

151

 

Consider the Taylor expansion of the function g  function g ( Z   Z ) around the value  g (

 Z 

):

 dg ( z )  ( Z  − µ  ) +  Z   dz   z = µ       2     3   1   ( Z  − µ  ) 3 + ... ( Z  − µ  ) 2 + 1  d   g ( z )  d   g ( z )  Z   Z  2 3 2  dz   z = µ   6  dz   z = µ          

Y  =  g ( Z ) =  g ( µ  Z  ) + 

 Z 

 Z 

(8.14)

 Z 

The  first order Taylor approximation  approximation  only considers the first two terms. The expected value is then approximated as:

µ Y 

 dg ( z ) =  E [Y ] ≈  g ( µ  Z  ) +  dz 

   E [( Z  − µ  Z  )] =  g ( µ  Z  )   z = µ 

(8.15)

 Z 

 

 

and the variance 2

 dg ( z )    dg ( z )     E   σ Y 2 =  E [(Y  − µ Y  ) 2 ] ≈    [( Z  − µ  Z  ) 2 ] =   dz   z = µ    dz   z = µ            Z 

2

σ  Z 2

(8.16)

 Z 

Keeping the first three terms of Equation (8.14) and taking expectations yields the second  the  second  order Taylor approximation. approximation. The mean becomes: d 2 g ( z ) 1    µ Y  ≈  g ( µ  Z  ) +   2  dz 2

  2 σ    Z 

(8.17)

 

 

 z  µ  Z 

The general expression for the variance is very large, but can be simplified in case  Z   is Gaussian (see Heuvelink, 1998). Here only the expression for Gaussian Z  Gaussian  Z  is   is shown. For  the full expression one is referred to Heuvelink (1998):

 dg ( z )    σ Y 2 ≈   dz   z = µ        Z 

2

  1   d 2 g ( z ) 2   σ  Z  + 2     4   dz   z = µ   

2

σ  Z 4

(8.18)

 Z 

Example One of the requirements for the Taylor approximation to work is that the variance of Z is not too large. To test this the first and second order Taylor  approximations are applied to the rating curve Q = aH b   for increasing variance of  H. The derivatives that are necessary for this analysis are:

152

 

 dg ( z )   b −1  = abh b  −1 = abµ  H   dz   h = µ   z = µ     

α  = 

(8.19)

 H 

 Z 

 d 2 g ( z )    = ab(b − 1)h b  − 2  β  =  2  dz   z = µ      

h = µ  H 

= ab(b − 1) µ  H b −2

(8.20)

 Z 

with the first order Taylor approximation: µ Q 

≈ aµ  H b

(8.21)

σ Q2 

≈  α 2σ  H 2

(8.22)

and the second order Taylor approximation: µ Q

2 Q

σ 

≈ a  µ  H b  +

 β  2

≈ α   σ   + 2

2  H 

2 σ  H 

 β 2 4

4 σ  H 

(8.23)

(8.24)

To be able to analyse a large range of variances, the mean  H   is set to 0.8 m (was 0.3 m). With a=5 and b=1.5 we have α  = 6.708  and   β  = 4.193.   Figure 8.3 shows a plot of  Q

an and   d σ Q as a ffun unct ctio ion n of the the sta stand ndar ard d dev devia iati tion on σ  H    as obtained from Monte Carlo simulation (1000 realisations) and with first and second order Taylor analysis. Clearly the Taylor approximation fails in estimating the mean if the variance becomes too large,

although the second order methods performs much better than the first. In this example the variance is approximated accurately with both methods. At this time it is convenient to remark that the methods presented in this chapter can also  be viewed from the point of prediction errors. So, instead of having a mean  Z    and ˆ of  Z   and the variance σ 2   of a stochastic input variable  Z , we have a predictor  Z   Z   Z 

 prediction error variance σ  Z 2ˆ

= Var    [ Z ˆ − Z ].  

If the prediction error is unbiased, i.e.

ˆ −  Z ] = 0 , then the same equations can be used as above, but with the mean  E [ Z  2  Z 

 Z 

2 ˆ  Z 

replaced by the prediction  z  ˆ and the variance σ   by the error variance σ  .  From the point of error analysis the mean value of Q then becomes:   µ Q

≈  ahˆ b + 

 β  2

2 σ  H  ˆ

(8.25

153

 

Equation (8.25) and Figure 8.3 show that in case of non-linear models unbiased (and even optimal predictions) of the model input do not yield unbiased (and certainly not optimal)  predictions of the model output (see the remarks in Chapter 1). Adding higher order  correction terms such as in (8.25) produce better results. 2.25

3.90 MC 3.85

2.00

Taylor 1 Taylor 2

3.80

MC Taylor 1 Taylor 2

1.75

3.75

1.50

3.70

µ QQ µ 

1.25

3.65

σ   σ Q

Q

1.00

3.60 0.75

3.55

0.50

3.50

0.25

3.45 3.40

0.00

0. 00

 Figure 8.3

0.0 5

Q

0 .10

0.1 5 H  σ  H 

σ 

0.2 0

0. 25

0.30

0.00

0.05

0.10

0.15

0.20

0.25

0.30

σ   H  σ   H 

(left)  and σ Q (right) as a function of the standard deviation σ  H   as obtained from Monte

Carlo simulation (1000 realisations) and the first and second order Taylor approximation.

As a final remark: if the moments of Y   are required, but  g () () is not differentiable or the variance of Z  of  Z   is is large, then Monte Carlo simulation could be used to derive the moments of Y . However, this means that some distribution type of Z  of Z  should  should be assumed.

8.3 

Explicit functions of multiple random variables

The following explicit function of multiple random variables is considered: Y   =  g ( Z 1 ,.., Z m )

(8.26)

Depending on what is aked about Y  ,   , what is known about  Z 1,.., Z   Z m and the form of g  of  g () () a number of different methods can be distinguished: a)  Derived distribution in the linear and Gaussian case

Goal: •  the probability density function  f Y  ( y ) . Requirements: •  the joint probability density  f ( z 1 ,.., z m ) of  Z 1,.., Z   Z m  is known and multivariate Gaussian; •  the function g() is linear: Y  = a +

n

∑ b Z  i

(8.27)

i

i =1

154

 

In the linear and multiGaussian case, the random variable Y  is   is also Gaussian distributed. The multivariate Gaussian distribution of  Z 1,.., Z   Z m  is completely described by the mean values 1 ,.., m , the variances σ 12 ,.., σ m2 and the correlation coefficients  ρ ij , i,  j = 1,.., m with  ρ ij

= 1 if  i =  j . The mean and variance of Y can then be obtained by: m

µ Y 

= a + ∑ bi µ i

(8.28)

i =1

m

σ Y 2

m

= ∑ ∑ bi b j ρ ij σ iσ  j

(8.29)

i =1  j =1

 Note that in case the  Z i  are not MultiGaussiasian that (8.28) and (8.29) are still valid expressions for the mean and the variance. However, in this case the mean variance σ Y 2 are not sufficient to characterise the complete pdf of Y .



and the

b)  Derived distribution in the non-linear and Gaussian case

Goal: •  the probability density function  f Y  ( y ) . Requirements: •  the joint probability density  f ( z 1 ,.., z m ) of  Z 1,.., Z   Z m  is known and multivariate Gaussian. In case Y   = =  g ( z 1 ,.., z m ) is non-linear we have to derive the distribution of Y   through Monte Carlo simulation. To achieve this we have to draw realisations from the joint

distribution  f ( z 1 ,.., z m ) . If this joint distribution is multivariate Gaussian this is possible through a technique called Cholesky decomposition (see box 5). The method then consists of: 1. Draw  M   realisations of the set of random variables  z 1( k ) ,.., z  m( k ) , k  = 1,.., M  from  f ( z 1 ,.., z m )  using simulation by Cholesky-decomposition. function  g () () to get M  get M  values  values of y of y:: 2. Use the M  sets  sets  z 1 ,.., z  m , k  = 1,.., M  as input for the function g  ( k )

( k )

 y ( k ) , k  = 1,.., M  . 3. Estimate Estimate the distr distributio ibution n function function or probabil probability ity density density functio function n of Y from ( k )

 y

, k  = 1,.., M  .

In case the joint distribution of  f ( z 1 ,.., z m ) is not multivariate Gaussian, a solution is to

= Tr( Z 1  ),..., X m = Tr( Z m ) , with 1 = 2 =,..,   = m = 0,

apply a transformation to each of the variables  Z 1,.., Z   Z m:  X 1 such that we can assume  X 1 ,..., X m   mu m ultivariate Gaussian

155

 

2

σ 1

= σ 22 =  ,.., σ m2 = 1 .

If we assume additionally that the correlation coefficients are

unaltered by the transformation (note that this is generally not the case!), then realisations of  X 1 ,..., X m can be simulated by Cholesky decomposition. The simulated realisations of   X 1 ,..., X m  are subsequently back transformed to realisations of of Z   Z 1,.., Z   Z m , which can then  be used in the Monte Carlo analysis described above. c) Derived moments

Goal: •  the moments of Y , e.g. Y   and σ Y 2 . Requirements: of Z 1,.., Z  •  the joint probability density  f ( z 1 ,.., z m ) of Z   Z m is known. The first two moments of Y are then obtained through: ∞

µ Y 

2 Y 

σ 

=

=

−∞

∫ L ∫ g ( z  ,.., z  ) f ( z   ,.., z  )dz  L d  1

−∞

−∞



−∞

m

1

m

1

∫ L ∫ g  ( z  ,.., z   ) f ( z  ,..,   z  )dz  L d  2

1

−∞

m

1

m

1

(8.30)

m

m

− µ  Z 2

(8.31)

−∞

In practice it is highly unlikely that  f ( z 1 ,.., z m ) is known, other than under the assumption that it is multivariate Gaussian. Also, evaluating the integrals, even numerically is likely to be very cumbersome. So, in practice this problem will be solved solved by assuming  f ( z 1 ,.., z m ) to be multiGaussian (at least after transformation) transformation) and use Monte Carlo simulation as explained under (b).

Box 5 Simulation by Cholesky-decomposition

The goal is to simulate realisations of the set of random variables  Z 1,.., Z   Z m  with multivariate Gaussian joint distribution  f ( z 1 ,.., z m ) , with parameters 1 ,.., m , σ 12 ,.., σ m2 and  ρ ij , i,  j 1.

= 1,.., m  with  ρ ij = 1 if  i =  j . The following steps are taken: a vector vector of mea mean n values values is define defined: d: µ = ( µ 1 , µ 2 ,.., µ m ) T ;

2. the the cova covari rianc ancee mat matri rix x C is constructed with element [C  [ C ijij] given by: [C ij ] =  ρ    ij σ i σ  j ;

(8.32)

3. the covariance covariance matrix matrix is is decomposed decomposed in a lower lower and upper upper triangular triangular matrix matrix that that are each others transpose:

156

 

  C = LU with L=U ; T

(8.33)

This operation is called Cholesky decomposition (a special form of LUdecomposition, so that the technique is also referred to as simulation by LUdecomposition). A routine to perform this operation can for instance be found in Press et al. (1986). 4.  A realisation of the random vector z

= ( Z 1 , Z 2 ,.., Z m ) T can

now be simulated by

simulating a vector x = ( X 1 , X 2 ,.., X m ) of independent standard Gaussian random T

variables  X 1 ,..., X m using a random number generator (see 8.2) and performing the transformation: z

= µ + Lx

(8.34)

That (8.34) yield the right variables can be seen as follows. First, (8.33) yields linear  transformations of Gaussian random variables so the simulated variables are Gaussian. Second, they have the correct mean value as:  E [ z ] =  E [µ] + E [Lx   ] = µ + L E [x] = µ

(8.35)

And the correct covariance structure T T T T T T  E [(z − µ )(z − µ ) ] =  E [Lx(Lx) ] =  E [Lxx L ] = L E [ xx ]L

= LILT = LLT = LU = C

(8.36)

So the simulated variables are indeed drawn from a multivariate Gaussian distribution with the preposed statistics.

 Note that this method can also be used to simulate realisations of multiGaussian random space functions on a grid, i.e. as an alternative to sequential simulation. In that case the random vector contains the values of the random space function at the grid nodes T z = ( Z ( x1 ), Z ( x 2 ),.., Z (x m )) , the mean is constant and the covariance matrix is constructed as: [C ij ] = C  Z  ( x i , x  j ) ;

(8.37)

d) Tayl Taylor or expan expansion sion approxi approximati mation on

Goal: •  the moments of Y , e.g.



 and σ Y 2 .

157

 

Requirements: •  the joint moments of  Z 1 ,.., Z m   are known up to a certain order, e.g.: σ 12 ,.., σ m2  and  ρ ij , i,  j

• 

2

1

,..,

m

,

= 1,.., m  ; 2

the variances σ 1 ,.., σ m should not be too large.

We first define a vector µ = ( µ 1 , µ 2 ,.., µ m ) T that contains the mean of the m random input variables. Next, we consider the Taylor expansion of the function g  function  g ( Z   Z ) around the value  g (µ  ) =  g ( 1 , 2 ,.., m ) :

 dg ( z )    (µ) ( Z i − µ i ) ∑ i =1   dz i   2   1 m m    d   g ( z )  i i  j  j (8.38) i =1  j =1   + 2∑ ∑  dz i dz  j (µ) ( Z  − µ  )( Z  − µ  )   1 m m m   d 3 g ( z )  + ∑∑∑ (µ) ( Z i − µ i )( Z  j − µ  j )( Z k  − µ k  ) + ... 6 i =1  j =1 k =1  dz i dz  j dz k   

Y  =  g ( Z ) =  g (µ) +

m

The  first order Taylor approximation  approximation  only considers the first two terms. The expected value is then approximated as:

=  E [Y ] ≈  g (µ) + ∑     i =1   dz  m

µ Y 

dg ( z )

   

(µ)  E [( Z i

− µ i )] =  g (µ)

(8.39)

and the variance



m

2



σ Y 2

    dg ( z )   (µ) ( Z i − µ i ) − g (µ)   =  E [(Y  − E [Y ] 2 ] ≈  E  g (µ) + ∑      i =1   dz       m   m dg ( z )       dg ( z ) (µ) ( Z  − µ  )   ( ) ( )  Z   − µ  =  E  ∑   µ   j  i  i  ∑   j          j =1   dz   i =1   dz   

(8.40)

m m dg ( z )   dg ( z )   = ∑ ∑   (µ)  (µ)  E [( Z i − µ i )( Z  j − µ  j )]  dz  dz       i =1  j =1   m m dg ( z )   dg ( z )   = ∑ ∑   (µ)  (µ)  ρ ij σ i σ  j       dz  i =1  j =1   dz 

Keeping the first three terms of Equation (8.38) and taking expectations yields the second  the  second  order Taylor approximation. approximation. We will only show the expression for the mean here. For the variance one is referred to Heuvelink (1998).

158

 

 dg ( z )   (µ)  E [( Z i − µ i )] =  E [Y ] ≈  g (µ) + ∑  m

µ Y 

i =1

dz i

 d  2 g ( z )    + ∑ ∑  (µ)  E [( Z i − µ i )( Z  j − µ  j )]  2 i =1  j =1   dz i dz  j   2 1 m m   d   g ( z )   =  g (µ) + ∑ ∑  (µ)  ρ ij σ i σ  j  2 i =1  j =1   dz i dz  j   m

1

m

(8.41)

Example Consider the weir equation or rating curve Q =  Ah B , where A where A and  and B  B are  are random variables with statistics µ  A , σ  A2 , µ  B  , σ  B2 and ρ  AB .  The first order Taylor approximation of  the mean becomes: µ   E [Q ] ≈ µ  A h  B

(8.42)

and the second order approximation: µ  µ  1 2  E [Q ] ≈ µ  A h   B + ( µ  A h  B (ln h )2 )σ  B 2

+ (h

µ   B ln h) ρ  σ  σ   AB  A  B

(8.43)

The variance from the first order Taylor analysis is given by: σ Q2

≈ (h

2 µ   B )σ 2

 B + ( 4 µ   A h

2 µ   B

ln h) ρ  ABσ  Aσ  B

+ (2 µ  A2 h

2 µ   B (ln h )2 )σ 2

 B

(8.44)

As can be seen, these expressions quickly become quite extensive, especially if due to larger variances σ  A2  , σ  B2 higher order terms have to be included. The alternative then is to

use Monte Carlo simulation by jointly simulating realisations of the variables  A   A  and  B using Cholesky decomposition. Of course, this means that some joint probability distribution for these random variables has to be assumed.

8.4

Spatial, temporal or spatio-temporal integrals of random functions

We consider the following relationship (here we consider space, but it could also be time or space-time): Y  D

=

∫ g [ Z (x)]d x

(8.45)

x∈ D

159

 

a) Simple averaging of moments

Goal: •  the moments of Y , e.g.



 and σ Y 2 , COV (Y  D1 , Y  D2 ).

Conditions: •  the function g[] is linear; •  the random function Z  function Z (x) is wide sense stationary (see chapter 5); •  the mean  Z   and covariance function C  Z  (x1 , x 2 ) =  C  Z  (x 2 − x1 ) are known. If the function g[] is linear, e.g. g[ Z (x)]= a+bZ (x), the moments of Y  D  can be evaluated  by spatial integration of the moments of Z  of Z (x) (see also section 5.6 and 7.3.3).

The mean of Y  D is given by

  = + a bE   Z  d  ( x ) x   ∫ ∫  x∈ D  x∈ D = a + b ∫ E [ Z (x)]d x = a + b ∫ µ  Z d x = a + b D µ  Z  

 E [Y  D ] =  E 



a + bZ (x) d x

x∈ D

x∈ D

and the variance by:

 E [(Y  D − E [Y  D ])

2

2      ] =  E  ∫ a + bZ ( x) d x − a − b D µ  Z     x∈ D      2      2  

(8.46)

  x∈∫ D    2       2 = b  E  ∫ ( Z (x) − µ  Z  )d x    x∈ D      = b 2 E  ∫ ( Z (x1 ) − µ  Z  )d x1 ∫ ( Z (x 2 ) − µ  Z  )d x 2  x ∈ D  x ∈ D   = b 2 E  ∫ ∫ ( Z (x1 ) − µ  Z  )( Z (x 2 ) − µ  Z  )d x1 d x 2  b  E   Z ( x)d x  D µ  Z  

1

2

x ∈ D x ∈D 2

=b

2

(8.47)

1

∫ ∫ E [( Z (x ) − µ  )( Z (x ) − µ  )]d x d x 1

2

 Z 

 Z 

1

 2

x 2 ∈ D x1∈D

= b2

∫ ∫ C  (x , x 1

 Z 

2

) d x1 d x 2

x 2 ∈ D x1∈D

160

 

By the same type of derivation the covariance between spatial averages of two domains can be derived (see also section 5.6 and 7.3.3):

∫ ∫ C  (x , x )d x d x

COV (Y  D1 , Y  D2 ) = b 2

 Z 

1

2

1

2

 

(8.48)

x 2 ∈ D2 x1∈D1

The spatial integrals can be solved either analytically in certain cases (e.g. Vanmarcke, 1983), but are usually approximated numerically as is explained in section 7.3.3.  Note that if the random function Z  function  Z (x) is wide sense stationary and multiGaussian, Y  D will  be Gaussian distributed also and its probability density function is given through the mean (8.46) and the variance (8.47).

b) Monte Carlo simulation

Goal: •  the moments of Y , e.g.



 and σ Y 2 , COV (Y  D1 , Y  D2 ) or its probability density  f Y  ( y ).

Conditions: •  the multivariate probability density function of Z  of  Z (x) is known. If g() is non-linear or we are interested in the complete probability density function geostatistical simulation in a Monte Carlo analysis is the appropriate method. The following steps are taken: 1.  generate  M   realisations z ( x) , k  = 1,.., m of the random function  Z (x) using geostatistical simulation on a fine grid discretising the domain  D.  D. If  Z (x) is nonGaussian, a transformation to a Gaussian distribution is in order, after which ( k )

sequential Gaussian simulation can be applied (see sections 7.4.3 and 7.5 for  elaborate descriptions of normal-transforms and geostatistical simulation respectively); 2.  the the M   M  realisations   realisations are used as input for the spatial integral (8.45) yielding  M  results   results ( k )  y D , k  = 1,.., M ; 3.  from the simulated values  y D( k ) , k  = 1,.., M  the moments and the probability density function of Y  D can be estimated. If the random function is observed at a number of locations, conditional realisations should be drawn (see also section 7.5). This allows one to investigate the effect of  additional sampling of Z  of Z  on  on the uncertainty about Y  D.

161

 

8.5

Vector functions

Consider a vector of model inputs (parameters, input variables, boundary conditions etc) T that are random variables: z = ( Z 1 ,..., Z m ) . The vector of model inputs is linked to a vector of model outputs y

= (Y 1 ,..., Y n ) T through a functional relationship g  relationship g ((): ):

y =  g ( z )

(8.49)

The goal is to get the joint pdf or the moments of y.

a) First order analysis

Required: •  the statistical moments of y: µ y C yy

=  E [(y − µ y  )(y − µ y ) T ] .

Conditions: •  The statistical C zz

• 

=  E [y  ] = ( E [Y 1 ],..., E [Y n ]) T and the covariance matrix

moments

of

z  should

be

known:

µz

=  E [z ] =  ( µ 1 ,..µ m ) T ,

=  E [(z − µ  z )(z − µ z ) T ] .

The variances of the elements of z  σ 12 ,.., σ m2 should not be too large.

The first order analysis is in fact the first order Taylor approximation (see 8.3) applied to each of the elements of y. The first order approximation is given by:

 E [ y  ] ≈  g (µ Z )

(8.50)

To obtain the covariance matrix of y the covariance matrix of z is constructed. This is an m×m matrix with the following elements (with ρ ij  the correlation between Z  between Z i and  and Z   Z  j): [C  zz (i,  j )]  =  ρ ij σ iσ  j

(8.51)

Also, the sensitivity the sensitivity matrix or matrix or Jacobian  Jacobian is  is required. This n×m matrix gives the derivatives of element Y i with respect to input Z  input  Z  j and has the following form:

162

 

 ∂ y1   ∂ z 1  M  J= M  M  ∂ y n   ∂ z 1

L L

L L

∂ y1    ∂ z m  M   M  M  ∂ y n   ∂ z m  

(8.52)

Sometimes it is possible to construct this matrix analytically, i.e. if the vector function  g (z) consist for each element  yi  of y of a separate explicit and differentiable function  g i(  z  z 1,..,  z  z m). However, usually this is not the case and  g (z) may represent some numerical model, e.g. a groundwater model, where the elements y elements  yi  of y are state elements, perhaps defined at some grid. In this case, a sensitivity analysis must be performed by running the the model g  model  g () () m+1 times: one baseline run where the model inputs are set at there mean values (i.e. Equation 8.50) and one run for each model in inputs  z  j  where  z  j  is slightly changed, e.g.  z  j =    j + ∆ z  j . From these runs the changes in the values of the elements of   yi , e.g. ∆ y i , are calculated and the derivatives are subsequently estimated as:

∂ yi ∆ yi ≈ ∂ z  j ∆ z  j

(8.53)

With the sensitivity matrix and the covariance matrix a first order approximation of the covariance matrix of y can be provided: C yy   = JC zz J T

(8.54)

Some additional remarks about this method: •  Above equations have been developed for stochastic input variables with prescribed means and variances. Of course, as is the case with the Taylor approximation in sections 8.2 and 8.3, this method can also be used as a first order approximation of a  prediction error covariance. In that case the prediction equation becomes: yˆ ≈  g ( zˆ )

with



(8.55)

ˆ )  = ( Z ˆ1 ,.., Z  m

the

predicted

values

of

the

model

inputs

and

T

C =  E [(z  − zˆ )(z − zˆ ) ]  the covariance matrix of the prediction errors; and similarly for y. Equation (8.54) then becomes: zˆ zˆ

C yˆ yˆ  = JC zˆ zˆ J T

(8.56)

163

 

• 

If the function g() is a matrix multiplication y = Az

(8.57)

the system is linear, the elements of the sensitivity matrix are exactly the elements of  the matrix A, i.e. [ jij  ] = [ aij ]  , and an exact equation for the covariance of y is given  by: C yy   = AC zz A

T

(8.58)

If on top of that z has a multivariate Gaussian distribution then y is also multivariate Gaussian and the derived mean and variance of y completely determine its probability distribution (using Equation 3.87 with µ y and C yy ).

• 

This method can also be used for transient models. Suppose that the following model applies: y (t  ) =  g ( z, t )

(8.59)

where y is the outcome of some dynamic model, e.g. a transient groundwater model, with stochastic input or parameters z. Then m+ m+1 1 transient runs can be performed, i.e. the baseline and one for each perturbed parameter, and for each time that one requires the sensitivity can be determined:

∂ yi ∂ z  j

(t ) ≈

∆ yi (t ) ∆ z  j

and the covariance of y(t ) be approximated at each time as:

(8.60)

C yy (t  ) = J (t )C zz J T (t )

(8.61)

b) Monte Carlo simulation

In case of strong non-linearity of  g () () or large variances of the elements of z, the linear  approximation would no longer work. In that case Monte Carlo simulation is to be applied, as described before: 1)  M realisations of z  are simulated (e.g. using Cholesky decomposition as shown in Box 5); 2) the M the M simulated realisations z are used as input for   g ( ) yielding  M   realisations   ; 3) the statistics of   can be estimated from the  M  z y y realisations of y.

164

 

8.6 

Differential equations with a random variable

Consider a partial differential equation with two random parameters, e.g. the groundwater  equation in two spatial dimensions with a homogenous and isotropic but random transmissivity T and a homogenous storage coefficient S :

∂ H  ∂   ∂ H   ∂   ∂ H    ∂ 2 H  ∂ 2 H    + =  T   +  T   = T  S  ∂t  ∂ x   ∂ x   ∂ y   ∂ y     ∂ x 2 ∂ y 2  

 (8.62)

What can immediately be seen from (8.62) is that, even though the transmissivity T   is random, it can be placed outside the derivatives because it does not change with space. If  the transmissivity is a single random variable there are effectively two way of obtaining the statistics of the random head  H , depending on whether an analytical solution is available. 1. If an analytic analytical al solution solution is is available, available, Equation Equation (8.62) can be solved solved for given given T and S  (as if they were deterministic) to produce an explicit relation between  H (x,t ) and S  and T. T.   This relation would generally be non-linear and the a Taylor approximation could be used to derive the statistics of  H (x,t ) from the joint statistics of S   and T : µ S  , σ S 2 , µ T  , σ T 2 and ρ ST .  If a Taylor approximation is not appropriate, e.g. because the variances of S  and  and T  are  are large, then realisations of S  and  and T   can be simulated assuming some joint distribution  f ( s, T ) of S  and   and T   (if multiGaussian Cholosky decomposition decomposition can be used). These realisations can then be plugged into the analytical solution of  (8.62) to produce realisations of H  of  H (x,t ) from which its statistics can be obtained. 2. If no analytical analytical soluti solution on can be obtained, obtained, then then Monte Carlo Carlo simulati simulation on in combination combination with a numerical method, e.g. finite elements or finite difference, is the only option. A large number  M of   realisations of S   and T   are simulated assuming a joint distribution  f ( s, T ) of S   and T   (if multiGaussian Cholosky decomposition decomposition can be

used). The M The M simulated realisations are used as parameters in the equations solved by the finite difference or finite element scheme. The numerical solution is obtained for  (k ) k=1,.., 1,.., M   M  at   at a finite each simulated parameter set to yield M yield  M realisations of H  of  H  (xi,t  j), k= number of points in space and time. From these, the statistics (mean, variance, spatial and temporal covariance) of H  of H (x,t ) can be obtained. So, in short: if random parameters are involved in differential equations that are not random functions of space or time, they can treated as deterministic while solving the differential equation and analysed as stochastic variables afterwards; that is if an analytical solution can be found. Example Consider the following differential equation describing the concentration of  some pollutant in a lake as a function of time: v

dC  dt 

=  −vKC + qin

(8.63)

165

 

where v  is the volume of the lake (assumed constant and known) qin is the constant an known input load and  K  is   is a random decay coefficient. The solution to this equation is (with known initial concentration c concentration c((t )=0):   C (t ) =

q in vK 

(1  − e − Kt )

(8.64)

 Now the statistics of C (t ) can be derived from those of K  of  K . For instance, using a first order  Taylor approximation (see section 8.2) the following relation can be derived for the variance:

  σ C  (t ) = 2

(

q in2 e v

2

− µ  K t 

) σ  (t )

(1 + µ  K t ) − 1 4  K 

µ 

2

2  K 

(8.65)

Figure 8.65 shows the development of the mean concentration with time as well as the confidence band of one standard deviation based on a first order Taylor analysis and the assumption that C   is Gaussian distributed. The following parameters are used:   −1 2 = 0.01 ( year −2 ) . ), σ  K  qin / v = 100 ( mg m -3 year    -1 ), µ  K  = 0.5 ( year  300

250    )   m200    /   g   m    (

   3

  n   o    i    t 150   a   r    t

  n   e   c   n 100   o    C 50

0 0

2

4

6

8

10

12

14

Time (years)

 Figure 8.4 Evolution of concentration of pollutant in a lake described by Equation (8.64) with random decay rate K. Mean concentration (central line) and one standard deviation prediction intervals (outer  lines) are approximated by a first order Taylor analysis; parameters:   − 2 = 0.01 ( year −2 ) . q in / v = 100 ( mg m -3 year    -1 ), µ  K  = 0.5 ( year  1 ), σ  K 

166

 

8.7 

Stochastic differential equations

As a last case, consider the following differential equations: 1. Transient groundwater equation for two-dimensional flow with heterogeneous storage coefficient and transmissivity, described as random space functions: S ( x, y )

10. 

∂ H  ∂   ∂ H  ∂   ∂ H   =  T ( x, y )   +  T ( x, y)  ∂t  ∂ x   ∂ x   ∂ y   ∂ y  

(8.66)

Evolution of lake concentration with decay rate as a random function of time:  v

dC  dt 

= − vK (t )C + q

(8.67) in

In both of these cases we cannot hope to find an analytical solution given a particular  realisation of the random functions S (x), T (x) or  K (t ), ), as the behaviour of random functions is generally wild and not described by a simple analytical expression. There are two alternatives to solve these stochastic these stochastic differential equations: equations: The first alternative is to assume some form of stationarity about the random functions and then develop differential equations in terms of the moments of the dependent variables and the moments of the random inputs. As the latter are assumed (wide sense) stationary the moments are constant, such that analytical solutions may be feasible. The second alternative is to use Monte Carlo simulation, i.e. simulate realisations of the random function and use these as input for the differential equations. The differential equations are subsequently solved for each realisation with a numerical scheme such as

finite difference or fine elements. In the following examples of both approaches are given. The field of stochastic subsurface hydrology is the most advanced in terms of analytical stochastic analysis of   parameter heterogeneity, with many papers in various v arious hydrological journals, in particular  in Water Resources Research. Although not very recent, extensive overviews of the advances in this field can be found in Dagan (1989) and Gelhar (1993). The applications in these books mostly pertain to flow and transport in porous media described with wide sense stationary stochastic functions and assuming infinite domains with uniform flow. Since the appearance of these books, advances have been made on finding (quasi)analytical for finiteflow, domains, non-uniform flow, random boundary unsaturatedsolutions flow, two-phase non-stationary (even fractal) random media conditions, and fractal  porous media. A more recent book with some advanced topics is written by Zhang (2002).

167

 

Box 6 Mean square differentiable random functions and white noise

In both approaches, we would like to use the rules of standard differential and integral calculus. For these standard rules to apply, the random functions involved have to be mean square integrable, integrable, i.e. the following limit expressions should exists (a similar  expression can be formulated for random functions of space):

  Z (t + τ ) − Z (t ) dZ   2  −  =0 lim E  τ →0 τ  dt     

(8.68)

So, averaged over all possible realisations, the quadratic difference between the derivative of the random function and its finite difference approximation should approach to zero if we look at increasingly smaller intervals τ . If (8.68) exists, and we interpret the differential dZ /dt  in  in the mean square sense, normal calculus rules can thus be applied. A necessary and sufficient condition for this to apply is that the following equation is finite (Gelhar, 1993, p. 53): ∞

∫ ω  S  (ω )d ω < ∞ 2

 Z 

(8.69)

−∞

where S  Z  ( ) is the spectral density function of the random function  Z (t ). ). An alternative necessary and sufficient condition is that the second derivative if the covariance function at lags approaching zero is finite (Gelhar, 1993): 199 3): 2

d  C  Z  (τ ) d τ 2

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF