Bayesian Statistics
November 12, 2016 | Author: ralucam_tm | Category: N/A
Short Description
Bayesian Statistics...
Description
Introduction to Bayesian Statistics (And some computational methods)
Theo Kypraios http://www.maths.nott.ac.uk/∼tk MSc in Applied Bioinformatics @ Cranfield University.
Statistics and Probability Research Group
1/1
My Background
Bayesian Statistics; Computational methods, such as Markov Chain and Sequential Monte Carlo (MCMC & SMC); Large and complex (real) data analysis, mainly Infectious Disease Modelling, Neuroimaging, Time series . . .. Recent interest in Bioinformatics (Gene Expression Data).
2/1
Outline of the Talk
1. Why (statistical) modelling is useful? 2. The Frequentic/Classical Approach to Inference. 3. The Bayesian Paradigm to Inference. Theory Examples
4. More Advanced Concepts (e.g. Model Choice/Comparison) 5. Conclusions
3/1
Use of Statistics Examples include: Sample Size Determination Comparison between two (or more) groups t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;
Classification/Clustering; ...
4/1
Use of Statistics Examples include: Sample Size Determination Comparison between two (or more) groups t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;
Classification/Clustering; ...
4/1
Use of Statistics Examples include: Sample Size Determination Comparison between two (or more) groups t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;
Classification/Clustering; ...
4/1
Use of Statistics Examples include: Sample Size Determination Comparison between two (or more) groups t-tests, Z-tests; Analysis of variance (ANOVA); tests for proportions etc;
Classification/Clustering; ...
4/1
Statistical Modelling
5/1
8
Why Statistical Modelling? ●
4 2 0
response(y)
6
●
●
−2
●
● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●● ●● ●● ●● ● ● ● ● ●● ●● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●
−3
−2
−1
0
1
2
explanatory (x)
Suppose that we are interested in investigating the association between x and y . Isn’t just enough to calculate the correlation (ρ) between x and y ? 6/1
8
Why Statistical Modelling? ●
4 2 0
response(y)
6
●
●
−2
●
● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●● ●● ●● ●● ● ● ● ● ●● ●● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●
−3
−2
−1
0
1
2
explanatory (x)
Suppose that we are interested in investigating the association between x and y . Isn’t just enough to calculate the correlation (ρ) between x and y ? 6/1
Why Statistical Modelling?
Perahps, for this dataset it is enough. . . . . . ρb = 0.83 indicates some strong correlation between x and y.
7/1
Why Statistical Modelling?
Perahps, for this dataset it is enough. . . . . . ρb = 0.83 indicates some strong correlation between x and y.
7/1
Why Statistical Modelling? What about this dataset? The correlation coefficient turns out to be ≈ 0.5. ● ●
6
●● ● ● ● ●●
4
●
● ● ●● ● ● ●● ● ●● ● ●
2
response(y)
8
● ●
−3
−2
●● ● ●●●● ● ●● ● ● ● ●● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ●●●●●●●● ●● ● ●● ● ● ●● ●●●
−1
0
1
2
explanatory (x) 8/1
Why Statistical Modelling?
0.2 0.1 −0.1 0.0
response(y)
0.3
What about this dataset? The correlation coefficient turns out to be ≈ -0.78. ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ●● ●● ● ●● ● ● ●
0
1
2
3
4
5
explanatory (x) 9/1
Statistical Modelling One (of the best) ways(s) to describe some data is by fitting a (statistical) model. Examples: 1. y = α + β · x + error 2. y = α + β · x + γ · x 2 + error 3. y = 1 −
1 1+α·x
− β · x + error
The model parameters (α, β, γ) tell us much more about the relationship between x and y rather than just the correlation coefficient . . . What about a more general model? 4. y = f (θ, x) + error 10 / 1
Aims of Statistical Modelling
In statical modelling we are interested in estimating the unknown parameters from data → Statistical Inference. Parameter estimation needs be done in a formal way. In other words we ask ourselves the question: what are the “best” values, say, for α and β such that the proposed model “bests” describes the observed data? And, what do we mean by “best”? Should we only look for a single estimate for (α, β)? No!
11 / 1
Aims of Statistical Modelling
In statical modelling we are interested in estimating the unknown parameters from data → Statistical Inference. Parameter estimation needs be done in a formal way. In other words we ask ourselves the question: what are the “best” values, say, for α and β such that the proposed model “bests” describes the observed data? And, what do we mean by “best”? Should we only look for a single estimate for (α, β)? No!
11 / 1
Aims of Statistical Modelling
In statical modelling we are interested in estimating the unknown parameters from data → Statistical Inference. Parameter estimation needs be done in a formal way. In other words we ask ourselves the question: what are the “best” values, say, for α and β such that the proposed model “bests” describes the observed data? And, what do we mean by “best”? Should we only look for a single estimate for (α, β)? No!
11 / 1
Aims of Statistical Modelling
In statical modelling we are interested in estimating the unknown parameters from data → Statistical Inference. Parameter estimation needs be done in a formal way. In other words we ask ourselves the question: what are the “best” values, say, for α and β such that the proposed model “bests” describes the observed data? And, what do we mean by “best”? Should we only look for a single estimate for (α, β)? No!
11 / 1
0.3
Least Squares Estimation
0.2
● ● ● ●
● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ●
● ● ●
−0.1 0.0
0.1
● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ●●●● ●● ●● ● ●● ● ● ●
0
1
2
3
4
5
Find the values of α and β which minimize the squared difference (distance) between what the model predicts and the data, (a.k.a. Least Squares Estimation (LSE) ) What about other pairs (α, β) (perhaps very different from each other) which describe equally well the observed data → uncertainty in parameter estimation. . .
12 / 1
Classical (or Frequentist) Inference
13 / 1
Statistical Approach: The Likelihood Function
The likelihood function plays a fundamental role in statistical inference. In non-technical terms, the likelihood function is a function that when evaluated at a particular point, say (α0 , β0 ), is the probability of observing the (observed) data given that the parameters (α, β) take the values α0 and β0 .
14 / 1
The Likelihood Function− A Toy Example
Let us think of a very simple example. Consider a Binomial experiment: n trials (e.g. toss a coin n times) model the number x successes (e.g. got heads x times).
Suppose we are interested in estimating the probability of success (denoted by θ) for one particular experiment. Data: Out of 100 times we repeated the experiment we observed 80 successes. What about L(0.1), L(0.7), L(0.99)?
15 / 1
The Likelihood Function− A Toy Example
What L(0.1) really means, is what is “how likely is to observe 80 times heads out of tossing a coin 100 times if the true (but unknown) probability of success is 0.7?” In other words, 100 L(0.7) = P(X = 80|θ) = ·0.780 ·(1−0.7)100−80 = 0.075 80 But if we can evaluate L(0.7) then we can evaluate L(θ) for all possible values for θ.
16 / 1
The Likelihood Function− A Toy Example
● ●
0.04
●
●
●
●
●
●
● ● ●
0.00
L(theta)
0.08
● ● ●
● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.0
0.2
0.4
0.6
● ● ● ●●●●●●●●●●●
0.8
1.0
theta 17 / 1
Classical (Frequentist) Inference Frequentist inference tell us that: we should for parameter values that maximise the likelihood function → maximum likelihood estimator (MLE) associate parameter’s uncertainty with the calculation of standard errors(SE) . . . . . . which in turn enable us to construct confidence intervals for the parameters, e.g. 95% CI r b θ ∓ 1.96 · var θb or θb ∓ 1.96 · SE θb For this example, this turns out to be (0.8 − 1.96 · 0.04, 0.8 + 1.96 · 0.04) = (0.72, 0.88) 18 / 1
Interval Estimation
Having obtained the variance-covariance matrix, we can then construct confidence intervals for the parameters based on sampling theory. Such an approach is based on the notion that: 1. if the experiment was to be repeated many times, 2. and a maximum likelihood method is derived as well as a confidence interval each time 3. then on average, the interval estimates would contain the true parameter (1 − α)% of the time.
19 / 1
Interval Estimation
Having obtained the variance-covariance matrix, we can then construct confidence intervals for the parameters based on sampling theory. Such an approach is based on the notion that: 1. if the experiment was to be repeated many times, 2. and a maximum likelihood method is derived as well as a confidence interval each time 3. then on average, the interval estimates would contain the true parameter (1 − α)% of the time.
19 / 1
Interval Estimation
Having obtained the variance-covariance matrix, we can then construct confidence intervals for the parameters based on sampling theory. Such an approach is based on the notion that: 1. if the experiment was to be repeated many times, 2. and a maximum likelihood method is derived as well as a confidence interval each time 3. then on average, the interval estimates would contain the true parameter (1 − α)% of the time.
19 / 1
Interval Estimation
What’s wrong with that? Nothing, but . . . . . . it is approximate, counter-intuitive (data is assumed to be random, parameter assumed to be fixed) and mathematically intractable for complex scenarios.
20 / 1
Interval Estimation
What’s wrong with that? Nothing, but . . . . . . it is approximate, counter-intuitive (data is assumed to be random, parameter assumed to be fixed) and mathematically intractable for complex scenarios.
20 / 1
Some (Other) Issues with this Approach For instance, recall the previous experiment: twe cannot ask (or even answer!) questions such as 1. “what is the chance that the unknown parameter (i.e. probability of success) is greater than 0.6?” i.e. compute the quantity P(θ > 0.6|data) . . . 2. or something like, P(0.3 < θ < 0.9|data);
Sometimes we are interested in (not necessarily linear) functions of parameters, e.g. θ 1 + θ2 ,
θ1 /(1 − θ1 ) θ2 /(1 − θ2 )
Whilst in some cases, the frequentist approach offers a solution which is not exact but approximate, there are others, for which it cannot or it is very hard to do so. 21 / 1
Bayesian Inference
22 / 1
Bayesian Inference
When drawing inference within a Bayesian framework, the data are treated as a fixed quantity and the parameters are treated as random variables. That allows us to assign to parameters (and models) probabilities, making the inferential framework far more intuitive and more straightforward (at least in principle!)
23 / 1
Bayesian Inference (2) Denote by θ the parameters and by y the observed data. Bayes theorem allows to write: π(θ|y) =
π(y|θ)π(θ) π(y|θ)π(θ) =R π(y) θ π(y|θ)π(θ) dθ
where π(θ|y) denotes the posterior distribution of the parameters given the data; π(y|θ) = L(θ) is the likelihood function; π(θ) is the prior distribution of θ which express our beliefs about the parameters, before we see the data; π(y) is often called the marginal likelihood and plays the role of the normalising constant of the density of the posterior distribution. 24 / 1
Bayesian Inference (2)
In a nutshell the Bayesian paradigm provides us with a distribution as for what we have learned about the parameter from the data. In contrast to the frequentist approach with which we are getting a point estimate (MLE) and a standard error (SE), in the Bayesian world we getting a whole distribution (i.e. we get much more for our money!)
25 / 1
The Posterior Distribution
26 / 1
Bayesian Inference (3) We can write the posterior distribution as follows: π(θ|y) =
π(y|θ)π(θ) π(y|θ)π(θ) dθ θ
R
The density of the posterior distribution is proportional to the likelihood times the prior density; The posterior distribution tells us everything we need to know about the parameter; θ Statements such as P(θ > k) or P 1+θ > k) where k is a constant make sense, since θ is a random variable but, in addition, they are very useful in modelling.
27 / 1
3 2 1 0
Density
4
Why Having the Distribution is Very Useful?
0.0
0.5
1.0
1.5 θ
2.0 28 / 1
0.4 0.2 0.0
Density
0.6
Why Having the Distribution is Very Useful?
0
2
4 θ
6
8 29 / 1
4 2 0
Density
6
Why Having the Distribution is Very Useful?
0.2
0.4
0.6 θ
0.8
1.0 30 / 1
The Prior Recall that: π(θ|y) =
π(y|θ)π(θ) π(y|θ)π(θ) =R π(y) θ π(y|θ)π(θ) dθ
31 / 1
Bayesian Inference: The Prior One of the biggest criticisms to the Bayesian paradigm is the use of the prior distribution. “Couldn’t I choose a very informative prior and come up with favorable result”? Yes, but this is bad science! “I know nothing about the parameter; what prior should I choose”? Choose an uninformative (or vague) prior. → more details shortly. If there is a lot of data available then the posterior distribution would not be influenced so much by the prior and vice versa;
32 / 1
Bayesian Inference: The Prior One of the biggest criticisms to the Bayesian paradigm is the use of the prior distribution. “Couldn’t I choose a very informative prior and come up with favorable result”? Yes, but this is bad science! “I know nothing about the parameter; what prior should I choose”? Choose an uninformative (or vague) prior. → more details shortly. If there is a lot of data available then the posterior distribution would not be influenced so much by the prior and vice versa;
32 / 1
Bayesian Inference: The Prior One of the biggest criticisms to the Bayesian paradigm is the use of the prior distribution. “Couldn’t I choose a very informative prior and come up with favorable result”? Yes, but this is bad science! “I know nothing about the parameter; what prior should I choose”? Choose an uninformative (or vague) prior. → more details shortly. If there is a lot of data available then the posterior distribution would not be influenced so much by the prior and vice versa;
32 / 1
Bayesian Inference: The Prior One of the biggest criticisms to the Bayesian paradigm is the use of the prior distribution. “Couldn’t I choose a very informative prior and come up with favorable result”? Yes, but this is bad science! “I know nothing about the parameter; what prior should I choose”? Choose an uninformative (or vague) prior. → more details shortly. If there is a lot of data available then the posterior distribution would not be influenced so much by the prior and vice versa;
32 / 1
Some Examples on the Effect of the Prior
8
10
83/100 successes: interested in probability of success θ
6 4 2 0
posterior
posterior lik prior
0.0
0.2
0.4
0.6 theta
0.8
1.0 33 / 1
Some Examples on the Effect of the Prior
8
10
83/100 successes: interested in probability of success θ
6 4 2 0
posterior
posterior lik prior
0.0
0.2
0.4
0.6 theta
0.8
1.0 34 / 1
Some Examples on the Effect of the Prior
8
10
83/100 successes: interested in probability of success θ
6 4 2 0
posterior
posterior lik prior
0.0
0.2
0.4
0.6 theta
0.8
1.0 35 / 1
Some Examples on the Effect of the Prior 10
8/10 successes: interested in probability of success θ
0
2
4
posterior
6
8
posterior lik prior
0.0
0.2
0.4
0.6 theta
0.8
1.0 36 / 1
Some Examples on the Effect of the Prior
10
83/100 successes: interested in probability of success θ
6 4 2 0
posterior
8
posterior lik prior
0.0
0.2
0.4
0.6 theta
0.8
1.0 37 / 1
The Prior Distribution
Take−home message: Be rather careful with the choice of prior, which of course is (or can be) subjective!
38 / 1
(Back to) The Posterior Distribution
39 / 1
Taking a Closer a Look at the Formulas
We can write the posterior distribution as follows: π(θ|y) =
π(y|θ)π(θ) π(y|θ)π(θ) = π(y) θ π(y|θ)π(θ) dθ
R
∝ π(y|θ)π(θ)
(1)
where π(y) is often called the marginal likelihood and plays the role of the normalising constant of the density of the posterior distribution, i.e. makes the area under the curve π(y|θ)π(θ) to integrate to one, i.e. a proper probability density function
40 / 1
How to Deal with the Normalising Constant? If we were only interested in finding θbMAP for which π(θ|y) is maximised, then there is no need to compute the normalising constant, π(y). Nevertheless, suppose that we want to get as a summary statistic from our posterior and compute, for instance, a posterior expectation, e.g. Z E [θ|y] = θ · π(θ|y) dθ θ
or the posterior variance etc. That, of course, requires knowledge of the full expression of π(θ|y), i.e. not just up to a normalising constant. How to compute this integral then? Numerical integration techniques? Can we “guess”? Or . . . 41 / 1
How to Deal with the Normalising Constant? If we were only interested in finding θbMAP for which π(θ|y) is maximised, then there is no need to compute the normalising constant, π(y). Nevertheless, suppose that we want to get as a summary statistic from our posterior and compute, for instance, a posterior expectation, e.g. Z E [θ|y] = θ · π(θ|y) dθ θ
or the posterior variance etc. That, of course, requires knowledge of the full expression of π(θ|y), i.e. not just up to a normalising constant. How to compute this integral then? Numerical integration techniques? Can we “guess”? Or . . . 41 / 1
How to Deal with the Normalising Constant? If we were only interested in finding θbMAP for which π(θ|y) is maximised, then there is no need to compute the normalising constant, π(y). Nevertheless, suppose that we want to get as a summary statistic from our posterior and compute, for instance, a posterior expectation, e.g. Z E [θ|y] = θ · π(θ|y) dθ θ
or the posterior variance etc. That, of course, requires knowledge of the full expression of π(θ|y), i.e. not just up to a normalising constant. How to compute this integral then? Numerical integration techniques? Can we “guess”? Or . . . 41 / 1
Do we Really Need to Compute the Normalising Constant? Instead of deriving the full expression of the probability density function of θ|y explicitly, we could draw samples from π(θ|y). If we have samples from π(θ|y) then we can approximate the posterior expectation as follows: M 1 X 0 θj , E [π(θ|y )] ≈ M
0
θj ∼ π(θ|y)
i=1
Therefore, the only thing we need to come up with, is a method which will allow us to draw samples from π(θ|y) without the need of evaluating the normalising constant.
42 / 1
Do we Really Need to Compute the Normalising Constant? Instead of deriving the full expression of the probability density function of θ|y explicitly, we could draw samples from π(θ|y). If we have samples from π(θ|y) then we can approximate the posterior expectation as follows: M 1 X 0 θj , E [π(θ|y )] ≈ M
0
θj ∼ π(θ|y)
i=1
Therefore, the only thing we need to come up with, is a method which will allow us to draw samples from π(θ|y) without the need of evaluating the normalising constant.
42 / 1
Do we Really Need to Compute the Normalising Constant? Instead of deriving the full expression of the probability density function of θ|y explicitly, we could draw samples from π(θ|y). If we have samples from π(θ|y) then we can approximate the posterior expectation as follows: M 1 X 0 θj , E [π(θ|y )] ≈ M
0
θj ∼ π(θ|y)
i=1
Therefore, the only thing we need to come up with, is a method which will allow us to draw samples from π(θ|y) without the need of evaluating the normalising constant.
42 / 1
Derive the Posterior Distribution via Sampling−Based Inference
43 / 1
Sampling−Based Inference
If we can draw samples from the posterior distribution π(θ|y) then we can do everything we want/need: Estimate the density (kernel density estimation, histogram); Estimate moments (eg means, variances), probabilities etc; Derive the distribution of (not necessarily linear) functions of the parameters g (θ) in a very straightforward manner; Visualise the relationship of two or more model parameters.
44 / 1
Sampling−Based Inference
If we can draw samples from the posterior distribution π(θ|y) then we can do everything we want/need: Estimate the density (kernel density estimation, histogram); Estimate moments (eg means, variances), probabilities etc; Derive the distribution of (not necessarily linear) functions of the parameters g (θ) in a very straightforward manner; Visualise the relationship of two or more model parameters.
44 / 1
Sampling−Based Inference
If we can draw samples from the posterior distribution π(θ|y) then we can do everything we want/need: Estimate the density (kernel density estimation, histogram); Estimate moments (eg means, variances), probabilities etc; Derive the distribution of (not necessarily linear) functions of the parameters g (θ) in a very straightforward manner; Visualise the relationship of two or more model parameters.
44 / 1
Sampling−Based Inference
If we can draw samples from the posterior distribution π(θ|y) then we can do everything we want/need: Estimate the density (kernel density estimation, histogram); Estimate moments (eg means, variances), probabilities etc; Derive the distribution of (not necessarily linear) functions of the parameters g (θ) in a very straightforward manner; Visualise the relationship of two or more model parameters.
44 / 1
A Toy Example on Sampling−Based Inference Suppose that the random variable X comes from a Gamma distribution with the following probability density function (pdf) fX (x|α, β) =
β α α−1 x · exp{−βx}, Γ(α)
α, β > 0
For any given α and β the expectation, i.e the mean of X is derived by doing this integral Z E [X ] = x · fX (x) dx X
and we also know that the probability Z 0.5 fX (x) dx P(X < 0.5) = 0
45 / 1
A Toy Example on Sampling−Based Inference
1. Suppose that, somehow, we have a way of simulating realizations from the Gamma distribution . . . 2. . . . and draw N samples where N is a large number, e.g. 100, 000. 3. If we plot the histogram of these N values by doing something like this in R hist(rgamma(10^5, 5, 3), prob=TRUE, main="Samples from Gamma(5,3)", xlab=expression(x),col=2)
46 / 1
A Toy Example on Sampling−Based Inference
1. Suppose that, somehow, we have a way of simulating realizations from the Gamma distribution . . . 2. . . . and draw N samples where N is a large number, e.g. 100, 000. 3. If we plot the histogram of these N values by doing something like this in R hist(rgamma(10^5, 5, 3), prob=TRUE, main="Samples from Gamma(5,3)", xlab=expression(x),col=2)
46 / 1
A Toy Example on Sampling−Based Inference
1. Suppose that, somehow, we have a way of simulating realizations from the Gamma distribution . . . 2. . . . and draw N samples where N is a large number, e.g. 100, 000. 3. If we plot the histogram of these N values by doing something like this in R hist(rgamma(10^5, 5, 3), prob=TRUE, main="Samples from Gamma(5,3)", xlab=expression(x),col=2)
46 / 1
A Toy Example on Sampling−Based Inference We get something like this:
0.3 0.2 0.1 0.0
Density
0.4
0.5
Samples from Gamma(5,3)
0
1
2
3
4
5
6
7
x
47 / 1
A Toy Example on Sampling−Based Inference We get something like this and draw fX (x) on top:
0.3 0.2 0.1 0.0
Density
0.4
0.5
Samples from Gamma(5,3)
0
1
2
3
4
5
6
7
x
48 / 1
A Toy Example on Sampling−Based Inference That means that we can: “approximate“ (or “estimate”) the mean E[X] by the sample mean, i.e. N 1 X E[ [X ] = xi N i=1
and the probability P(X < 0.5) by the proportion of the values in the sample which are less than 0.5, i.e. N 1 X \ P(X < 0.5) = 1(xi < 0.5) N i=1
Of course, these “approximations” are getting better and better as N →> ∞. 49 / 1
An Example of a Bivariate Distribution
6
●
●
●
●
●
●
● ●
● ●
●
●
● ●
●● ● ●
● ●
● ● ●
●
●
● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●●● ● ● ● ●● ● ● ● ●● ● ●● ●●● ●● ● ●● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ● ●● ● ●●●● ●●●●●● ●●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●●● ● ●● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ●● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ●●● ●● ● ● ● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●●● ● ● ●● ●●● ● ●●● ●● ● ●● ● ● ●●● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ●● ●●● ●● ● ● ●● ●● ●●●●●● ●● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●●●● ● ●● ● ● ●●● ●● ● ●● ● ● ● ● ●●●● ●●● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ●●● ●●●●●● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ●● ●●●●● ● ●● ● ● ● ● ● ● ● ●● ●● ●●●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ●●●● ●●● ● ● ● ● ●● ●●●● ●● ●●●● ● ●●●● ● ●●● ●● ● ● ● ●●● ●● ●●● ● ●● ●●●● ● ●● ●● ● ● ●● ●● ●● ● ● ●● ● ●● ●● ●● ●●●● ● ● ●● ●●● ● ●●● ● ●● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ●● ●● ●●●● ● ● ● ● ●● ●●● ●● ● ● ● ●● ●● ● ●●●● ●●●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ●●●●●●● ●● ● ● ● ●● ● ● ●● ●● ●● ●● ● ● ● ●● ●●● ● ● ●●●●● ●● ● ●●● ●●● ● ● ●●● ● ●● ●●● ● ●● ●● ● ● ●●● ●●●● ● ●●● ● ●● ● ●●●●● ●● ● ●●● ● ● ● ● ● ●●●●●● ●● ● ● ● ● ●● ●●● ●●●● ●● ● ● ● ●●● ●●● ● ● ●● ● ● ● ●● ●●● ●● ● ●●●●●● ●●● ●● ●● ● ● ● ●● ●● ●●● ● ●●● ●● ● ●● ● ●● ● ● ●● ● ● ● ●●● ●● ● ● ●● ●●●● ●● ●● ● ●●●● ●● ●●● ● ● ●●●●● ● ●● ● ● ● ●●● ●● ●●● ●●● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ●● ●● ● ●●●●● ●●●●●●● ● ● ● ● ● ● ● ●● ●●●●● ●● ●●●● ● ● ● ● ● ● ●● ●●●●●●●●●● ● ●● ● ● ●●●●●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●●●● ●●●●●●●● ●● ● ●●● ● ● ● ●●●● ● ● ● ● ● ●● ●● ●●● ● ● ●● ● ●●●●●●● ● ● ●●●●●● ●●●●●● ●●●●●● ● ● ● ●●●●● ●● ● ● ● ●● ●● ●●● ●●● ● ● ●● ●●● ●●● ● ● ●● ●●●●●●●●● ●● ●● ●●● ● ●● ● ● ●●● ●● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ●●●● ●● ●●● ●● ● ● ● ●●● ●●●●●●● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ●●●●●●●●●● ● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ●●● ●●● ● ● ●● ●● ● ●●●● ●●●● ●● ● ● ● ● ● ● ●● ● ● ●● ●●●● ● ●● ●●●●● ● ●●● ● ● ● ●● ● ● ●● ●● ● ●● ●●● ● ● ●●●● ●● ●● ●● ●●●●● ●●● ●●● ● ● ●●●●● ● ●●●●●●●● ● ● ●●● ●● ●● ●● ● ●● ● ● ●● ● ●●●● ● ●● ●●●● ● ●●●●●● ●● ● ● ●● ● ●●● ●●●● ●● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ●●●● ● ●● ● ● ●●● ● ●●● ●● ● ● ● ●● ● ●●● ●●●● ●●● ●● ●●● ●● ●● ● ●●●● ● ●● ●● ● ●●●● ●● ● ●●●●● ● ●● ●●●●● ● ●● ●● ●●●● ● ●●● ● ● ●● ●● ●●●●● ● ● ●●●●● ●●●● ● ●● ● ●●●● ●● ● ●● ● ● ●●● ● ● ● ●● ● ● ●● ●●● ●● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●●● ●●● ●● ● ●● ● ● ● ●●●●● ●●●●●● ● ●● ● ● ● ● ● ● ●● ●●● ●●● ● ●● ● ●●●●● ●●●●● ●●●●● ●● ●●● ●●●● ●●●● ●●●● ● ● ●● ● ●● ●●●●● ● ●● ● ● ● ●●● ●●●●● ●● ●● ●●●●●● ● ●●●● ●● ●● ● ● ● ● ●● ● ●● ●● ● ● ●●●● ● ● ●● ● ●● ●● ● ●● ●●●● ● ●● ●● ● ● ●●● ● ● ●● ●●●●● ● ● ●●● ● ●●●● ●●● ● ● ●●● ●●● ●●● ●● ● ● ● ● ● ●● ●●● ● ● ● ●● ●●● ●● ●●● ●● ●●● ● ●● ●●● ● ● ●● ●● ● ●● ●● ● ●● ● ●● ● ●●● ●●● ● ● ● ●●●● ●● ● ●● ● ●●● ● ● ● ●●● ●●●●●●●● ●● ● ●●● ● ● ●● ●● ● ●● ● ● ●● ●●●●● ●● ●●●● ●● ● ● ● ●● ●●● ●● ● ● ●● ●●● ● ●● ● ● ●● ●● ● ● ● ●● ●● ●● ● ●●●●● ●● ● ● ● ● ● ● ●● ● ● ●●●● ●●● ● ●●● ● ● ●●● ● ● ●● ●● ●●● ● ● ● ● ● ● ●●●●●●● ●●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ●●●●●● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ●●● ● ●●● ● ● ●● ● ●●● ● ● ● ● ● ●●●●● ● ●● ● ● ●●●● ●● ● ● ● ●●● ● ● ●● ● ● ●●●● ●●● ●● ●●● ●●●●● ● ●● ● ● ●● ●● ●● ● ● ●●● ● ● ● ● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ●●● ●●●● ●● ● ●● ● ●●● ● ●●● ●● ● ● ●● ●● ●●●●●●●● ● ●●●●●●●● ● ●●●● ● ●●●● ●● ●●●● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ●●● ● ●●● ●● ●●●●● ● ●● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ●●●●●●● ●● ●●●● ●● ●●● ●● ● ● ●●● ● ● ● ●●● ●● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ●●● ●●●● ● ●●●●● ● ●● ●● ●● ● ●● ● ●● ● ●● ● ● ●●● ● ●● ● ●●●●●●●●● ● ● ●●● ●●●● ● ● ● ●●●● ●●●● ● ●●●●● ●● ● ●● ●● ● ● ● ●●●●●● ●● ● ●●● ● ● ●●●●● ●●● ●●● ● ●● ●● ● ●● ●● ● ●● ●● ●●● ● ● ●● ● ● ●●● ●●● ● ●●● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ●●●●● ● ● ●●● ●● ● ●●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ●● ● ● ●● ●● ●● ● ● ● ●●●●● ● ●● ●● ●●● ●●●●● ● ● ● ●● ●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●● ● ●● ●●●● ● ● ●●● ● ● ● ●●● ●● ●● ● ●● ●●●●● ● ●● ● ●● ● ● ●●●●●●●● ●●● ●● ● ● ● ● ● ●● ● ●●● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●●●● ●●● ●●●● ●● ●●● ● ● ●● ●● ● ●●● ● ●●●● ●● ● ●●● ●● ●●● ●● ● ● ● ● ●● ● ● ● ● ●● ●●● ●● ●● ● ●● ●●● ●● ● ● ● ●●● ●● ● ● ●●● ●●● ●● ●●● ●● ●● ● ● ● ● ●● ●● ●● ● ● ●●● ● ●●●●● ●● ● ●●●●●●●●● ●●● ● ●● ● ● ●●● ● ● ● ● ●● ● ●● ● ● ●●● ●●●● ●● ●● ● ●● ●●● ● ●● ●● ●●● ● ● ●● ●● ● ●●● ● ●● ● ● ●● ●●● ● ● ●● ● ●●●● ● ●● ● ●● ● ● ●●●● ● ●● ●● ●●●● ●● ●●●● ●● ●● ● ●● ● ●● ●● ●● ●● ●● ●● ● ●● ● ●● ● ● ●●●●● ●●●● ●● ● ●● ●●● ●●●●● ● ●● ● ●●● ● ● ●●● ● ● ●●●● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ●●●●●● ●● ● ● ● ● ●● ● ● ● ● ●● ●●●● ● ●●●●● ●● ●● ●● ●● ● ● ●●● ●● ● ●● ●●●● ● ●●●●● ●● ● ●● ● ●●●● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ●● ●● ●● ● ● ●● ● ●●● ● ● ●●●●● ●● ● ●● ● ●● ● ●● ● ● ● ● ●●●● ●● ●● ●●● ●●● ●● ●● ●●●● ● ● ● ●●● ●● ● ● ●● ● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●● ●●● ●● ●● ● ● ●● ●● ●● ●●● ● ●●●● ● ● ●● ●●● ● ●● ● ●●● ● ● ● ●● ●●●●●●● ● ●●● ●● ● ●● ●● ● ● ●●● ●●●●●● ●●●● ● ● ●● ● ● ● ● ●●● ●●●●● ● ● ● ●●●● ● ●●● ●●● ● ● ● ● ●●● ●● ● ● ● ● ● ●●●● ● ● ● ●● ● ●● ●●● ●● ● ● ●● ● ●● ●● ●● ●●●●●● ● ●●●●● ●●● ●● ●● ●●● ●● ●●● ● ● ●●●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●●●● ● ●● ● ● ●● ●● ●● ●●●● ●● ●● ●●●● ●●●●●●● ● ● ● ● ●● ● ●●● ●● ● ●●●● ● ● ● ●● ●●●●●● ●● ●● ●●● ●● ● ●●●● ● ●●● ● ●●●● ●●●● ● ● ● ●●● ● ●● ● ● ● ●● ●●●●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ●●●●● ●●● ● ● ● ● ● ●● ● ● ●●●● ● ●●● ●● ● ●● ● ●● ● ●● ●●● ● ●● ●● ● ● ● ●● ● ● ●●● ● ● ● ●● ●●● ● ● ●●●●●●●● ●●● ● ●●●● ●●● ●● ● ● ● ●●●●● ● ● ● ●● ● ● ● ● ●●●●● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ●●● ● ●●●● ● ● ●● ●● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ●● ● ●● ●●●● ●●● ●● ●● ● ● ●● ●●●●● ● ●● ● ●●● ●●●●● ● ●●● ●● ●●●●● ●● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ●●●●● ●●● ● ●● ●● ● ●● ● ● ●●● ● ● ●● ● ● ●● ●●●● ● ●● ●● ● ●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ●●●● ●●● ●● ● ●●●●● ● ● ●●●●● ● ● ● ●● ● ● ●●●●●● ●●● ● ●● ●● ● ●● ●●●● ●● ●●● ●● ● ● ● ●● ●●● ●● ● ● ● ●●●● ● ●●●● ●●● ● ●●●● ● ● ●●●● ●● ●● ●● ● ●● ●● ● ● ●● ● ●● ●● ●●●●●●● ● ●●● ● ●● ●● ●● ● ●● ● ●● ● ● ●●●● ●●● ● ●● ●● ●● ● ● ● ●● ●● ● ● ●● ●●● ●●●●● ●● ● ●●●●● ●●● ● ● ●● ●●● ● ● ● ●●● ●● ● ● ●● ●●● ● ●● ●●● ●● ●● ●● ●●●●●●● ●●●● ●● ●● ● ● ●●● ●● ● ●● ●●●● ●●● ● ●● ●●●● ●● ● ●●● ● ●● ● ●●● ● ● ● ●● ● ●● ●●●●● ● ●●●●●●●● ●● ● ● ● ●● ● ● ● ●●● ●●● ● ● ●● ● ●● ●●● ● ●● ● ● ●● ●● ●● ●●● ● ●●●●●● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ●● ● ●● ●●●●●●● ● ●●● ● ●● ●● ● ●● ●● ●● ● ●●●● ●● ● ●●●●●●●● ●●●● ● ●●● ● ● ●● ●● ● ●● ● ●●●● ● ● ● ● ● ●●● ● ● ●● ● ●●● ●●●●● ●● ● ●● ●●● ● ● ● ● ●●● ● ● ● ●●● ●● ●● ● ●●● ● ●● ●● ●●● ●● ●●● ● ●● ●● ●●●● ●●● ●●●● ●● ●●● ● ●●●● ● ● ● ●● ●●●●● ● ● ●●● ●●●● ●● ● ●●●●●● ● ● ● ●●●●●● ●●●● ●● ●●●● ●●●● ● ● ●● ●● ●●● ●● ●●●●●● ● ●●●● ● ●● ●● ●●● ● ● ●●● ●●●●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ●●●● ●●●● ● ●● ●● ● ● ●● ● ●● ●● ●●● ●●●● ●● ●●● ● ● ● ● ●● ● ● ● ●●●● ●● ● ●●●●● ●● ●●● ● ● ● ●● ●● ● ●●● ●● ●●●● ● ●● ●● ● ● ● ●● ● ●● ● ●●●●●●●●●●●●● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ●●● ● ● ● ●●●● ● ● ●●●●● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ● ●● ●●●●●● ●● ● ●●● ●● ●●●● ● ●● ●● ●● ● ● ● ●●● ●● ● ●● ● ●● ●●●●●● ● ●●●● ●● ●● ● ● ● ● ● ●● ●●●●●●● ●● ●● ● ●● ●●●●● ● ●●● ● ●●● ●● ● ● ● ●● ● ●● ●● ●● ●● ●● ● ●● ●●●● ● ● ● ● ●●●●● ● ●●●● ●●●● ●●●● ●●●●●● ●● ● ●●●● ● ●● ●● ● ● ●●● ● ●● ● ●● ●● ● ●● ● ●●● ● ● ● ●● ●● ●● ● ●● ● ●●●●●● ● ● ●● ● ●● ●● ● ● ●●●● ●●● ●●●● ●● ● ● ●●● ●● ●●● ●●●●● ● ●● ●● ●● ● ●●●●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ● ●●● ● ●●● ● ●● ● ● ●● ●● ●●● ●● ● ●●●● ● ●● ● ●●● ●● ● ● ● ● ●● ●●●● ● ●●●●●●● ● ● ●● ● ●● ● ●●●●● ●● ● ● ●● ● ●●●●●●● ●● ●●● ●●●●●●●● ●● ● ● ●● ●●● ● ● ● ●● ●●●●● ● ●● ●● ● ● ●● ● ●●●● ●●● ● ●● ● ●●● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●●●● ● ●● ● ●●●●● ● ● ●●● ●●●● ● ●●●●●● ●● ● ● ● ●●●●●● ● ●● ● ●●● ● ● ● ●● ● ●● ● ●●● ● ● ●●● ●●● ● ● ●●●● ● ●● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ●● ●● ●●● ●●●●●●●●●●● ●● ●●●●●●●● ●● ● ●● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●●●●● ●● ●●●●● ● ● ● ●● ●● ● ●●● ● ●● ●●●●●●●● ●● ●●● ●● ●● ●● ●● ● ● ●● ● ● ●●● ●● ●● ●●● ● ●● ● ●● ●● ● ●● ●●● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ●●●●● ●● ● ● ● ● ●●● ● ●●●● ● ● ● ● ●●● ●● ●●● ●● ● ● ●● ● ●●● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●●●●●●●●●● ● ● ● ● ●● ●● ●●● ● ●●●● ● ● ● ●● ●● ● ●●●●●● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ●●●● ●●● ● ● ● ● ● ● ●●●● ●●● ● ●● ● ●●●● ● ● ●● ●● ●● ● ●●●● ● ●● ●● ●●● ● ● ●● ● ●●●●● ●●● ●●● ●●●●●● ● ●●● ● ● ●● ● ●● ●●● ● ●● ● ●●●● ● ● ●● ●● ●● ●● ● ●●●● ● ● ●●●●●●●● ● ●● ● ● ● ●●● ●● ● ● ● ●● ● ● ●● ● ●●●●● ● ● ● ●● ●● ●●● ● ●● ●● ● ● ● ●● ● ●●● ● ● ●● ●●●● ● ●●● ● ●● ● ● ●●● ● ●● ● ● ● ●●● ●●● ● ●● ● ●● ●● ● ● ●● ● ● ●●●●●●● ●●●● ● ● ●● ●●●● ● ● ● ● ● ● ●●●●● ● ●●● ●●● ● ● ●● ● ●●● ● ●●●● ●● ● ● ●●● ●●●●●● ●●● ● ● ●● ● ● ● ● ● ●● ●● ● ●●● ● ●●● ●● ●●●●●●●● ● ●●● ●●●● ●● ● ● ●● ● ●● ●●● ● ●●●● ●●●● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ●● ●●●● ●● ● ● ● ●● ● ● ●● ●●●● ● ● ● ●● ●●●●● ●●● ● ● ● ●●● ●●● ● ● ● ●● ● ●● ●●● ● ●●● ● ●●●● ● ● ●● ●● ●● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ●●● ●● ● ●● ● ●● ●● ●●●●● ●● ● ● ●● ● ● ●● ●●●●● ● ●●● ●●● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ●●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ●● ●●●● ● ●●●●●● ●●● ● ● ● ● ● ● ● ●●● ● ● ●●● ●●● ●●●●●●●● ●●● ●●●● ●●●● ● ● ● ● ● ● ●●● ● ●● ●●●●● ●● ● ● ● ●●● ●● ● ●●● ●●● ● ●● ● ●● ●● ●●● ● ● ●●● ● ● ●●●●● ● ●●●● ●● ● ●●● ● ● ● ● ● ● ●● ●●● ●● ●●●●● ●●●● ● ●● ●● ●●● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ●●● ● ● ●● ●● ● ● ●●● ● ●●●● ● ●● ●● ●● ● ●●●● ● ●●●●●●● ●●● ● ●●●●● ●●● ● ●● ●● ●● ● ● ● ●●●● ● ● ●●● ●● ●●● ● ● ●●● ● ●●● ● ● ●●●●● ●●● ● ●● ●●● ● ● ● ●●●● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ●● ●● ●● ● ●●●● ● ● ● ●●● ● ●●● ● ●● ● ●●● ● ●●●●●● ●● ● ● ● ●● ●● ● ●● ● ● ●● ●●● ●●●● ● ●● ●● ● ●●●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●●●● ● ● ●● ● ●●●● ●● ●● ● ● ● ● ●●● ● ●●● ● ● ● ●● ●● ● ●●●●● ● ● ● ● ● ● ●●●●●● ●● ●● ● ●●●● ● ●● ●●●●● ●● ●●● ●● ● ● ●●●● ●●● ● ● ●● ●●●●● ●●●●● ● ●●● ● ●●● ●●● ● ● ●●●● ●●● ●● ●● ● ● ●●● ● ●● ●● ● ● ●●● ● ● ●● ●● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●●● ● ● ● ●●● ● ● ● ●● ● ●●● ●●● ●●●● ● ● ●●● ●●● ●● ●●● ● ●●● ●●● ●● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ●●●● ●● ●●●● ● ● ●● ● ●●●●●● ●● ●● ●●●● ●●● ● ●● ● ●● ●● ● ●● ●●● ●●● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ●●●● ●●● ●● ● ●● ●● ● ●● ● ● ● ●● ● ● ●●● ● ●● ● ●●● ●● ● ●● ●● ● ●●● ●● ●●● ●● ● ● ●●●●● ● ● ●●●● ● ● ●● ● ●●● ●● ●● ●●●●● ●● ●●● ● ● ●●● ●●●● ● ●● ● ● ● ●● ●● ●●●●●● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ●●● ● ● ●●●●●●●●●●● ●● ●●● ● ● ● ●● ● ●● ● ●●● ●● ●●● ● ● ●● ● ● ●●● ● ●●●●●●● ● ●● ●●●● ● ●● ●●●●●●●● ●● ● ● ●●●● ● ●● ● ●●●●● ● ● ● ● ●● ● ● ●● ●● ●● ●● ● ● ●● ●●● ● ●●●●●● ●● ●●● ● ●●● ● ●● ●● ● ● ● ●● ● ● ●●●●●●● ● ● ●●●● ● ●●● ●●● ●●● ● ● ●●● ●●● ● ●● ● ● ●● ● ● ● ●●●●●●●●● ●●● ●●● ● ●● ● ● ● ● ●● ●● ●●●● ●● ●●● ● ● ●● ● ● ●● ●● ●● ● ●●● ● ●● ● ●●● ● ●●●●● ●● ●● ● ●●●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ●●● ●●●● ● ●●●●● ●●●● ● ● ● ●●● ● ● ●● ●● ● ● ●●● ●● ● ●● ● ● ● ●●● ● ●● ● ● ●●● ● ●● ●● ● ● ● ● ●●●● ● ●● ● ●● ● ●● ●● ●●● ● ●● ●●● ● ● ● ●● ●●● ●● ● ●●●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●●●● ●● ● ● ● ● ●● ● ● ●●● ● ●●●●● ●●●●● ● ● ● ●●● ●●●●●●● ● ●●●● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●●● ●● ●● ●● ●● ●●●● ●●● ● ● ●● ●●●●●● ● ●● ● ● ●● ●● ● ● ● ●● ● ●●●● ●●●● ● ● ●● ●● ● ●● ● ●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ●● ● ● ●● ●●● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ●●● ● ● ●● ● ●●●● ● ● ●● ●●●● ●● ● ●●● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ●●● ● ●● ● ●● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●●● ●● ●● ● ●● ● ●● ●● ● ●● ● ● ● ●●●● ● ● ●● ● ●● ●● ●● ●● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ●● ● ●●●●●● ●● ● ●● ● ●● ●●●● ●●● ●●● ● ● ● ● ● ●● ● ● ●●●●● ● ● ●●●●●●● ● ●● ● ● ●●● ●●● ● ● ●● ● ● ● ● ● ● ●●●● ● ●●●● ● ●● ●● ● ●● ● ●● ● ● ●● ●●●●●● ●●● ● ●● ●● ● ● ● ●● ●● ● ●●●●● ● ● ●●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ●● ● ● ●●●● ●●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ●● ●● ●● ● ●● ●● ● ●● ● ●● ●●● ● ● ●●●● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ●●●●●● ●● ● ●● ●● ●● ● ● ● ● ●● ● ●● ●●● ● ● ●● ●● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●●● ●●●●●● ●● ● ● ●●●●●●● ● ● ●●●● ● ● ●● ● ● ● ●● ● ●●●● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ● ●● ●● ● ● ●●● ●● ● ● ●● ●● ● ●● ● ●● ● ●● ●●●● ● ● ●●● ● ●● ●● ●●●● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ●●● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●●● ●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●●●● ● ●●● ●●●● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
2
y
4
●
●
0
●
● ● ●
●
● ●
● ●
●
● ● ●
●
● ● ●
● ●
● ●● ● ●●
● ●
●
●●
●
● ●
● ●● ● ●
●
●
● ●
● ●
● ● ●
●
●
●
●
●
● ● ●
●
● ● ●
●
●
−2
● ●
−2
0
2
4
x
50 / 1
Derive the Posterior Distribution via Analytic Integration
51 / 1
Obtaining the Posterior Distribution Analytically The basic idea relies on the following simple observation: Consider for example the random variable X which follows a Gamma distribution (as in the example before) with pdf: fX (x|α, β) =
β α α−1 x · exp{−βx}. Γ(α)
Since this is a proper pdf then it holds that Z β α α−1 x · exp{−βx} dx = 1 X Γ(α) and re-arrangement give us that Z Γ(α) x α−1 · exp{−βx} dx = α β X Of course, the same idea applies for other well-known distributions and their densities 52 / 1
Obtaining the Posterior Distribution Analytically By making use of that idea, it is often the case that with a suitable choice of prior distribution we can avoid calculating the normalising constant. Recall the Binomial example(n, θ): Our parameter of interest is a probability, and therefore can take values only between 0 and 1. A suitable prior distribution which takes the above into account is the Beta distribution with some parameters, say λα and λβ : π(θ) =
1 θλα −1 (1−θ)λβ −1 , B(λα , λβ )
λα , λβ > 0, 0 < θ < 1
53 / 1
Obtaining the Posterior Distribution Analytically The prior density is π(θ) =
1 θλα −1 (1 − θ)λβ −1 , B(λα , λβ )
λα
The likelihood function is: n π(x|θ) = L(θ) = · θx · (1 − θ)n−x x To derive the posterior density we just multiply the two terms: π(θ|x) ∝ θx · (1 − θ)n−x · θλα −1 · (1 − θ)λβ −1 π(θ|x) ∝ θx+λα −1 · (1 − θ)n+λβ −x−1 Note that we have derive π(θ|x) up to proportionality! 54 / 1
Obtaining the Posterior Distribution Analytically In this case Z
θx+λα −1 · (1 − θ)n+λβ −x−1 θ dθ
θ
is equal to B(x + λα − 1, n + λβ − x − 1) because we are making use of the basic idea we described earlier, i.e. that Z 1 x A−1 · (1 − x)C −1 = 1 x B(A, C ) and hence Z
x A−1 · (1 − x)C −1 = B(A, C )
x
which holds for any x,A and C ! 55 / 1
Obtaining the Posterior Distribution Analytically
That allows us to say that θ|x ∼ Beta(x + λα , n + λβ − x) which is very convenient because we know a lot of things about the Beta distribution, such as the mean, the variance etc Summary: Using a Beta distribution as a prior for θ led to a Beta posterior distribution for θ. The above is a special case of what is called conjugate priors.
56 / 1
A Recap
57 / 1
Bayesian vs Frequentist Inference
Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.
58 / 1
Bayesian vs Frequentist Inference
Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.
58 / 1
Bayesian vs Frequentist Inference
Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.
58 / 1
Bayesian vs Frequentist Inference
Everything is assigned distributions (prior, posterior); we are allowed to incorporate prior information about the parameter . . . which is then updated by using the likelihood function . . . leading to the posterior distribution which tell us everything we need about the parameter.
58 / 1
Other Aspects
59 / 1
Bayesian Inference via MCMC
There exist a wealth of algorithms, techniques, approaches which enable us to draw samples from distributions which in a Bayesian framework would be posterior distributions. Markov Chain Monte Carlo (MCMC) is one of tools available which enables us to do just that. Although MCMC has been around in the Physics community for more than 60 years, it is only until the 90s that was discovered by statisticians! Other approaches such as Particle Filters and Approximate Bayesian Computation are some alternatives to MCMC.
60 / 1
Bayesian Inference via MCMC Although Bayesian inference has been around for long time it is only the last two/three decades that it has really revolutionized the way we do statistical modelling. Although, in principle, Bayesian inference is straightforward and intuitive when it comes to computations it could be very hard to implement it. Thanks to computational developments such as Markov Chain Monte Carlo (MCMC) doing Bayesian inference is a lot easier. There is growing research interest in developing robust and fast methods even if they are approximate.
61 / 1
Data Augmentation So far, we have assumed that we can write down the likelihood function, i.e. how likely is to observe what we have observed given our parameters. However, in many real-life problems this is not possible due to missing data. In classical/frequentist statistic one can use the Expectation-Maximization (EM) Algorithm to find the MLE. The Bayesian paradigm can handle such problems very naturally since one can treat the missing data as extra parameters and draw inference, for example, via MCMC. There are many cases where statisticians who are not really “Bayesians” in principle, employ a Bayesian approach just because they can get answer for their problem in hand much easier than a frequentist framework. 62 / 1
Bayesian Model Choice Suppose that we are interested in testing two competing model hypotheses, M1 and M2 . Within a Bayesian framework, the model index M can be treated as a random variable. So, it is natural to ask “what is the posterior model probability given the observed data?”, i.e. (M1 |y) or P(M2 |y) But, P(M1 |y) =
π(y|M1 )π(M1 ) π(D)
where π(y|M1 ) is the marginal likelihood (also called the evidence), π(M1 ) is the prior model probability
63 / 1
Bayesian Model Choice (2) Given a model selection problem in which we have to choose between two models, on the basis of observed data y. . . . . .the plausibility of the two different models M1 and M2 , parametrised by model parameter vectors θ1 and θ2 is assessed by the Bayes factor given by: R π(y|θ1 , M1 ) dθ1 P(y|M1 ) = Rθ1 P(y|M2 ) θ2 π(y|θ2 , M2 ) dθ2 The Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. This is similar to a likelihood-ratio test, but instead of maximizing the likelihood, we average over all the parameters. 64 / 1
Bayesian Model Choice (2) Given a model selection problem in which we have to choose between two models, on the basis of observed data y. . . . . .the plausibility of the two different models M1 and M2 , parametrised by model parameter vectors θ1 and θ2 is assessed by the Bayes factor given by: R π(y|θ1 , M1 ) dθ1 P(y|M1 ) = Rθ1 P(y|M2 ) θ2 π(y|θ2 , M2 ) dθ2 The Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. This is similar to a likelihood-ratio test, but instead of maximizing the likelihood, we average over all the parameters. 64 / 1
Bayesian Model Choice (2) Given a model selection problem in which we have to choose between two models, on the basis of observed data y. . . . . .the plausibility of the two different models M1 and M2 , parametrised by model parameter vectors θ1 and θ2 is assessed by the Bayes factor given by: R π(y|θ1 , M1 ) dθ1 P(y|M1 ) = Rθ1 P(y|M2 ) θ2 π(y|θ2 , M2 ) dθ2 The Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. This is similar to a likelihood-ratio test, but instead of maximizing the likelihood, we average over all the parameters. 64 / 1
Bayesian Model Choice (2) Given a model selection problem in which we have to choose between two models, on the basis of observed data y. . . . . .the plausibility of the two different models M1 and M2 , parametrised by model parameter vectors θ1 and θ2 is assessed by the Bayes factor given by: R π(y|θ1 , M1 ) dθ1 P(y|M1 ) = Rθ1 P(y|M2 ) θ2 π(y|θ2 , M2 ) dθ2 The Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. This is similar to a likelihood-ratio test, but instead of maximizing the likelihood, we average over all the parameters. 64 / 1
Bayesian Model Choice (3) Why bother? An advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure. It thus guards against overfitting. Ways to compute the Bayes Factor Reversible Jump MCMC (a more advanced MCMC) Approximate the marginal likelihoods e.g. using thermodynamic integration/path sampling.
65 / 1
Conclusions
66 / 1
Conclusions Quantification of the uncertainty both in parameter estimation and model choice is essential in any modelling exercise. A Bayesian approach offers a natural framework to deal with parameter and model uncertainty. It offers much more than a single “best fit” or any sort “sensitivity analysis”. Markov Chain Monte Carlo methods are only one of the available tools that enables us to draw samples posterior distribution. There exist others such as Approximate Bayesian Computation (ABC), Sequential Monte Carlo, (Particle) Filtering etc.
67 / 1
Conclusions (2)
Although we have focused on some very simple models, the same techniques apply for more complicated situations. Nevertheless, it is often the case (depending on the model, the data etc) that alternative methods should be used/developed to improve the efficiency of standard methods.
68 / 1
View more...
Comments