How to calculate brackets to contain the correct answer with a given probability of success...
Confidence Intervals for Validating Simulation Models
Summary. Summary. This paper summarizes the processes for building and using confidence intervals to evaluate 2 the validity of simulation models. We can use statistical methods to evaluate the goodness of fit between simulation engines and real life. This requires the operator to select a probability probability of compliance that feels good, e.g., 98%. With that value of probability of inclusion (P) we can calculate the probability of exclusion (a), and with that and the sample size n calculate n calculate the interval (in terms of the the experimental experimental average and variance S2) within which the simulation data simulation data must fall to be representative of real life. The basic principle is to use live data sampling to establish a statistically valid estimate of the population mean and variance for a particular parameter as well as a range of values about each (called confidence intervals) that are dependent on the operator’s sense of percent confidence required. Given these intervals the task is to then decide whether the corresponding mean and variance statistics of the simulation data fall within the acceptable confidence interval about the sample statistics. Parameters other than mean and variance can be used for comparison. For example, the Universal Naval Task List (UNTL) contains a host of measures of effectiveness (MOEs) that can be used for making comparisons between systems and processes. We have restricted ourselves to the mean and variance in this paper only for illustrative purposes. Confidence Interval. Interval . A confidence interval is the region in the vicinity of a specified value of a phenomenon’s parameter within which another value may lie with a given probability [100(1- α)%], based on the results of observing n samples n samples of the phenomenon: α% Interval
Figure 1 - Generic Confidence Interval Interval Graph In general, experimental sample data are assumed to be only representative of some larger (and generally infinite) set of data elements. Suppose we are measuring the time t to t to detect a target after it arrives within a specified range of the sensor. We can observe the detection phenomenon a number of times and calculate the average time to detect. But we might be more interested in knowing what the average detection time would be if we observed an infinite number of trials. This infinite set is said to be the global “population” of data. Sample and Population Statistics. Statistics . A “statistic” is some number calculated from data data that is used to characterize the data set. There are many different statistics; the more commonly used are the arithmetic 3 mean (“average”) and the variance.
The population arithmetic mean is generally indicated by the Greek “m” letter µ (“mu”), while the sample mean is indicated by a letter with a bar over the top, for example t (called “t-bar”). The 4 sample mean is calculated as the sum of the parameters divided by their count:
C. Andrews La Varre, Booz Allen & Hamilton Inc., Newport, RI, October 2000
Douglas C. Montgomery, Statistical Quality Control , John Wiley & Sons, 1985, ISBN 0-471-80870-9, Section 2-3, Chapter 3.
t statistic, and many, many others. We will highlight those of immediate interest for this Others include the standard deviation, the t statistic, problem. 4
For example, the mean of 2, 4, and 6 = (2+4+6)/3 = 4
∑ t i =1
The population variance is generally indicated by the squared Greek “s” letter σ (“sigma 2 squared”) while the sample variance is indicated by the squared capital “S” letter S (“Ssquared”). The sample variance is the average squared distance squared distance that the parameter varies from 5 its average: its average: n
∑ (t − t )
n − 1
The square root of the variance is called the “standard deviation” as both the population σ and the sample S . 6
It turns out that the sample and population statistics are related mathematically by expressions that depend on the sample size. For the normal distribution discussed below these relationships are that:
the sample mean t is distributed normally about the true population mean µ, but with a variance 2 2 2 equal to the population variance σ divided by the sample size n : σx = σ / n n.
Similarly the variance of the sample set is not the same as that of the population; population; here there is a function of the sample variance that fits a particular curve called the Chi-squared distribution. distribution. The formula is rather arcane so is bypassed here for simplicity. Confidence intervals are used to evaluate how close sample statistics are to the population statistics. Effectively, they let us scale the observed data to a common (“standard”) interval in order to reach consistent conclusions about experimental data. The formulas are different for different kinds of events. We will describe the formulas for the most common type of event, that whose data conform to the familiar bell-shaped curve, the “Normal” distribution. Normal Distribution. Distribution . A “Normal” distribution curve is a symmetric curve centered on an average, the “arithmetic mean” value. It gives a graph (a continuous histogram) of the expected number of times a Normally-distributed Normally-distributed parameter is likely to occur. The edges (or “skirts”) of the graph fall off in a manner defined by the “variance”, such that 69% of the values occur within plus or minus one sigma (+ σ ) on either side of the mean. The mean can be any value, as can the variance.
Figure 2 - "Normal" Distribution Standard Normal Distribution. Distribution . The “Standard” Normal distribution is a specially scaled version of the ordinary “Normal” distribution. This curve is centered on zero and shaped so that its variance is 1.0, causing 69% of the observations will fall between the values +1 and -1: 5
For the same example, 2, 4, 6 vary from their average (4) by –2, 0, 2 respectively, the squared amounts being 4, 0, 4, respectively. So their variance is the average of these three values, or 8/3. The term is squared to nullify the effect of negative numbers, since we are interested in just the size of the distance. The sum is divided by n-1 rather than n to ensure the result more closely approaches the population variance.
Ibid , Section 3-1
1 Figure 3 - "Standard" Normal Distribution -1
The process of converting a normal to a standard normal distribution is called standardization and standardization and involves using a conversion parameter z : z =
t − µ σ
These characteristics are very useful as a “standard” since any “Normal” curve can be scaled to the 7 “Standard Normal” to allow making comparisons and drawing conclusions. Normal and Standard Normal Probabilities . It turns out that the probability of a value being less than or equal to some value t in t in the Normal distribution is calculated as the integral of the Normal distribution 8 over the range from the far left edge up to the value t . The value of this integral is mathematically equal to the probability of the variable z being z being less than or equal to the expression above, or: P (t
≤ t0 ) ≡ P ( z ≤ z0 ) t − µ z 0 = 0
These “cumulative probability” values are tabulated in a number of different texts for various values of z . It also turns out that the probability of a value being less than some other is equal to one minus the probability of it being bigger than the other: P (u
≤ q) = 1 − P(u > q )
Equations (2) and (3) can be used to calculate the probability that some value is between two other values. For example, the probability that the mean is between a and a and b is b is equal to the probability that it is less than or equal to b minus the probability that it is less than or equal to a : P (a
≤ t ≤ b ) = P ( t ≤ b ) − P ( t ≤ a)
Confidence Intervals. Intervals . So with (2) and (4) we can use the tabulated values for z a =
a − µ b − µ and z b = σ σ
to calculate the probability of a < t < t < b : P (a
b−µ a − µ ≤ t ≤ b ) = P z ≤ − P z ≤ σ σ = Pzb − P za
The standard normal and cumulative standard normal distributions are widely published in t ables. Additionally they are readily available in Microsoft Excel with the functions NORMDIST() and NORMSDIST(). The first computes the values for either a normal or standard normal distribution, the latter only for the cumulative standard normal distribution.
The “integral” is simply a very precise way of adding things up. It “integrates” a range of incremental probabilities into a cumulative probability value. In a discrete problem it is represented by the summation symbol, the Greek capital S: Σ. The difference between a summation and an integral is essentially only the size of the samples being added up. So if we add up the incremental probabilities of a number being just so, we get the total probability of a number being less than or equal to the last “just so” value.
where Pza and Pzb are obtained from the Cumulative Standard Normal Distribution tables. What this means is that for any Normally distributed set of data we can calculate the likelihood of its mean value being between two arbitrarily chosen values a and a and b . Application to Evaluating Simulated Data . We can collect data on a simulation engine for a range of parameters, collect data on live events for the same parameters, and use the process above to compare the closeness of the simulation simulation data to the live data. For example, if we have a mean time to detect of 30 minutes in the simulation data and a mean time of 50 minutes from the live data, we can use the confidence interval to calculate the probability that the simulation mean is within range of the population mean as established by the sampling of live data, that is, is 30 minutes within the 100(1- α)% interval about the sample mean of 50 minutes. If it is not in that interval then we need to ask questions about why it is not, and what it would take to get it into that range. This does not, however, immediately answer the question about sample size needed to reach these conclusions. At this point the literature gets really arcane. However, we can simplify it. Recall that the sample and population statistics are related by a normal distribution for the mean and a Chi-squared distribution for the variance. These relationships allow us, for a given probability of containment, to calculate a confidence interval for the mean under different conditions. conditions. Specifically (and this is turn-the-crank turn-the-crank stuff for an Operations Research Research / Systems Analysis [ORSA] person): Unknown population distribution, known population mean and variance.
a. Select a desired probability probability of containment P . b. Calculate the resulting probability probability of exclusion α / 2 = c.
1 − P α = . 2 2
Calculate the sample mean t
d. Calculate the sample size n .
e. Calculate the value of z for z for that exclusion probability: z a / 2 f.
1 − α − µ = 2 σ
Calculate the interval within which the population mean is contained with a probability of P = P = (1-α) from: x−
zα / 2σ n
≤ µ ≤ x +
z α / 2σ n
Normal population distribution, known population mean, unknown variance.
a. Select a desired desired probability of containment P . b. Calculate the resulting probability probability of exclusion α / 2 = c.
1 − P α = . 2 2
Calculate the sample mean t and variance S
d. Calculate the degrees of freedom = n – n – 1 from a sample size n . e. Use t- distribution distribution tables to get the percentage point ω of the t -distribution -distribution with n -1 -1 degrees of freedom and exclusion probability probability α /2 f.
Use the Chi-squared tables to get the percentage point ξα/2 and ξ(1-α /2) of the Chi-squareddistribution with n -1 -1 degrees of freedom for both exclusion probabilities α /2 and (1-α /2)
g. Calculate the interval within which the population mean is contained with a probability probabilit y of P = P = (1-α) from: 4
≤ µ ≤ t +
h. Calculate the interval within which the population variance is contained with a probability of P = P = (1-α) from:
( n − 1) S 2 ζα 2
≤σ ≤ 2
( n − 1) S 2 ζ (1−α 2)
Evaluating the Validity of Simulation Data . We can use these methods to evaluate the validity of the simulation data. Specifically: 2
a. Calculate the simulation data statistics t and S . 2
b. Use standard methods to extrapolate to population µ and σ c.
Postulate live sample sizes n and n and sample mean t and variance S
d. Calculate the population mean and variance confidence levels as described. e. If the simulation simulation data mean and and variance do not not fall in those intervals intervals then take steps to calibrate the program to have the results reflect the real data. Alternative Approaches. Approaches . There are, of course, many other methods of comparing data. “Curve Fitting” (regression theory) can be used to devise a closed form equation that describes the data. The equations can then be compared to evaluate their similarities. Another interesting approach is the Kolmogorov9 Smirnov test. The Kolmogorov-Smirnov D is a particularly simple measure: It is defined as the maximum value of the absolute difference between two cumulative distribution functions. This allows direct comparison between two sets of data after constructing a synthetic cumulative distribution function for each set. The simplicity of this approach is that no assumptions need be made about the actual distribution of the sample sets, you simply perform an absolute distance test between curves that represent each set. Conclusion. Conclusion. We can use statistical methods to evaluate the goodness of fit between simulation engines and real life. This requires the operator to select a probability of compliance that feels good, e.g., e.g., 98%. With that value of probability of inclusion ( P) we can calculate the probability of exclusion ( α), and with 2 that and the sample size n calculate n calculate the interval (in terms of the experimental average t and variance S ) within which the simulation data must fall to be representative of real life. We can also use a variety of other methods to establish a confidence of similarity between separate data sets.