User:Grw66/sandbox3

Use this page to edit interval estimates page for statistics final project

In statistics, interval estimation is the use of sample data to estimate an interval of possible values of a parameter of interest. This is in contrast to point estimation, which gives a single value.

The most prevalent forms of interval estimation are statistics-based. The most common forms are confidence intervals (a frequentist method) and credible intervals (a Bayesian method). Less common statistics-based interval estimate forms include likelihood intervals, fiducial intervals, tolerance intervals, and prediction intervals. For a non-statistical method, interval estimates can be deduced from fuzzy logic.

Definition
Here, I should define what it means to be statistics-based interval estimation.

Confidence Intervals
Confidence intervals are used to estimate the parameter of interest from a sampled data set, commonly the mean or standard deviation. A common misconception of confidence intervals is 100γ% of the data set fits within or above/below the bounds, this is referred to as a tolerance interval, which is discussed below.

Credible Intervals
Show a different figure. Possibly make figure that differentiates between confidence interval and a credible interval. ￼

Tolerance
Use collected data set population to obtain an interval, within tolerance limits, containing 100γ% values. Typically, examples used in describing tolerance intervals include manufacturing. In this context, a percentage of an existing product set is evaluated to ensure that a percentage of the population is included within tolerance limits.

For tolerance intervals, the bounds can be written in terms of an upper and lower tolerance limit, utilizing the sample mean, $$\mu$$, and the sample standard deviation, s.

$$(l_b, u_b) = \mu \pm k_2s$$

Where,

$$k_2 = z_{(1+p)/2}\sqrt{\frac{\nu(1+\frac{1}{N})}{\chi_{1-\alpha,\nu}^2}}$$

$$\chi _{1-\alpha,\nu}^2$$ critical value of the chi-square distribution utilizing $$\nu$$ degrees of freedom that is exceeded with probability \alpha.

In a normally distributed data set

Prediction
Use prior data set population to obtain an interval containing future samples with some confidence, γ. These intervals are typically used in regression data sets.

Non-Statistics Based Interval Estimation
Insert short discussion of fuzzy logic, possibly include a figure.

One-Sided vs. Two-Sided
Two-sided intervals estimate a parameter of interest, Θ, with a level of confidence, γ, using a lower ($$l_b$$) and upper bound ($$u_b$$). Examples may include estimating the average height of males in a geographic region or lengths of a particular desk made by a manufacturer. These cases tend to estimate the central value of a parameter. Typically, this is presented in a form similar to the equation below.

$$P(l_b < \Theta < u_b) = \gamma$$

Differentiating from the two-sided interval, the one-sided interval utilizes a level of confidence, γ, to construct a minimum or maximum bound which predicts the parameter of interest to γ*100% probability. Typically, a one-sided interval is required when the estimate's minimum or maximum bound is not of interest. When concerned about the minimum predicted value of Θ, one is no longer required to find an upper bounds of the estimate, leading to a form reduced form of the two-sided.

$$P(l_b < \Theta) = \gamma$$

As a result of removing the upper bound and maintaining the confidence, the lower-bound ($$l_b$$) will increase. Likewise, when concerned with finding only an upper bound of a parameter's estimate, the upper bound will decrease. A one-sided interval is a commonly found in material production's quality assurance, where an expected value of a material's strength, Θ, must be above a certain minimum value ($$l_b$$) with some confidence (100γ%). In this case, the manufacturer is not concerned with producing a product that is too strong, there is no upper-bound ($$u_b$$).

Caution Using and Building Estimates
When determining the significance of a parameter, it is best to understand the data and its collection methods. Before collecting data, an experiment should be planned such that the uncertainty of the data is sample variability, as opposed to a statistical bias. After experimenting, a typical first step in creating interval estimates is plotting using various graphical methods. From this, one can determine the distribution of samples from the data set. Producing interval boundaries with incorrect assumptions based on distribution makes a prediction faulty.

When interval estimates are reported, they should have a commonly held interpretation within and beyond the scientific community. In this regard, credible intervals are held to be most readily understood by the general public[citation needed]. Interval estimates derived from fuzzy logic have much more application-specific meanings.

In commonly occurring situations there should be sets of standard procedures that can be used, subject to the checking and validity of any required assumptions. This applies for both confidence intervals and credible intervals. However, in more novel situations there should be guidance on how interval estimates can be formulated. In this regard confidence intervals and credible intervals have a similar standing but there two differences. First, credible intervals can readily deal with prior information, while confidence intervals cannot. Secondly, confidence intervals are more flexible and can be used practically in more situations than credible intervals: one area where credible intervals suffer in comparison is in dealing with non-parametric models.

There should be ways of testing the performance of interval estimation procedures. This arises because many such procedures involve approximations of various kinds and there is a need to check that the actual performance of a procedure is close to what is claimed. The use of stochastic simulations makes this is straightforward in the case of confidence intervals, but it is somewhat more problematic for credible intervals where prior information needs to be taken properly into account. Checking of credible intervals can be done for situations representing no-prior-information but the check involves checking the long-run frequency properties of the procedures.

Severini (1991) discusses conditions under which credible intervals and confidence intervals will produce similar results, and also discusses both the coverage probabilities of credible intervals and the posterior probabilities associated with confidence intervals.

In decision theory, which is a common approach to and justification for Bayesian statistics, interval estimation is not of direct interest. The outcome is a decision, not an interval estimate, and thus Bayesian decision theorists use a Bayes action: they minimize expected loss of a loss function with respect to the entire posterior distribution, not a specific interval.

Applications
Applications of confidence intervals are used to solve a variety of problems dealing with uncertainty. Katz (1975) proposes various challenges and benefits for utilizing interval estimates in legal proceedings. For use in medical research, Altmen (1990) discusses the use of confidence intervals and guidelines towards using them. It is also common to find to see interval estimates to estimate a product life. Meeker and Escobar (1998) present methods to analyze reliability data under parametric and nonparametric estimation, including the prediction of future, random variables (prediction intervals).