User:Thepigdog/Prediction from data

The basis for prediction is inductive probabilities. Inductive probabilities are built on Occam's Razor. The simplest explanation (that explains all the facts) is the most probable to be the correct explanation. "The simplest" means the shortest description of the facts using the language and knowledge that we have.

"Simplest" can only be defined in terms of the language used to describe events. This language encodes descriptions using prior probabilities. These prior probabilities are the same as the priors in Bayes law.

Inductive reasoning only detects patterns in the world. There is no assignment of meaning. There is not even cause and effect. There only the patterns, which may be used to predict the future.

Relative probability
Most probabilities are relative probabilities between a limited number of alternatives. An alternative that explains the facts with 1 bit less of data than another alternative is twice as likely to be correct.

Absolute probabilities
An absolute or true probability may only be determined by considering every possibility, at least to the point of measuring the number of possibilities. For this reason measuring absolute probabilities is very difficult.

Absolute probabilities are formed from relative probabilities, by summing the relative probabilities of all alternatives, in order to normalize them.

Classical probabilities
Most classical probability is not inductive. There are inherent assumptions which often have no basis. Consider for example, a "fair coin". A toss of the coin yields heads or tails with equal probability. But why do we regard the coin as fair. How do we know that the coin is not weighted. Without understanding the mechanism of the coin how do we know that a complex message is not hidden in the data.

A theoretical basis for classical probability demands first, a series of heads and tails, that can not be compressed. From this basis Bayesian probability may be deduced, which describes the same principle Occams's razor.

Probability as information
Suppose a theory A can be represented with I(A) bits of information, and using this theory, a theory B can be represented with I(B/A) bits of information. Then to represent both theories,
 * A and B can be represented with $$I(A \land B) = I(A) + I(B/A) $$ bits of information.

but A and B may be exchanged to give
 * A and B can be represented with $$I(A \land B) = I(B) + I(A/B) $$ bits of information.

So Bayes law is,
 * $$I(A) + I(B/A) = I(B) + I(A/B) $$

using $$P(A) = I(A)^{-2} $$
 * $$P(A) * P(B/A) = P(B) + I(A/B) $$

Inherit instability and potential for bias
As current probabilities effect future probabilities there is always the possibility for a feed back of bias in the probabilities. Also actions are based on current probabilities. Actions and probabilities may reinforce themselves to reach a point that would not be reached from a slightly different starting point.

Extreme Probabilities
Relative probabilities may become so extreme so that they may be approximated as certainties. From this approximation facts are built. But the basis, or starting point, is inductive probabilities.

Inherit instability and potential for bias
As current probabilities effect future probabilities there is always the possibility for a feed back of bias in the probabilities. Also actions are based on current probabilities. Actions and probabilities may reinforce themselves to reach a point that would not be reached from a slightly different starting point.

Extreme Probabilities
Relative probabilities may become so extreme so that they may be approximated as certainties. From this approximation facts are built. But the basis, or starting point, is inductive probabilities.

Knowledge Coverage
???? Not covered properly ????

How do you measure the amount of knowledge you have? I may know the probability of a prediction but how do I measure the uncertainty of that probability. The probability may be based on little data.