User:Vivohobson

At the heart of the universal prior is an abstract model of a computer, such as a universal Turing machine. Any abstract computer will do, as long as it is Turing-complete, i.e. every finite binary string has at least one program that will compute it on the abstract computer.

The abstract computer is used to give precise meaning to the phrase `simple explanation'. In the formalism used, explanations, or theories of phenomena are computer programs that generate observation strings when run on the abstract computer. A simple explanation is a short computer program; a complex explanation is a long computer program. Simple explanations are more likely, so a high-probability observation string is one generated by a short computer program, or perhaps by any of a large number of slightly longer computer programs. A low-probability observation string is one that can only be generated by a long computer program.

These ideas can be made specific and the probabilities used to construct a prior probability distribution for the given observation. Solomonoff's main reason for inventing this prior is so that it can be used in Bayes' rule when the actual prior is unknown, enabling prediction under uncertainty. It predicts the most likely continuation of that observation, and provides a measure of how likely this continuation will be.

Although the universal probability of an observation (and it's extension) is incomputible, there is a computer algorithm, Levin Search, which, when run for longer and longer periods of time, will generate a sequence of approximations which converge to the Universal probability distribution.

Solomonoff proved this distribution to be machine-invariant within a constant factor (called the invariance theorem).

Solomonoff invented the concept of algorithmic probability with its associated invariance theorem around 1960.â¶ He clarified these ideas more fully in 1964 with two more publications.â¸â¹

A special mathematical object called a universal Turing machine is used to compute, quantify and assign codes to all quantities of interest. The universal prior is taken over the class of all computable measures; no hypothesis will have a zero probability.

Algorithmic probability combines Occam's razor and the principle of multiple explanations by giving a probability value to each hypothesis (algorithm or program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses (longer programs) receiving increasingly small probabilities. These probabilities form a prior probability distribution for the observation, which Ray Solomonoff proved to be machine-invariant within a constant factor (called the invariance theorem) and can be used with Bayes' theorem to predict the most likely continuation of that observation. A universal Turing machine is used for the computer operations.

In algorithmic information theory, algorithmic (Solomonoff) probability is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his  general theory of inductive inference, Solomonoff uses the prior obtained by this formula, in Bayes' rule for prediction.

In the mathematical formalism used, the observations have the form of finite binary strings, and the universal prior is a probability distribution over the set of finite binary strings. The prior is universal in the Turing-computability sense, i.e. no string has zero probability. It is not computable, but it can be approximated.

In algorithmic information theory, algorithmic (Solomonoff) probability is a mathematical method of assigning a prior probability to a given observation. In a theoretic sense, the prior is universal. It is used in inductive inference theory, and analyses of algorithms. Since it is not computable, it must be approximated.

Overview
Algorithmic probability deals with the questions: Given a body of data about some phenomenon that one wants to understand, how can one select the most probable hypothesis of how it was caused from among all possible hypotheses, how can one evaluate the different hypotheses, and how can one predict future data?

Among Solomonoff's inspirations for the Algorithmic probability were Occam's razor and [Epicurus#Epistemology|Epicurus' principle of multiple explanations]]. These are essentially two different non-mathematical approximations of the universal prior.

Occam's razor means 'among the theories that are consistent with the observed phenomena, one should select the simplest theory'.

Epicurus's Principle of Multiple Explanations proposes that `if more than one theory is consistent with the observations, keep all such theories'.

At the heart of the universal prior is an abstract model of a computer, such as a universal Turing machine. Any abstract computer will do, as long as it is Turing-complete, i.e. every finite binary string has at least one program that will compute it on the abstract computer.

The abstract computer is used to give precise meaning to the phrase `simple explanation'. In the formalism used, explanations, or theories of phenomena are computer programs that generate observation strings when run on the abstract computer. A simple explanation is a short computer program; a complex explanation is a long computer program. Simple explanations are more likely, so a high-probability observation string is one generated by a short computer program, or perhaps by any of a large number of slightly longer computer programs. A low-probability observation string is one that can only be generated by a long computer program.

Algorithmic probability combines several ideas: Occam's razor; Epicurus' principle of multiple explanations; and special coding methods from modern computing theory. The prior obtained from the formula is used in Bayes rule for prediction.

In contrast, Epicurus had proposed the Principle of Multiple Explanations: if more than one theory is consistent with the observations, keep all such theories.

A special mathematical object called a universal Turing machine is used to compute, quantify and assign codes to all quantities of interest. The universal prior is taken over the class of all computable measures; no hypothesis will have a zero probability.

Algorithmic probability combines Occam's razor and the principle of multiple explanations by giving a probability value to each hypothesis (algorithm or program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses (longer programs) receiving increasingly small probabilities. These probabilities form a prior probability distribution for the observation, which Ray Solomonoff proved to be machine-invariant within a constant factor (called the invariance theorem) and can be used with Bayes' theorem to predict the most likely continuation of that observation. A universal Turing machine is used for the computer operations.

Solomonoff invented the concept of algorithmic probability with its associated invariance theorem around 1960, publishing a report on it: "A Preliminary Report on a General Theory of Inductive Inference." He clarified these ideas more fully in 1964 with "A Formal Theory of Inductive Inference," Part I and Part II.

He described a universal computer with a randomly generated input program. The program computes some possibly infinite output. The universal probability distribution is the probability distribution on all possible output strings with random input.

The algorithmic probability of any given finite output prefix q is the sum of the probabilities of the programs that compute something starting with q. Certain long objects with short programs have high probability.

Algorithmic probability is the main ingredient of Solomonoff's theory of inductive inference, the theory of prediction based on observations; it was invented with the goal of using it for machine learning; given a sequence of symbols, which one will come next? Solomonoff's theory provides an answer that is optimal in a certain sense, although it is incomputable. Unlike, for example, Karl Popper's informal inductive inference theory, Solomonoff's is mathematically rigorous.

Algorithmic probability is closely related to the concept of Kolmogorov complexity. Kolmogorov's introduction of complexity was motivated by information theory and problems in randomness, while Solomonoff introduced algorithmic complexity for a different reason: inductive reasoning. A single universal prior probability that can be substituted for each actual prior probability in Bayes’s rule was invented by Solomonoff with Kolmogorov complexity as a side product.

Solomonoff's enumerable measure is universal in a certain powerful sense, but the computation time can be infinite. One way of dealing with this issue is a variant of Leonid Levin's Search Algorithm, which limits the time spent computing the success of possible programs, with shorter programs given more time. Other methods of limiting the search space include training sequences.

Key people

 * Ray Solomonoff
 * Andrey Kolmogorov

Overview
Algorithmic probability deals with the questions: Given a body of data about some phenomenon that one wants to understand, how can one select the most probable hypothesis of how it was caused from among all possible hypotheses, how can one evaluate the different hypotheses, and how can one predict future data?

Algorithmic probability combines several ideas: Occam's razor; Epicurus' principle of multiple explanations; and special coding methods from modern computing theory. The prior obtained from the formula is used in Bayes rule for prediction.

Occam's razor means 'among the theories that are consistent with the observed phenomena, one should select the simplest theory'.

In contrast, Epicurus had proposed the Principle of Multiple Explanations: if more than one theory is consistent with the observations, keep all such theories.

A special mathematical object called a universal Turing machine is used to compute, quantify and assign codes to all quantities of interest. The universal prior is taken over the class of all computable measures; no hypothesis will have a zero probability.

Algorithmic probability combines Occam's razor and the principle of multiple explanations by giving a probability value to each hypothesis (algorithm or program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses (longer programs) receiving increasingly small probabilities. These probabilities form a prior probability distribution for the observation, which Ray Solomonoff proved to be machine-invariant within a constant factor (called the invariance theorem) and can be used with Bayes' theorem to predict the most likely continuation of that observation. A universal Turing machine is used for the computer operations.

Solomonoff invented the concept of algorithmic probability with its associated invariance theorem around 1960, publishing a report on it: "A Preliminary Report on a General Theory of Inductive Inference." He clarified these ideas more fully in 1964 with "A Formal Theory of Inductive Inference," Part I and Part II.

He described a universal computer with a randomly generated input program. The program computes some possibly infinite output. The universal probability distribution is the probability distribution on all possible output strings with random input.

The algorithmic probability of any given finite output prefix q is the sum of the probabilities of the programs that compute something starting with q. Certain long objects with short programs have high probability.

Algorithmic probability is the main ingredient of Solomonoff's theory of inductive inference, the theory of prediction based on observations; it was invented with the goal of using it for machine learning; given a sequence of symbols, which one will come next? Solomonoff's theory provides an answer that is optimal in a certain sense, although it is incomputable. Unlike, for example, Karl Popper's informal inductive inference theory, Solomonoff's is mathematically rigorous.

Algorithmic probability is closely related to the concept of Kolmogorov complexity. Kolmogorov's introduction of complexity was motivated by information theory and problems in randomness, while Solomonoff introduced algorithmic complexity for a different reason: inductive reasoning. A single universal prior probability that can be substituted for each actual prior probability in Bayes’s rule was invented by Solomonoff with Kolmogorov complexity as a side product.

Solomonoff's enumerable measure is universal in a certain powerful sense, but the computation time can be infinite. One way of dealing with this issue is a variant of Leonid Levin's Search Algorithm, which limits the time spent computing the success of possible programs, with shorter programs given more time. Other methods of limiting the search space include training sequences.

Key people

 * Ray Solomonoff
 * Andrey Kolmogorov

Planning the Summer Research Project: The Proposal
In the early 1950s, there were various names for the field of "thinking machines" such as cybernetics, automata theory, and complex information processing These indicate how different the ideas were on what such machines would be like.

In 1955 John McCarthy,John McCarthy then a young Assistant Professor of Mathematics at Dartmouth College, decided to organize a group to clarify and develop ideas about thinking machines. He picked the name 'Artificial Intelligence' for the new field. He chose the name partly for its neutrality; avoiding a focus on narrow automata theory, and avoiding cybernetics which was heavily focused on analog feedback, as well as him potentially having to accept the assertive Norbert Wiener as guru or having to argue with him.

In early 1955, McCarthy approached the Rockefeller Foundation to request funding for a summer seminar at Dartmouth for about 10 participants. In June, he and Claude Shannon, a founder of Information Theory then at Bell Labs, met with Robert Morison, Director of Biological and Medical Research to discuss the idea and possible funding, though Morison, was unsure whether money would be made available for such a visionary project.

On September 2, 1955, the project was formally proposed by McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon. The proposal is credited with introducing the term 'artificial intelligence'.

The Proposal states We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer. The proposal goes on to discuss computers, natural language processing, neural networks, theory of computation, abstraction and creativity (these areas within the field of artificial intelligence are considered still relevant to the  work of the field).

On May 26, 1956, McCarthy notified Robert Morison of the planned 11 attendees:

For the full period: 1) Dr. Marvin Minsky 2) Dr. Julian Bigelow 3) Professor D.M. Mackay 4) Mr. Ray Solomonoff 5) Mr. John Holland 6) Mr. John McCarthy.

For four weeks: 7) Dr. Claude Shannon 8) Mr. Nathanial Rochester 9) Mr. Oliver Selfridge.

For the first two weeks: 10) Mr. Allen Newell 11) Professor Herbert Simon.

He noted, ``We will concentrate on a problem of devising a way of programming a calculator to form concepts and to form generalizations. This of course is subject to change when the group gets together.

The actual participants came at different times, mostly for much shorter times. Trenchard More replaced Rochester for three weeks and MacKay and Holland did not attend --- but the project was set to begin. Around June 18, 1956, the earliest participants (perhaps only Ray Solomonoff, maybe with Tom Etter) arrived at the Dartmouth campus in Hanover, N.H., to join John McCarthy who already had an apartment there. Ray and Marvin stayed at Professors' apartments, but most would stay at the Hanover Inn.

When Did It Happen?
The Dartmouth Workshop is said to have run for six weeks in the summer of 1956. Ray Solomonoff's notes written during the Workshop time, 1956, however, say it ran for ``roughly eight weeks, from about June 18 to August 17. Solomonoff's Dartmouth notes start on June 22; June 28 mentions Minsky, June 30 mentions Hanover, N.H., July 1 mentions Tom Etter. On August 17, Ray gave a final talk.

Who Was There?
Unfortunately McCarthy lost his list of attendees! Instead, after the Dartmouth Project McCarthy sent Ray a preliminary list of participants and visitors plus those interested in the subject. There are 47 people listed.

Solomonoff, however, made a complete list in his notes of the summer project: 1) Ray Solomonoff   2) Marvin Minsky 3) John McCarthy   4) Claude Shannon 5) Trenchard More 6) Nat Rochester 7) Oliver Selfridge 8) Julian Bigelow 9) W. Ross Ashby 10) W.S. McCulloch 11) Abraham Robinson 12) Tom Etter 13) John Nash 14) David Sayre 15) Arthur Samuel 16) Shoulders 17) Shoulder's friend 18) Alex Bernstein 19) Herbert Simon 20) Allen Newell

Shannon attended Ray's talk on July 10 and Bigelow gave a talk on August 15. Ray doesn't mention Bernard Widrow, but apparently he visited, along with W.A. Clark and B.G. Farley. Trenchard mentions R. Culver and Ray mentions Bill Shutz. Herb Gelernter didn't attend, but was influenced later by what Rochester learned. Gloria Minsky also commuted there (with their part-beagle dog, Senje, who would start out in the car back seat and end up curled around her like a scarf), and attended some sessions (without Senje).

Ray Solomonoff, Marvin Minsky, and John McCarthy were the only three who stayed for the full time. Trenchard took attendence during two weeks of his three week visit. From three to about eight people would attend the daily sessions.

The Meetings and Some Results
They had the entire top floor of the Dartmouth Math Department to themselves, and most weekdays they would meet at the main math classroom where someone might lead a discussion focusing on his ideas, or more frequently, a general discussion would be held.

It was not a directed group research project, discussions convered many topics but several directions are considered to have been initiated or encouraged by the Workshop: the rise of symbolic methods, systems focussed on limited domains (early Expert Systems), and deductive systems versus inductive systems. One participant, Arthur Samuel said, "It was very interesting, very stimulating, very exciting".

Ray Solomonoff kept notes during the summer giving his impression of the talks and the ideas from various discussions. These are available, along with other notes concerning the Dartmouth Summer Research Project on AI, at: http://raysolomonoff.com/dartmouth/

Planning the Project
Organised by John McCarthy (then at Dartmouth College) and formally proposed by McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon, the proposal is credited with introducing the term 'artificial intelligence'.

The participants included Ray Solomonoff, Oliver Selfridge, Trenchard More, Arthur Samuel, Allen Newell and Herbert A. Simon. On May 26, 1956, McCarthy notified Robert Morison of the 11 attendees:

\textbf{for the full period:} 1) Dr. Marvin Minsky 2)\ Dr. Julian Bigelow 3)\ Professor D.M. Mackay 4)\ Mr. Ray Solomonoff 5)\ Mr. John Holland 6)\ Mr. John McCarthy.

\textbf{for four weeks:} 7) Dr. Claude Shannon, 8) Mr. Nathanial Rochester, 9)\ Mr. Oliver Selfridge.

\textbf{for the first two weeks:} 10) Mr. Allen Newell and 11) Professor Herbert Simon.

He noted, ``We will concentrate on a problem of devising a way of programming a calculator to form concepts and to form generalizations. This of course is subject to change when the group gets together.''\cite{mcc:56mccmor}

The actual participants came at different times, mostly for much shorter times. Trenchard More replaced Rochester for three weeks, and MacKay and Holland did not attend --- but the project was set to begin. So around June 18, 1956, the earliest participants (perhaps only Ray, maybe with Tom Etter) arrived at the Dartmouth campus in Hanover, N.H., to join John McCarthy who already had an apartment there. Ray and Marvin stayed at Professors' apartments, but most would stay at the Hanover Inn.

Founding statement
The project lasted a month, and it was essentially an extended brainstorming session. The introduction states:

We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.

(McCarthy et al. 1955)

The proposal goes on to discuss computers, natural language processing, neural networks, theory of computation, abstraction and creativity (these areas within the field of artificial intelligence are considered still relevant to the  work of the field). According to Stottler Henke Associates, besides the proposal's authors, attendees at the conference included Ray Solomonoff, Oliver Selfridge, Trenchard More, Arthur Samuel, Herbert A. Simon, and Allen Newell.

foundations
begin foundations of mathematics-

1960-1967: The beginning of Algorithmic Information Theory. In 1960, 1964, Ray Solomonoff publishes Algorithmic probability and Solomonoff Prediction (Theory of Inductive Inference) which connect probability to program length, using concepts of Occam's Razor and Epicurus' Theory of Multiple explanations. He establishes a Universal Prior that can be used in Bayes rule of causation for prediction. In 1965 Andrey Kolmogorov publishes his version of Occam's Razor, which becomes known as Kolmogorov Complexity. In 1968 Gregory Chaitin publishes his version of complexity, similar to that of Kolmogorov. All three, Solomonoff, Kolmogorov and Chaitin are founders of Algorithmic Information Theory.

from foundations of MAthematics

More paradoxes
1920: Thoralf Skolem corrected Löwenheim's proof of what is now called the downward Löwenheim-Skolem theorem, leading to Skolem's paradox discussed in 1922 (the existence of countable models of ZF, making infinite cardinalities a relative property).

1922: Proof by Abraham Fraenkel that the axiom of choice cannot be proved from the axioms of Zermelo's set theory with urelements.

1931: Publication of Gödel's incompleteness theorems, showing that essential aspects of Hilbert's program could not be attained. It showed how to construct, for any sufficiently powerful and consistent recursively axiomatizable system – such as necessary to axiomatize the elementary theory of arithmetic on the (infinite) set of natural numbers – a statement that formally expresses its own unprovability, which he then proved equivalent to the claim of consistency of the theory; so that (assuming the consistency as true), the system is not powerful enough for proving its own consistency, let alone that a simpler system could do the job. It thus became clear that the notion of mathematical truth can not be completely determined and reduced to a purely formal system as envisaged in Hilbert's program. This dealt a final blow to the heart of Hilbert's program, the hope that consistency could be established by finitistic means (it was never made clear exactly what axioms were the "finitistic" ones, but whatever axiomatic system was being referred to, it was a 'weaker' system than the system whose consistency it was supposed to prove).

1936: Alfred Tarski proved his truth undefinability theorem.

1936: Alan Turing proved that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist.

1938: Gödel proved the consistency of the axiom of choice and of the Generalized Continuum-Hypothesis.

1936 - 1937: Alonzo Church and Alan Turing, respectively, published independent papers showing that a general solution to the Entscheidungsproblem is impossible: the universal validity of statements in first-order logic is not decidable (it is only semi-decidable as given by the completeness theorem).

1955: Pyotr Novikov showed that there exists a finitely presented group G such that the word problem for G is undecidable.

1963: Paul Cohen showed that the Continuum Hypothesis is unprovable from ZFC. Cohen's proof developed the method of forcing, which is now an important tool for establishing independence results in set theory.

1960-1967: The beginning of Algorithmic Information Theory. In 1960, 1964, Ray Solomonoff publishes Algorithmic probability and Solomonoff Prediction (Theory of Inductive Inference) which connect probability to program length, using concepts of Occam's Razor and Epicurus' Theory of Multiple explanations. He establishes a Universal Prior that can be used in Bayes rule of causation for prediction. In 1965 Andrey Kolmogorov publishes his version of Occam's Razor, which becomes known as Kolmogorov Complexity. In 1968 Gregory Chaitin publishes his version of complexity, similar to that of Kolmogorov. All three, Solomonoff, Kolmogorov and Chaitin are founders of Algorithmic Information Theory.

1960-68: Inspired by the fundamental randomness in physics, in 1968 Gregory Chaitin starts publishing results on Algorithmic Information theory (measuring incompleteness and randomness in mathematics). The beginning of Algorithmic Information Theory. Prior to this, In 1960, 1964, Ray Solomonoff publishes Algorithmic probability and Solomonoff Prediction (Theory of Inductive Inference) which connect probability to program length, using concepts of Occam's Razor and Epicurus' Theory of Multiple explanations. He establishes a Universal Prior that can be used in Bayes rule of causation for prediction. In 1965 Andrey Kolmogorov publishes his version of Occam's Razor, which becomes known as Kolmogorov Complexity. In 1968 Gregory Chaitin publishes his version of complexity, similar to that of Kolmogorov. All three, Solomonoff, Kolmogorov and Chaitin are founders of Algorithmic Information Theory.

1966: Paul Cohen showed that the axiom of choice is unprovable in ZF even without urelements.

1970: Hilbert's tenth problem is proven unsolvable: there is no recursive solution to decide whether a Diophantine equation (multivariable polynomial equation) has a solution in integers.

1971: Suslin's problem is proven to be independent from ZFC.

-

inductive inference
begin sol theory of inductive inference

Solomonoff's theory of universal inductive inference is a theory of prediction based on logical observations, such as predicting the next symbol based upon a given series of symbols. The only assumption that the theory makes is that the environment follows some unknown but computable probability distribution. It is a mathematical formalization of Occam's razor    and the Principle of Multiple Explanations.

Prediction is done using a completely Bayesian framework. The universal prior is taken over the class of all computable sequences—this is the universal a priori probability distribution; no computable hypothesis will have a zero probability. This means that Bayes rule of causation can be used in predicting the continuation of any particular computable sequence.

Philosophical
The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960. It is a mathematically formalized combination of Occam's razor. and the Principle of Multiple Explanations. All computable theories which perfectly describe previous observations are used to calculate the probability of the next observation, with more weight put on the shorter computable theories. Marcus Hutter's universal artificial intelligence builds upon this to calculate the expected value of an action.

Mathematical
The proof of the "razor" is based on the known mathematical properties of a probability distribution over a denumerable set. These properties are relevant because the infinite set of all programs is a denumerable set. The sum S of the probabilities of all programs must be exactly equal to one (as per the definition of probability) thus the probabilities must roughly decrease as we enumerate the infinite set of all programs, otherwise S will be strictly greater than one. To be more precise, for every $$\epsilon$$ > 0, there is some length l such that the probability of all programs longer than l is at most $$\epsilon$$. This does not, however, preclude very long programs from having very high probability.

Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity. The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion.

--- end start of sol induction

beginning of alp

In algorithmic information theory, algorithmic (Solomonoff) probability is a mathematical method of assigning a prior probability to a given observation. In a theoretic sense, the prior is universal. It is used in inductive inference theory, and analyses of algorithms. Since it is not computable, it must be approximated.

It deals with the questions: Given a body of data about some phenomenon that one wants to understand, how can one select the most probable hypothesis of how it was caused from among all possible hypotheses, how can one evaluate the different hypotheses, and how can one predict future data?

Algorithmic probability combines several ideas: Occam's razor; Epicurus' principle of multiple explanations; and the concept of a Universal Prior, special coding methods from modern computing theory which Solomonoff uses to establish a Universal Prior for all possible .... The prior obtained from the formula is used in Bayes rule for prediction.

Occam's razor means 'among the theories that are consistent with the observed phenomena, one should select the simplest theory'.

In contrast, Epicurus had proposed the Principle of Multiple Explanations: if more than one theory is consistent with the observations, keep all such theories.

A special mathematical object called a universal Turing machine is used to compute, quantify and assign codes to all quantities of interest. The universal prior is taken over the class of all computable measures; no hypothesis will have a zero probability.

Algorithmic probability combines Occam's razor and the principle of multiple explanations by giving a probability value to each hypothesis (algorithm or program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses (longer programs) receiving increasingly small probabilities. These probabilities form a prior probability distribution for the observation, which Ray Solomonoff proved to be machine-invariant within a constant factor (called the invariance theorem) and can be used with Bayes' theorem to predict the most likely continuation of that observation. A universal Turing machine is used for the computer operations.

Solomonoff invented the concept of algorithmic probability with its associated invariance theorem around 1960, publishing a report on it: "A Preliminary Report on a General Theory of Inductive Inference." He clarified these ideas more fully in 1964 with "A Formal Theory of Inductive Inference," Part I and Part II.

He described a universal computer with a randomly generated input program. The program computes some possibly infinite output. The universal probability distribution is the probability distribution on all possible output strings with random input.

The algorithmic probability of any given finite output prefix q is the sum of the probabilities of the programs that compute something starting with q. Certain long objects with short programs have high probability.

Algorithmic probability is the main ingredient of Solomonoff's theory of inductive inference, the theory of prediction based on observations; it was invented with the goal of using it for machine learning; given a sequence of symbols, which one will come next? Solomonoff's theory provides an answer that is optimal in a certain sense, although it is incomputable. Unlike, for example, Karl Popper's informal inductive inference theory, Solomonoff's is mathematically rigorous.

Algorithmic probability is closely related to the concept of Kolmogorov complexity. Kolmogorov's introduction of complexity, was motivated by information theory and problems in randomness while Solomonoff introduced algorithmic complexity for a different reason: inductive reasoning. A single universal prior probability that can be substituted for each actual prior probability in Bayes’s rule was invented by Solomonoff with Kolmogorov complexity as a side product.

Solomonoff's enumerable measure is universal in a certain powerful sense, but the computation time can be infinite. One way of dealing with this is a variant of Leonid Levin's Search Algorithm, which limits the time spent computing the success of possible programs, with shorter programs given more time. Other methods of limiting the search space include training sequences.

Key people

 * Ray Solomonoff
 * Andrey Kolmogorov

Life history through 1964
Ray Solomonoff was born on July 25, 1926, in Cleveland, Ohio, son of the Russian immigrants Phillip Julius and Sarah Mashman Solomonoff. He attended Glenville High School, graduating in 1944. In 1944 he joined the United States Navy as Instructor in Electronics. From 1947-1951 he attended the University of Chicago, studying under Professors such as Rudolf Carnap and Enrico Fermi, and graduated with an M.S. in Physics in 1951.

From his earliest years he was motivated by the pure joy of mathematical discovery and by the desire to explore where no one had gone before. At age of 16, in 1942, he began to search for a general method to solve mathematical problems.

In 1952 he met Marvin Minsky, John McCarthy and others interested in machine intelligence. In 1956 Minsky and McCarthy and others organized the Dartmouth Summer Research Conference on Artificial Intelligence, where Ray was one of the original 10 invitees --- he, McCarthy, and Minsky were the only ones to stay all summer. It was for this group that Artificial Intelligence was first named as a science. Computers at the time could solve very specific mathematical problems, but not much else. Ray wanted to pursue a bigger question, how to make machines more generally intelligent, and how computers could use probability for this purpose.

Work history through 1964
He wrote three papers, two with Anatol Rapoport, in 1950-52, that are regarded as the earliest statistical analysis of networks.

He was one of the 10 attendees at the 1956 Dartmouth Summer Research Project on Artificial Intelligence. He wrote and circulated a report among the attendees: "An Inductive Inference Machine". It viewed machine learning as probabilistic, with an emphasis on the importance of training sequences, and on the use of parts of previous solutions to problems in constructing trial solutions for new problems. He published a version of his findings in 1957. These were the first papers to be written on probabilistic Machine Learning.

In the late 1950s, he invented probabilistic languages and their associated grammars. A probabilistic language assigns a probability value to every possible string. Generalizing the concept of probabilistic grammars led him to his discovery in 1960 of Algorithmic Probability and General Theory of Inductive Inference. As part of this work he also established the philosophical foundation that enables the use of Bayes rule for induction.

Prior to the 1960s, the usual method of calculating probability was based on frequency: taking the ratio of favorable results to the total number of trials. In his 1960 publication, and, more completely, in his 1964 publications, Solomonoff seriously revised this definition of probability. He called this new form of probability "Algorithmic Probability" and showed how to use it in a Bayesian framework for prediction in his theory of inductive inference.

The basic theorem of what was later called Kolmogorov Complexity was part of his General Theory. Writing in 1960, he begins: "Consider a very long sequence of symbols ...We shall consider such a sequence of symbols to be 'simple' and have a high a priori probability, if there exists a very brief description of this sequence - using, of course, some sort of stipulated description method. More exactly, if we use only the symbols 0 and 1 to express our description, we will assign the probability 2-N to a sequence of symbols if its shortest possible binary description contains  N digits."

The probability is with reference to a particular Universal Turing machine. Solomonoff showed and in 1964 proved that the choice of machine, while it could add a constant factor would not change the probability ratios very much. These probabilities are machine independent.

In 1965, the Russian mathematician Kolmogorov independently published similar ideas. When he became aware of Solomonoff's work, he acknowledged Solomonoff, and for several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was more concerned with randomness of a sequence. Algorithmic Probability and Universal (Solomonoff) Induction became associated with Solomonoff, who was focused on prediction - the extrapolation of a sequence.

Later in the same 1960 publication Solomonoff describes his extension of the single-shortest-code theory. This is Algorithmic Probability. He states: "It would seem that if there are several different methods of describing a sequence, each of these methods should be given some weight in determining the probability of that sequence."

Closely related is his idea of how this can be used in a Bayesian framework. The universal prior is taken over the class of all computable sequences; this is the universal a priori probability distribution; no hypothesis will have a zero probability. This means that Bayes rule of causation can be used in predicting the continuation of any particular sequence.

Algorithmic Probability uses a weighting based on the program length of each program that could produce a particular sequence, x: the shorter the program the more weight it is given.

In Inductive Inference, The universal probability distribution of that sequence functions by its sum to define the probability of the sequence, and by using the weight of individual programs to give a figure of merit to each program that could produce the sequence. The extrapolation of the next member in the sequence

Inductive inference, by adding up the weights of all models predictions of all models describing a particular sequence finds the probability of the sequence; using these weights based on the lengths of those models, gets the probability distribution for the extension of that sequence.

This is used with Bayes rule, to get the most accurate probability in predicting what is most likely to come next as the sequence, x, is extrapolated.

This theory of prediction has since become known as Solomonoff induction. It is also called Universal Induction, or the General Theory of Inductive Inference.

He then shows how this idea can be used to generate the universal a priori probability distribution and how it enables the use of Bayes rule in inductive inference. Inductive inference, by adding up the predictions of all models describing a particular sequence, using suitable weights based on the lengths of those models, gets the probability distribution for the extension of that sequence. This method of prediction has since become known as Solomonoff induction.

He enlarged his theory, publishing a number of reports leading up to the publications in 1964. The 1964 papers give a more detailed description of Algorithmic Probability and Solomonoff Induction, presenting 5 different models, including the model popularly called the Universal Distribution.

Later in the same 1960 publication Solomonoff describes his extension of the single-shortest-code theory. This is Algorithmic Probability. He states: "It would seem that if there are several different methods of describing a sequence, each of these methods should be given some weight in determining the probability of that sequence." He then shows how this idea can be used to generate the universal a priori probability distribution and how it enables the use of Bayes rule in inductive inference. Inductive inference, by adding up the predictions of all models describing a particular sequence, using suitable weights based on the lengths of those models, gets the probability distribution for the extension of that sequence. This method of prediction has since become known as Solomonoff induction.

He enlarged his theory, publishing a number of reports leading up to the publications in 1964. The 1964 papers give a more detailed description of Algorithmic Probability, and Solomonoff Induction, presenting 5 different models, including the model popularly called the Universal Distribution.

Closely related is the idea of using this in a Bayesian framework. The universal prior is taken over the class of all computable sequences; no hypothesis will have a zero probability. This means that Bayes rule of causation can be used in predicting the continuation of a particular sequence. Using program lengths of all programs that could produce a particular start of the sequence, x, Ray gets the prior distribution for x, and uses it in Bayes rule for the most accurate probabilities to predict what is most likely to come next as the sequence is extrapolated. The Universal Probability Distribution functions by its sum to define the probability of a sequence, and by using the weight of individual programs to give a figure of merit to each program that could produce the sequence. [11][12]

He enlarged his theory, publishing a number of reports leading up to the publications in 1964. The 1964 papers give a more detailed description of Algorithmic Probability, and Solomonoff Induction, presenting 5 different models, including the model popularly called the Universal Distribution.

to generate the universal a priori probability distribution and how it enables the use of Bayes rule in inductive inference. Inductive inference, by adding up the predictions of all models describing a particular sequence, using suitable weights based on the lengths of those models, gets the probability distribution for the extension of that sequence. This method of prediction has since become known as Solomonoff induction.

Closely related is the third idea of its use in a Bayesian framework. The universal prior is taken over the class of all computable measures; no hypothesis will have a zero probability. Using program lengths of all programs that could produce a particular start of the string, x, Ray gets the prior distribution for x, used in Bayes rule for accurate probabilities to predict what is most likely to come next as the start is extrapolated. The Universal Probability Distribution functions by its sum to define the probability of a sequence, and by using the weight of individual programs to give a figure of merit to each program that could produce the sequence. [11][12]

He enlarged his theory, publishing a number of reports leading up to the publications in 1964. The 1964 papers give a more detailed description of Algorithmic Probability, and Solomonoff Induction, presenting 5 different models, including the model popularly called the Universal Distribution.

Work history from 1964 to 1984
Other scientists who had been at the 1956 Dartmouth Summer Conference (such as Newell and Simon) were developing the branch of Artificial Intelligence which used machines governed by if-then rules, fact based. Solomonoff was developing the branch of Artificial Intelligence that focussed on probability and prediction; his specific view of A.I. described machines that were governed by the Algorithmic Probability distribution. The machine generates theories together with their associated probabilities, to solve problems, and as new problems and theories develop, updates the probability distribution on the theories.

In 1968 he found a proof for the efficacy of Algorithmic Probability, but mainly because of lack of general interest at that time, did not publish it until 10 years later. In his report, he published the proof for the convergence theorem.

In the years following his discovery of Algorithmic Probability he focused on how to use this probability and Solomonoff Induction in actual prediction and problem solving for A.I. He also wanted to understand the deeper implications of this probability system.

One important aspect of Algorithmic Probability is that it is complete and incomputable.

In the 1968 report he shows that Algorithmic Probability is complete; that is, if there is any describable regularity in a body of data, Algorithmic Probability will eventually discover that regularity, requiring a relatively small sample of that data. Algorithmic Probability is the only probability system known to be complete in this way. As a necessary consequence of its completeness it is incomputable. The incomputability is because some algorithms - a subset of those that are partially recursive - can never be evaluated fully because it would take too long. But these programs will at least be recognized as possible solutions. On the other hand, any computable system is incomplete. There will always be descriptions outside that system's search space which will never be acknowledged or considered, even in an infinite amount of time. Computable prediction models hide this fact by ignoring such algorithms.

In many of his papers he described how to search for solutions to problems and in the 1970s and early 1980s developed what he felt was the best way to update the machine.

The use of probability in A.I., however, did not have a completely smooth path. In the early years of A.I., the relevance of probability was problematic. Many in the A.I. community felt probability was not usable in their work. The area of pattern recognition did use a form of probability, but because there was no broadly based theory of how to incorporate probability in any A.I. field, most fields did not use it at all.

There were, however, researchers such as Judea Pearl and Peter Cheeseman who argued that probability could be used in artificial intelligence.

About 1984, at an annual meeting of the American Association for Artificial Intelligence (AAAI), it was decided that probability was in no way relevant to A.I.

A protest group formed, and the next year there was a workshop at the AAAI meeting devoted to "Probability and Uncertainty in AI." This yearly workshop has continued to the present day.

As part of the protest at the first workshop, Solomonoff gave a paper on how to apply the universal distribution to problems in A.I. This was an early version of the system he has been developing since that time.

In that report, he described the search technique he had developed. In search problems, the best order of search, is time $$T_i/P_i$$, where $$T_i$$ is the time needed to test the trial and $$P_i$$ is the probability of success of that trial. He called this the "Conceptual Jump Size" of the problem. Levin's search technique approximates this order, and so Solomonoff, who had studied Levin's work, called this search technique Lsearch.

Work history — the later years
In other papers he explored how to limit the time needed to search for solutions, writing on resource bounded search. The search space is limited by available time or computation cost rather than by cutting out search space as is done in some other prediction methods, such as Minimum Description Length.

Throughout his career Solomonoff was concerned with the potential benefits and dangers of A.I., discussing it in many of his published reports. In 1985 he analyzed a likely evolution of A.I., giving a formula predicting when it would reach the "Infinity Point". This Infinity Point is an early version of the "Singularity" later made popular by Ray Kurzweil.

Originally algorithmic induction methods extrapolated ordered sequences of strings. Methods were needed for dealing with other kinds of data.

A 1999 report, generalizes the Universal Distribution and associated convergence theorems to unordered sets of strings and a 2008 report, to unordered pairs of strings.

In 1997, 2003 and 2006 he showed that incomputability and subjectivity are both necessary and desirable characteristics of any high performance induction system.

In 1970 he formed his own one man company, Oxbridge Research, and continued his research there except for periods at other institutions such as MIT, University of Saarland in Germany and the Dalle Molle Institute for Artificial Intelligence in Lugano, Switzerland. In 2003 he was the first recipient of the Kolmogorov Award by The Computer Learning Research Center at the Royal Holloway, University of London, where he gave the inaugural Kolmogorov Lecture. Solomonoff was most recently a visiting Professor at the CLRC.

In 2006 he spoke at AI@50, "Dartmouth Artificial Intelligence Conference: the Next Fifty Years" commemorating the fiftieth anniversary of the original Dartmouth summer study group. Solomonoff was one of five original participants to attend.

In Feb. 2008, he gave the keynote address at the Conference "Current Trends in the Theory and Application of Computer Science" (CTTACS), held at Notre Dame University in Lebanon. He followed this with a short series of lectures, and began research on new applications of Algorithmic Probability.

Algorithmic Probability and Solomonoff Induction have many advantages for Artificial Intelligence. Algorithmic Probability gives extremely accurate probability estimates. These estimates can be revised by a reliable method so that they continue to be acceptable. It utilizes search time in a very efficient way. In addition to probability estimates, Algorithmic Probability "has for AI another important value: its multiplicity of models gives us many different ways to understand our data;

A very conventional scientist understands his science using a single 'current paradigm' --- the way of understanding that is most in vogue at the present time. A more creative scientist understands his science in very many ways, and can more easily create new theories, new ways of understanding, when the 'current paradigm' no longer fits the current data".

A description of Solomonoff's life and work prior to 1997 is in "The Discovery of Algorithmic Probability", Journal of Computer and System Sciences, Vol 55, No. 1, pp 73–88, August 1997. The paper, as well as most of the others mentioned here, are available on his website at the publications page.

Suggested further reading
Rathmanner, S and Hutter, M., "A Philosophical Treatise of Universal Induction" in Entropy 2011, 13, 1076-1136: A very clear philosophical and mathematical analysis of Solomonoff's Theory of Inductive Inference

Deduction, reasoning, problem solving
Early AI researchers developed algorithms that imitated the step-by-step reasoning that humans were often assumed to use when they solve puzzles, play board games or make logical deductions. During the early years many researchers felt probability could not be used in AI, but in 1960 probability was redefined using program lengths rather than frequency for prediction. By the late 1980s and '90s, AI research had developed highly successful methods for dealing with uncertain or incomplete information, employing concepts from probability and economics.

See also Kolmogorov, A.N. (1965). "Three Approaches to the Quantitative Definition of Information". Problems Inform. Transmission 1 (1): 1–7.

in 1956, at the original Dartmouth summer conference, Ray Solomonoff wrote a report on unsupervised probabilistic machine learning: "An Inductive Inference Machine".

It viewed machine learning as probabilistic, with an emphasis on the importance of training sequences, and on the use of parts of previous solutions to problems in constructing trial solutions for new problems. He published a version of his findings in 1957 was the founder of Algorithmic Information Theory, and the branch of artificial intelligence based on machine learning, prediction and probability. He circulated the first report on non-semantic machine learning in 1956.

He was the inventor of algorithmic probability, publishing the crucial theorem that launched Kolmogorov complexity and Algorithmic Information Theory. He first described these results at a Conference at Caltech in 1960, and in a report, Feb. 1960, "A Preliminary Report on a General Theory of Inductive Inference." He clarified these ideas more fully in his 1964 publications, "A Formal Theory of Inductive Inference," Part I and Part II. it is a method of assigning a probability value to each hypothesis (algorithm/program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses receiving increasingly small probabilities. Although he is best known for algorithmic probability and his general theory of inductive inference, he made many other important discoveries throughout his life, most of them directed toward his goal in artificial intelligence: to develop a machine that could solve hard problems using probabilistic methods.

Introduction
Algorithmic (Solomonoff) probability, is a concept in theoretical computer science; it is a method of assigning a probability value to each hypothesis (algorithm/program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses receiving increasingly small probabilities. These probabilities form an a priori probability distribution for the observation that can then be used with Bayes theorem to predict the most likely continuation of that observation.

Around 1960, Ray Solomonoff invented the concept of algorithmic probability. He first described his results at a Conference at Caltech in 1960, and in a report, Feb. 1960, "A Preliminary Report on a General Theory of Inductive Inference." He clarified these ideas more fully in his 1964 publications, "A Formal Theory of Inductive Inference," Part I and Part II.

Algorithmic Probability is a unique melding of ideas about computing, a priori and conditional probability using Bayes Theorem, philosophical concepts about simplicity (Occam's Razor), and retaining multiple hypotheses (Epicurus). A universal a priori probability distribution, governed by algorithmic probability is generated. The practical goal of algorithmic probability is Solomonoff's General Theory of Inductive Inference: a universal theory of prediction.

Algorithmic probability is a mathematically formalized combination of Occam's razor,   and the Principle of Multiple Explanations.

Probability and Bayes theorem
Suppose there is a set of observations of some data, and a set of hypotheses that are candidates for generating the data. What is the probability that a particular hypothesis is the one that actually generated the data? If there is enough prior data frequency theory can be used: the relative probabilities of the hypotheses are found by taking the ratio of the number of favorable outcomes to the total number of possible outcomes in the past. How then to adjust this likelihood when a new set of observations occur and you want to combine them all? The mathematician, Thomas Bayes [1702-1761], developed an elegant rule generalizing how to change an existing hypothesis in the light of new evidence -- it is a function of the new evidence and the previous knowledge (the prior probability). His formula says the probability of two events happening is equal to the conditional probability of one event occurring, given that the other has already occurred, multiplied by the probability of the other event happening. Bayes rule is probabilistic, but exact, and with more data, converges toward certainty. Many times, however, there is little prior data so the probabilities can't be used reliably. If there is no prior data at all, then there is no way to assign probability and no way to use Bayes rule. This is the problem that algorithmic probability treats. It provides a mathematically rigorous way of getting an a priori probability under all circumstances even when there is no data at all!

Occam's Razor and Epicurus' Theory of Multiple Explanations
Solomonoff combined several ideas to find a solution to this problem. There are two main philosophical ideas at work. The first is the principle of Occam's Razor which is usually understood to mean that among all hypotheses that can explain the event choose the simplest. The second is Epicurus principle of multiple explanations which advocates keeping all hypotheses that can explain the event. Algorithmic probability combines these two ideas by keeping as many hypotheses as possible, while ordering their likelihood according to how simple each one is.

To do this Solomonoff used a new definition of simplicity based on computers and binary coding.

Turing Machines and Binary Coding
Solomonoff's definition of simplicity derives from the binary coding used by computers. He uses a Universal Turing machine, a computing device which takes a tape with a string of symbols on it as an input, and can respond to a new given symbol by changing its internal state, writing a new symbol on the tape, shifting the tape right or left to the next symbol, or halting. He provides a randomly generated input program. The program computes some possibly infinite output.

The algorithmic probability of any given finite output prefix q is the sum of the probabilities of the programs that compute something starting with q. Certain long objects with short programs have high probability.

Inductive Inference: Solomonoff Theory of Prediction
Algorithmic probability is the main ingredient of Ray Solomonoff's theory of inductive inference, the theory of prediction based on observations. Given a sequence of symbols, which will come next? Solomonoff's theory provides an answer that is optimal in a certain sense, although it is incomputable. Unlike, for example, Karl Popper's informal inductive inference theory, however, Solomonoff's is mathematically rigorous.

Algorithmic probability is closely related to the concept of Kolmogorov complexity. The Kolmogorov complexity of any computable object is the length of the shortest program that computes it and then halts. The invariance theorem shows that it is not really important which computer we use.

Solomonoff's enumerable measure is universal in a certain powerful sense, but it ignores computation time. In order to deal with this problem Solomonoff developed a way to search for solutions by restricting the time allowed to search, and within that time frame, allowing the shorter programs more time to search than the longer one. This concept is called Levin's search, since it is similar to and partly based on the method Levin used for other computer problems.

Algorithmic Probability and Solomonoff Induction have many advantages for Artificial Intelligence. Algorithmic Probability gives extremely accurate probability estimates. These estimates can be revised by a reliable method so that they continue to be acceptable. It utilizes search time in a very efficient way. In addition to probability estimates, Algorithmic Probability "has for AI another important value: its multiplicity of models gives us many different ways to understand our data;

A very conventional scientist understands his science using a single 'current paradigm' --- the way of understanding that is most in vogue at the present time. A more creative scientist understands his science in very many ways, and can more easily create new theories, new ways of understanding, when the 'current paradigm' no longer fits the current data".

A description of algorithmic probability and how it was discovered is Solomonoff's "The Discovery of Algorithmic Probability", Journal of Computer and System Sciences, Vol 55, No. 1, pp 73–88, August 1997. The paper, as well as most of the others mentioned here, are available on his website at the publications page.

Introduction
Algorithmic information theory is the area of computer science that studies Kolmogorov complexity and other complexity measures on strings (or other data structures).

The concept and theory of Kolmogorov Complexity is based on a crucial theorem first discovered by Ray Solomonoff who published it in 1960, describing it in "A Preliminary Report on a General Theory of Inductive Inference" (see ref) as a side product to his invention of Algorithmic Probability. He gave a more complete description in his 1964 publications, "A Formal Theory of Inductive Inference," Part 1 and Part 2 in Information and Control (see ref).

Andrey Kolmogorov later independently invented this theoren as a measure of information content, first describing it in 1965, ''Problems Inform. Transmission'', 1, (1965), 1-7. Gregory Chaitin also invented it independently, submitting 2 reports on it in 1965, a preliminary investigation published in 1966 (J. ACM, 13(1966)) and a more complete discussion in 1969 (J. ACM, 16(1969)).

The theorem says that among algorithms that decode strings from their descriptions (codes) there exists an optimal one. This algorithm, for all strings, allows codes as short as allowed by any other algorithm up to an additive constant that depends on the algorithms, but not on the strings themselves. Solomonoff used this algorithm, and the code lengths it allows, to define string's `universal probability' on which inductive inference of string's subsequent digits can be based. Kolmogorov used this theorem to define several functions of strings: complexity, randomness, and information."

When Kolmogorov became aware of Solomonoff's work, he acknowledged Solomonoff's priority (IEEE Trans. Inform Theory, 14:5(1968), 662-664). For several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was concerned with randomness of a sequence while Algorithmic Probability became associated with Solomonoff, who focused on prediction using his invention of the universal a priori probability distribution.

There are several other variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on self-delimiting programs and is mainly due to Leonid Levin (1974).

"Andrey Kolmogorov later independently published this theorem in Problems Inform. Transmission, 1, (1965), 1-7. Gregory Chaitin also presents this theorem in J. ACM, 16 (1969). Chaitin's paper was submitting October 1966, revised in December 1968 and cites both Solomonoff's and Kolmogorov's papers."