User:Vivohobson/sandbox

Ray Solomonoff (July 25, 1926 – December 7, 2009) was the inventor of algorithmic probability, and founder of algorithmic information theory. He was an originator of the branch of artificial intelligence based on machine learning, prediction and probability. He circulated the first report on non-semantic machine learning in 1956.

Solomonoff first described algorithmic probability in 1960, publishing the theorem that launched Kolmogorov complexity and algorithmic information theory. He first described these results at a Conference at Caltech in 1960, and in a report, Feb. 1960, "A Preliminary Report on a General Theory of Inductive Inference." He clarified these ideas more fully in his 1964 publications, "A Formal Theory of Inductive Inference," Part I and Part II.

Algorithmic probability is a mathematically formalized combination of Occam's razor,   and the Principle of Multiple Explanations. It is a machine independent method of assigning a probability value to each hypothesis (algorithm/program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses receiving increasingly small probabilities.

Solomonoff founded the theory of universal inductive inference, which is based on solid philosophical foundations and has its root in Kolmogorov complexity and algorithmic information theory.

Although he is best known for algorithmic probability and his general theory of inductive inference, he made many other important discoveries throughout his life, most of them directed toward his goal in artificial intelligence: to develop a machine that could solve hard problems using probabilistic methods.

Life history through 1964
Ray Solomonoff was born on July 25, 1926, in Cleveland, Ohio, son of the Russian immigrants Phillip Julius and Sarah Mashman Solomonoff. He attended Glenville High School, graduating in 1944. In 1944 he joined the United States Navy as Instructor in Electronics. From 1947-1951 he attended the University of Chicago, studying under Professors such as Rudolf Carnap and Enrico Fermi, and graduated with an M.S. in Physics in 1951.

From his earliest years he was motivated by the pure joy of mathematical discovery and by the desire to explore where no one had gone before. At age of 16, in 1942, he began to search for a general method to solve mathematical problems.

In 1952 he met Marvin Minsky, John McCarthy and others interested in machine intelligence. In 1956 Minsky and McCarthy and others organized the Dartmouth Summer Research Conference on Artificial Intelligence, where Ray was one of the original 10 participants --- he and Minsky were the only ones to stay all summer. It was for this group that Artificial Intelligence was first named as a science. Computers at the time could solve very specific mathematical problems, but not much else. Ray wanted to pursue a bigger question, how to make machines more generally intelligent, and how computers could use probability for this purpose.

of Information and Control.[12],[13] In a letter in 2011, Marcus Hutter wrote: “Ray Solomonoff’s universal probability distribution M(x) is defined as the probability that the output of a universal monotone Turing machine U starts with string x when provided with fair coin flips on the input tape. Despite this simple definition, it has truly remarkable properties, and constitutes a universal solution of the induction problem.”(See also [7]) Algorithmic Probability combines several major ideas; of these, two might be considered more philosophical and two more mathematical. The first is related to the idea of Occam’s Razor: the simplest theory is the best. Ray’s 1960 paper states “We shall consider a sequence of symbols to be ‘simple’ and have high a priori probability if there exists a very brief description of this sequence — using of course some stipulated description method. More exactly, if we use only the symbols 0 or 1 to express our description, we will assign the probability of 2−N to a sequence of symbols, if its shortest possible binary description contains N digits.”[11][10] The second idea is similar to that of Epicurus: it is an expansion on the shortest code theory; if more than one theory explains the data, keep all of the theories. Ray writes “Equation 1 uses only the ‘minimal binary description’ of the sequence it analyzes. It would seem that if there are several different methods of describing a sequence, each of these methods should be given some weight in determining the probability of that sequence.”[11][10] P(x)M = ∞ � i=1 2 −|si(x)| This is the formula he developed to give each possible explanation the right weight. (The probability of sequence x with respect to Turing Machine M is the total sum of 2 to the minus length of each string s that produces an output that begins with x.) Closely related is the third idea of its use in a Bayesian framework. The universal prior is taken over the class of all computable measures; no hypothesis will have a zero probability. Using program lengths of all programs that could produce a particular start of the string, x, Ray gets the prior distribution for x, used in Bayes rule for accurate probabilities to predict what is most likely to come next as the start is extrapolated. The Universal Probability Distribution functions by its sum to define the probability of a sequence, and by using the weight of individual programs to give a figure of merit to each program that could produce the sequence. [11][12] The fourth idea shows that the choice of machine, while it could add a constant factor, would not change the probability ratios very much. These probabilities are machine independent; this is the invariance theorem that is considered a foundation of Algorithmic Information Theory.[11][13]