Wikipedia:Reference desk/Archives/Mathematics/2011 December 19

= December 19 =

Platonism vs formalism
As no doubt most of you know, in the philosophy of mathematics there is an ongoing debate between formalists and Platonists. I've often wondered whether the following has any relevance. The success of formal systems explains in part the popularity of formalism, but the axioms in any system consist of information. If the system is powerful enough to generate information theory, it must therefore relate back to the information in the axioms, insofar as it provides a mathematical foundation for the amount of information they contain. The ability to define information theory is fairly basic, I think, and at any rate is found in ZFC. It must be impossible to create two incompatible versions of information theory, for otherwise there would be two different ways of reading the information in the axioms. As a result, different systems must still produce the one theory of information. This at least appears to suggest that mathematics is already present before we define axiom systems, because information theory provides a necessary foundation for interpreting axioms in the first place. Can this put a dent in formalism, or is it going too far? IBE (talk) 17:25, 19 December 2011 (UTC)
 * I don't see why it would be impossible to create two incompatible versions of information theory. Really I don't even understand what that statement means, in any precise way.  As Wolfgang Pauli once put it (talking about something else), not only is this not right, it is not even wrong. Looie496 (talk) 17:41, 19 December 2011 (UTC)
 * A second version of information theory might state, for example, that the wordcount until now for this thread, about 270 words, could be condensed into 5 characters. Of course no such theory is possible, but that's what I'm saying. If you could come up with any other information theory that has theorems that are false as we know them, I would be interested. IBE (talk) 17:55, 19 December 2011 (UTC)
 * If you can come up with formal criteria for calling a body of statements an "information theory", and a schema for translating those statements into ordinary English (so that we can decide whether they are true), then your problem will be well-specified. Currently it is not. Looie496 (talk) 17:59, 19 December 2011 (UTC)
 * I confess I should have linked the article information theory, but I did not anticipate the objection. I thought it had clear statements mathematically speaking, and as understood in normal English. I don't know what more is required other than to note that the information contained in a random variable is defined as the expected value of the informativeness of each outcome, which itself equals -log(p); p being its probability. This gives more. Claude Shannon's 1948 paper established the discipline, and I thought until now it was sufficiently rigorous, precise and clear. IBE (talk) 18:22, 19 December 2011 (UTC)
 * If I had to wager a guess, I'd say that you are confusing what it means to say that axioms contain information (which I imagine to be an informal notion about how well they capture some system- mental or physical) and information in information theory, which has nothing to do with having meaning. Moreover, taken in the most sensible fashion possible, you would at best be talking about applying information theory to a system of axioms, which wouldn't say anything either way about philosophy (if it could say anything about anything). Phoenixia1177 (talk) 22:23, 19 December 2011 (UTC)
 * I had considered this problem, and I confess it is a genuine one. I believe the best way to show that the claim is nevertheless relevant is as follows. Basically, imagine we have one axiom for a system, that is 100 letters long. Information theory tells us how many bits it contains. Irrespective of its exact meaning, that places limits on what can be expressed in that much space. A bogus information theory, produced by a different 100 letter axiom, might say that all 100 letter axioms can be compressed into one bit. Then it would be saying that either a 1 or a 0 can express each system - for any three axioms, two would be the same. The point is that if it gave any different theory of the amount of information that can be represented in a string of characters, regardless of what those characters were, it would allow different forms of data compression, and hence, using one theory, you could condense the axioms and they would say the same thing, and using another theory, you couldn't. We can read the axioms, and do data compression on computers, so it is a fact of the real world, and presumably can only have one theory of information behind it. IBE (talk) 18:28, 20 December 2011 (UTC)
 * You're crossing the very blurred edge between science and mathematics. The idea of a "correct" information theory you are using is relating to the idea of what captures information as we humans see information; I'm sure some creative author could think up a world based on some other weird type of information (I can consistently write down rules that allow for languages with 1.38 symbols, I just can't tell you what such a language could be. However, I can half imagine a world where there is some sort of property that, magically, allows such concepts to have application.) Another, better, example of this is with computers: a lot of people think of computers as abstract things that actually obey the rules of boolean logic in their circuits and computation theory is their running; however, they are really a physical system, the correspondence is only an empirical one. Just as we can find out tomorrow that gravity isn't modeled by Riemannian Geometry, so too can we find some goofy circuit that doesn't do as expected by the Boolean stuff; and we would, then, revise our ideas. Now, of course, you might argue that difference here is that you can talk about compression, and what not, being applied to the axioms themselves. However, we are still talking empirically, how do you know what will happen when you compress axioms? We know, inductively, that it should behave like our abstract theory tells us; but, again, nothing forbid there being some physical representation of the axioms that just doesn't do what it should, or some set of axioms and compression that don't actually agree with the theory (this sounds far fetched, but that doesn't mean it is mathematically impossible.)
 * You could also mean that information theory is correct not based on empirical considerations, but because it works out just right even if done abstractly. However, the problem is that any abstract theory of information using the term compression would have to satisfy basic things if we want it to, again an empirical idea, relate to what we mean by compression. I could just say that any theory of information has a set of maps called compressions that act on strings, nothing forbids this. Indeed, we could do this for every term involved, we might say that entropy is a function such that some properties hold and so forth. So, in this case, saying that our information theory is correct because it works is like saying the integers under addition is the correct group because it adds things up right; or, better, like saying the category of Sets (ZFC) is the correct topos because we can define category theory using set theory. But, that doesn't really work, at the end of the day we just have a bunch of functions with fancy names and meaning we have given to them.
 * Finally, one could also say that when you apply information theory to a system of axioms, you are really just working with corresponding strings. I'm pretty sure you could redefine language and the meaning of symbols, etc. in a way that would get you different results.Phoenixia1177 (talk) 09:41, 21 December 2011 (UTC)
 * When you say compression, are you talking about compressing the axioms down to the smallest string that generates the exact same theory? If you mean it in this sense, then this is not information theory, but Logic, and that's a different ball game altogether:-) Phoenixia1177 (talk) 09:44, 21 December 2011 (UTC)


 * Thanks very much for the help, since it gives me a chance to clarify my thoughts, and see what I'm missing. Basically the last sentence is correct, so I don't totally understand the rest of what you wrote - I think we were talking at cross-purposes there. Yes, information theory and logic are totally different areas, but I am trying to draw a connection between them. The axioms are, no matter what, expressed using information. Consequently, there are rules about how much can be expressed using a particular alphabet and a certain amount of space, so let's say it's just 1s and 0s. Then compression involves maximising entropy for a particular string, or making the bits all independent, and 50% likely to be either a 1 or a 0. If a different set of axioms resulted in a different calculation for the entropy for an expression, there would be a different amount of data that could be represented on my hard drive. Using a set corpus of text (including all the axioms we are interested in), and also using a single algorithm for compression, imagine what happens if we have two different axiom sets in the corpus, producing two incompatible theories of information. They would arrive at different descriptions of how random each bit is, or different descriptions of how much information can be stored in the compressed format. I regard this as a contradiction, although I do not know how to express it as a theorem. I know that means it is not in itself revolutionary, but I have read that mathematicians don't attempt to prove any theorems unless they are already convinced of them on other grounds (intuition or experimentation). I'm saying that such a set of axioms creating a new information theory couldn't exist, because it would affect the description of the axioms themselves. I assume compressed data to still be perfectly valid, even if a human cannot read it. I'm interested in your alternative system, with non-integer numbers of symbols. Is it your own? If not, where can I read more? If so, let me read more here, or on my talk page. IBE (talk) 18:53, 21 December 2011 (UTC)
 * NB: of course I mean lossless data compression IBE (talk) 18:55, 21 December 2011 (UTC)
 * The problem is that information theory is "really" just a collection of functions with fancy names that operate on some objects that we call "bit strings". That there are physical systems in our world that happen to correspond to these is empirical, that information theory says anything about your hard drive is a matter of physics, not of mathematics; if electromagnetism worked differently, then that correspondance, probably, wouldn't hold. When we are talking about applying information theory to axioms, we are talking about applying it to strings that can be used to represent the axioms, not the axioms themselves; information theory, if it can be said to actually talk about something conceptual at all, talks about how many alternative things could be expressed, not the content of those things. In a certain sense, what you are saying sounds almost like saying that you couldn't have a theory of geometry in which the shape "R" didn't exist if the axioms used the letter "R", but that wouldn't really be a problem. (What I just said is stupider than what you are talking about, but I think some of the essential ideas are the same; perhaps not all, so please don't take that as an insulting strawman.)Phoenixia1177 (talk) 20:19, 21 December 2011 (UTC)
 * Now, what I was saying in the last sentence of my last post does not sound like what you are talking about. I was asking if you were talking about "compressing" the axioms into the smallest collection of formulas that say the same thing. However, this would be a matter of logic for two reasons: the idea that two sets of axioms are the same means they entail the same things, this is purely logical, information theory says nothing about the content of the strings; second, there is definitely no algorithm that could do this since there is no algorithm that can tell us if some other axiom set if equivalent to ZFC (suppose there was, then, let P be some wff we want to prove. If ZFC proves P, then ZFC+P = ZFC, and conversely. Hence, given any P; check ZFC+P = ZFC, if yes, declare P a theorem; if not, check ZFC+notP = ZFC, if yes, declare notP a theorem; if not, declare P undecidable. Obviously, no such process exists.)Phoenixia1177 (talk) 20:19, 21 December 2011 (UTC)
 * If the above doesn't answer your question. Are you talking about information theory in the same sense that someone might point to my above rant about algorithms and say that any axiom system that gave a different theory of computation would have to be wrong because of that? If yes, then that is interesting, but not quite right. The content of the axioms or, at least something with formal strings that acts a lot like the content, can be discussed with computation theory; this is not true of information theory, it would be discussing the names of the axioms. But, in a larger view, the thing about computation theory wouldn't really be right either; we would either be talking about formal objects called "proofs" and "algorithms" or talking philosophically about computation and truth (when I make this distinction I am referring to the exact analogus statement I made, I think there is much much more to discuss in the general case of such statements.)Phoenixia1177 (talk) 20:19, 21 December 2011 (UTC)
 * I would also like to point out a few random things (I'm not 100% sure I have your intended meaning pinned down, so I want to cover all of the bases.) First, there is no mathematical objects that are "information theories", hence you cannot produce axioms that give a different theory of information; just axioms that give the theory of information; so, on a superficial level, there is no way to define a wrong information theory. Second, relating to the analogy with computation theory in the last paragraph; there are different theories of computation coresponding to other ordinals, if we chose to use another one, I don't think that mathematics would fail; we just wouldn't be able to do it anymore. Third, you might want to look up Skolem's Paradox, "If set theory is consistent, it has a countable model that satisfies, 'There are uncountable sets.'." Interestingly, this is not an actual problem; it is only seemingly a problem when you start to blur the line between statements about set theory and statements of set theory (I think you might be making a mistake along these lines.) Finally, my point about 1.63 symbol alphabets was just that if you muck around with a bit with probability and go around changing integers to 1.63's in information theory, you could do so and avoid contradictions; the problem would be that you couldn't exhibit a 1.63 symbol alphabet, that doesn't mean you couldn't talk about one. The point being that information theory need not refer to any "real" thing, it just happens to because we limit the expressions in a way that makes it (we don't allow 1.63's for the number of symbols becuase that seems kind of stupid to us.)Phoenixia1177 (talk) 20:19, 21 December 2011 (UTC)
 * I signed each paragraph and indented differently to try and break everything up a little better. If anything sounds terse, or rushed, I find it hard to type well in the edit box and was trying not to take up too much space to say it in.Phoenixia1177 (talk) 20:19, 21 December 2011 (UTC)
 * By the way (I promise I'll quit yammering) I just wanted to point out, since it seems related, that I am actually a Platonist; I'm just very accepting in my ontology. I don't think that there is a correct "Set Theory", but that every "set theory" we come up with/or really any axiom system just describes some section of the objects that exist; so ZFC+CH and ZFC+~CH both refer to different parts of the universe. On that topic, while some of my above sounds formalist or a rejection of Platonism, I am only saying that I don't think that what we physically call computation needs be an instantiation of "The Theory of Computation" nor do I accept that there is only one such theory. Finally, when I mention "Formal mathematical objects", I don't mean to say that they are just meaningless strings, but that the content we ascribe to the mathematical objects (which I take as real) is not a part of the objects themselves; in other words, I'm meaning that they are formal in so far as relative to meaningful nonmathematical content that we ascribe to the objects. In a sense, we are talking about names of objects, objects, and an intuitive conceptual framework built by mathematicians, somewhere in all of this, I think I come off as sounding like I'm in the opposite camp. At any rate, I think some of this relates back to the original idea of their being a "right" "Information Theory".Phoenixia1177 (talk) 21:09, 21 December 2011 (UTC)
 * Thanks, that does help overall. Basically, your par 1. is very clever, with the stuff about the letter R - I'll have to have a think about it. Also I always assume good faith, especially when someone spares their time. Par 2: Now I understand you. Par 3: I think this is basically my drift. Par 4: Thanks for suggesting Skolem's paradox. Is there a good book I could read on this? I've done most of Schaum's Set Theory, which covers uncountable sets and so forth, but I haven't done mathematical logic except for its elements. I'm not too dumb (I was usually near the top of university maths units, and have won a couple of prizes in maths competitions) but I can't stand purely cerebral textbooks without questions (I almost certainly need answers as well). This seems to be very much your area, although I'll leave it to you to declare your expertise. As far as the discussion goes, I think I am just extremely resistant to detaching strings from their meaning, more so than you. I agree with most of what you say about Platonism. In fact, I can see no reason why a new claim like the axiom of choice should be proven one way or the other in ZFC, after all, a statement like "Oranges are blue" is neither true nor false in ZFC (though that might be overdoing the point), so I would think without doubt there are two "universes" for each unprovable assertion. All in all, a very interesting and helpful discussion. IBE (talk) 21:23, 21 December 2011 (UTC)
 * Thank you, I'm happy that I could be helpful:-) As for books, I am unfortunately, not with my book collection at the moment; I won't be until after Christmas, so I only have a limited off my head list of things you might want to check out; none specifically about the Skolem Paradox, though all helpful to thinking on matters like the above. First, I would recommend Computability and Logic by George Boolos, I think its very readable and it covers many good topics. A great book to work through for set theory stuff in general (I think, at least) is Set Theory by Thomas Jech, specifically the first part; there is a book called the Joy of Sets by Devlin that is not that bad, but I'm not the biggest fan of that might worth looking at. A good book that covers a range of topics is A First Course in Logic by Hedman. I'm sure that there are numerous articles and books that I've left out (there's a few I can almost think of). When I get back to my books, I'll look through them and leave a few really good one's on your talk page if you'd like; if you give me an email address, I can send you some resources after Christmas. All of the above, by the way, are textbooks (good ones though, in my opinion); there are also a couple more informal books that would be the worth the read on such topics, as well as a few books on philosophy, I'll give you their names too. Have a great holiday:-) Phoenixia1177 (talk) 05:12, 22 December 2011 (UTC)