Wikipedia:Reference desk/Archives/Mathematics/2010 March 13

= March 13 =

Maclaurin expansion
Question 6 on the homework is in two parts

(a) Obtain the first four terms of the Maclarin expansion for ln(1+x)

(b) Use the expansion from part (a) to show that:

$$ \ln(c + x) = \ln(c) + {x \over c} - {x^2 \over 2c^2} + {x^3 \over 3c^3} - {x^4 \over 4c^4} + ...$$

Part (a) is a piece of cake: $$ \ln(1 + x) = x - {x^2 \over 2} + {x^3 \over 3} - {x^4 \over 4} + ...$$

Part (b) would also be a piece of cake if I worked it out the typical way (i.e. finding all the derivatives, powers and factorials etc.) and indeed I have already solved it that way, however the question specifically says Use the expansion from part (a) to solve it. It's possibly a mistake in the question. This is what I have so far:

$$ \ln(c + x) = \ln(1 + (x + c - 1)) = x + c - 1 - {(x + c - 1)^2 \over 2} + {(x + c - 1)^3 \over 3} - {(x + c - 1)^4 \over 4} + ...$$

I could probably expand the brackets to continue but you just know I'm never going to rearrange it to the correct form (after all, where the is that ln(c) going to come from?!)--220.253.247.165 (talk) 00:45, 13 March 2010 (UTC)


 * $$c+x=c(1+x/c)$$ —Bkell (talk) 01:12, 13 March 2010 (UTC)
 * That's only helpful if the OP is allowed to use the rule for logs of products. That's always the problem with this type of question - guessing that you are and aren't allowed to take as given. --Tango (talk) 01:24, 13 March 2010 (UTC)
 * Thanks Bkell - That doesn't help the slightest bit :P Seriously, how is that useful?--220.253.247.165 (talk) 02:52, 13 March 2010 (UTC)
 * log(c+x)=log(c(1+x/c))=log(c)+log(1+x/c), you can then use your expression for log(1+x) and change all the x's to x/c's. Job done. --Tango (talk) 02:55, 13 March 2010 (UTC)
 * THANK YOU!!! Problem solved :)--220.253.247.165 (talk) 04:08, 13 March 2010 (UTC)

Statistics - Conditioning
My stats course recently covered conditioning for finding an expectation, probability, etc., but I really don't understand it. When do we use it and why and how is it useful? Please recommend any good resources. I don't understand my prof or textbook. —Preceding unsigned comment added by 142.58.129.94 (talk) 02:36, 13 March 2010 (UTC)


 * Have you ever watched a televised poker tournament? They commonly have little cameras mounted around the edge of the table or somewhere so that the television audience can peek at the players' hands. Often there is a chart on the side of the screen listing the players and the percentage probabilities that each one will hold the best hand at the end. For example, the chart might say Jones 42%, Smith 9%, Brown 31%, Fernandez 18%, indicating that Jones has a 42% probability of holding the best hand at the end, etc. These probabilities are calculated based on the knowledge of the cards the players hold and (in a variant like Texas hold-em) the community cards that have been revealed. (The probabilities ignore such aspects of the game as bluffing; a player may hold the best hand but still lose if he decides to fold.)
 * In the hypothetical situation above with four players, before any cards are dealt or revealed, each player has a 25% probability of holding the best hand at the end. But as more knowledge is obtained about the cards, these probabilities change—if a player is dealt three aces, her probability of holding the best hand at the end will probably rise, whereas a player who is dealt a two, a six, and a nine of different suits will probably see his probability fall.
 * This is the idea of conditional probability. The probability that a person in a four-player poker game will be dealt the best hand, given no other information, is 25%. But the probability that she will be dealt the best hand, given that she has been dealt three aces, is quite a bit higher than 25%.
 * Of course, an example like poker is an overly complicated example. It's simpler to think about, say, the roll of two fair six-sided dice. Let's say one of the dice is white and the other is red, so that we can tell them apart. There are 36 possible rolls of the two dice; each of these rolls is equally likely. Let's shake the dice in an opaque plastic cup and then slam the cup upside-down on the table—we've rolled the dice, but we can't see what we rolled yet. What is the probability we rolled at least an 11? Well, there are three possible ways that could happen: white 5, red 6; white 6, red 5; or white 6, red 6. So the probability we rolled at least an 11 is 3/36, or 1/12. Now let's say we reveal only the white die, and we see it's a 5. Given that information, what is the probability that we rolled at least an 11? Well, there are only six possibilities for what we rolled (we know the white die was a 5, but the red die could be anything from 1 to 6). Out of those possibilities, two of them give us at least 11. So, given the fact that the white die was a 5, the probability that we rolled at least 11 is 2/6, or 1/3, which is significantly higher than 1/12. Our partial knowledge about the situation changed the probability—that's conditional probability. Now suppose instead that when we revealed the white die we saw that it was a 3. Then the conditional probability that we rolled at least 11, given the fact that the white die is a 3, is 0; do you see why? —Bkell (talk) 04:16, 13 March 2010 (UTC)

Sorry, but I already understand contional probability. I was asking specifically about using conditioning to find expectation or probability.. —Preceding unsigned comment added by 142.58.129.94 (talk) 02:36, 14 March 2010 (UTC)  —Preceding unsigned comment added by 70.68.120.162 (talk)


 * I don't understand. What do you mean by "conditioning to find probability" if you don't mean "conditional probability"?
 * If you understand the idea of conditional probability, then conditional expectation should be an easy step to take: conditional expectation is to conditional probability as expectation is to probability. For example, if I roll two fair six-sided dice, what is the expected value of their sum? It's 7. Do you understand this? Then suppose I add more information: If I roll two fair six-sided dice, one red and one white, what is the expected value of their sum, given that the white die shows 5? The conditional expectation in this case is 8.5: There are six possible rolls (the white die is 5, but the red die could be anything from 1 to 6), each equally likely; the results of these six possible rolls are 6, 7, 8, 9, 10, and 11; so the expected value is (6 + 7 + 8 + 9 + 10 + 11)/6 = 8.5.
 * If something is still unclear, please explain in more detail what you are having trouble understanding. —Bkell (talk) 02:14, 14 March 2010 (UTC)
 * I think the OP is asking about Law of total probability and Law of total expectation. In that case their usefulness is, naturally, that they allow you to calculate the probability\expectation when you know the conditional probabilities\expectations. -- Meni Rosenfeld (talk) 10:55, 14 March 2010 (UTC)

Stats -- algebraic manipulation of V(X) and E(X), etc.
I can do simple things but can't solve them when it's really complicated. Could anyone please recommend any resources that have tips or practice problems for doing manipulations with V(X), E(X), etc.? —Preceding unsigned comment added by 142.58.129.94 (talk) 02:38, 13 March 2010 (UTC)


 * Are you talking about functions, or something else? 70.171.224.249 (talk) 16:27, 13 March 2010 (UTC)


 * Do Variance and Expected value help? --Tardis (talk) 02:30, 14 March 2010 (UTC)

Symmetric linear solve
Given $$AE+EA=B$$ with all three matrices symmetric, is there an explicit form for E? (If it and A aren't known to commute, of course.) --Tardis (talk) 03:55, 13 March 2010 (UTC)


 * You want to invert the linear endomorphism $$\scriptstyle T_AX:=AX+XA$$ defined on the linear space of all symmetric square matrices of order $$n$$ . One case where $$\scriptstyle T_A$$ is certainly invertible (both from matrices to matrices and from symmetric matrices to symmetric matrices), and you have an inversion formula is when A is a perturbation of the identity. Note that $$\scriptstyle T_A^n X=\sum_{j=0}^n {n \choose j}A^j XA^{n-j}$$ (because $$T_A$$ is the sum of a left multiplication operator  and a right multiplication operator, and they commute so you can apply the binomial expansion formula). From $$\scriptstyle \|T_A^n\|$$ you can compute the spectral radius of $$T_A,$$ as $$r(T_A)=2r(A)$$. Therefore if $$A=I-H$$ and $$r(H)<1/2$$ you have $$T_A=I - T_H$$, with $$r(H)<1$$ and $$(T_A)^{-1}$$ has the geometric series expansion. This gives E as a double series, $$\scriptstyle E=\sum {n \choose j}H^j XH^{n-j},$$ summing over all pairs (j,n) (with 0≤j≤n).
 * PS: Also note that the spectral theory for the above TA is linked to that of A. For instance, if λ and μ are eigenvalues of A with eigenvectors resp. u and v, then the rank-one endomorphism u&otimes;v s.t. (u&otimes;v)x=u(v&middot;x) is an eigenvector of TA with eigenvalue λ+μ. In particular, since your A is symmetric, hence diagonalizable, TA is also diagonalizable (because you get that way n2 linearly independent eigenvectors), and if A has no pairs of opposite eigenvalues then TA is invertible.
 * In other words, if A=diag(λi) in the base (ei), then TA =diag(λi+λj) in the base ei&otimes;ei (the endomorphism with matrix Eij, which has an entry 1 at the place (i,j) and 0 otherwise)--pm a  10:21, 13 March 2010 (UTC)

simple question.
Is there away to find out how many prime numbers before agiven one?like there are 4 prime numbers before 11.Husseinshimaljasimdini (talk) 06:45, 13 March 2010 (UTC)
 * Just counting them is one way. Prime-counting function mentions some more sophisticated methods. Algebraist 06:59, 13 March 2010 (UTC)
 * the function Algebraist cited, π(x), is called the prime-counting function. So if you want to find out how many primes there are up to a certain number, all you need to do is put it into the parantheses.  For example, if you are interested in knowing how many primes there are up to a thousand billion, you just need to put it in like this: π(1000000000000).  82.113.121.167 (talk) 11:17, 13 March 2010 (UTC)
 * Just because the function has a name doesn't mean it's easy to compute. "π(x)" is a just a concise way to represent the number that we are trying to find. Rckrone (talk) 21:59, 13 March 2010 (UTC)


 * http://primes.utm.edu/nthprime/ has an online tool to compute π(x) for x < 3×1013. PrimeHunter (talk) 14:44, 13 March 2010 (UTC)
 * If you're content with a simple approximation, $$\pi(n)\sim\frac{n}{\log n}$$ will do. -- Meni Rosenfeld (talk) 20:26, 14 March 2010 (UTC)

is there any proof that current compression couldn't be much better?
Is there anything to show that in the future, people won't have computing power coupled with algorithms that will allow them to collapse something that is a 9 megabyte compressed file today, and actually just have it as a 50 kilobyte compressed file?

I don't mean compression of white noise, of random data - obviously impossible (pigeonhole principle). I mean the kinds of data people actually use and compress.

Thank you. 82.113.121.167 (talk) 11:06, 13 March 2010 (UTC)


 * Can you be more specific about what is being compressed ? Text files ?  Digital photographs ?  Movies ?  And are we talking about lossless compression only ? StuRat (talk) 14:08, 13 March 2010 (UTC)


 * Yes. We are talking about lossless compression.  I'm talking about the most common contents of zip files, these can be programs, documents, digital photographs, whatever.  Is there anything to say that they are being compressed anywhere near the optimal level? 82.113.121.167 (talk) 14:23, 13 March 2010 (UTC)


 * There is a theoretical result that if you view the messages to be encoded as a random variables, under the correct hypotheses, then the minimum possible average compression ratio is determined by the entropy of the random variables. See Shannon's source coding theorem.


 * For specific types of messages, this entropy can be measured experimentally. For example, Shannon measured that the entropy of English is about 1 bit per letter (plus of minus). Assuming that experimental result is correct, it means that no lossless compression algorithm for English texts will be able to compress them smaller than 1 bit of output per letter of input, on average.


 * However, if the language is restricted, this could change the entropy. For example, if I only need to compress sentences that consist of the word "buffalo" repeated over and over, the entropy will be much lower than 1 bit per letter. So the best possible algorithm is only defined relative to a particular set of input messages. &mdash; Carl (CBM · talk) 14:35, 13 March 2010 (UTC)


 * Also, in general, if we consider binary files with some fixed number k bits, there are 2k such files, and there are
 * 1 + 2 + 22 + ... + 2k-1
 * smaller files (adding up the number of files of every possible size starting with 0). But
 * 1 + 2 + 22 + ... + 2k-1 = 2k &minus; 1
 * Therefore there are more files of length k then the total number of files of lengths shorter than k. This means there is no lossless compression algorithm whatsoever that can compress every file of length k to a shorter file. This is a different sort of theoretical limitation on lossless compression algorithms for arbitrary binary data. &mdash; Carl (CBM · talk) 14:44, 13 March 2010 (UTC)

uh, thanks, but : JESUS CHRIST! I specifically said the data wasn't random (not "random variables" at all!) but instead the examples of data that we typically zip up. Moreover, when you ARE talking about "random data" then it can't be compressed - on average - at ALL. It's very easy to prove. If you take data of 1 bit, then it is either: 0 or 1 Now if you compress that down to less than 1 bit (ie 0 bits) then you are left with no alternatives - you can't differentiate a.zipped which is supposed to uncompress to a 1 and b.zipped, which is supposed to uncompress to a 0, if they are each 0 bytes long. Moving on, for 2-bit source data, you can have the following files: 00 01 10 or 11 now the only way to "compress" it is to go down to 1 bit compressed files or 0 bit compressed files. Now there is only one 0-bit compressed file, then two possible one-bit compressed files (0 and 1), leaving you three compressed files, A, B, or C. But if there are only three possible results, and four starting states, then obviously you can't represent all of them. One of them must compress to something LARGER than 1. And this is true all over the board. You can't compress white noise, as it has a complete sample space of every possible source input (all bit combinations) and so they must all be represented. Trying to "compress" any random data just can't be done.

Note: despite the above I do have a methodology that is able to compress random data; however it is at least 75 years ahead of current science, and so I am not eager to take on the abuse that I would get on this reference desk if i were to share it. 82.113.121.167 (talk) 14:55, 13 March 2010 (UTC)


 * The first post I made was about compressing English; the second is about binary data. The theoretical limitation is that you cannot compress English document tighter than about 1 bit per letter, on average. &mdash; Carl (CBM · talk) 15:04, 13 March 2010 (UTC)


 * The analysis in terms of random variables and information content is just that: an analysis. It addresses the question of "how many 90 kiB English text files are there"?  Certainly the upper limit is $$2^{90(1024)(8)}$$, because that's the total number of 90 kiB files.  But most of those won't look anything like English; Shannon's experimental result is that the "effective number of bits" is 1 per letter, so there are "just" $$2^{90(1024)}$$ 90 kiB English text files.  This isn't really a count, but a mathematical estimate based on concepts from random variable theory.  Given that many possible files to compress (even with a program that knew that it would never have to compress anything else), you obviously can't do much better than compressing them to about 90(128)=11520 bytes on average &mdash; on average because of course you could choose one of them to compress to 0 bytes.  --Tardis (talk) 16:23, 13 March 2010 (UTC)
 * There is a difference between "uniformly random", which as you say cannot be compressed, and the kind of randomness that Carl talked about. Anything - be it English text, source code, images - can be modeled as a stochastic process. English text for example can be seen as a sequence of words from a dictionary, where the probability for each word to occur depends on those that came before it. If you have a sufficiently accurate model for those probabilities, you can 1) Design an optimal compression scheme for documents following the model (which may be computationally intractable) and 2) Estimate the entropy of the model, giving you the expected size of the compressed file. -- Meni Rosenfeld (talk) 17:02, 13 March 2010 (UTC)

Doesn't the unsolvability of the busy beaver problem imply there will always be specific cases in which compression is possible but where you can't tell that it's possible until you figure out how to do it, and that there's no algorithm for that? (This of course relies on the Church–Turing thesis and might cease to apply if radically new concepts of algorithm were discovered.) Michael Hardy (talk) 19:36, 13 March 2010 (UTC)


 * I don't think so. Usually the way people formalize compression is by talking about an injective function from the set of strings on some finite alphabet back to the set of strings on that alphabet. Each such function can be viewed as a "decompression algorithm" in a general sense. Now for any particular string there is a decompression algorithm that returns that string when given any string of length 1 as input, but this is not particularly interesting. However, it does show that compression of a fixed string to a string of length 1 is always "possible" if you allow free choice of the compression algorithm. So if you are thinking about particular instances of the busy beaver problem as corresponding to incompressible strings, I don't think that will work. But maybe I am misreading what you said. &mdash; Carl (CBM · talk) 19:44, 13 March 2010 (UTC)


 * On the other hand, the busy beaver problem/halting problem is exactly the reason why the Kolmogorov complexity is not a computable function. Let f be a partial computable decompression function as in my previous paragraph, and let s be a string. Then Kf(s) is the smallest n such that there is a string d of length n with f(d) = s. (Usually, a "universality" assumption is made on f, which ensures among other things that Kf is a total function). For many choices of f the function Kf is not computable. Is that what you meant? (If f is taken to be total computable then we can enumerate the values of f on longer and longer strings, and so Kf would be computable). &mdash; Carl (CBM · talk) 19:54, 13 March 2010 (UTC)


 * You could think of the decompressor as a universal interpreter of fixed size, and the compressed text as a program for the interpreter. The question then becomes, what is the minimal sized program that generates the uncompressed text when run by the decompressor?  The minimal size is the Kolmogorov complexity, which is uncomputable.  So you might find a compressed text of size N, but in general you can't tell whether an even smaller one exists. You could go even further and say we don't know how much entropy there really is in the whole Universe.  Maybe the Big Bang started with a subatomic particle that could be completely described by a bunch of quantum numbers that would fit on a 3-by-5 card, and we now live in a giant Mandelbrot-like fractal that evolved deterministically from that small initial configuration, and all the weather patterns, biological organisms, galaxies, novels, sequences of coin flips or radioactive decay events, etc., all could in principle be predicted from that tiny set of parameters; and therefore, any plaintext that can occur in the real world can be compressed to a small constant size.  That, too, is a statement about Kolmogorov complexity.  There is just no way to know. 66.127.52.47 (talk) 00:02, 14 March 2010 (UTC)

L'Hopital's rule
In l'Hopital's rule, can x tend to 0, i.e., is $$ \lim_{x \to 0} \frac{f(x)}{g(x)} = \lim_{x \to 0} \frac{f'(x)}{g'(x)} $$? —Preceding unsigned comment added by 76.230.230.42 (talk) 21:36, 13 March 2010 (UTC)
 * Yes, x can tend to anything. The rule applies if the limit gives 0/0 or infinity/infinity and the derivatives all exist (and are continuous?). --Tango (talk) 21:38, 13 March 2010 (UTC)
 * Oh, and the latter limit has to exist, too. --Tango (talk) 21:40, 13 March 2010 (UTC)
 * Question - According to the article, you only need an indeterminate former limit and for the latter limit to exist. If the latter limit exists, does that mean that continuous derivatives exist? Zain Ebrahim (talk) 22:40, 13 March 2010 (UTC)
 * No, the derivatives need not exist at the point you're taking the limit to (in this case at x=0). Rckrone (talk) 22:47, 13 March 2010 (UTC)
 * They do, however, need to exist around that point (otherwise the whole thing makes no sense). --Tango (talk) 22:53, 13 March 2010 (UTC)
 * Aren't we interested in the limit of the ratio of the derivatives only? Why do we need f' and g' to exist around 0 if f'/g' exists around 0? Zain Ebrahim (talk) 23:04, 13 March 2010 (UTC)
 * If at some point either f' or g' doesn't exist, what could f'/g' possibly mean there? Rckrone (talk) 23:20, 13 March 2010 (UTC)
 * f' and g' have to exist in arbitrarily small neighborhoods of x, but not necessarily at x itself. That's the whole concept of a limit. 66.127.52.47 (talk) 23:30, 13 March 2010 (UTC)


 * Ah, I see it now. Thanks. Zain Ebrahim (talk) 07:48, 14 March 2010 (UTC)


 * Btw, my suggestion, as an analyst, is: forget about de l'Hopital rule. It's a not-successfull tentative of a theorem, that still survives for some unclear unmathematical reason. Learn the Landau notation instead. --pm a 22:37, 13 March 2010 (UTC)
 * I agree. I don't think I have ever used l'Hopital's rule outside of questions on l'Hopital's rule. There are almost always better ways to calculate limits. I think l'Hopital's rule is still taught because a lot of people prefer to just memorise algorithms than actually learn about a subject. If you actually understand what limits are and how they work, then you don't need l'Hopital's rule. --Tango (talk) 22:53, 13 March 2010 (UTC)
 * Yeah, it enjoyed some popularity in old times, where it was successfully employed as a quick tool to kick out students in examinations. But today we have the opposite problem, and try everything to let them survive. So de L'Hopital rule is going to go in the attic together with moustache nets.--pm a 23:22, 13 March 2010 (UTC)

Approximations to roots
For large n, x^(1/n) ~= 1 + log(x)/n. What is the logical way of continuing the expression on the right-hand side to higher-order terms (and correspondingly greater accuracy)? 86.136.194.148 (talk) 22:42, 13 March 2010 (UTC).
 * If x>0 is fixed and you let n diverge, since x1/n=exp(log(x)/n), you have all asymptotics you want from the exponential series, together with bounds on the remainder. For instance,
 * x1/n=1+log(x)/n+ log(x)2/2n2+o(n-2).--pm a 23:02, 13 March 2010 (UTC)
 * Thanks pma. Do you mean "... +o(n-2)" or should that be "... +o(n-3)"? Do you know what the next few terms should be? Is there any pattern? 86.136.194.148 (talk) 00:14, 14 March 2010 (UTC). Don't worry, I see how to do it now!