Wikipedia:Reference desk/Archives/Mathematics/2009 August 2

= August 2 =

Image of a set of measure 0
Royden has a problem, number 5.18, which says:

Let g be an absolutely continuous monotone function on [0, 1] and E a set of measure zero. Then g[E] has measure zero.

The article Luzin N property says that every absolutely continuous function on [a, b], f, has the Luzin N property [a, b], which means for $$N \subset [a, b]$$, mN = 0 implies m(f(N)) = 0.

So, it's true for any absolutely continuous function, not just monotone ones? And, is the more general version much harder to prove?

Thanks StatisticsMan (talk) 01:33, 2 August 2009 (UTC)


 * Unless I'm doing something wrong, I think it follows pretty quickly from absolute continuity alone. To show that f(E) has measure less than ε, choose a δ that satisfies the absolute continuity property for ε.  Since E has measure zero, there exists an open cover A of E consisting of disjoint intervals where the sum of the lengths is less than δ.  So then f(A) has measure less than ε and contains f(E). Rckrone (talk) 16:27, 2 August 2009 (UTC)


 * Oh I guess where monotone comes in is that for an interval I = [x,y], m(f(I)) could be larger than |f(y)-f(x)| for f that's not monotone. I don't think it's much of a problem, since you can probably find a way to break up intervals like that so that that doesn't happen.  Alternatively, f can be decomposed into the sum of two monotone absolutely continuous functions. Rckrone (talk) 18:02, 2 August 2009 (UTC)


 * Yes; or also you may rephrase, clearly equivalently, the definition of absolute continuity this way:
 * A function f: I → R is absolutely continuous on I if for every positive number $$\varepsilon$$, there is a positive number $$\delta$$ such that whenever a (finite or infinite) sequence of pairwise disjoint sub-intervals [xk, yk] of I satisfies
 * $$\scriptstyle \sum_{k} \left| y_k - x_k \right| \leq \delta$$
 * then
 * $$\scriptstyle\sum_{k} \mathrm{osc}(f, [x_k, y_k] ) \leq \varepsilon$$,
 * (where $$\scriptstyle\mathrm{osc}(f, S )$$ denotes the oscillation of f on the set S, that is $$\scriptstyle\sup_S f -\inf_S f$$).--pma (talk) 19:22, 2 August 2009 (UTC)


 * I thought about the fact that an absolutely continuous function is the sum of two monotone functions. I did not think about the fact that they would be absolutely continuous.  But, here it is in one of my books.  So, assume f is not monotone, then f = f_1 - f_2 where f_1 and f_2 are monotone and absolutely continuous.  The result on monotone functions gives f_1[E] and f_2[E] measure 0.  Then f[E] is certainly a subset of f_1[E] - f_2[E] = {x - y : x \in f_1[E], y \in f_2[E]}.  But, is f_1[E] - f_2[E] also of measure 0?  If so, that proves it.  If not, ??? StatisticsMan (talk) 18:36, 2 August 2009 (UTC)
 * Consider that for null subsets N and N' of [0,1], N+N' may have positive measure: take N:=all reals that admit a ternary expansion with only 0 and 1 ; N':=all reals that admit a ternary expansion with only 0 and 2. Clearly, each x in [0,1] is expressable as y+y' with y in N y' in N'. Indeed in this moment I do not see a simplification in decomposing a function as f=f1-f2.--pma (talk) 18:53, 2 August 2009 (UTC)


 * (ec) Your proof is correct; you not have to bother about monotonicity; it never enters. Notice also that for continuous function that is actually a characterization of absolute continuity. Indeed, for a continuous BV function g:[a,b]→R the following are equivalent:
 * i. g absolutely continuous;
 * ii. for any $$\scriptstyle N\subset [a,b]$$ with measure zero, g(N) has measure zero;
 * iii. for any ε>0 there exists δ>0 such that for any (Lebesgue) measurable $$\scriptstyle M\subset [a,b]$$, if M has measure less than ε, then g(M) has (exterior) measure less than δ;
 * iv. for any measurable $$\scriptstyle M\subset [a,b]$$ g(M) is measurable.
 * By (iv), the exterior measure in (iii) is actually the measure. --pma (talk) 18:40, 2 August 2009 (UTC)


 * (ec) @StatisticsMan: For any interval I = [a,b], for all x in I, f(x) - f(a) ≤ f_1(x) - f_1(a) ≤ f_1(b) - f_1(a) and similarly f(a) - f(x) ≤ f_2(x) - f_2(a) ≤ f_2(b) - f_2(a), so m(f(I)) ≤ |f_1(b) - f_1(a)| + |f_2(b) - f_2(a)|. In other words f can't increase in the interval more than f_1 does or decrease more than f_2 increases, which puts bounds on the measure of f(I). Rckrone (talk) 18:48, 2 August 2009 (UTC)
 * Good; so if g :=f1+f2 you have |f(B)|≤|g(B)|, which reduces the proof to the case of a g monotone. Still, I do not see why monotonicity should make any simpler the proof (your first argument is correct and monotonicity does not enter...)--pma (talk) 19:04, 2 August 2009 (UTC)
 * The reason I brought it up is that the way my book defines absolute continuity is that for any ε there exists δ such that $$\sum{|f(b_k)-f(a_k)|} < \epsilon$$ for any disjoint intervals (ak, bk) with $$\sum{(b_k-a_k)} < \delta$$, which is also how it's defined on absolute continuity. With other definitions it might not be necessary. Rckrone (talk) 19:14, 2 August 2009 (UTC)
 * Perfect (I put above a variant of the definition, which it is not necessary to bother about monotonicity in the proof (is this sentence grammatically correct?) . Note that the dependence of δ on ε in the modified definition is exactly the same as in the standard one). --pma (talk) 19:24, 2 August 2009 (UTC)
 * Yeah that makes sense. I probably should've mentioned before what definition I was going by. (it would work if you said "with which" instead of "which") Rckrone (talk) 20:06, 2 August 2009 (UTC)


 * I read this all and will read it again in a bit. Quickly though, I see you say that for a continuous function, absolute continuity is equivalent to the Lusin N property.  However, the article on absolute continuity says it is equivalent to the Lusin N property and bounded variation.  "f is absolutely continuous if and only if it is continuous, is of bounded variation and has the Luzin N property."  Right?  Are you saying it would be fine to say f is absolutely continuous if and only if it is continuous and has the Luzin N property? StatisticsMan (talk) 22:33, 2 August 2009 (UTC)


 * Okay, I think I am understanding this all. So, for the definition Rckrone is using, which is the same one I am using, monotonicity does matter but the solution is to consider g = f_1 + f_2 and note $$m(f(B)) \leq m(g(B))$$  Then, g is monotone and it works for g so it works for f as well.  On the other hand, if we just use the alternate definition that you gave pma, the one involving oscillation, then the same proof works and we don't need to use monotonicity.  Is that all right?  And, this definition is equivalent, you say clearly :)  Let's see if it is to me.  First, your definition immediately implies "my" definition since my definition gets a sum which is even smaller than the one involving oscillation.  I guess I don't see, right away, the other direction.  But, it does seem like it would be true.


 * Also, is exterior measure also called outer measure? I have heard of the latter but not the former. StatisticsMan (talk) 01:29, 3 August 2009 (UTC)


 * Yes: "exterior measure" is less used, but the same as "outer measure". However, for a continuous function f:[a,b]→R the LNP implies that f(M) is measurable for all measurable M, so its outer measure is just the measure.
 * "BV and LNP implies AC" is false: if f is a characteristic function of a sub-interval of [a,b] it's BV, discontinuous and maps everything into the null set {0,1}. But you are right, I forgot "BV" above (fixed).
 * The definition with "osc" follows at once from the standard definition as Rckrone wrote it: in that situation not only
 * $$\scriptstyle \sum{|f(b_k)-f(a_k)|} < \epsilon$$,
 * but in fact for all $$\scriptstyle a_k\leq a'_k\leq b'_k\leq b_k$$ also
 * $$\scriptstyle \sum{|f(b'_k)-f(a'_k)|} < \epsilon$$
 * so that
 * $$\scriptstyle \sum_k{\mathrm{osc}(f, [a_k,b_k])}=\sup\big\{\sum_k{|f(b'_k)-f(a'_k)|}: a'_k\; \mathrm{ and }\; b'_k \;\mathrm{such\;that}\; a_k<a'_k<b'_k<b_k\big\}    \leq \epsilon$$
 * --pma (talk) 05:56, 3 August 2009 (UTC)

Normal distribution and chi-square test project - Please help
Hello people. As part of my QT (Quantitative Techniques) course, I was required to do an practical exercise of my choice. Before doing that I had to get the idea approved by the teacher. Now the teacher has approved my idea but I have no clue how to go about doing the exercise. My exercise involves normal probability distribution and chi-square test, but I hadn't read up on either of them when I submitted the idea. Please read my idea below (that I submitted to the teacher and got approved) and tell me how I can go about doing it. In particular I don't know how to convert the discrete values of lengths of leaves to the continuous distribution of the normal curve.

"The project involves making measurements on the length of the leaves of Ashoka trees planted in JLT. I shall randomly pluck leaves from Ashoka trees and measure the length of the leaves from the base to the tip. I shall make such measurements for about 200 leaves. After collecting this data I shall plot a graph of the distribution of leaves. According to my expectations, the frequency distribution of the lengths of the leaves should be a normal distribution. I shall use chi square test to gauge the deviation of the distribution from a normal distribution and to determine whether the deviation can be attributed to chance or not. In case it is not, I shall try to find a hypothesis for the deviation from a normal distribution."

Many thanks -- ReluctantPhilosopher (talk) 13:20, 2 August 2009 (UTC)


 * The lengths of leaves aren't discrete values -- they are continuous. Depending on the scale of your ruler and the lengths of the leaves, your measurements of the lengths will be more or less discrete. For example, if the average leaf is 5 inches and your scale is in quarters-of-an-inch, then your measurements will look more discrete than continuous. If your scale is in millimeters, your measurements will look more continuous than discrete. Wikiant (talk) 14:51, 2 August 2009 (UTC)


 * Er, my question is, how do I calculate the deviation of the observations from the expected normal distribution. Thanks. ReluctantPhilosopher (talk) 15:22, 2 August 2009 (UTC)


 * Try the Jarque-Bera test. Wikiant (talk) 15:37, 2 August 2009 (UTC)


 * Aw thank you so much! That's the kind of thing I was looking for. The article says that it can be further used in a chi-square test also so that means I can stay within the ambit of my "proposal". Thanks again ReluctantPhilosopher (talk) 15:46, 2 August 2009 (UTC)


 * A follow up question - Is it possible to directly apply the chi-square test? ReluctantPhilosopher (talk) 17:30, 2 August 2009 (UTC)


 * I'm not sure what you mean by "chi-square test." Every statistical test requires (1) that you construct a test statistic, and (2) that you compare the test statistic to a known distribution. In that sense, any test that uses the chi-squared distribution in step (2) is a "chi-square test." Wikiant (talk) 18:36, 2 August 2009 (UTC)
 * I apologize for my lack of knowledge about the subject - I am yet to study what chi squared test is and I only have a vague idea what it is used for. Thanks for helping --ReluctantPhilosopher (talk) 16:31, 3 August 2009 (UTC)

Note that the domain of a normal distribution is the entire real axis, while the length of a leave is always a positive number. So take a logarithm of the lengths before testing data for normality (as the logarithm maps the positive semiaxis upon the real axis). Note also the difficulty in defining what it means to randomly pluck a leave; (when is a leave too small to be called a leave?) See multiset and cumulant for defining the deviation of the distribution from a normal distribution. Bo Jacoby (talk) 21:14, 2 August 2009 (UTC).
 * Those are very good insights. I'll keep those in mind, thanks! --ReluctantPhilosopher (talk) 16:31, 3 August 2009 (UTC)

I imagine you are referring to Pearson's chi square goodness-of-fit test. Whenever I hear "chi-square test", this is the first thing that comes to mind anyhow. I can't give any input as to whether or not it is better (or in what circumstances it might be) than the above-mentioned Jarque-Bera test, though it is presumably what you meant (?) when you told your teacher this was your plan. With Pearson's test there is the problem of choosing the number of "bins" when testing for a continuous distribution, and the Wiki page doesn't seem to give any indication of how to do so; you might search elsewhere for guidelines. You might also check out the Kolmogorov-Smirnov test or the Anderson-Darling test. Nm420 (talk) 20:34, 4 August 2009 (UTC)

0 expected value of a bet = impossible to make money (guaranteed)? why?
If bets are independent and have 0 expected return, does that mean it is IMPOSSIBLE using any strategy to be guaranteed to make money through a series of such bets? What is the proof of this impossibility? (Note: I'm thinking of betting strategies like "martingale"). I'm trying to understand why not just martingale, but any betting strategy is doomed mathematically, so that if it works anywhere, then this is only in places not subject to math. —Preceding unsigned comment added by 82.234.207.120 (talk) 14:18, 2 August 2009 (UTC)


 * The expected value of a sum of random variables is the sum of the expected values of the variables (that's a basic property of expected values), so if each bet has expected value 0 the expected value of lots of such bets will also be 0. The best you can do is guarantee to get exactly 0 (by hedging your bets precisely), if there is any chance of a positive result then there needs to be a chance of a negative result to balance it. (Note, in real life a bet will generally have an expected value less than zero, so you cannot even guarantee to break even.) --Tango (talk) 14:31, 2 August 2009 (UTC)
 * That's not actually quite true: a sum of randomly many 0-expectation variables can have positive expectation. E.g. if you have infinite funds and unlimited time, and a house is offering fair bets at evens odds, then the strategy "continue betting until you've won a million dollars" has expected value of a million dollars, even though each bet has expected value zero. The real result is the optional stopping theorem, which requires some additional assumptions. Algebraist 17:05, 2 August 2009 (UTC)
 * How about if I don't want a million dollars under those conditions, but infinite dollars? Would the strategy "starting at 1 and moving on to the next integer each time you net an additional $1, continue betting until you have netted a win for every positive integer" have infinite expected value? 82.234.207.120 (talk) 23:53, 2 August 2009 (UTC)
 * No. It takes an infinite time to carry out, and your wealth after an inifinite sequence of plays is not well-defined. Algebraist 00:03, 3 August 2009 (UTC)
 * Surely the expected value is indeterminate. For the strategy "continue betting until you've won $a or lost $b" the expected gain after any number of steps is zero (even taking into account that you stop upon reaching one of the bounds). Your strategy is the case when a = 106 and b → ∞, but every strategy with finite b has an expected gain of zero. There's some kind of ∞/∞ thing going on. -- BenRG (talk) 00:43, 3 August 2009 (UTC)
 * No, it's perfectly determinate. The probability that I will be a million dollars up at some finite time is one, so the expected gain is a million dollars. It doesn't work if you have a finite amount of money to lose, or a limited amount of time to play, but I said that already. Algebraist 01:08, 3 August 2009 (UTC)
 * I see what you mean, but my argument seems valid too. Your expected winnings at any given time are zero. I think there are two incompatible definitions of "expected gain". It reminds me of the balls and vase problem for some reason. -- BenRG (talk) 10:57, 3 August 2009 (UTC)
 * Yes, your expected winnings at any fixed time is zero, but at a random time can be nonzero, as I said to start with. Since typical betting strategies involve playing for a random time, that's the relevant case. Algebraist 12:15, 3 August 2009 (UTC)
 * Typical betting strategies usually have a fixed amount of money to lose. Taemyr (talk) 12:33, 3 August 2009 (UTC)
 * The only surefire way of making money by betting is to be better (oops, pardon the pun) at determining the odds than the house. I know of at least one person who made a living this way. --ReluctantPhilosopher (talk) 15:31, 2 August 2009 (UTC)
 * Or just better than the other people betting. For something like horse racing, where there is no certain way to determine the odds (unlike roulette, for example), the house just sets the odds based on how people are betting (the initial odds will be based on the actual race, I suppose, but they don't last long). If you know better than everyone else you can, in theory, make money. (You have to know quite a lot better in order to compensate for the profit margin the house adds on to the odds.) --Tango (talk) 15:36, 2 August 2009 (UTC)


 * There are cases where adding up an infinite number of events each with probability 0 can have a non-zero probability. For example, suppose I am choosing a truly random number from 0 to 1.  For any specific value x, the chance that I'll choose x is 0, but the chance of me choosing any number between 0 and 1 should be the sum of these individual probabilities, but is equal to 1.  If you bet that I would pick the number x, your expected return would be zero, but if you bet on all the numbers from 0 to 1/2 you would have 50/50 odds.  See continuous probability distribution. Rckrone (talk) 17:03, 2 August 2009 (UTC)


 * You are adding up an uncountably infinite number of events, which I'm not sure is even meaningful to say (although I guess you could add them up by enumerating them with the ordinal numbers). A uniform probability distribution between 0 and 1 is actually a measure on the real numbers between 0 and 1. Measures are, by definition, countably additive, which means that a countably infinite number of 0-probability events cannot possibly add up to a positive value. --COVIZAPIBETEFOKY (talk) 20:34, 2 August 2009 (UTC)


 * The set of outcomes need not be uncountable. Suppose I choose a random rational number between 0 and 1.  The result is the same. Rckrone (talk) 21:08, 2 August 2009 (UTC)
 * There is no uniform probability distribution on the rationals between 0 and 1, unless you take the unusual step of not requiring countable additivity. Algebraist 21:24, 2 August 2009 (UTC)
 * Ok point taken, but what if I choose a real number between 0 and 1 at random and you bet on a specific Vitali set? What is your probability of winning? Rckrone (talk) 16:23, 4 August 2009 (UTC)
 * It must be undefined. I imagine any attempt to determine it empirically would just be unstable - you can only do finitely many trials (countably infinitely many in the limit) and we're working with uncountable sets, so that's perfectly plausible. --Tango (talk) 16:41, 4 August 2009 (UTC)
 * How do you mean it would be unstable if you test it empirically? If you did N trails, would you ever win?  I would think the answer is no. Rckrone (talk) 17:37, 4 August 2009 (UTC)
 * Vitali sets are unmeasurable, as I'm sure you're well aware. If you're talking about what would actually happen in practice, then (a) it's impossible to specify a specific Vitali set and (b) determining whether a randomly chosen number lies in a specific Vitali set requires knowing that number precisely, i.e. knowing infinite amounts of information. Algebraist 17:46, 4 August 2009 (UTC)
 * I mean that if you did more and more trials the experimental probability wouldn't stabilise. When you calculate a probability empirically you do a load of trials and calculate successes/total trials, you then do more trials and calculate it again and repeat and repeat until the answer stops changing at the level of precision you want. For something without a well defined frequentist probability the answer would never changing at any level of precision. Of course, Algebraist's point makes mine moot - you couldn't even carry out a trial. --Tango (talk) 18:25, 4 August 2009 (UTC)
 * Doesn't a "martingale" strategy actually work here -I'm thinking of something like tossing a coin, and either losing or wining the amount bet - that has expectation 0 on constant equal bets. But using the simple 'double the bet' method each time allows the player to win.?
 * Is that what you were thinking of - because in this case it looks like it is possible to win with enough funds. (I may have made a mistake)83.100.250.79 (talk) 22:08, 2 August 2009 (UTC) Thus there is no proof of the impossibility, but there is proof that a doubling bet method can win; if you want to see it.83.100.250.79 (talk) 22:10, 2 August 2009 (UTC)
 * You are right, but "enough funds" turns out to be infinite funds. With any finite amount you will find that there is enough chance of losing everything to balance the large probability of winning a tiny amount of money so that you end up with a expectation of breaking even, just as if you had everything you have on a single toss of the coin. --Tango (talk) 22:59, 2 August 2009 (UTC)