Wikipedia:Reference desk/Archives/Mathematics/2017 September 19

= September 19 =

Measure Theory
I am taking a course in statistical theory and am having some issues with measure theory. Admittedly these are homework questions (they're from the book Theoretical Statistics by Keener), but I'm not looking for answers, really, but perhaps some clarification on notation and whether my thought processes are correct so far. Thanks in advance!

Question 1: $$\mu$$ is a measure on subsets of the natural numbers where $$\mu(\{n,n+1,\ldots\}) = \dfrac{n}{2^n}$$, for $$n=1,2,\ldots$$. I'm asked to compute $$\int{x\,d\mu(x)}$$. Since this is a counting measure, would this simply be $$\sum_{i=1}^\infty\dfrac{n}{2^n}$$? I think having the $$x$$ in the integrand is throwing me off.

Question 2: I'm given that, for a set $$B \subset \mathbb{N}$$, $$\mu(B) = \lim_{n \to \infty} \dfrac{\#[B \cap \{1,\ldots,n\}]}{n}$$. I'm assuming the "pound sign" notation refers to the size of the set, correct? So it's asking me to find various measures, one of which is $$\mu(E)$$, where $$E$$ is the set of all even numbers. I can see the sequence, for $$n$$ odd, $$\mu(B) = \dfrac{n+1}{2n}$$, and for $$n$$ even, this exactly equals $$\dfrac{1}{2}$$. I also see that $$\lim_{n \to \infty} \dfrac{n+1}{2n} = \dfrac{1}{2}$$. Since the limit exists, would this be $$\mu(E)$$, and can I apply the same sort of process to other such sets (e.g., the primes, the perfect squares)?


 * A few hints and such. For #1, you need to be a bit more careful.  $$\mu$$ isn't given directly – but only by its value on certain subsets of $$\mathbb{N}.$$  But you can use properties of measures to recover it without much hassle.  And it's not counting measure; that's the measure m with m({n}) = 1 for all singletons.  But you're right that you can evaluate it as a sum; exactly how to do that might depend on what you're supposed to know at this point though, so I'm not 100% sure how you should proceed exactly.
 * For #2, I think you've basically got it, although it might be kind of a pain to try to come up with exact expressions for every value of n. Still, you should be able to reason out a limit in either of those other 2 cases.  --Deacon Vorbis (talk) 02:13, 19 September 2017 (UTC)
 * For #2, I think you've basically got it, although it might be kind of a pain to try to come up with exact expressions for every value of n. Still, you should be able to reason out a limit in either of those other 2 cases.  --Deacon Vorbis (talk) 02:13, 19 September 2017 (UTC)


 * For Question 1, I'm pretty sure the sum given is the correct value even if the reason given isn't. On question 2, this is known as the Natural density. It should be easy to show that the natural density of the squares is 0, but the article quotes the Prime Number Theorem (a fairly deep result) to compute the natural density of the primes. --RDBury (talk) 02:22, 19 September 2017 (UTC)
 * The sum isn't right though; you really need to figure out μ&apos;s value on singletons and then use that along with the fact that there's a factor of x in the integrand. Also, I would assume that something like quoting the Prime Number Theorem is fair game.  But that's stronger than you need anyway.  You can show that the number of primes up to n is no more than about n(1 - 1/2)(1 - 1/3)(1 - 1/5)...(1 - 1/pk) for pk &lt; n, and that this (divided by n) tends to 0 as n goes to ∞.  See Euler product and Riemann zeta function for a bit more detail.  --Deacon Vorbis (talk) 02:39, 19 September 2017 (UTC)
 * On 1, my thinking is that for any measure on N,
 * $$\int{x\,d\mu(x)}=\mu(\{1\})+2\mu(\{2\})+\dots=\mu(\{1, 2, 3, \dots\})+\mu(\{2, 3, \dots\})+\dots$$
 * which turns out to be the given sum in this case. The second equation follows by expanding out the lhs, rearranging, and applying countable additivity. On 2, it did seem like using the PNT was like using a sledge hammer to swat a fly and I like your argument better. The OP did mention though that it was a course on statistical theory, which I assume would not have number theory as a prerequisite. So while the squares are straightforward, the primes aren't unless you use a strong result like that, and even if it is fair game to use it wouldn't be fair to assume someone taking that course would be familiar with it. --RDBury (talk) 03:23, 19 September 2017 (UTC)
 * Oh yeah, that's clever. I guess the boring way I had would have given the same answer had I bothered to crank out the sum, but that's definitely nicer.  --Deacon Vorbis (talk) 03:30, 19 September 2017 (UTC)

OP here... the insight is appreciated! Thanks for clarifying my confusion on Question 1; it makes quite a bit of sense now, and RDBury, that is definitely an elegant approach. (I'm shocked I got the right answer for the wrong reason, but am glad to know the proper angle now.) I feel much better about Question 2 as well. Thank you again. 2600:387:A:9:0:0:0:61 (talk) 04:24, 19 September 2017 (UTC)

Tripling odds
If three things each have a 17% chance of happening, how likely is it all three happen? InedibleHulk (talk) 01:17, 19 September 2017 (UTC)

If their occurrence is independent of each other then 0.17 * 0.17 * 0.17 110.22.20.252 (talk) 01:59, 19 September 2017 (UTC)

Otherwise it is Pr(A) * Pr(B|A) * Pr(C|A,B) = 0.17 * Pr(B|A) * Pr(C|A,B) 110.22.20.252 (talk) 02:01, 19 September 2017 (UTC)
 * Fortunately, it's not otherwise. Simpler than I'd hoped. Thanks! InedibleHulk (talk) 02:09, 19 September 2017 (UTC)


 * Or wait, no. It depends. Maybe. If a boy is 10 at some point in 1954, 15 at some other in 1959 and 17 at another in 1961, what are the odds (without knowing anything else) that all of these points occur in November or December? InedibleHulk (talk) 02:20, 19 September 2017 (UTC)


 * He would have to have reached age 10 between November 1 and December 22 (resulting in his reaching 17 by December 31). Thus he would have to have been born in 1944 between October 20 and December 10. Thus there are 52 admissible birth dates. Since 1944 ( the year when he must have been born) had 366 possible birth dates, the probability is 52/366 = 26/183. Loraof (talk) 02:46, 19 September 2017 (UTC)


 * Why couldn't he turn 10 between December 22 and 31? InedibleHulk (talk) 03:41, 19 September 2017 (UTC)


 * Sorry. Striking my post. Too convoluted to explain the nonsense that I had in mind. I'll think about it some more. Loraof (talk) 17:32, 19 September 2017 (UTC)
 * Let's see if I understand the question correctly: Given that a boy was 10 at some point in 1954, 15 at some point in 1959 and 17 at some point in 1961, what is the probability that he was 10 on some day in November or December of 1954, 15 on some day in November or December of 1959 and 17 on some in day in November or December of 1961?
 * If that's correct, then most of the question is redundant. If the boy was 10 at some day in 1954, he's guaranteed to be 15 at some day in 1959.  Similarly, if he's 10 at some day in November or December of 1954, he's guaranteed to be 15 at some day in November or December of 1959.
 * So the question really comes down to: given that the boy was 10 on some day in 1954, what is the probability that he was 10 on some day in November or December of 1954? To be 10 on some day in 1954, he would have to be born on some day from Jan 2, 1943 to Dec 31, 1944.  Since 1944 was a leap year, that gives us 730 possible days.  To be 10 on some day in November or December of 1954 requires being born on some day from Nov 2, 1943 to Dec 31, 1944.  Again, because leap year, that's 425 days.  Assuming an a priori uniform distribution on possible birthdays (not true, but eh), that gives us 425/730.--2601:245:C500:A5D6:418F:C405:A22F:8942 (talk) 22:40, 19 September 2017 (UTC)
 * I interpret the question differently. Pick a random day in each of the years 1954, 1959, 1961. Ask how old a boy was on those dates. If the answer is 10, 15, 17, then what is the chance that all three days were in November or December? 61/365 is near 17% so that matches the original question better. PrimeHunter (talk) 14:17, 20 September 2017 (UTC)
 * Unless there's some chance of the boy dying, this really reduces to the probability that 3 randomly chosen days are all in November or December: the boy's age may be the motivation for the question, but the only random element is the choice of date for each year. Ignoring leap years, that's 61/365 for each of the three dates, so approx. .17 * .17 * .17. OldTimeNESter (talk) 21:48, 20 September 2017 (UTC)
 * No, the three events are not independent in my scenario. The ages tell us that all three days are on the same side of the birthday. That increases the chance they are close together. PrimeHunter (talk) 23:17, 20 September 2017 (UTC)


 * I think that's close enough to my situation. The points aren't quite random, but they're unspecified. The boy (now a man) is telling me he did three things in three years at three ages, but doesn't mention months or seasons. Through comparing his story to known facts, I'm almost certain two of those three things must've happened after October 14 and 18, so the actual likelihood of all occuring after November 1 is more like 82%. But without knowing that, I think now that each thing had a 17% (16.66...) chance of happening after November 1, and if one thing does, they all must.
 * Since there's no practical use in me knowing the hypotheticals for sure, I'm OK with believing this, especially if the absolute truth is complicated (and it seems it might be). Thanks for everyone's help. InedibleHulk (talk) 22:02, 20 September 2017 (UTC)

Type-I and Type-II Errors in The Presence of Autocorrelation
Scheffé (1959) shows that type-I errors increase in the presence of autocorrelated (serially dependent) data sets. In one simulation I regress time series on a set of ordinal numbers, which should be meaningless, yet the regression coefficient is statistically significant about 20% of the time, which I interpret as a type-I error. In another simulation, I split time series by pre-intervnetion and post-intervention, resulting in 7 times as many statistically significant changes in an interrupted time series analysis versus those detected by paired samples t-tests, which I interpret to be type-II errors. The two simulations seems to disagree with each other, while the first agrees with Scheffé (1959). Am I missing something? Can someone shed some light on the matter for me? Schyler ( exquirere bonum ipsum ) 14:05, 19 September 2017 (UTC)


 * Fixing the redlink in the OP's post. Loraof (talk) 17:38, 19 September 2017 (UTC)


 * In general, I'm not even clear on what is confusing you: why should any of this different (and only abstractly described) stuff be the same? Is there some reason you'd expect these two different simulations and different methods to "agree"? And if so, what specifically do you expect them to agree on? Maybe it's obvious to you because you're close to it, but I feel like you're leaving out a lot of potentially important context. For starters, time-series analysis is something I turn to for understanding observational data about the real (non-simulated world). It can help us conjecture underlying causes, etc. But, if I want to understand the results of a (possibly very complicated) simulation scheme, I start by analyzing how it works, not by throwing that out and doing stats on the output. All the underlying causes are right there in the code, right? I say this not to criticize what you're doing, but to point out how much we are missing in this description.
 * I think we might be able to offer a bit more insight and appropriate references if you can give more detail on what this is all about. What is the nature of simulations A and B? What exactly are the two different methods used on B? I know what you mean by the paired sample t-test, but not what you mean by interrupted time series analysis, as that is whole field of related methods, not a specific method. SemanticMantis (talk) 21:14, 19 September 2017 (UTC)

What type of regression are you doing? A standard ARIMA model will handle auto-correlation (but check for stationarity), while Ordinary_Least_Squares may indeed give spurious results, because it assumes the error terms are uncorrelated. OldTimeNESter (talk) 22:05, 20 September 2017 (UTC)