Wikipedia:Reference desk/Archives/Mathematics/2013 October 20

= October 20 =

how to test the coefficient of an equation ( multiple regression analysis)
I used multiple regression analysis (linear) to find the coefficient of an equation (y = a + bx + cx'), but now I want to check if the results I got are correct. How do I check it? I've got a set of data for y, x, and x' and now I have the results of a,b,c, but these are only estimates so there are always some variances/differences when I plug in the a,b,c in the equation. How do I check if my estimates of a,b,c are correct? Thank you 109.129.182.135 (talk) 12:39, 20 October 2013 (UTC)


 * Well, you could calculate the sum of the squares of the deviations, or use some other method, to tell how good it is. However, this doesn't tell you how good it is relative to other possible values for a, b, and c.  This requires so much analysis, that it really needs a computer program.  So, here are my suggestions:


 * 1) Plot the equation and the data points, and visually decide if it looks good.


 * 2) Use some different multiple regression programs, and see how close those answers are.


 * 3) Vary the params, like increasing the number of iterations, to see how close those answers are. If everything yields approximately the same results, then chances are it's a good one.


 * However, annoyingly for mathematicians, there really is no "correct" and "incorrect" here, only "better" and "worse". StuRat (talk) 12:59, 20 October 2013 (UTC)


 * Thank you very much. Is there always a best estimation of a,b,c or there can be several equally acceptable results? Is it possible that when I use different statistic programs I get different results for a,b,c (given the same data set for x,x',y)? 109.129.182.135 (talk) 13:10, 20 October 2013 (UTC)


 * There can be multiple equivalent estimates given the same data and method, although this isn't likely. But, if you change the method slightly, you can get different results, as well.  Even different programs which supposedly use the same method might have some slight difference in implementations which causes them to give slightly different results.  So, you have to settle for answers which are "good enough".  StuRat (talk) 13:39, 20 October 2013 (UTC)


 * I should also back off a bit and list an assumption of mine: that you have more data points than the degree of your equation is guaranteed to fit exactly. This is usually the case.  A line is guaranteed to fit only 2 points exactly, for example, and each time you add a degree to the equation, you can exactly fit one more point.  However, with too high of a degree you get a "lumpy" curve, since each additional degree beyond 2 also introduces the possibility of an additional inflection point. StuRat (talk) 14:21, 20 October 2013 (UTC)


 * The OP asks: How do I check if my estimates of a,b,c are correct? There are two things that this question might be intended to mean. (1) How do I check if the estimates really minimize the sum of squared residuals? -- If you entered the data correctly and used the correct commands in the software package, the estimates will be calculated correctly according to the formulas, unless you have a severe multicollinearity problem. (2) How do I check if my parameter estimates equal to true values of the parameters? -- There's absolutely no way to check this -- that's why we have to estimate the parameters.


 * Regarding the OP's follow-up question Is there always a best estimation of a,b,c or there can be several equally acceptable results? -- there are various different ways of obtaining coefficient estimates, and each has its advantages and disadvantages. See Robust regression and in particular Robust regression. Duoduoduo (talk) 15:14, 20 October 2013 (UTC)

Optimal allocation strategy
A thought experiment that I was having raised some questions which I do not know how to resolve, and for which I would appreciate help.

Part 1:

Consider the following game: You start off with C (indivisible) coins, and there is a silver cup in front of you. Before each round, you can put as many of your coins in the silver cup as you want (you are allowed to leave the cup empty if you so desire). Then, in each round, there is a 60% probability that the coins in the silver cup double, and a 40% probability that the coins in the silver cup all disappear. If you are going to play N rounds of this game, what is your optimal allocation strategy over the course of the entire game to maximize the number of coins you have at the end of the game?

My problem with this question is that it seems to raise a contradiction which I'm not sure how to resolve. The expected value for putting coins in the cup is positive, and clearly maximized for each round by putting as many coins in the cup as possible, which suggests that you should put ALL your coins in the cup. But also clearly, if you do that, you will likely go bust before the N rounds are over. So, what then should your strategy be?

Part 2:

Now, consider the following additional complication to the game: Instead of just a silver cup, there is now also a second, golden cup, which behaves exactly like the silver cup except that when the coins in it increase, they triple instead of double (the probabilities of gain/loss are still 60%/40%, and the two cup payouts are separate, independent events). You are allowed to split your coins between two cups as you want (you can allowed to leave either or both cups empty if you so desire). Again, if you are going to play N rounds of this game, what is your optimal allocation strategy over the course of the entire game to maximize the number of coins you have at the end of the game?

This additional cup raises a second contradiction for me which I'm also not sure how to resolve. The golden cup has a higher expected value, which would suggest that to maximize your expected payout, if you are going to be putting any coins at all in cups, you should put them all in the golden cup and leave no coins in the silver cup. But, intuitively, my instinct is that it is instead better to spread the coins across the two cups to diversify the risk -- but if so, what then should the relative coin allocation ratios between the two cups be? And what would be the optimal allocation strategy over the course of the entire game?

—SeekingAnswers (reply) 15:01, 20 October 2013 (UTC)


 * I think that it depends on your precise criterion. If you want to maximise the EV of the number of coins at the end, yes, put all in the silver cup at each stage in the first case and all in the gold one in the second. If you want to guarantee that the number at the end is as high as possible, don't put any in the cup(s) at any stage. Anything else will require some assessment of the risk you will accept, which will depend on the number of rounds.31.54.112.70 (talk) 15:21, 20 October 2013 (UTC)


 * Yes, if you put all the coins in the cup and keep trying to double or triple it, you end up with a progressively smaller chance of having anything, but what you would have if you do have anything gets progressively larger, and at a greater rate. So, the average amount would go up with each iteration, while the chance of having zero also goes up.  In the two cup game, I'd say use the golden cup only.  The interesting question is, if you have some realistic goal, like "I need 10x as much money to pay my rent", how would you be most likely to achieve that ?  StuRat (talk) 16:26, 20 October 2013 (UTC)


 * See Kelly criterion for this general question. This is standard for dealing with volatility in shares. Dmcq (talk) 16:57, 20 October 2013 (UTC)


 * This problem is extremely complicated by the indivisibility of the coins. It is also very complicated if the problem means to prohibit removing coins that have accumulated in a previous stage. I will assume away both of these prohibitions. I'll discuss the Part 1 problem, but the Part 2 problem works analogously.


 * You care about final wealth $$W_N$$, you start with initial wealth $$W_0$$, and your stochastic return (the amount the portfolio grows or shrinks to in a given period i) is $$R_i$$. $$R_i$$ depends on the portfolio allocation (the fraction $$w_i$$ of current wealth in the cup, rather than out of the cup, in a given period) and the stochastic return $$r_i$$ on the cup (the amount that a dollar in the cup grows or shrinks to in a given period). So:


 * $$W_N=W_0R_1R_2 \cdots R_N$$


 * where
 * $$R_i=w_1r_1+(1-w_i)\cdot1$$


 * Assume risk aversion. Then we maximize the expected value of a concave expected utility function. Paul Samuelson showed long ago that exactly one utility function in intertemporal problems like this causes it to be optimal to make one period's decision independently of the outcomes of the previous periods' investment: namely the log utility function of Bernoulli: Maximize E ln($$W_N$$). This independence-of-decisions result holds even if the probability of success in one period is dependent on the actual stochastic outcomes in previous periods. For any other utility function you have to nontrivially use dynamic programming.


 * So take the log of $$W_N$$ above, and substitute in for $$R_i$$ for each i, and take the expected value of $$W_N$$. You can see the the terms containing the choice shares $$w_i$$ are additively separate, giving rise the Samuelson's result of intertemporal independence of optimal decisions. Take the expectations using Prob($$r_i$$=2) (a doubling) = $$p_i$$ and Prob($$r_i$$=0) (a wiping out of the contents of the cup) = $$1-p_i$$. Maximize with respect to $$w_i$$ and you get $$w_i=2p_i-1.$$


 * So there's your answer under these assumptions: $$w_i=2p_i-1.$$ Note that it obeys a no-borrowing constraint $$w_i \le 1$$, with equality when $$p_i=1$$, and if $$p_i \ge 1/2$$ it obeys a constraint that the share of current wealth put into the cup be at least 0. Duoduoduo (talk) 18:18, 20 October 2013 (UTC)


 * Clarification: The indivisibility of the coins is a deliberate/intentional part of the problem. However, there is no requirement that you maintain coins earlier placed in the cups between rounds (you are freely allowed to remove coins from cups). —SeekingAnswers (reply) 22:25, 20 October 2013 (UTC)


 * More generally, if we redo the above letting the success-value of $$r_i$$ be the potentially time-varying parameter $$f_i$$ instead of specifically 2 (e.g. if a success triples whatever is in the cup, then $$f_i=3$$, etc.) then the optimal share (fraction) of current wealth to place in the cup at any time is $$w_i=\frac{p_if_i-1}{f_i-1}$$. Note that (1) this is increasing in the probability $$p_i$$ and increasing in the payout factor $$f_i;$$ (2) the limit of $$w_i$$ as the cup's return $$f_i$$ goes to infinity is $$p_i;$$ (3) as $$p_i$$ goes to 1 then $$w_i$$ goes to 1, and (4) as $$p_i$$ goes to $$1/f_i$$ then the optimal share $$w_i$$ goes to 0. Duoduoduo (talk) 20:21, 20 October 2013 (UTC)


 * Does that apply to only the most favorable cup, or does it apply to all cups? That is, using the numbers in the original question:


 * (0.6 x 3 - 1) / (3 - 1) = 40% of coins in the golden cup and no coins in the silver cup
 * 40% of coins in the golden cup and (0.6 x 2 - 1) / (2 - 1) = 20% of coins in the silver cup


 * ...which of the above two are you saying is optimal for each round?


 * —SeekingAnswers (reply) 23:17, 20 October 2013 (UTC)


 * Neither. I only did the math for the one-cup case. You would have to redo it with more terms in the expression for $$R_i$$ in the case of two cups, and you would have to have two choice variables being chosen simultaneously -- the share put into the silver cup, and the share put into the gold cup.
 * Incidentally, in the presence of an indivisibility constraint, if the w implied by the above formula does not lead to a feasible number of coins (e.g. if $$W_0=10$$ and $$w=2/3$$ so the dollar amount to put into the cup is $$W_0 \cdot w = 20/3,$$ and if each coin is worth one dollar so fractions of a dollar are impermissible), then you want to pick the feasible number of coins that gives the highest value of the objective function $$E \ln W_N.$$ Since the objective function is concave in the choice variable, the integer optimum will be an integer adjacent to 20/3 -- either 6 or 7. Duoduoduo (talk) 01:38, 21 October 2013 (UTC)

I am trying to prove that: For any interval I, m*(I)=|I|.
The proof is in here (Oxtoby - Measure and Category),

![Oxtoby - Thm 3.3][1]

But I don't follow the details, so I am trying to write my own proof. (this is a new version after comments of other viewers)

Suppose that $$M=|I| > m^*(I)$$. Then $$\epsilon = |I| - m^*(I) > 0$$. Take the cover of $$I$$ by open balls $$B_n$$ where each $$B_n$$ has radius $$\frac{\epsilon}{8}$$. Take $$U=\bigcup^{\infty}_{n=1}B_n$$. $$\overline{U}$$ is closed. Take a cover of $$\overline{U}$$, $$\{C_i\}$$ by open balls of radius $$\frac{\epsilon}{8}$$. $$\overline{U}$$ is closed and $$R^n$$ is Hausdorff, so There is a finite subcover $$\{ C_{i_j} \}^{k}_{i_j=1}$$, and therefore a finite cover of $$|I|$$. We also get that $$\Sigma^{k}_{i_j=1}|C_i|<|I|+\frac{\epsilon}{2}$$ contradicting the fact that $$\epsilon = |I|- m^*(I)$$.
 * Proof:** One direction is obvious so, we need to show that $$|I| \le m^*(I)$$.

What do you think?

Thank you, Shir

[1]: http://i.stack.imgur.com/kmhz1.jpg — Preceding unsigned comment added by Topologia clalit (talk • contribs) 16:51, 20 October 2013 (UTC)
 * First, I replaced your $'s with 'math' tags so WP will convert the TeX. Second, not everyone has a copy of Oxtoby and the link you gave doesn't give a lot of context. I'm not sure what you mean by an "interval" in r-space for example. --RDBury (talk) 22:17, 20 October 2013 (UTC)


 * The TeX in the original question above was showing some "Failed to parse(unknown function '\gt')" errors, so I replaced the "\gt" markup with ">" symbols. —SeekingAnswers (reply) 23:23, 20 October 2013 (UTC)