Wikipedia:Reference desk/Archives/Mathematics/2014 January 21

= January 21 =

Lottery and math question.
First lets imagine http://en.wikipedia.org/wiki/Mega-Sena.

Now, lets imagine a guy buy one ticked per month. This is 12 tickets per year.

Anyway, imagine he, instead of doing that, brought all 12 tickets in the first month of the day (and didnt broughtr tickets for the rest of the year), would this guy have a better chance of winning than with the way he used to bet?201.78.152.70 (talk) 11:44, 21 January 2014 (UTC)
 * The short answer - no. Only buying more tickets per draw would improve his chances. Start with Probability_theory and the related articles to see why. 196.214.78.114 (talk) 12:49, 21 January 2014 (UTC)

I think it does. a guy should buy 12 tickets in January only. Then when the results for January's draw is out the guy must hide in a mine shaft. Then he communicate with his trusted friend and ask if ticket #1 has won? If his friend said "no it has not" then the probability of ticket #2 to ticket #12 winning has increased!!! Next repeat by asking his trusted friend if ticket #2 has won. When his friend said "no ticket #2 has not won" then the probability of ticket #3 to ticket #12 winning has increased!!! Keep doing so until only ticket #12 is left. You will find now that the probability of ticket #12 winning is higher than it was before the lottery result has been announced.

I know you will not believe me. So try this thought experiment. Imagine a lottery which consists of just 13 different numbers. Calculate your chances of wining in two different ways.

(First way) Buy 12 tickets in January

(Second way) Buy 1 ticket a month from January to December

Next, Imagine a lottery which consists of just 14 different numbers. And keep increases the number until you hit the X different numbers where X is the total number of ways you can choose a lottery ticket.

Ohanian (talk) 15:22, 21 January 2014 (UTC)


 * Doing it the second way, you could (theoretically) win every time. You can only win once the first way... AndyTheGrump (talk) 16:03, 21 January 2014 (UTC)


 * I'm not going to write out the formulas, but the probability is actually slightly lower using the second method. However, the difference is on the order of p2, where p is the probability of winning, so it is totally meaningless.  Also, as Andy said, it is compensated by the possibility of winning more than once. Looie496 (talk) 16:17, 21 January 2014 (UTC)
 * Thanks for the info.


 * "it is compensated by the possibility of winning more than once." Op here, well when someone win the lottery, they usually dont think "hey, winning again would be really cool", most would never bet on lottery again. So this compensation is not really one.201.78.152.70 (talk) 17:16, 21 January 2014 (UTC)


 * Looie and Ohanian are correct, if you only consider the chances of winning, and the lotteries are fairly "normal" -- but to make any precise statements or formalize the chances, you'd have to make assumptions about the lotteries. E.g. if lotteries occur once a month, with the same number of tickets sold, and the same jackpot, the analysis becomes much easier. Without assumptions, we can't say much of anything. What if e.g. the December lottery only sells one ticket? SemanticMantis (talk) 18:00, 21 January 2014 (UTC)
 * I was assuming that any realistic wining prize would be considered good enought and the guy would not care about betting again. — Preceding unsigned comment added by 201.78.155.202 (talk) 12:01, 22 January 2014 (UTC)


 * Assuming that there is no significant variation in procedure each drawing, then yes, buying all your tickets for one draw is the best strategy to achieve the goal of "win once", though for most lotteries, the increase in success is quite minimal. If you are considering this problem purely out of interest, though, then the case of a monthly raffle may be of interest. Specifically, where the winner is decided by drawing the winning ticket from the pool of purchased tickets; it is interesting since, then, the "buy all tickets for one draw" strategy can end up only being better in the event that not everyone follows it. For example, with two drawings and two players, each able to buy two tickets, and able to use strategies A = "buy 2 for first drawing" and B = "Buy 1 ticket for each"; if both use A, they each have a 1/2 chance of winning; if both use B, they each have a 3/4 chance of winning at least one; if player 1 uses A and player 2 uses B, then player 1 has a 2/3 chance of success and B is guaranteed to win one. Again, if you are interested in the mathematics of it, it might be fun to consider the general outcome of m players, n drawings, k tickets total per player, and allowing any distribution of tickets to drawings; specifically, the problem of what generates the highest expected number of players getting "at least one win". Sorry for the long rant, just food for thought:-)Phoenixia1177 (talk) 07:45, 23 January 2014 (UTC)

probability
Please help me to find the answer for the following question. In a particular town the probability that an adult owns a car is 0.7. while the probability that an adult owns a car and is employed is 0.6.If a randomly selected adult is found to own a car, what is the probability that he is also employed? Thank you.175.157.92.171 (talk) 16:10, 21 January 2014 (UTC)
 * Oh. I love mental arrhythmic homework question's. I have never driven 7 tenths  of an automobile so can't answer you.--Aspro (talk) 20:20, 21 January 2014 (UTC)
 * Would you believe 6/7?→31.54.113.130 (talk) 20:24, 21 January 2014 (UTC)
 * I think you are wrong. My calculations shows it is "0.6"/"0.7" 220.239.51.150 (talk) 04:11, 22 January 2014 (UTC)
 * Please see our article Conditional probability. The relevant formula here is:
 * $$P(A|B) = \frac{P(A \cap B)}{P(B)}$$
 * but math should never be about memorizing formulae.  Read the article and study the illustrations until you fully understand the concept.  Then, assuming that you understand the mathematical symbols, the formula will simply be the most concise way of expressing the concept.  In the future you will be able to recreate the formula on your own after drawing your own diagram.  If you have to look up the formula (or rely on memorizing it), then you don't understand the concept. -- ToE 05:28, 22 January 2014 (UTC)

Thank you very very much. Its not for me. For my son.175.157.57.250 (talk) 11:04, 22 January 2014 (UTC)
 * Draw a Venn diagram like so: (1) draw a big rectangle, representing all adults in your particular town. (2) draw an oval within the rectangle, occupying about 70% of the area. This represents the car-owning adults in the town. (3) draw a new oval within the rectangle, which overlaps most of the area of the first oval, but not quite, such that both ovals have a part that doesn't overlap with the other. Annotate your diagram as follows: the first oval occupies 0.7 (70%) of the rectangle. It represents all car-owning adults. The second oval represents all adults who are employed. You don't know exactly how big it is, but you know it is greater than 0.6 (60%). You have been told that the area of the intersection between the ovals is 0.6 (60%) (the probability that an adult owns a car and is employed is 0.6). By definition, the proportion (probability) of car-owning adults who also are employed, is found by dividing the area of the intersection (the car-owning and employed adults) by the area of the first oval you drew, the adults who own a car, whether they are employed or not. You end up with the results others have given, 0.6/0.7 = 6/7.
 * If you want to use Bayes theorem, this is the question you've been asked (and its solution:)
 * $$P(employed|carowner) = \frac{P(employed \cap carowner)}{P(carowner)}$$
 * You know that $$P(carowner) = 0.7$$ and that $$P(employed \cap carowner) = 0.6$$
 * Inserting the values into the formula givers the same result as above: 0.6/0.7 = 6/7. ---NorwegianBluetalk 23:10, 23 January 2014 (UTC)
 * Hello sinebot, I no longer use cookies and therefore cannot log in. But I figured that a copy-pasted signature is better than no signature (as longs as it's mine). I would appreciate if you stopped appending "unsigned" tags to my posts. Thanks! N.Blue.
 * Sinebot is a bot rather than a person, so it's not going to understand your message. If you turned off cookies on your browser for privacy reasons, try setting your browser to enable Wikipedia but not other sites to set cookies. —SeekingAnswers (reply) 13:19, 24 January 2014 (UTC)
 * A much more pedestrian way to approach the problem (sorry, I couldn't resist) is to assume that there are only 10 people in the town, 7 of whom own a car. Of the 7 that own a car, there are 6 who are employed.  The question is that if you pick one of these 7 people at random, what is the probability that he or she is employed?  That is obviously equal to the number of people who own a car and are employed (6) divided by the total number of people who own a car (7).   Sławomir Biały  (talk) 13:58, 24 January 2014 (UTC)

Best divisor approximation
Working on a recreational mathematical problem and came to think of this problem:


 * Given an integer n with a known prime factorization, is there an efficient algorithm for finding the divisor d of n, which best approximates some positive real number $$1 < x < n$$ such that $$|x-d|$$ is minimized?

Another variant could be to find the divisor $$d < x $$ such that $$\epsilon = x - d$$ is minimized?

An obvious method is to iterate through an unsorted list of all divisors from the prime factorization (easy to implement) and pick the one that comes closest. However, I would like this to work for highly composite numbers, such as primorials, where the number of divisors are overwhelmingly large.

An alternative idea I came to think of was a binary search, but this requires that I can generate the kth ordered divisor $$d_{k-1} < d_k < d_{k+1}$$ relatively efficient, and I am not aware of such a method. I did notice though an interesting post from Numerical Recipes in 2009, divising a method for generating an ordered set of divisors of n without having to precalculate all of them and then sort them (although this appears to be the most efficient method unless yu are bounded by memory). At least then you know when to stop the iteration, but the method is less efficient than simply finding all unordered divisors. The method uses some rather complicated data structures, and it appears to me that it can be adapted in a way such that a more directed search for a divisor close to or bound by some number can be done thus cutting down the number of computations significantly. I probably just need to read it 5-10 more times, and maybe I will understand :-).

A third idea I have had is to consider $$\log n$$, thus converting the bound variant of the problem into the knapsack problem, in general, an NP-hard problem. The floating point values can be converted into integers by finding a minimum difference between the log of the prime factors, consider the possible buildup of rounding errors from the known number of terms and multiply by a suitable multiple of the smallest difference and floor it. Given that the problems seems to be equivalent to an NP-hard problem, I guess this means there are no simple ways to do this? Or can some properties of the prime factors be used to make it an 'easier problem'?

--Slaunger (talk) 20:17, 21 January 2014 (UTC)
 * This problem isn't equivalent to the knapsack problem but NP-completeness can be used to eliminate possible appraoches which are ultimately dead ends. For example, from my reading of the description of the algorithm in the blog, it could be adapted to produce an ordered list of all possible sums for an instance of the knapsack problem. If this could be combined with a binary search to produce an efficient algorithm then it would prove P=NP, which is unlikely. Having an algorithm that produces a sorted list of values is different than having an algorithm that predicts the next value from the current value. In the first case the algorithm may be storing results from the computation of all the previous values. So to use the algorithm to predict a next value you may have to, in effect, compute all the previous values again, making this approach very inefficient for what you're trying to do. You would run into the same kind of issue trying to do a binary search.


 * One quality of the divisor problem that you don't get with the general knapsack problem is nondegeneracy. For example you may have knapsack items of weights 3, 5, 11. These can be added in different ways to get the same weight, e.g. 11+11=3+3+3+3+5+5. But every combination of log p, for p prime, is unique by the fundamental theorem of arithmetic. I doubt the nondegenerate knapsack problem is any easier than the general knapsack problem, but there are several problems where a nondegeneracy assumption simplifies the solution.


 * A variation of the kind of problem you're talking about involves smooth numbers, i.e. numbers whose prime factors are all less than a given bound B. You might ask, given n and B, what is the smallest B-smooth number greater than n? --RDBury (talk) 19:23, 22 January 2014 (UTC)


 * It is a good point you have about the nondegeneracy. Actually the sets of $$\log p_i$$ are just about as perfect for getting a unique solution as can be. I do not quite understand though, why you state in the beginning that the upper bounded version of my question it is not equivalent with the knapsack problem? In what way is it not equivalent? One thing I came to think about, where the NP character of the problem maybe can be circumvented is the special case where $$x = \sqrt{n}$$. There we know that if n has an odd number of divisors, the best approximant will be the middle one, e.g., $$d_{(n+1)/2}$$ in the ordered set of divisors, and if n has an even number of divisors, it will be one of $$d_{\frac{n}{2}\pm 1}$$. Could there be an efficient algorithm for just finding that single divisor among the ordered divisors? --Slaunger (talk) 21:45, 22 January 2014 (UTC)
 * It's not equivalent because in the divisor problem you're restricted to log p for weights while in the general knapsack problem there is no such restriction. So if you can solve the knapsack problem efficiently you can solve the divisor problem but the reverse does not necessarily hold. The binary search depends on finding the nth largest divisor dk for a given k. Suppose n is the product of the first 1000 primes, then the number of divisors is 21000. In order to find the middle divisor, d2999, with the algorithm given in the blog you would need start from 1 and let it run until you got 2999 values, which would take a very long time. So being able to generate the divisors in order is not the same as having the entire list at once where you can easily jump to the middle; there may be too many divisors to even store them all at once, much less do a binary search on them. There may be a way to solve the divisor problem that uses some special properties of primes rather than approaching it as an instance of the knapsack problem, but I highly doubt it; primes don't have much in the way of such properties to work with. --RDBury (talk) 22:49, 22 January 2014 (UTC)
 * Ah, OK, I get it. You are right! Thanks. --Slaunger (talk) 23:23, 22 January 2014 (UTC)
 * Another simplifying aspect of this problem when converted into a knapsack problem is the fact that we can assume weight equals value. --Slaunger (talk) 09:26, 23 January 2014 (UTC)
 * (ec)Regarding the ordering I just played around a bit with brute-forced ordered lists of the divisors of the first few primorials. I came to think about representing the primorials in a prime code binary representation inspired by the Fibonacci code, so the least significant digit is the multiplicity of the lowest prime, 2 in the prime factorization, the next digit is the multiplicity of the next prime, 3, etc. Thus,
 * $$p_{0}\# = 1 = 2^0 \leftrightarrow 0$$,
 * $$p_{1}\# = 2 = 2^1 \leftrightarrow 1$$,
 * $$p_2\# = 2\cdot 3 = 2^1 \cdot 3^1 \leftrightarrow 11$$,
 * and
 * $$p_n\# = 2^1 \cdot 3^1 \cdot 5^1 \cdots p_n^1 \leftrightarrow \underbrace{1\cdots1}_{n}$$
 * Now, any divisor in $$p_n\#$$ if found by either including a given prime in the primorial in the product or not. Thus, we can represent by any 'n'-digit binary prime code. For instance, $$p_3\#=30$$ represented by the three digit binary prime code 111 has the divisor 15 represented by the binary prime code 101. Now it is interesting to list the divisors for the primorials in ascending order written in their binary prime code form

n p_n#    Ordered divisors in binary prime code form 0    1    0 1     2    0 1 2     3    00 01 10 11 3     4    000 001 010 100 011 101 110 111 4     5    0000 0001 0010 0100 0011 1000 0101 1001 0110 1010 0111 1100 1011 1101 1110 1111 5     6    00000 00001 00010 00100 00011 01000 00101 10000 01001 00110 01010 10001 00111 10010 01100 01011 10100 10011 01101 11000 01110 10101 11001 10110 01111 11010 10111 11100 11011 11101 11110 11111
 * As can be seen here is a certain kind of order in this progression. For the first three primorials the divisors are ordered in accordance with their binary prime codes. For the fourth we begin to see some disorder as the divisor 5 (100) comes before 6 (011), but the underlying ordering for the lower bits are retained. Since it is easy to calculate the divisor with a given binary prime code, I was wondering if the this first rough sorting can be of any help to reduce the computational complexity or at least build up the solution recursively using dynamic programming (or maybe this is just the 0-1 knapsack problem restated in another form)? --Slaunger (talk) 23:03, 22 January 2014 (UTC)

Lenstra–Lenstra–Lovász lattice basis reduction algorithm. Count Iblis (talk) 22:12, 22 January 2014 (UTC)
 * Interesting (and complicated). Thanks, I'll look into that. Lattice reduction problems is a new concept for me. A whole new world opens... :D --Slaunger (talk) 23:22, 22 January 2014 (UTC)
 * OK, after trying to understand, I do not get it? How can this problem be mapped into a lattice reductions problem? How to choose the initial vectors and the bases? --Slaunger (talk) 23:38, 22 January 2014 (UTC)


 * I don't really see how LLL helps, but yes it's an important algorithm, along with PSLQ and other integer relation algorithms. For generating divisors in order given the factorization, I think you can do it straightforwardly, similarly to these schemes for enumerating (say) 5-smooth numbers (a standard programming exercise).  Instead of having an "unlimited" supply of 2's, 3's, and 5's, you'd have some multisets giving the prime factorization of your number, and during the enumeration you'd keep track of which factors you'd already used.  I'm getting sleepy but can explain it further tomorrow if this is too confusing. 50.0.121.102 (talk) 07:50, 23 January 2014 (UTC)
 * After reading about integer relation algorithms on Wolfram, it appears to me that the PSLQ algorithm ought to be the best suited algorithm, as it is specifically made for these kinds of problems (LLL is more general-purpose). Morover, the PSLQ algorithm allows to set uppler limits to the size of individual integer coefficients. I could not find a good explanation of the algorithm (that I could understand), but I found it was included in the sympy package for Python, and I did try it out for $$p_{40}\#$$ using the log of the first 40 prime factors as input, constraining the maximum coefficients to 1 and different tolerances, trying to find the best divisor approximation to $$\sqrt{p_{40}\#}$$. However, I couldn't get the algorithm to converge for nearly low enough tolerances to anything near the right result. Maybe the problem is just too hard for PSLQ, or maybe my lack of insight into the algorithm is the problem... --Slaunger (talk) 21:06, 30 January 2014 (UTC)
 * I'm realizing you're talking about primorials with a LOT of factors, so the above method won't help that much. In fact taking approximations of logarithms of both sides of the equation (log base b rounded to n places), you've got a subset-sum problem which is NP-complete, though the particular instance may not be hard (for one thing all the numbers are positive).  Hmm.  The subset-sum article mentions some heuristic or approximate algorithms that might help. 50.0.121.102 (talk) 18:41, 23 January 2014 (UTC)
 * I did see the mentioning in the knapsack article that for this case where weight equals value you have a subset sum problem for the case of primorials. But actually, it is not the true (NP-hard) subset sum problem. Rather it is the approximate subset sum problem, where you allow a slack in how well the subset sum matches zero. And this can be solved in polynomial time as explained in the article. That makes sense as the LLL algortihm is also an approximate algortihm running in polynomial time. The generalization to arbitrary n is the (approximate) multiple subset sum problem for which Wikipedia has no article curently. --Slaunger (talk) 21:18, 23 January 2014 (UTC)


 * The LLL method can be used in a heuristic way to generate possible solutions. The number n is given as:

$$n = \prod_{j=1}^{m} p_j^{k_j}$$

Let's define m+1 vectors $$v_{r}$$ as follows. for $$1\leq r\leq m$$ put:

$$v_{r} = \left[\underbrace{0,0,\cdots,0}_{r-1\text{ zeros}},1,0,\cdots,0,Z \log(p_{r}),\frac{W}{k_{r}}\right]$$

where Z and W are large normalization factors that can be determined by trial and error. The vector $$v_{m+1}$$ is then taken to be:

$$v_{m+1} = \left[\underbrace{0,0,\cdots,0}_{m \text{ zeros}},1,Z \log(x),W s m\right]$$

The number s will be of order 1 and is also determined by trial and error. The last components of the vectors will steer the reduced vectors to be within the constraints of the problem, the component before that favors optimal solutions to the problem. Of course, the outputs will not all be "legal solutions", but you can tune the parameters (or add extra components with more parameters) so that the outputs are steered more in the right direction. Count Iblis (talk) 12:43, 23 January 2014 (UTC)
 * Thanks, Count Iblis for your worked out example!! It appears to be a really powerful algorithm. And, now I am beginning to actually understand it. At least now I can envision some m+1 dimensional space for the case of primorials, where you have the $$\log x$$ along the last axis in one basis fighting with all the $$\log p_j$$ components in the other basis vectors. I also see that $$Z \sim m/\min(\log p_{j+1} - \log p_j)$$ appears to be an adequate choise corresponding to casting the equivalent knapsack problem into a form with integer weights. I think. I still do not quite understand the extension to arbitrary integers in the last dimension, and I am especially confused about the role of W and s. But after rereading the LLL article, it now seems more tractable, and I think what I have to do to really undersstand it, is to do a by hand calculation of the LLL algorithm for some very simple example such as $$n=360 = 2^3 \cdot 3^2 \cdot 5$$, and the closest divisor approximation to $$x = \sqrt{360} \simeq 18.97$$ using the basis vectors you specify. Hopefully, I will get the result $$d=20$$:-) --Slaunger (talk) 20:58, 23 January 2014 (UTC)
 * I am still struggling to get this to work. I have tried implementing the algorithm by following the pseudocode and example in the Lenstra–Lenstra–Lovász lattice basis reduction algorithm, but I think there are inconsistencies between the stated algorithm and the example as I and another user has noted on the talk page of the article.
 * However, I now better understand the way the vectors are constructed, but I still cannot get the last dimension to work. The second last dimension is clearly constructed as a trick to find an approximate integer relation, such that
 * $$\sum_{j=1}^m a_j \log(p_j) = b \log x + \epsilon$$,
 * where the coefficients are integers and $$\epsilon$$ is some small error to minimize. Assuming that the divisors of n are approximately evenly spread on a log scale of n, one can get a reasonable estimate of the order of the expected error by dividing by the divisor function
 * $$\sigma_0(n)=\prod_{j=1}^m(1 + k_j)$$
 * such that
 * $$\epsilon \sim \frac{\sum_{j=1}^m k_j\log(p_j)}{\prod_{j=1}^m(1 + k_j)}$$.
 * To give this potentially very small error a reasonable weight of relevance for a lattice reduction problem we therefore scale up all values in this dimension with a factor of Z, such that
 * $$Z\epsilon \sim 1$$
 * or
 * $$Z \sim \frac{\prod_{j=1}^m(1 + k_j)}{\sum_{j=1}^m k_j\log(p_j)}$$
 * Is that a reasonable analysis of that dimension?
 * Now, there are some problems with this constraint alone. First of all, we would really like to fix $$b=1$$, and moreover it is a requirement that $$0 \le a_j<k_j$$ for the set of found integers to correspond to a divisor in n. Thus, we need more constraint(s). I believe the last proposed dimension is for that purpose, as that gives the constraint.
 * $$\sum_{j=1}^m\frac{a_j}{k_j} = b \cdot s \cdot m $$
 * If we assume b = 1 (can we?), it is seen that $$s \le 1$$ denotes an average value of the fraction counting the actual multiplicity of a prime factor in the approximating divisor with the multiplicity of the same prime factor in n. For instance, if $$x = \sqrt{n}$$, we would be about half-way through the divisors, and thus expect $$s \simeq 0.5$$. The last factor W would then steer how aggressively, the algorithm should try and find a divisor, which obeys the average multiplicity count, and this would be found by experimentation.
 * While I can see this is a helpful constraint, it does not really put any particularly tough limits on the multiplicity in each divisor, does it? Would it not be better to add m additional dimensions, each giving a constraint for the multiplicity of each prime factor? And how to avoid negative multiplicities?--Slaunger (talk) 21:42, 29 January 2014 (UTC)
 * While I can see this is a helpful constraint, it does not really put any particularly tough limits on the multiplicity in each divisor, does it? Would it not be better to add m additional dimensions, each giving a constraint for the multiplicity of each prime factor? And how to avoid negative multiplicities?--Slaunger (talk) 21:42, 29 January 2014 (UTC)