Wikipedia:Reference desk/Archives/Mathematics/2010 March 22

= March 22 =

Statistics question: Estimating a probability distribution given sample trials
Is there a technique for estimating the probability distribution of a continous random variable given the results for many trials?

For instance, a continous random variable was rolled 10 times, and the results were 4.03, 1.99, 3.2, 3.119, 4.21, 0.87, 4.14, 2.02, 3.324, 4.39 - Find the probability distribution, or an approximation of the probability distribution.

The article on recursive Bayesian estimation originally looked promising, but it may not be relevant.--220.253.247.165 (talk) 09:50, 22 March 2010 (UTC)
 * There are many techniques for this - choosing one depends on the particular application. If you know absolutely nothing about the distribution, you can't do much better than the empirical cumulative distribution function. If you assume the density function is smooth, then a good way to estimate it is kernel density estimation. If you assume the distribution is from some parametric family, you can use the method of moments or preferably, the MLE. -- Meni Rosenfeld (talk) 11:17, 22 March 2010 (UTC)


 * Wouldn't it make sense to plot it first, then try to determine the type of distribution by inspection ? If I plot the ranges 0-1, 1-2, 2-3, 3-4, and 4-5, I get this:

4^                *   3|             * n 2| 1| *  *   *   0+--->     0-1 1-2 2-3 3-4 4-5             range


 * I also notice data clusters around 2, 3.2, and 4.2, although more data points are needed to confirm this. StuRat (talk) 12:58, 22 March 2010 (UTC)
 * Compute the cumulant generating function of the sample. Bo Jacoby (talk) 18:13, 22 March 2010 (UTC).
 * Wouldn't that just give the empirical distribution? -- Meni Rosenfeld (talk) 18:25, 22 March 2010 (UTC)
 * Yes, but the cumulants of the sample approximate the cumulants of the population. See also multiset. Bo Jacoby (talk) 18:44, 22 March 2010 (UTC).

Iterations of a multiplicative function
For those who are interested in such things, here is an oddment from recreational number theory that has been puzzling me (not homework, and not a competition problem either !).

If the prime factoriation of a positive integer n is


 * $$n=\prod_{p_i \text{ prime}}p_i^{q_i}$$

define the function f(n) as


 * $$f(n)=\prod q_i^{p_i} \text{ for } n>1 \text{ ; } f(1)=1$$

Then f is a multiplicative function, although not completely multiplicative. We can iterate f; for example:


 * $$\begin{align}

f(24500) &= f((2^2)(5^3)(7^2)) &= (2^2)(3^5)(2^7) &= 124461\\ f(124461) &= f((2^9)(3^5))     &= (9^2)(5^3)      &= 10125\\ f(10125) &= f((3^4)(5^3))      &= (4^3)(3^5)      &= 15552\\ f(15552) &= f((2^6)(3^5))      &= (6^2)(5^3)      &= 4500\\ f(4500)  &= f((2^2)(3^2)(5^3)) &= (2^2)(2^3)(3^5) &= 7776\\ f(7776)  &= f((2^5)(3^5))      &= (5^2)(5^3)      &= 3125\\ f(3125)  &= f(5^5)             &= 5^5             &= 3125\\ \end{align}$$

As the above sequence shows, f(n) may be less than, greater than or equal to n. However, the second iterate f(f(n)) always seems to be less than or equal to n.

Can you either prove that f(f(n)) &le; n for all n or find a counterexample such that f(f(n)) > n. Gandalf61 (talk) 11:10, 22 March 2010 (UTC)


 * If p is an arbitrary prime number, then $$f(p^k)=k^p$$. However, if we let $$k=\prod_{i=1}^m q_i^{n_i}$$ (qi prime for all i), we can compute $$f(f(p^k))=f(k^p)=f(\prod_{i=1}^m q_i^{pn_i})=\prod_{i=1}^m (pn_i)^{q_i}=p^{\sum_{i=1}^n q_i}\prod_{i=1}^m n_i^{q_i}=p^{\sum_{i=1}^n q_i}f(k)$$. Now use the formula $$f(f(p^k))=p^{\sum_{i=1}^n q_i}f(k)$$ to test your conjecture. PS  T  15:03, 22 March 2010 (UTC)


 * Okay, I follow that - but I am not sure where it goes next. And what if n does not have the form pk ? Gandalf61 (talk) 15:48, 22 March 2010 (UTC)


 * You want to show that $$\prod_{i=1}^m q_i^{n_i} \geq \sum_{i=1}^m q_i (1 + \log_p n_i)$$ for any p.
 * $$2^{n - 1} \geq 1 + \log_2 n$$ for all integer n > 0, so then $$q^n \geq q (1 + \log_2 n)$$ for any prime q. The product of numbers ≥2 is at least their sum so $$\prod_{i=1}^m q_i^{n_i} \geq \sum_{i=1}^m q_i^{n_i} \geq \sum_{i=1}^m q_i (1 + \log_2 n_i) \geq \sum_{i=1}^m q_i (1 + \log_p n_i)$$ which is what we want.
 * For terms that aren't of the form pk, I don't think the argument should be too bad. The main idea is that for n = pkqk, with k prime f(f(n)) = (p + q)k which is less, but there might be some difficult cases you have to deal with. Rckrone (talk) 22:36, 22 March 2010 (UTC)


 * Problem is that although f is multiplicative, f(f) is not - for example
 * $$f(f(2^5\, 3^5)) \ne f(f(2^5))\, f(f(3^5))$$
 * So a proof for prime powers does not easily extend to a proof for all n. Gandalf61 (talk) 14:47, 23 March 2010 (UTC)
 * What you can do though is show that for any a and b, f(f(a)f(b)) ≤ f(f(a))f(f(b)). Let $$f(a) = \prod q_i^{n_i}$$ and $$f(b) = \prod q_i^{m_i}.$$  The problematic terms are only the primes they have in common, so just consider those.  $$f(\prod q_i^{m_i+n_i}) = \prod (m_i+n_i)^{q_i}.$$  Numbers such as f(a) and f(b) in the image of f must have all exponents mi, ni ≥2, so $$\prod_i (m_i+n_i)^{q_i} \leq \prod_i (m_i n_i)^{q_i} = f(\prod_i q_i^{m_i})f(\prod_i q_i^{n_i}).$$
 * With that result, for any number $$n = \prod_{i=1}^r p_i^{k_i},$$
 * $$f(f(\prod_{i=1}^r p_i^{k_i})) = f(\prod_{i=1}^r f(p_i^{k_i})) \leq \prod_{i=1}^r f(f(p_i^{k_i})) \leq \prod_{i=1}^r p_i^{k_i}$$ Rckrone (talk) 17:10, 23 March 2010 (UTC)

Inverse functions $$e^x - e^{-x}$$ and $$\ln \left ( \frac{ x + \sqrt{ x + 4 } }{2} \right ) $$
I worked out that the second in my question's title was the inverse of the first by using the first and then substituting $$a = e^x$$, flipping my x-and y-values so I have $$x = \frac{a^2 - 1}{a}$$, rearranging to get $$a^2 - xa - 1 = 0$$ and applying the quadratic equation and discarding the solution with the minus and finally taking the log to get back to y. But when I graphed these two functions on the online grapher at walterzorn.com, they only looked like mirror images over the line y=x for positive x-values. Are these two not inverses, meaning I did something wrong, or is there some restriction here I'm not remembering? I know you can't take the log of a negative value, but $$x + \sqrt{x + 4} $$ doesn't go below zero until x equals approximately -1.56, where the asymptote of my second function is. 20.137.18.50 (talk) 13:23, 22 March 2010 (UTC)


 * Should be $$\ln \left ( \frac{ x + \sqrt{ x^2 + 4 } }{2} \right ) $$. Gandalf61 (talk) 13:36, 22 March 2010 (UTC)
 * Thanks, I see how I forgot to square my middle term. 20.137.18.50 (talk) 14:05, 22 March 2010 (UTC)
 * Should be $$\ln \left ( \frac{ x + \sqrt{ x^2 - 4 } }{2} \right ) $$. (minus not plus in the square root) Staecker (talk) 13:46, 22 March 2010 (UTC)
 * Sorry- you're right. I screwed it up twice so it still looked right when I double-checked. Staecker (talk) 14:39, 22 March 2010 (UTC)
 * Since no one gave the WP link yet, I will: hyperbolic function. The inverse of $$e^x - e^{-x}$$ is then $$\operatorname{arsinh}(x/2)=\log\left(\frac x2+\sqrt{\left(\frac x2\right)^2+1}\right)=\log\left(\frac{x+\sqrt{x^2+4}}2\right)$$.—Emil J. 14:54, 22 March 2010 (UTC)
 * You have some missing logs, of course. -- Meni Rosenfeld (talk) 17:13, 23 March 2010 (UTC)
 * Drat. Thanks for pointing it out.—Emil J. 11:37, 24 March 2010 (UTC)

If and only if symbol to use in thesis
I am wanting to use the symbol for iff in my theoretical framework but I am not sure which one would be the right one to use in this instance. Should it be ≡ ?--160.36.39.157 (talk) 14:27, 22 March 2010 (UTC)


 * If you must use a symbol, I've most often seen $$\Leftrightarrow$$. But English words are usually better in my opinion. Staecker (talk) 14:34, 22 March 2010 (UTC)


 * I agree, words are often better than symbols (I don't know about "usually better"). The abbreviation "iff" is very commonly used, as well. --Tango (talk) 15:35, 22 March 2010 (UTC)


 * See iff. -- SGBailey (talk) 16:28, 22 March 2010 (UTC)


 * I'd say the symbol is OK, but you need to define it first: "Throughout this thesis I will use 'iff' to mean 'if, and only if,' ". StuRat (talk) 17:08, 22 March 2010 (UTC)


 * I think "iff" is sufficiently widely used not to need to be defined. --Tango (talk) 17:11, 22 March 2010 (UTC)


 * I agree with that -- definitely do not define the word iff; that would be somewhere on the continuum from silly to offensive.
 * But I don't actually agree with using it. Iff is a blackboard abbreviation; it's not the right linguistic register for a thesis.  That said, some good people do use it in print, but I still don't like it.
 * As for symbols versus words, it depends on whether you're writing prose or logical formulae. It's true that $$\Leftrightarrow$$ would be odd in prose, but it's just the right thing if the rest of the formula is in symbols.  (Well, that or $$\leftrightarrow$$.)
 * If you find that the repetition of if and only if gets tedious, a couple of synonyms that you could throw in for elegant variation are just in case or exactly when, or you could reword to use necessary and sufficient. --Trovatore (talk) 17:20, 22 March 2010 (UTC)


 * You're both assuming that the audience has a math or science background. This could be a thesis in another area, with just a bit of math or science thrown in. StuRat (talk) 17:32, 22 March 2010 (UTC)
 * That strikes me as unlikely. --Trovatore (talk) 17:58, 22 March 2010 (UTC)

It is a thesis for M. S. in Agricultural Economics. I have my logic explained in words and I want to also show it in a equation form.--160.36.39.157 (talk) 18:58, 22 March 2010 (UTC)

Optimal location of a bridge
Does anybody suggest me an algorithm for finding an optimal location of a bridge joining two sets of populated places on either side of a river? —Preceding unsigned comment added by Amrahs (talk • contribs) 15:23, 22 March 2010 (UTC)


 * For a optimisation problem you need a cost function (some way of comparing different places) and a set of constraints (things that must be satisfied by the place). What algorithm is best will depend on what form those things take. --Tango (talk) 15:38, 22 March 2010 (UTC)


 * Obvious criteria to include are: minimising sum of (journey length over bridge) * (Relative journey frequency) and minimising construction cost of bridge (eg build it over the narrowest part of the river) -- 16:26, 22 March 2010 (UTC)


 * This would get so complex, with so many variable weighting factors, that human judgment might be better than an algorithm. Other factors might be the depth of the river in various locations, ground quality (bedrock or swamp ?) at the adjacent land, what would need to be demolished to build the bridge (new skyscraper or old slum ?), environmental impact, attractiveness of the bridge in various locations, effect on river traffic, etc.  Something which they don't always consider, but should, is the ability to connect to existing highways.  In many places you must exit the highway, drive on local roads to the bridge, cross the bridge, then drive on local roads again to get to the highway on the other side.  And, if political reasons are included, like getting government money for your area that will otherwise go elsewhere, you can even justify a bridge to nowhere.  StuRat (talk) 17:15, 22 March 2010 (UTC)


 * The quest for optimum is not worth while to pursue. Comparing two options, if one of them is clearly inferior, discard it. If the difference is not clear, pick either one. Using a lot of efford in making an unimportant decision is not rational behaviour. Bo Jacoby (talk) 08:55, 23 March 2010 (UTC).


 * Seems to me that such a decision can be very consequential to a lot of people. -- Meni Rosenfeld (talk) 10:25, 23 March 2010 (UTC)


 * I do agree. You could even build several bridges crossing the same river. Bo Jacoby (talk) 14:02, 23 March 2010 (UTC).


 * If the river was a straight line, and you were solely interested in the shortest distance between town A and town B, then the obvious place to put it would be where the line from A to B crossed the river. Too obvious? 92.29.120.231 (talk) 15:30, 23 March 2010 (UTC)


 * That neglects all the other factors mentioned so far. I'd say many of those other factors are far more important than the bridge being equidistant from both cities.  For example, if you have swamps there ("precious wetlands", to environmentalists), and no roads, that may be a poor choice. StuRat (talk) 15:41, 23 March 2010 (UTC)


 * The bridge would only be equidistant if the towns or cities were the same perpendicular distance from the straight-line river. The OP did not give any criteria for assessing the merits of the bridge position. 92.29.120.231 (talk) 16:31, 23 March 2010 (UTC)


 * The way I understood the question, there are more than 2 towns. -- Meni Rosenfeld (talk) 17:12, 23 March 2010 (UTC)


 * If there were several cities on each side of the river, then the OP needs to define the road system interconnecting the cities. If roads cost nothing to build and you can have as many of them as you like, and there are no other considerations, then you just travel from one end of the river to another and at each point, sum the distances to each city. Choose the point that has the lowest sum of distances. 78.149.133.100 (talk) 21:52, 23 March 2010 (UTC)


 * Probably in reality you would connect the two road networks on either side of the river by joining the two closest points of them, since that would require building the least amount of new road. If this is for a game, then the roads could be constructed using Central place theory with some randomness added. 78.149.167.173 (talk) 21:39, 24 March 2010 (UTC)