Wikipedia:Reference desk/Archives/Mathematics/2006 November 10

= November 10 =

An algorithm for a problem in statistics
This is a mathematical problem with application to microeconomics. It assumes that the the purchase response (the number of people who buy an item) or "Demand" will decrease as the price increases. The logit price-response function, a very popular way of modeling this decrease, is as follows:

d(p) = C * e^−(a+b*p) / ( 1 + e^−(a+b*p) )

Here a, b, and C are parameters with C > 0 and b > 0. a can be either greater than or less than 0. Broadly speaking, C indicates the size of the overall market and b specifies price sensitivity. Larger values of b correspond to greater price sensitivity. The price-response curve is steepest in slope at the point −(a/b). The curve is a reverse S-shaped, or sigmoid, price-response curve.

There is a related curve called the willingness-to-pay distribution. Its formula is:

w(p) = K * e ^ −(a+b*p) / ( 1 + e ^ −(a+b*p) )^2

Logit willingness-to-pay follows a curve known as the logistic distribution. The logistic distribution is similar to the normal distribution, except it has somewhat "fatter tails"--that is, it approaches zero more slowly at very high and very low values. The highest point (mode) of the logistic willingness-to-pay distribution occurs at −(a/b), which is also the point at which the slope of the price-response function is steepest.

The Problem: You are provided with the response data for samples of the population at N price points, where N >= 2 and N <= 50. This response data tells you how many people purchased a product at that price point out of how many people considered purchasing the product at that price point. We will not assume that the only reason these people did not purchase is that the price was above their maximum willingness-to-pay; however, we may assume that those who purchased did so at or below their maximum willingness-to-pay. Thus, part of the problem may be (or this might be irrelevant) to determine what percentage of those who consider a product are actually "in the market" and what percentage are just browsers and, for whatever reason, wouldn't buy.

Example data would be (with N as low as 3): Price point $2: 120 of 840 purchased. Price point $5: 110 of 1000 purchased. Price point $10: 30 of 600 purchased.

Your answer should contain an algorithm for taking data in the format of the example data (with N price points where N >= 2 and N <= 50) and providing a best-approximation of the price-response function and willingness-to-pay distribution. You may write with pseudocode or with a specific programming language (Java or Haskell for example) and comments. Comments for clarity are important. Your input is an array with data like the example (price point $, resultant observed purchase response relative to those with an opportunity to purchase [x out of y people purchased]). Your output is at a minimum the values a and b. You may also provide C or K. For an outstanding answer, please also return a price P where the value of P*d(P) is at its maximum. —The preceding unsigned comment was added by Peter Kirby (talk • contribs) 04:33, 10 November 2006 (UTC).


 * Do your own homework. If you need help with a specific part or concept of your homework, feel free to ask, but please do not post entire homework questions and expect us to give you the answers. —David Eppstein 04:43, 10 November 2006 (UTC)


 * Peter, don't delete it when people tell you to do your own homework. Also, if you've done work on this problem, you need to show us what you've done so far. StuRat 05:09, 10 November 2006 (UTC)


 * It's not homework. I started from scratch on this. I found these formulas in microeconomics, and I've framed the problem. I will post again when I've found out more.
 * Does anyone have any comments on using a nonlinear least-squares fitting algorithm for this? (See the one in the GNU Scientific Library for example.) --Peter Kirby 05:22, 10 November 2006 (UTC)


 * If it's not homework, why did you dress it up in the form of a homework question? It seems like a curve fitting problem, but which is the curve that should be fitted to the data? As a mathematician, I don't understand the meaning of people being "in the market" in connection to the problem. When "120 of 840 purchased", could it be that 120 were "in the market" and 720 just browsing? As phrased, a major impediment is the difficulty of extracting the mathematical content. If you want us to think about this, please reframe it as a maths problem. --Lambiam Talk  07:01, 10 November 2006 (UTC)


 * Thank you for your feedback. It is so-dressed because it is cut-and-paste from my post of it to Google's Answers. I could have saved myself some trouble if I did more work up-front on dressing it up for Wikipedia's Reference Desk, which I now understand to be not quite the same forum in its format as Google Answers.
 * Okay, so, I did some pencil and paper work with the example data that I fabricated. I have often found that doing pencil and paper work on some particular example of the problem helps in arriving at a general solution. More to follow. --Peter Kirby 07:14, 10 November 2006 (UTC)


 * The curve to be fitted to the data is the price-response function d(p) = C * e^−(a+b*p) / ( 1 + e^−(a+b*p) ). d(p) is typically the total number of units that will sell at a given price p. However, I think it may be useful to use this formula with d(p) being constrained between 0 and 1, where d(p) is the percentage of potential consumers who actually buy at the price point p.
 * In the data that I fabricated, d(2) = 0.143, d(5) = 0.11, and d(10) = 0.05. The general formula is again d(p) = C * e^−(a+b*p) / ( 1 + e^−(a+b*p) ).
 * I have only some calculus and not a lot of statistics, so I decided to tackle this by choosing an arbitrary value of C and fitting a and b so that the curve goes through the first two points, d(2) and d(5). I chose C=0.16. As I understand it, C is basically the value of those who are not "just browsing" but who would be prompted to convert into buyers as the price is lowered. In this case it is 16% of the consumers being presented with the offer, on my arbitrary choice of C. Using some simple algebra, I find that a = −3.03 and b = 0.45. Plugging this in for the third point, d(10), the curve gives 0.03. This is not the same number as the data point, 0.05.
 * So I guess my problem is finding a best-fit curve when there are more than three points to consider. Please help me to clarify the problem any further by posing any other questions you may have about what I am asking. And thank you again for your feedback. --Peter Kirby 07:26, 10 November 2006 (UTC)


 * So, for a pure maths problem, we have: Find the values of a, b, C of the function d(p) = C * e^−(a+b*p) / ( 1 + e^−(a+b*p) ) that best fits the two or more points of d(p) provided. And I have to find a general method for this type of problem. What type of general method would you suggest? A friend said to look at "a nonlinear least-squares fitting algorithm," but I don't know if this is the best, and I don't completely understand it anyway from what I've read online. However, if it is the best approach, I'd put more effort into understanding it. What do you think is the best approach for curve fitting in this kind of problem? --Peter Kirby 07:49, 10 November 2006 (UTC)


 * I don't have much time now, but just a few remarks. Fitting data to the logistics curve is notoriously hard; the parameters are very sensitive to small changes in the data in some ranges (which is a backwards way of saying that the function values may change very little for big changes in the parameters, at least in some regions). Practically, I'd try to first use heuristic methods to find reasonable estimates for the parameters, and use a generalized Newton-Raphson method from there (where you use the Jacobian, but with safety fences against the method derailing). Depending on the shape of the data plot, the heuristic might be to discern a ceiling if possible (curve flattening out to the left) giving an estimate for C, or an inflection point (essentially giving estimates for all parameters), or a combination. Alternatively, use three data points spread out as much as possible and solve for C, a and b to obtain an estimate. As to the latter, putting A = e^a, the equation d = C * e^−(a+b*p) / (1 + e^−(a+b*p)) can be transformed to d * A = e^−(b*p) * (C−d). From three such equations you can eliminate A and C, leaving one equation in one unknown b. Solve that numerically by any of several methods, and solve two out of the three transformed equations for A and C using the value of b found. --Lambiam Talk  09:41, 10 November 2006 (UTC)


 * The first question to be addresses is "what do you mean by best fit?" Is summing the square distances from the fitted model to the datapoints going to give you a good fit. Is so then a least-squares approach is fine, if not you may need to consider weighted least squares or other forms of robust statistics. You many find you need a more sophisticated model which takes into account the variance of the data for different values of p.
 * Once you have your measure of fit, E(a,b,c) say, its a matter of finding an algorithm to minimise this function. A few plots of the curve with different values of the parameters will help in gauging how difficult a problem this is going to be. If things behave nicely then a Gradient descent approach will suffice, if not other curve fitting approaches may be necessary. --Salix alba (talk) 10:24, 10 November 2006 (UTC)

Would an article on logistic regression help? (Igny 18:39, 10 November 2006 (UTC))


 * Applying the method from that article results in the equation logit(1 − d/C) = a+b*p. So if a good estimate for C is known, this can be used to estimate a and b, with linear regression (and proper weighting). Letting a(C) and b(C) denote the values of a and b minimizing the measure of fit E(a,b,C) for this model, f(C) = E(a(C),b(C),C) depends only on C. My expectation is that f(C) is nicely continuous and assumes a unique minimum for C between 0 and infinity, which should make it possible to find the best value for C by walking in one direction with large strides (say deltaC = 1) until downwards turns into upwards, then reversing the direction and proceeding with a stride that is one fourth of the previous stride (deltaC := − deltaC/4) until downwards turns into upwards, and so on, until the stride size is small enough, say 10^−6. --Lambiam Talk  19:44, 10 November 2006 (UTC)


 * Lambiam, there is no need to invent a wheel. That is a modification of a  classic homework exercise on logistic regression. There are just two parameters in 1-dimensional logistic distribution, $$p(x)=.5b sech^2(a+bx)$$, remember p(x) is density. a and b are estimated by using the data $$p(x_1),...p(x_N)$$ (p=ratio of people bought the bricks at price x) by the regression, $$\ln(p/(1-p)=a+bp$$. (Igny 21:35, 10 November 2006 (UTC))


 * Well, as far as I can see the problem posed by Peter Kirby does not immediately fit this classic exercise, if only because it has three parameters and this "density" we should remember is not given as such. So what I did is (attempt to) show how the given problem may be massaged into a form where logistic regression can be applied. For the given example with 3 data points, I find a = −1.711793, b = 0.272348, and C = 0.187511. I don't see how you would go about to find these values in a substantially easier way than the one I described. --Lambiam Talk  23:32, 10 November 2006 (UTC)

Distribution of minimum of two random variables
A statistics question: say X and Y are independently distributed with an exponential distribution, with parameters λ1 and λ2 respectiveyl (as in, means of 1/λ1 and 1/λ2 respectively). Define Z = Min (X, Y). Then Z is distributed exponentially with parameter (λ1 + λ2). So the expected value of Z, E[Z] = 1/(λ1 + λ2). What I can't get my head around, is that E[Z|XY] = E[Z]. I know this is true - having shown it by integrating and all that. But I can't understand the logic.

What I am thinking, is that E[Z|XY] is effectively E[Y|X>Y]. So why are they the same? Clearly there's something wrong in my logic. Can anyone tell me where I've gone wrong? Thanks! --Sumple (Talk) 11:56, 10 November 2006 (UTC)


 * What makes you think you went wrong? More precisely, out of the expressions E[Z], E[Z|XY], E[X|XY], for which pair(s) is it strongly counterintuitive to you that they are equal?  --Lambiam Talk  13:15, 10 November 2006 (UTC)


 * E[Z|XY]. In particular, why does the X>Y and X<Y make no difference? --Sumple (Talk) 22:46, 10 November 2006 (UTC)


 * As you yourself explained, these are obviously the same as E[X|X> λ2. Then almost always X<Y, so E[X|X<Y] will be close to E[X], and, although smaller than E[X], not by not much. On the other hand, it will rarely be the case that Y<X, and when that happens, Y is really much smaller than on average; it is even smaller than X! (X could be exceptionally large, but that is much and much less likely than Y being exceptionally small). So the conditional expected value of Y will be somewhat less than E[X]. --Lambiam Talk  00:02, 11 November 2006 (UTC)


 * Yep that makes sense. Thanks Lambiam! I can sleep easy tonight. =D --Sumple (Talk) 10:44, 11 November 2006 (UTC)


 * This seems a very counterintuitive property indeed. Thanks for mentioning it.  &#x2013; b_jonas 14:14, 11 November 2006 (UTC)


 * Lambian has already given a perfectly good explanation, but let me try a different informal argument. I've always found it easiest to think of exponential distributions in terms of the waiting times of Poisson processes.  To refresh your (and other readers') memory, in a Poisson process with frequency λ, the waiting time from any given point in time until the next event is exponentially distributed, Δt ~ Exp(λ), with the expected waiting time being E[Δt] = 1/λ.  Now consider two independent Poisson processes with frequencies λA and λB.  Being independent, their combination is also a Poisson process, with events occurring at the combined frequency λA + λB and thus with waiting times Δt ~ Exp(λA + λB).  Since Poisson processes are memoryless, all you can tell about the combined process at any given time is that you expect to see an event after E[Δt] = 1 / (λA + λB), with the event being A with probability λA / (λA + λB) and B with probability λB / (λA + λB).  But because of the memorylessness, the same is true regardless of how long you've already been waiting.  Thus, you can't make any predictions about the arrival time or the type of the next event based on how long you've been waiting.  But this implies that the conditional waiting time (Δt | next event will be A) must equal (Δt | next event will be B), otherwise you could use the time you've been waiting so far to make additional predictions about the type of the next event beyond what you could when you started waiting.


 * For those who prefer a more concrete analogy, let me offer the "irregular bus service" — a bus line that is so badly out of schedule that the buses approximate a Poisson process, arriving at the bus stop completely at random, heedless of any timetables and independently of one another. Now assume there are two such bus lines, with line A buses passing the bus stop on average 5 times in an hour, and line B buses on average only once.  Now, since the buses arrive at random, you'll never know for sure when the next bus will come or which kind it'll be — but you can expect that you'll have to wait about 10 minutes on average, and that the first bus to arrive will be from line A about 5 times out of 6.  Now, since the buses arrive independently of one another, even when you spot a bus in the distance you still won't be able to say anything more about it than that it will be from line A about 5 times out of 6.  But this implies that, when the bus does arrive, the average time you had to wait for it must be the same regardless of which kind it turns out to be — since otherwise you could've predicted its type based on how long you had to wait for it.  —Ilmari Karonen (talk) 23:16, 11 November 2006 (UTC)


 * Thanks for this explanation. &#x2013; b_jonas 00:36, 12 November 2006 (UTC)


 * Thanks for that explanation. And wow, that bus line sounds just like my train line :D --Sumple (Talk) 06:56, 12 November 2006 (UTC)

Hyperbolic functions - construction
With the (circular) trigonometric functions, one starts with the unit circle, draws a line passing through the origin at an angle θ with the x axis, then the intersection of that line with the circle gives the cosine of θ in x and the sine of θ in y, and the tangent of θ can also be found easily. But with hyperbolic functions, one starts with a rectangular hyperbola. I plotted that hyperbola and also plotted the points (cosh(θ), sinh(θ)) and (1,tanh(θ)), and the line passing through the origin and (1,tanh(θ)). Obviously the equation of that line is y=arctan(tanh(θ)). But is there a curve that passes through the origin and the point (cosh(θ),sinh(θ)) which has a tangent at the origin that forms an angle θ with the x axis (not really any function, more like a rectangular hyperbola, an arc of circle or an exponential, that is increasing on that interval) ? And, also, how would one find the point (cosh(θ),sinh(θ)) with only the hyperbola (and the axes) drawn, on a graph (without any calculations, though I don't mind tough drawings (for example having to draw a certain hyperbola or exponential to get the intersection)) ? --Xedi 19:06, 10 November 2006 (UTC)


 * I'm not quite sure why you'd want to do that. If you take the curve parametrically represented by x(t) = ((1−t)cos(θ) + t2cosh(θ))/ (1−t+t2), y(t) = ((1−t)sin(θ) + t2sinh(θ))/ (1−t+t2), at t = 0 it passes through the origin (0, 0), and at t = 1 through the point (cosh(θ),sinh(θ)), when it meets the hyperbola x2 − y2 = 1. Near the origin y(t)/x(t) = tan(θ) + O(t2), so indeed the tangent exists and forms an angle θ with the x axis (also when cos(θ) = 0, although formally not covered by this quick analysis). So it meets your criteria, but I can't imagine that this is what you were looking for. As the magnitude of θ increases, the curve becomes a spiral, winding more and more often around the origin before reaching the point (cosh(θ),sinh(θ)). A curiosity is that as t tends to infinity, the curve returns to that same point in the limit. --Lambiam Talk  20:18, 10 November 2006 (UTC)


 * Well actually what I really wanted was how to get the point (cosh(θ),sinh(θ)) by drawing a curve that had a tangent at x=0 y=0 that forms an angle of θ with the x-axis; a sort of way to construct the point (cosh(θ),sinh(θ)) with the angle θ clearly visible. But then I don't even know what the real way to construct (cosh(θ),sinh(θ)) is with the hyperbola, graphically, and maybe θ does appear clearly ; anyway I would've liked a construction method.
 * And I don't see why the curve you gave the equation of passes by (0,0) at t=0 : I get x=cos(θ) and y=sin(θ)) at t=0. Anyway, nervermind that curve, I suppose you just made a little mistake, but how did you come up with that curve ?
 * Thanks. --Xedi 20:52, 10 November 2006 (UTC)
 * You're right, I mde a mistake. Unless I made a mistake again, this is what I should have written: x(t) = t(1−t2)cos(θ) + t2cosh(θ), y(t) = t(1−t2)sin(θ) + t2sinh(θ). --Lambiam Talk  22:19, 10 November 2006 (UTC)
 * I like it. Thanks. What method did you use to get this result ? --Xedi 23:17, 10 November 2006 (UTC)
 * Using vectors, a parametric representation of the line through points A and B is (1–t)A + tB. A simple line from the origin O = (0,0) to Z = (cosh(θ),sinh(θ)) is therefore (1–t)O + tZ = tZ. However, this goes straight from O to Z, instead of departing under an angle θ. Taking t as a time parameter, and thinking of a bug crawling from O at t = 0 to Z at t = 1, for small t we want it to move like B(t) = (t cos(θ),t sin(θ)). The next refinement is then (1–t)B(t) + tZ. This does not work; the tug towards Z is too abrupt. We can generalize the formula to (1–λ)B(t) + λZ, where λ is some function of t with λ(0) = 0 and λ(1) = 1. To get the angle right, we need that, for small t, λ << t. Putting λ = t2 does the trick. --Lambiam Talk  15:19, 12 November 2006 (UTC)


 * (after edit conflict)
 * Hi, Xedi, here is the explanation. First pay the special attention to the inverse functions' names.
 * Arc-functions are inverse to trigonometric functions—this reveals, that the base of trigonometric functions definition is not an angle, but rather an arc length. See my image in the Polish wiki pl:image:Miara lukowa.png (used in the pl:Miara kąta  'Angle measure'  article). Additional pictures pl:image:Funtryggeomcossec.png, pl:image:Funtryggeomsincsec.png and pl:image:Funtryggeomtgctg.png in pl:Funkcje trygonometryczne  'Trigonometric functions'.
 * On the other hand the inverse–hyperbolic functions have the Area prefix. That tells us the hyperbolic functions are defined by the means of the area, not arc (or angle). I have made some pictures for this subject some time ago, but did not find time to finish them and put them on wiki. Here is one of them — I hope it explains everything you need.
 * [[image:Funhipgeom.bad.png|center]]
 * --CiaPan 20:57, 10 November 2006 (UTC)


 * Well, thank you very much. Just one question remains, what's the standard construction method ? Obviously drawing a line to enclose a particular area between that line and : 0 (between 0 and 1) ; the hyperbola (between 1 and whatever) isn't as easy as arc length (especially as in the case of a circle arc length is so easy to get from the angle).
 * --Xedi 21:41, 10 November 2006 (UTC)


 * (A small correction to the explanation by CiaPan: the magenta cyan area should be a/2. For the trigonometric functions, the circle segment also has area a/2. Although not an angle for the hyperbolic case, this parameter is sometimes called the hyperbolic angle.) I'm afraid there is no "standard construction method". If you can draw the curve y = e^x – you said that you don't mind having to draw a certain exponential – then by intersecting with the line x = θ you have e^θ and thereby cosh(θ) and sinh(θ). But if you can draw y = e^x, why wouldn't you be able to draw y = cosh(x) directly? --Lambiam Talk  22:19, 10 November 2006 (UTC)


 * Yes, I had a look at the hyperbolic angle article and tried to find but I didn't really prove useful. For the "exponential" part, I admit it would be silly to want to find the point (cosh(θ),sinh(θ)) by drawing an exponential as it would be far quicker to plot cosh(x) directly, I was more wanting to know if there was a particular curve that would "take the place" of the line passing through the origin at an angle θ with the x axis in the case of the trigonometric functions (as well as wanting to know a relatively "simple" way of constructing the point (cosh(θ),sinh(θ)) given the hyperbola). Anyway, thanks you very much.
 * (By the way, I just did an animation about this, I hope it can be useful (maybe it lacks some information on the image itself like what the equations of the curves are, obviously you are free to add that information if necessary) : http://en.wikipedia.org/wiki/Image:HyperbolicAnimation.gif (1.1Mo))
 * --Xedi 22:38, 10 November 2006 (UTC)


 * Oh and it's cyan and not magenta ! :) --Xedi 22:50, 10 November 2006 (UTC)
 * Yes, apparently I'm CMY colour blind :) --Lambiam Talk  23:36, 10 November 2006 (UTC)


 * Of course, Lambiam is right. Here is the corrected image:
 * Hyperbolic functions.svg
 * CiaPan 20:03, 13 November 2006 (UTC)


 * It's also possible to construct a hyperbola without using any hyperbolic functions... the locus of points where the difference in the distance to two fixed points is constant is a hyperbola. Once you've constructued your hyperbola, you can the read of &theta; via the methods above. Tompw 12:03, 14 November 2006 (UTC)