Wikipedia:Reference desk/Archives/Mathematics/2008 June 27

= June 27 =

Optimization
Hello. A factory wants to make the cheapest 250 cm3 right cone. Plastic at 0.03¢/cm2 must cover the base. Waffle at 0.05¢/cm2 must cover the lateral area. I read the article titled optimization (mathematics). I recently learned linear programming. A system of equations would have several points of intersection forming a polygon. These vertices are the optimal values. One of those is the minimum value. The volume formula is $$h = \frac{750}{\pi r^2}$$ where r is the radius in cm (r > 0) and h is the height in cm perpendicular to the base (h > 0). The limits are that r and h near but never equal 0. The volume equation on a Cartesian coordinate plane exists within the first quadrant. The r- and h-axes will not form any points of intersection with the volume formula. Do I need at least three equations and/or inequalities, other than the cost equation and the two inequalities aforementioned, intersecting at three distinct points to substitute the x- and y-values of the optimal values into the cost equation: C = [$$\pi$$r(0.03s + 0.05r)]¢ where C is the cost of the cone in cents and s is the slant height in cm? By trial-and-error, the radius is about 3.93 cm and the cost 8.333 cents. Thanks in advance. --Mayfare (talk) 01:08, 27 June 2008 (UTC)


 * This is not a linear programming problem because the volume of a cone is a nonlinear function of its radius, so the constraint that the cone have a volume of 250 cm3 is nonlinear. Here's how I would approach it: The family of all cones is parameterized by two real numbers (for example, radius and height). But the constraint that the volume be 250 cm3 removes one degree of freedom, so you can parameterize the family of "all cones with volume 250 cm3" by a single real number. Now simply express the cost in terms of this one parameter, and optimize it with standard single-variable calculus. Good luck! —Keenan Pepper 01:59, 27 June 2008 (UTC)


 * Note however, that you can take the base surface as a variable in place of the base radius. Then the base cost becomes linear to the variable. --CiaPan (talk) 05:20, 27 June 2008 (UTC)


 * I was thinking the same thing (volume as bh/3) until I realized that cost is also a function of the lateral area, which is still a function of the radius. --Prestidigitator (talk) 18:49, 27 June 2008 (UTC)


 * Oops. You're right. --CiaPan (talk) 05:24, 30 June 2008 (UTC)

Hektar (ha) conversions?
The newly created Tiergarten Nürnberg article uses what I'm guessing is a German method of measurement, Hektar (ha). I have no idea how to convert this to square miles/meters, me being an American and an English major (i.e. horrible at math) to boot, so could someone enlighten me? María ( habla con migo ) 13:36, 27 June 2008 (UTC)
 * I'm guessing it's a German spelling of Hectare. shoy 13:56, 27 June 2008 (UTC)
 * Great, good to know. Is there a wiki conversion template that automatically converts ha to mi/km within the text?  María ( habla  con migo ) 14:11, 27 June 2008 (UTC)
 * You can use Google to help with the conversion. AndrewWTaylor (talk) 14:49, 27 June 2008 (UTC)
 * There is a wiki conversion template, called Template:Convert. For hectare to square miles, you could use x ha where x is whatever number in hectares. The template is very versatile, you can convert many different units with it.  J kasd  16:29, 27 June 2008 (UTC)
 * Thank you so much, that's just what I needed. :) María ( habla con migo ) 18:48, 27 June 2008 (UTC)

Estimating Median in finite space
Is there a method for producing a good estimate of the median in finite space?

For example I have 10,000 numbers, would it be possible to iteratively take these numbers in 1,000 chunks and estimate the median?

These chunks can be sorted and computed, but on the next chunk the previous values are not available, but any amount of running state values can be kept.

For background these restrictions of a software platform, which has storage capacity of 8000 bytes (about 2000 integers) and need to calculate the median of large data sets of more than 10,000 integers. The additional restriction is that there is only a single pass of the data.

Oh, and distribution characteristics of the data are unknown. Thanks 62.24.129.68 (talk) 15:13, 27 June 2008 (UTC)


 * Some information and ideas can be found at Selection algorithm; it doesn't address your question, though.  Suppose that you want to find the median of n integers, and you can store m integers in memory.  Here are some suggestions:


 * You could keep track of the m-middlemost elements seen so far. However, if a lot of larger numbers appeared all at once, for example, then you begin to lose information:  some of the m-middlemost elements seen so far become "unknown".  After all n numbers have been read, you either know the value of the median for sure, or you have no idea what the median is.  If you know that the input integers are statistically independent (or if you can choose which order you read the integers, in which case you would want to read them in random order), then we could explicitly calculate the probability that you successfully calculate the median.  Although I don't know how to calculate that, I would guess that at n = 2000 and m = 10000, the probability of failure would be negligible.


 * Fix any real number k > 1; you could keep track of the km-middlemost elements seen so far, except space them out by k at a time.  For example, if m = 4 and k = 3, then the km-middlemost elements might at some point in time be:
 * 4, unknown, unknown, 4, unknown, unknown, 8, unknown, unknown, 9, unknown, unknown.
 * Then proceed as with the previous method. This allows you to "fake" having a larger memory of km integers, even though you only have a true memory of m elements, at the cost of some inaccuracy of the result:  you would only know that the median lies in some range, e.g., above we know that the median lies in the range 4 to 8.  Thus, compared to the previous technique, you get a much higher probability of success (since you have a much larger "memory") but with possibly less accuracy in the answer.  Note that if your distribution has a nice peak near the median (e.g., is Gaussian), then the range which you known the median lies in would likely be very narrow.


 * If you know the value of n ahead of time, you could set k = n / (2m) and follow the second method suggested above. Then your fake memory of km elements is large enough to guarantee that you will successfully find the (range containing the) median regardless of the order the elements come in.


 * Hopefully some of these ideas prove useful. Eric.  144.32.89.104 (talk) 16:28, 27 June 2008 (UTC)


 * Thanks,
 * The number of elements is unknown to the function (it could be between 2 and a million), just to restate a bit more clearly. I was really hoping for a function, such that
 * $$median(S) = M(m0(S_1), m0(S_2), m0(S_3) \dots m0(S_n))$$
 * Where S1…Sn can be in any order.
 * Here is my interpretation inspired by your suggestions (I think) without knowing $n$.
 * We have an array $$S$$ with a maximum size of 1000 and start adding numbers to the array until I have reached 1000.
 * Once we have reached 1000, we shrink the array to 500, by sorting, then taking 2 elements at a turn and calculating the median. i.e. $$P(j) = (S(i) + S(i+1))/2$$. The median of P should be approximately equal to that of S
 * We can then start adding another 500 numbers.
 * The problem with this “adaptive” version is that the overall shrinking will have to be maintained. So in 1…500 there are really 1,000 (1:2) elements in 501..1000 there are 500 (1:1). To make sure the array always ends up smaller, we will have to shrink 1.500 to 250 (1:4 ) and 501…1000 to 125 (1:4). The final array becomes 1:4.
 * I shall attempt this and see what the results are like. Trying to find some test data with a variety of statistical distributions will be the hard part though :) … 78.148.177.85 (talk) 18:25, 27 June 2008 (UTC)
 * One quick questio though, suppose I had the array shrunk to 1:2, then there 3 elements in the final batch (8,9,10), what two values would I put in? (8+9)/2 and ((8+9)+10)/2 maybe.. ? 78.148.177.85 (talk) 18:36, 27 June 2008 (UTC)
 * I don’t know if this addresses what you want to actually do, but if you allow yourself to access the numbers multiple times, you could find the median in I think O(n^2 log n) time, only ever storing m at a time: by finding upper and lower bounds on the median m/2 numbers at a time, a total of n/m times. GromXXVII (talk) 20:11, 27 June 2008 (UTC)
 * Sounds similar to this which I found while searching.

These are the full constraints btw, (real world problem. not just made up for a quiz or something) :
 * 1) You are handed the numbers one at a time
 * 2) You may store as much state information as you like while processing rows. BUT…
 * 3) At any time (in reality about every 4000 rows) you can and will be asked to save all the state information in at most 8000 bytes.
 * 4) You will then be asked to restore from the 8000 bytes, and continue processing rows.
 * 5) Finally you will be asked to provide the result using the 8000 bytes of state.
 * 6) (Additionally you may be asked to merge your state with that of another state information and finally provide a result as if you had seen the same numbers)


 * I have performed some tests on the "shrinking array" version that was inspired by the previous posters response, and it gives reasonable approximations, except for the case of an even numbered set as I mentioned 78.148.177.85 (talk) 20:56, 27 June 2008 (UTC)


 * Here is an example of what I mean by the algorithm I loosely described in the first bullet. Suppose m = 3 and n = 9, and the numbers are:
 * 1, 7, 6, 3, 4, 9, 8, 2, 5
 * At each point in time, we keep track of 3 numbers, the middle-most numbers that we have seen so far. This is the numbers held in memory after each successive number is read in:


 * 1) Read the number 1:  1, none, none
 * 2) Read the number 7:  1, 7, none
 * 3) Read the number 6:  1, 6, 7
 * 4) Read the number 3:  3, 6, 7
 * 5) Read the number 4:  3, 4, 6
 * 6) Read the number 9:  4, 6, unknown
 * 7) Read the number 8:  4, 6, unknown
 * 8) Read the number 2:  4, 6, unknown
 * 9) Read the number 5:  4, 5, 6
 * This is an example of a successful run: the median is known to be exactly 5, with no error.


 * I'll explain where the "unknown" comes in, above. When the number 9 is read, the 3 middle-most numbers are 4, 6, 7.  However, the program cannot know this:  the number 7 had been read in earlier, and was "pushed" off the end of the array, and thus forgotten, since the program does not have enough memory to remember all of the numbers seen so far.  The best it can do it mark that spot as "unknown".


 * If the numbers tend to come in increasing or decreasing order, or if they come in long runs of high numbers and long runs of low numbers, then this increases the chance of the algorithm failing. Here is an example:
 * 1, 4, 3, 6, 2, 9, 8, 7, 5


 * 1) Read the number 1:  1, none, none
 * 2) Read the number 4:  1, 4, none
 * 3) Read the number 3:  1, 3, 4
 * 4) Read the number 6:  3, 4, 6
 * 5) Read the number 2:  unknown, 3, 4
 * 6) Read the number 9:  3, 4, unknown
 * 7) Read the number 8:  3, 4, unknown
 * 8) Read the number 7:  4, unknown, unknown
 * 9) Read the number 5:  4, unknown, unknown
 * This is an example of a failed run: the median is not known (although we know that it is at least 4, a small consolation).


 * I don't know how critical your application is, or whether occasional but unlikely failures are acceptable. This algorithm will always give the exact answer when it succeeds, but not always succeed.  By using a large value for the k I described, you can increase the chance of success at the cost of approximation in the answer.  If failure is unacceptable, then some further tweaking can produce something that will never fail, but simply give a rather large error in bad corner cases.


 * I'm glad now that my original explanation was so vague because I like the algorithm that you suggested in response. Eric.  86.153.207.223 (talk) 01:11, 28 June 2008 (UTC)


 * So basically, if you can read the numbers in sequence multiple times, it's possible to do an exact computation in few passes. If you can only read them once (one-pass), I think you could take a random sample of so many numbers that fit into your memory, and compute the median of those numbers.  &#x2013; b_jonas 13:24, 30 June 2008 (UTC)

Compact closed intervals in the order topology
There was a recent dispute about compactness in order topologies, and the following question was brought up: Is there a totally ordered space with a least and a greatest element, and with the least upper bound property, but that is not compact?

For finite unions of intervals in the reals it seems that such a space must be compact, since you can only omit the lower endpoint of an interval, but I think [0,1] union (2,3] is homeomorphic to [0,2] in the order topology, though not in the subspace topology, right? JackSchmidt (talk) 18:16, 27 June 2008 (UTC)
 * Yes, there's an obvious order-preserving bijection. For the original question, doesn't the usual least-upper-bound proof of (one-dimensional) Heine-Borel show that any complete bounded total order is compact in the order topology? Algebraist 18:23, 27 June 2008 (UTC)
 * Cool, thanks. I've actually had virtually no use for topology for years, and haven't looked at Heine–Borel for over 10 years, so I wanted to make sure I had everything in place.  I mean, earlier today I didn't think [0] U (1,2] was compact (which it isn't in the subspace dangit!), so I don't want to depend on my recollection.  If you want to state the compactness result clearly/citably (again), there is a fact tag at compact space that could use fixing. JackSchmidt (talk) 18:34, 27 June 2008 (UTC)
 * I've changed the wording a bit and cited it to Counterexamples. (my wording is equivalent since the order topology on an interval in an ordered space is the same as the subspace topology) Algebraist 18:51, 27 June 2008 (UTC)
 * Looks good, thanks again! I think leaving out the "closed interval" wording makes it much easier to understand. I managed to remove most of it by the time I posted here, but it was still lingering. JackSchmidt (talk) 19:21, 27 June 2008 (UTC)

Cartesian from Distance/Angle/Dihedral
Given 3 points in 3D-space (A, B & C, all with known Cartesian coordinates) is there a general formula for finding the Cartesian coordinates for a fourth point (D) given the distance CD, the angle BCD, and the dihedral angle for A->B->C->D (valued -π to π, as in dihedral angle)? I tried to use a computer algebra system to back-calculate it from the distance formula, the dot product formula, and the formula found under dihedral angle, but that results in two points. Unfortunately, for my purposes I'm looking at symbolic forms (e.g. I don't know if the dihedral is positive or negative), so procedural-type comparisons to narrow down which point is correct do not work too well. -- 128.104.112.147 (talk) 19:53, 27 June 2008 (UTC)
 * I get up to given the distance CD, the angle BCD - which gives me a circle of points - next I expect you have a dihedral angle to further define D.
 * I think two points is right.. they are diametrically opposed right?
 * To define it down to a single point I imagine you would need to specificy - the dist CD, angle BCD and define a direction of rotation about BC (eg with the vector BA being at angle 0) about BC - that way you should only get one point.87.102.86.73 (talk) 20:06, 27 June 2008 (UTC)


 * In other words if you take your line CD to be a vector AvBC + Bvvector at right angles to BC (A,B are scalars) then the two points you are getting will be due to +B and -B (I think) - this is because the dot product formula gives the same angle for cos(angleBCD) for both cases.


 * If you wan't to go through the maths involved in getting an equation for a single point I can certainly do that for you..87.102.86.73 (talk) 21:16, 27 June 2008 (UTC)


 * ((One alternative is to let the dihedral angle only range 'pi' in valaue ie 0 to pi. —Preceding unsigned comment added by 87.102.86.73 (talk) 21:36, 27 June 2008 (UTC) ))


 * My understanding of the set-up is that the sign of the dihedral is what is specifying the direction of the dihedral rotation. The intersection of the sphere of distance and the cone of angle specify a circle of points in a plane perpendicular to the line BC. The dihedral angle then specifies where on that circle the point is - the full 2π range is needed to uniquely specify the complete circle. I think you are correct that I'm losing the distinction between the two halves because the tangent of the dihedral is not uniquely valued over the full -π to π range. I'm not sure how to fix that without resorting to an if-then-else construct, though. -- 128.104.112.147 (talk) 23:55, 27 June 2008 (UTC)
 * If you try an alternative method eg
 * 1. find vector orthogonal to AB and BC call this v(y) (using cross product)
 * 2. find vector orthogonal to v(y) and BC call this v(z)
 * The v(y),v(z), and BC form a set of right angled axis.
 * 3. Now make a vector Q1 along CB of length |CD|
 * 4. Rotate this vector Q1 about C using axis v(y) (ie in the plane given by v(z) and BC) by the angle BCD - call the result Q2
 * 5. Now rotate Q2 about axis BC (in the plane given by v(z) and v(y) ) by the dihedral angle (here is where you must specify the correct direction of rotation)
 * 6. The resultant vector is Q3
 * 7. The point C+Q3 = D
 * that avoids any ifs - would that be ok/make sense ? (note it might be posssible to simplify this proceedure in terms of length - but I've split it into all the steps for simplicity)87.102.86.73 (talk) 09:48, 28 June 2008 (UTC)


 * Thanks. I think this will work for me. -- 128.104.112.147 (talk) 18:01, 1 July 2008 (UTC)
 * No problem, you're welcome.87.102.86.73 (talk) 13:13, 2 July 2008 (UTC)

Question on percentiles
According to the percentile rank article, being in the 85th percentile means a score above 85% of the other statistics in the data range. But, what does that mean on the other end? I'll give an example:

I am 6'6" tall, and a website claims that I am in the 99.8th percentile. Now, I know that means I'm taller than 99.8% of the sample group (I think it was American males), but how many am I shorter than?  0.2%?  0.1%?  If I'm in a room with 1,000 people who reflect the results of the data, I am shorter than how mant of those people?  Thanks. 70.105.164.43 (talk) 19:57, 27 June 2008 (UTC)


 * By convention, the percentile is always rounded down at the last digit shown, even if the next decimal place (which is not shown) is greater than 5. Thus, being in the 99.8th percentile means that you are taller than 99.8% to 99.9% of the population and shorter than 0.1% to 0.2% of the population. If you went to full precision on percentiles, adding up the percent above you and the percent below you would not equal 100% since, technically, you yourself are not counted as being above or below. 76.224.121.13 (talk) 08:36, 28 June 2008 (UTC)
 * Interesting; I was not aware of this. --Proficient (talk) 10:02, 30 June 2008 (UTC)


 * I wasn't aware of the convention that percentages are rounded down. It makes sense so as to avoid (for example) rounding 99.8% to 100% and saying that the person is taller than 100% of the sample (when clearly this is false). By extension, it seems reasonable that this rounding down would only apply to percentages greater than 50%, and that percentages less than 50% should be rounded *up* so as to avoid (for example) rounding 0.1% to 0%. Is this the case? Wikiant (talk) 11:32, 30 June 2008 (UTC)


 * Perhaps some statistics programs use a slightly different definition for percentile, but as far as I know the percentile rank is always rounded down so that being in the kth percentile always entails being above at least k percent of the sample group. 76.217.110.126 (talk) 06:00, 1 July 2008 (UTC)