Wikipedia:Reference desk/Archives/Mathematics/2009 September 14

= September 14 =

chess variables and states
Is it more proper to think of the game of chess as having 32 variables with 65 states (64 in-play states and 1 out-of-play state) or as having 64 variables with 33 states (32 occupied and 1 unoccupied state)? -- Taxa  (talk) 02:19, 14 September 2009 (UTC)
 * There isn't one objectively proper way to think about it. It's just a question of what the most useful scheme is for what you happen to be doing with the information.  Is there a particular problem you're trying to solve? Rckrone (talk) 02:31, 14 September 2009 (UTC)
 * I'm just trying to figure the best way to think of it. I mean if you do the same for tic-tac-toe then it would seem the positions (squares) always remain while the players vary so positions as variables seems better. -- Taxa  (talk) 03:04, 14 September 2009 (UTC)
 * Okay, if you are not going to buy that then how about the idea that there are programs that can reduce many-valued logic equations to minimum form. If you had a computer big and fast enough to assign it the task of reducing all known winning games to minimum form then you might come up with a way to win at chess all the time. -- Taxa  (talk) 03:33, 14 September 2009 (UTC)
 * Assuming one can win all the time. A "best" strategy, when used by both players, may always result in a draw.  I don't know.  But, what constitutes a minimum form?  One uses different parameterizations to solve different problems.  And sometimes the most concise form isn't the easiest to solve.  As for chess, well, I don't know, and I'm pretty sure nobody does.--Leon (talk) 08:49, 14 September 2009 (UTC)


 * Most serious chess programs these days use bitboards. 67.122.211.205 (talk) 04:30, 14 September 2009 (UTC)utnot
 * The reason seems to be speed and memory due to the compact nature of the data format rather than the accomodation of a superior method of play. -- Taxa  (talk) 05:28, 14 September 2009 (UTC)
 * Neither of your methods are proper, because neither actually work. You seem to be ignoring promotions, the details of the rules of castling, draw by repetition, and the fifty move rule. Not to mention whose turn it is. Algebraist 13:56, 14 September 2009 (UTC)
 * To expand on Algebraist's point, you can use:
 * 65 variables with 13 possible states each (since there are only 12 distinct chesspieces)
 * 48 variables, corresponding to the 32 starting pieces and upto 16 (?) promoted queens, each with upto 65 possible states.
 * Maintain a variable length list of live pieces and their position on the board.
 * Use bitboards
 * Of course, there are many more possibilities. All of them carry exactly the same information, so none are inately "more proper" than the other. However depending upon the specific application, hardware architecture, performance metric of interest etc one may be preferred over the other. Abecedare (talk) 15:57, 14 September 2009 (UTC)


 * This can be analysed from a computer science perspective: 32 variables with 65 states is ugly since one needs to check for the out-of-play state everytime the value is used. Also, this representation provides too many ways to represent a single position, since the n pawns, etc can be interchanges in factorial(n) ways. So, I prefer the 64 variables with 33 states representation, but I would change it to 7 states: (king, queen, rook, bishop, knight, pawn, empty), to minimize redundancy. Thought this representation also has redundancy (in the sense that there cannot be more than one king), this is more elegant from a programming perspective. Also, you need to store ability to en-passant, castle king-side, castle queen-side, and maybe even time controls, etc. --Masatran (talk) 07:55, 17 September 2009 (UTC)


 * Also note that some pieces can occupy any square (the kings, queens, rooks, and knights), but others can only occupy a subset of the squares (32 possible squares for each bishop and 48 for each pawn). StuRat (talk) 03:08, 20 September 2009 (UTC)

winning games
Is there a list of winning games for tic-tac-toe? -- Taxa  (talk) 05:29, 14 September 2009 (UTC)
 * If you mean winning strategies then no, the best strategy on both sides leads to a draw (see tic-tac-toe). If you mean configurations of X's and O's where there are three in a row, why would someone make or need such a list?--RDBury (talk) 06:48, 14 September 2009 (UTC)


 * Maybe the question is about winning sequences, e.g. O in 5, X in 2, O in 7, X in 3, O in 1, X in 4, O in 9 to win. It would be easy enough to list these, particularly if reflections etc are removed→86.148.186.231 (talk) 09:09, 14 September 2009 (UTC)
 * Easy, but pretty pointless. --Tango (talk) 13:47, 14 September 2009 (UTC)
 * I remember making such a list during a boring history high-school lesson once. The game mentioned above is the only "interesting" one: all others either result in a draw or require the loosing party to make an obvious blunder.195.128.251.10 (talk) 22:46, 14 September 2009 (UTC)
 * Yes, I've done similar things during boring lessons! As you say, most of the games are not interesting, that is why making the list is pointless. --Tango (talk) 23:22, 14 September 2009 (UTC)

Why so many 20ps
I always have a mixture of loose change in my purse, but at any time, about half of them are 20p pieces, even though there is a mixture of all eight common UK coins. Is there a mathematical reason why I always end up with so many 20ps?


 * I doubt it. I can think of two explanations which are quite likely - you might regularly buy things that result in you getting 20p coins as change, or you might have a tendency not to spend 20p coins. You might well choose to make 60p by 50p+10p rather than 3x20p, for example. There might be something in the most efficient way to make up various common sums not involving 20ps, but that would be different to work out since we would need to know the distribution of prices. --Tango (talk) 13:55, 14 September 2009 (UTC)


 * Let's assume prices are uniformly distributed between whole pound amounts, (do people typically use a bill for 1 pound or a coin?) and that people always make change by starting with the highest denomination they can (that's how it's usually done in the US at least). Then on average you'll get 0.4 1ps, 0.8 2ps, 0.5 5ps, 0.4 10ps, 0.8 20ps and 0.5 50ps each time you get change.  By that measure 2p and 20p coins would be a bit more frequent than the rest.  If the UK has sales tax that gets added on top of the listed price, and you tend to be making purchases of over a few pounds, then these might be reasonable assumptions.  If not, then there might be some weird patterns in the distribution. Rckrone (talk) 16:01, 14 September 2009 (UTC)


 * In addition, Gresham's law may also be at play. For example, if you are a typical consumer you probably don't like to be weighted down by too many 1-2 penny coins. Thus if you purchase something worth (say) 32 pence, you probably hand a pound note and 2 pennies, and accept back 70 pence ( 2 coins) in change, rather than take 68 pence (minimum of 6 coins) back. Such behaviour will similarly skew the distribution of coins in your purse. Abecedare (talk) 16:11, 14 September 2009 (UTC)


 * Just to correct you both on a cultural, rather than mathematical note, there are no pound notes in England and Wales, only pound coins (and we wouldn't call them "bills" if there were ;)). I say England and Wales, because the Royal Bank of Scotland still issues pound notes, although I believe they are increasingly rare.
 * Also, the UK has value added tax, rather than straight sales tax, and retailers are generally required to display the "with VAT" price as the main price, so reverse engineer that to be a round number - £x.00, £x.50, £x.99, £x.95 etc. On the other hand, adding up those 99s and 95s tends to produce all sorts of prices for a whole basket of goods anyway. - IMSoP (talk) 23:50, 18 September 2009 (UTC)

Geometry question
I have a triangle ABC and a smaller one DEF inside it. ADF, CFE and BED are straight lines. If it is known that CF = xFE, BE = yED and AD = zDF and the area of triangle is a m2, is there a quick way to find the area of triangle ABC in general? Professor M. Fiendish, Esq. 13:20, 14 September 2009 (UTC)
 * Yes. Here's a hint: To find the area of triangle ADB in terms of the area of DEF, utilize the formula Δ = (1/2)ab*sinC and remember that sinθ = sin(180° - θ). Rckrone (talk) 15:32, 14 September 2009 (UTC)

integral approximation name
The name of this method escapes me and searching Wikipedia and Googling are not helping. What is the name of the process of using little rectangles under a function to estimate the integral? -- k a i n a w &trade; 14:42, 14 September 2009 (UTC)


 * Wikipedia calls it the rectangle method. Gandalf61 (talk) 14:49, 14 September 2009 (UTC)


 * Thanks. -- k a i n a w &trade; 15:54, 14 September 2009 (UTC)


 * Specifically, it's that version of the rectangle method where you calculate a lower Darboux sum. Algebraist 17:57, 14 September 2009 (UTC)

Probabilities of unobserved observations
I'm writing a hidden Markov model to calculate the probability of utterances (the probability of the given fragment out of all fragments with the same number of words). To do this, I need to assign each word in the utterance a probability (the probability that a word taken at random from English text is this word). The naive way is to say that if the word appears k times in my n-word corpus, then the probability is k/n, but this ignores the fact that my corpus is just a sample of English text (and thus rare words may not appear in it at all, etc.). Is there a more accurate way to work out this probability? --196.210.182.23 (talk) 15:46, 14 September 2009 (UTC)
 * Your sample is the only window you have into what English text looks like, so there's not anyway for your program to tell that "normalizable" is a word and "asljdasjksgs" isn't if neither appear in the sample. In stuff I've done with Markov models, the probability is taken to be (k+1)/n to reflect the fact that things you haven't seen are still possible.  I don't know how standard that is though. Rckrone (talk) 16:22, 14 September 2009 (UTC)
 * Shouldn't honest-to-god probabilities sum to 1? Also, do you really assign probability 1/n to "asljdasjksgs" on the grounds of its having k = 0? — Emil J. 16:33, 14 September 2009 (UTC)
 * On the first point, you'd normalize them, so I guess it would really be (k+1)/(n+a) where a is the number of possible outcomes. On the second point, you're right this method doesn't work at all for what the OP is doing.  The case where I used it modeled the probability of the next character, so a was only about 35 and a<<n.  But there are too many words.  Maybe someone else has a better method...  Rckrone (talk) 17:19, 14 September 2009 (UTC)
 * That is additive smoothing and I think it's very common in practice, even if not necessarily optimal. 70.90.174.101 (talk) 00:07, 15 September 2009 (UTC)
 * I think this should be viewed in a Bayesian context. Assign a prior distribution to the frequency of each "word" (beta with parameters which depend on the complexity of the word works nicely) and update it based on your empirical data (corpus). The dependency between the frequencies can probably be neglected. -- Meni Rosenfeld (talk) 19:59, 14 September 2009 (UTC)
 * Of course, the same result can be achieved without considering this mechanism. Just replace "+1" with "+ a term which depends on the complexity". -- Meni Rosenfeld (talk) 20:16, 14 September 2009 (UTC)
 * That is additive smoothing and I think it's very common in practice, even if not necessarily optimal. 70.90.174.101 (talk) 00:07, 15 September 2009 (UTC)
 * Assuming Zipf's law you can do a maximum likelihood estimate which would adjust the very low probability ones a little, though you'll still have to assume a maximum number of words, and I'm not sure what it would come out as. I'd be a bit surprised if this really is the sort of thing you want. Dmcq (talk) 22:20, 14 September 2009 (UTC)
 * The Zipf's law article shows the frequencies dropping after 10000 words to something near 1/x^2 so you do get a proper density function for a prior distribution even with an infinite number of words including things like Zipfian. I guess the Zipf-Mandelbrot law it refers to is about getting that extra accuracy. The whole phrases should probably also follow a Zipf law, it would be interesting to see how one could ensure that and keep the Zipf law for the individual words. Dmcq (talk) 09:37, 15 September 2009 (UTC)
 * If you do not observe something then your unbiased estimator of its frequency is 0, which sometimes isn't very helpful, as you're realizing. Can you incorporate 95% confidence intervals into your model? There is a law we refer to in medicine which is that if you observe n patients and event X doesn't happen, the 95% confidence interval for the frequency of X is 0 to 3/n. Perhaps you can extend the statistical approach to calculate 95% confidence intervals through your Markov model. AFAIK the 3/n rule was first explained to medical audiences in, although it's pedestrian non parametric statistics. An article that is free to access is here: I hope this helps. RupertMillard (Talk) 10:21, 15 September 2009 (UTC)


 * The 3/n rule arises from the apparent fact that, as n increases, so (0.05)^(1/n) approaches 1 - 3/n. Can someone explain how this limit arises?→81.131.163.227 (talk) 20:57, 15 September 2009 (UTC)
 * $$0.05^{1/n} = e^{\ln 0.05 / n} \approx e^{-3/n} \approx 1-\frac3n$$
 * Where the last estimate follows from the Taylor series of $$e^x$$. -- Meni Rosenfeld (talk) 05:01, 16 September 2009 (UTC)

The computation of the mean value ± standard deviation of a sample, based on data of the population, is called deduction. The computation of the mean value ± standard deviation of the population, based on data of a sample, is called induction. Using the J (programming language), the deduction formula is this: deduc =. (* (% +/)) ([ ,: %:@*) (*/@}: % {:)@(-.@(([, ] ,: 1:) % +/@])) Using this formula, the estimated mean values ± standard deviations of a sample of 1 ticket taken from a population of 1 red and 1 green ticket is   1 deduc 1 1 0.5 0.5 0.5 0.5 showing that the sample contains 0.5±0.5 red and 0.5±0.5 green tickets.

A sample of 6 tickets taken from a population of 6 red and 12 green and 18 blue tickets contains 1±0.845 red and 2±1.07 green and 3±1.13 blue tickets: 6 deduc 6 12 18 1   2    3 0.845 1.07 1.13

The induction problem is more useful, because usually an uninteresting sample is known, but the interesting population is unknown.

The induction formula, using the deduction formula above, is: induc =. *&_1 1@(+&1 0)@(-@>:@[ deduc~ -@(+ #)~) The simplest example is taking no sample from a population containing one ticket which is either red or green 0 0 induc 1 0.5 0.5 0.5 0.5 showing that the population contains 0.5±0.5 red and 0.5±0.5 green tickets.

Taking no sample from a population containing one ticket which is either red or green or blue: 0 0 0 induc 1 0.333 0.333 0.333 0.471 0.471 0.471 showing that the population contains 0.333±0.471 red and 0.333±0.471 green and 0.333±0.471 blue tickets.

The formula gives what you would expect in the simplest cases. A more complicated example which completely answers the question: 'Is there a more accurate way to work out this probability? '   corpus =. 55 40 29 18 1 0 0   # corpus NB. number of different words in the corpus 7   +/corpus NB. number of words in the corpus 143   (%~ corpus&induc)100000 NB. probabilities of unobserved observations, ± standard deviation 0.37 0.27   0.2  0.13  0.013 0.0067 0.0067 0.039 0.036 0.033 0.027 0.0093 0.0066 0.0066 See also [].

Bo Jacoby (talk) 20:38, 16 September 2009 (UTC).

Parabola in projective space
Hi. I'm not formally studying mathematics, but am trying to get my head round projective geometry. As it's a raster image, I was trying to redraw File:Parabola in projective space.png as an SVG using gnuplot. My best effort so far is File:Parabola in projective space.svg. Please can you experts let me know what you think? Is it mathematically correct, and could someone alter the image description page so it precisely says what I have done? Does anyone have a tip for making the image reflect that the plane is at a tangent to the sphere? Should the cubic line on the ball go all the way round the sphere as it does in the first image? I didn't understand how that was a projection.Many thanks. RupertMillard (Talk) 22:07, 14 September 2009 (UTC)


 * You don't really have a description of the mapping you're trying to apply so it's hard to tell if what you're doing is correct or not. It looks like you're trying to map the parabolas onto the Riemann sphere which isn't really projective space at all. You can map project space onto a sphere but its a 1-2 mapping. Also keep in mind that all conics are basically the same curve in projective since the equations can all be transformed under a projective transformation to x2+y2=z2, which just happens to be the equation of the cone in space. Does that help?--RDBury (talk) 10:24, 15 September 2009 (UTC)


 * This is very helpful, thank you. I've never read about Riemann spheres before. (Isn't this illustration stunning? ) What I was really trying to do was just redraw the figure File:Parabola in projective space.png in the context of Algebraic_geometry. I thought it looked like the curves are projected onto a unit sphere through its centre, where the sphere touches the origin of the graph, but I can see this is all wrong now. As you say, this would yield a 1-2 mapping. I guess I shouldn't even be trying to draw figures that I don't understand. (I don't understand exactly what you mean about the conics being basically the same curve - in the original diagram, the parabola goes from an unclosed figure, to a closed one. Is this related to one of the foci of a parabola being at infinity, which is about where my conics left off.) RupertMillard (Talk) 10:52, 15 September 2009 (UTC)


 * Try the conic sections paragraph in Projective geometry. Really the most natural setting for algebraic curves is complex projective space. But any attempt to visualize that make my head hurt.--RDBury (talk) 12:55, 15 September 2009 (UTC)


 * Just to be clear, the Riemann sphere represents one-dimensional complex projective space CP1 which is topologically a sphere. We can think of wrapping the complex plane up and then completing the sphere by adding a single point at the top, which is called the "point at infinity".  This is different from what you were dealing with when making your diagram, which is two-dimensional real projective space RP2.  RP2 isn't topologically a sphere, but there is a 1-2 mapping from RP2 to the sphere.  Each point on the sphere is identified with the point on the direct opposite side of the sphere.  In this case we're mapping the x-y plain onto just the top half of the sphere and then there's a whole "line at infinity" which is the equator, and then the bottom half is just an identical copy of the top half.
 * I didn't check the math, but your diagram certainly looks correct. It would probably be more complete to show both sets of points on the sphere, although to be fair the original diagram you were reproducing failed to do so for the y=x2 graph. Rckrone (talk) 21:06, 15 September 2009 (UTC)
 * Ah thanks. This all makes a lot more sense now. I've added the second halves, and described it as "y=x2 & y=x3 projected on RP2 mapped onto a sphere". But I think I might animate this or redraw it in POV-Ray because it's not looking very clear to me. (Ironically, this would bring it back to a raster format of some description.) RupertMillard (Talk) 23:07, 15 September 2009 (UTC)

The key idea of projective space is that the infinite can become finite (and visa versa). Take a piece of paper and cut yourself a rectangle. Put the thin end against the thin end to form a loop. (The join formed by the thin end of the rectange meeting thr other thin end will be called the join.) We will draw some circles in relation to the join and these will give us conics. If the circle misses the join, then when we open the strip of paper we will have an elipse. If the circle is tangent to the join then when we open then strip we will have a parabola. If the circle crosses the join then we get a hyperbola. Dr Dec ( Talk )    21:29, 15 September 2009 (UTC)

visa versa? --131.114.73.84 (talk) 09:56, 16 September 2009 (UTC)