Wikipedia:Reference desk/Archives/Mathematics/2008 June 23

= June 23 =

Probability
I have a question concerning probability:

Given: 57% Breakfast eating {P(A)}, 80% Teeth flossing {P(B)}, and 46% doing both. I figured that (Teeth + Breakfast) - Both = Probability of doing either activity. However I am not sure how to find the probability of those who eat breakfast, but do NOT floss. Vivio Testa rossa  Talk Who 03:27, 23 June 2008 (UTC)


 * Try drawing a Venn diagram, and use that to see what areas you could add or subtract to get the one you want. Confusing Manifestation (Say hi!) 04:00, 23 June 2008 (UTC)


 * Ultimately, you want to find the percentage of people who only eat breakfast. This is given by (Percent breakfast eating) - (Percent doing both). 76.238.91.253 (talk) 04:34, 23 June 2008 (UTC)


 * Or, alternatively, since they do seem to be assuming these are independent events, you can multiply the portion who eat breakfast (0.57) by the portion NOT brushing (0.20). StuRat (talk) 11:51, 23 June 2008 (UTC)


 * Is "but do not floss" equivalent to "and do not floss"? I think you've assumed so, StuRat. Zain Ebrahim (talk) 12:07, 23 June 2008 (UTC)
 * Yes, "but" and "and" mean effectively the same thing, "but" is used when there is some kind of contradiction between the two statements (that is, given the first you would expect the opposite of the second). --Tango (talk) 14:54, 23 June 2008 (UTC)


 * I thought about bringing this up earlier, but anyway the question needs to be clarified. Is it that, out of a group of people, 57% of them are eating breakfast etc. and you want to find out what is the probability that a randomly picked person will be eating breakfast and not flossing? Or is it that there is a 57% chance that a person is eating breakfast and independent of that there is an 80% chance that he will be flossing, and you want to find the probability that this one person will be eating breakfast and not flossing? I originally thought you meant the former, but now I'm thinking you mean the latter. —Preceding unsigned comment added by 76.224.118.12 (talk) 01:18, 24 June 2008 (UTC)


 * I assumed they're independent events, since 80% of 57% is approximately 46%, just what one would expect from two independent events. StuRat (talk) 10:30, 24 June 2008 (UTC)

Polynomials
Who uses polynomials in the real world for real life situations?67.101.98.121 (talk) 04:45, 23 June 2008 (UTC) —Preceding unsigned comment added by 67.101.98.121 (talk) 04:30, 23 June 2008 (UTC)


 * Just about everyone? Calculators normally use polynomial approximations to calculate many functions like cosine or logarithms. The path of a thrown ball is a polynomial - a quadratic - to a reasonable approximation. Computer programmers think in terms of polynomial versus non-polynomial complexity, and some encryption algorithms use an extension of polynomials (or at least polynomials over fields other than the real numbers). Complicated functions are also sometimes approximated using polynomial interpolation, although care needs to be taken sometimes (using a high order polynomial to approximate the path your Mars lander needs to take may result in one that dips below ground level). Confusing Manifestation (Say hi!) 07:00, 23 June 2008 (UTC)

Chi-squared question
Given a set of measurements with uncertainty which are presumably linearly distributed, I'm comfortable using the chi-squared parameter to find a line of best fit. However, what if I want to fit a line to data that has uncertainties in both the independent and dependent variables? How would I define a $$\chi^2$$ in that case? Thanks!&#8201;—&#8201;gogobera (talk) 05:35, 23 June 2008 (UTC)


 * See total least squares. --Tardis (talk) 18:20, 24 June 2008 (UTC)


 * I'm not following your question (specifically, what do you mean by "uncertainty which (is) linearly distributed"?). If you're asking about a linear regression, the case in which the independent variable has a random component is called the "errors in variables" or "stochastic regressors" problem. The slope coefficient is biased toward zero. The procedure for eliminating the bias is called "two stage least squares." Wikiant (talk) 19:06, 24 June 2008 (UTC)

Regarding the Wiki article on Markov chains
I just edited and added some content to a section (reproduced below) of the Markov chain article, and I have some questions about the math and terminology. All of my questions and such pertain only to Markov processes with invariant transition matrices that are stochastic.


 * Markov chains with a finite state space


 * If the state space is finite, the transition probability distribution can be represented by a matrix, called the transition matrix, with the (i, j)'th element of P equal to


 * $$p_{ij} = \Pr(X_{n+1}=j\mid X_n=i). \,$$


 * P is a stochastic matrix, which is an important fact to keep in mind for the rest of this discussion. When the Markov chain is a time-homogeneous Markov chain, so that the transition matrix P always remains the same at each step, then the k-step transition probability can be computed as the k ' th power of the transition matrix, Pk.


 * The stationary distribution π is a (row) vector that satisfies the equation


 * $$ \pi = \pi\mathbf{P}.\,$$


 * In other words, the stationary distribution π is a normalized left eigenvector of the transition matrix associated with the eigenvalue 1.


 * Alternatively, π can be viewed as a fixed point of the linear (hence continuous) transformation on the unit simplex associated to the matrix P. As any continuous transformation in the unit simplex has a fixed point, a stationary distribution always exists, but is not guaranteed to be unique, in general. However, if the Markov chain is irreducible and aperiodic, then there is a unique stationary distribution π. Additionally, in this case Pk converges to a rank-one matrix in which each row is the stationary distribution π, that is,


 * $$\lim_{k\rightarrow\infty}\mathbf{P}^k=\mathbf{1}\pi$$


 * where 1 is the column vector with all entries equal to 1. This is stated by the Perron-Frobenius theorem. If, by whatever means, $$\scriptstyle \lim_{k\to\infty}\mathbf{P}^k$$ is found, then the stationary distribution of the Markov chain in question can be easily determined for any starting distribution, as will be explained below.


 * Since P is a stochastic matrix, $$\scriptstyle \lim_{k\to\infty}\mathbf{P}^k$$ always exists. Because there are a number of different special cases to consider, the process of finding this limit can be a lengthy task. All the same, there are several general rules and guidelines to keep in mind. Let P be an n&times;n matrix, and define $$\scriptstyle \mathbf{Q} = \lim_{k\to\infty}\mathbf{P}^k.$$


 * It is always true that


 * $$\mathbf{QP} = \mathbf{Q}.$$


 * Subtracting Q from both sides and factoring then yields


 * $$\mathbf{Q}(\mathbf{P} - \mathbf{I}_{n}) = \mathbf{0}_{n,n}$$


 * where In is the identity matrix of size n, and 0n,n is the zero matrix of size n&times;n. Multiplying together stochastic matrices always yields another stochastic matrix, so Q must be a stochastic matrix. It is sometimes sufficient to use the matrix equation above and the fact that Q is a stochastic matrix to solve for Q.


 * Here is one method for doing so: first, define the function f(A) to return the matrix A with its right-most column replaced with all 1's. Then evaluate the following equation:


 * $$\lim_{k\rightarrow\infty}\mathbf{P}^k=f(\mathbf{0}_{n,n})[f(\mathbf{P}-\mathbf{I}_n)]^{-1}.$$


 * This equation does not work when [f(P – In)]–1 does not exist. If this is the case, then it is necessary to take into account more information in order to find Q. One thing to notice is that if P has an element Pii on its main diagonal that is equal to 1 and the ith row or column is otherwise filled with 0's, then that row or column will remain unchanged in all of the subsequent powers Pk. Hence, the ith row or column of Q will have the 1 and the 0's in the same positions as in P.


 * In most cases, Pk approaches but never actually equals its limit. There are numerous exceptions to this, however, such as the case in which


 * $$\mathbf{P} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \end{bmatrix}.$$


 * If A0 (which is a row vector) represents the starting distribution, then the stationary distribution is equal to A0Q. Note that any distribution, regardless of the number of steps it is away from the starting distribution, can be used in place of A0 without affecting the result for the stationary distribution.

I added all the information in the lower half of this section about how to find $$\scriptstyle \lim_{k\to\infty}\mathbf{P}^k$$. First of all, I was wondering whether what I added would be more appropriately placed in the stochastic matrix article. While I was editing, I just figured that some people might want to know how to find $$\scriptstyle \lim_{k\to\infty}\mathbf{P}^k$$ while they are looking up information about topics like these.

Anyway, what exactly is a "distribution"? Specifically, am I correct in believing that a distribution is represented by a row vector and shows the status of a Markov process after some number of transition steps? Are distributions linked to specific starting conditions, or are they only dependent on the transition matrix P? Do all the elements in a distribution vector have to sum to 1? As a related matter, I know that a Markov chain of the type described above always converges to a particular distribution after a large number of steps, and I'd like to know the proper term for this distribution that it converges to.

In the part of the section that I did not write, it was stated that the stationary distribution π is a row vector that satisfies the equation πP = π. Does that constitute the entire definition for π? After all, any scalar multiple of some original π would also satisfy the equation.

It was also stated that π is a normalized vector, which means to me that π is appropriately scaled to have a vector length of 1. However, earlier in the article I believe it stated that all the elements of π sum to 1. These two statements are contradictory. I noticed that the latter statement coincides with the equation (as quoted above) :$$\lim_{k\rightarrow\infty}\mathbf{P}^k=\mathbf{1}\pi$$.

I have a number of other questions about the proper terminology to use for describing Markov chains, but I'll ask them later. 76.238.91.253 (talk) 05:44, 23 June 2008 (UTC)


 * In this context a distribution is a probability measure. Yes, in the case of a finite space (finite state space) the distribution is sometimes described by a vector. Yes, the elements are non-negative and sum to 1. A stationary distribution is a distribution that is P-invariant. For some P, there might be more than one stationary distributions. But there are a large class of Markov chains for which there is a unique stationary distribution. Don't you think it would be more appropriate to have this discussion in the talk page of the article? (Not that I particularly mind.) Oded (talk) 05:50, 23 June 2008 (UTC)


 * I asked one question on the talk page (the title of the question starts "Just added a formula..."), but then I realized that the reference desk gets more user traffic. I would continue the discussion at the talk page if I could get at least one person knowledgeable in the subject to notice what I'm doing/asking there. I haven't taken a class about Markov chains proper (I will soon, though), but I have worked with them informally for a long time. Thus, for the most part I just need someone to help me with the terminology so that I can appropriately describe what I know about this subject. Do you think you, or anyone else, could help me with that (I don't have that many questions left)? By the way, what do you make of the statement that π is a "normalized" vector? 76.224.118.12 (talk) 01:06, 24 June 2008 (UTC)
 * The fact the elements sum to one is the relevant notion of normalization in this context. If you want, you can see this as having length one in the l1 norm. Algebraist 21:38, 24 June 2008 (UTC)

square roots and cube roots.
Is the amount of numbers, which are both squares and cube, (eg 64) infinite? And how does one prove it? I imagine that it would be some sort of proof by contradiction, eg "Assume there is some some number x such that the square root is an integer and the cube root is an integer and that there is no larger such number." Now, if I think about it, (this is the part I'm not 100% clear on) x^6 should also satisfy the two initial conditions, because if the square root of x is an integer then the square root of x^6 is also an integer, ditto for cube roots, therefore there are an infinite amount of numbers that are both squares and cube.

Am I way off base???

Duomillia (talk) 19:50, 23 June 2008 (UTC)


 * After thought.

Can the same proof be used to demonstrate the same thing for higher degrees? Example, an number that is both x^10 and y^6 just to pick random numbers. 65.110.174.74 Duomillia (talk) 20:09, 23 June 2008 (UTC)


 * Your proof is valid (assuming you have shown at least one x exists to start a chain), but it can be simplified. Regardless of x (just assuming it's an integer), what can you say about the square root of x^6 and the cube root of x^6? Remember x^6 = x*x*x*x*x*x. PrimeHunter (talk) 20:18, 23 June 2008 (UTC)
 * Indeed, and the same approach works for any other powers you may choose. Just remember that there are an infinite number of integers. --Tango (talk) 20:20, 23 June 2008 (UTC)


 * And, indeed, every such number is the sixth power of an integer, which is easily shown by using the fundamental theorem of arithmetic. (Every exponent in the prime factorization of a square number must be even.  Every exponent in the prime factorization of a cube must be divisible by three.  Hence every exponent in the prime factorization of a number which is both a square and a cube must be divisible by six.)  —Ilmari Karonen (talk) 00:22, 25 June 2008 (UTC)