Wikipedia:Reference desk/Archives/Mathematics/2015 December 22

= December 22 =

Can someone verify this counterintuitive result.
We have a system requirement that a computer system processing records should stop processing if the percentage of erroneous records exceeds 1%. A colleague of mine said that we had better only start applying the percentage calculation after processing 1000 or so records as otherwise any error in the first 100 records, or 2 within the first 200 would mean that we would stop. I said that it would not matter because the chance of getting an error in the first 100 records with an error rate of 1% would only be 1%, figuring that this means an average of 1 error in 100. He said "I think you'll find it's much more likely", so I calculated it as $$1-(0.99^{100})$$ which is about .634 or 64%. Is this calculation right?

Also, how would I calculate the chances of 2 errors in up to 200, 3 errors in up to 300, so I can see how fast it converges with the 0.1% 1% error rate? As you can probably tell my mathematics is a bit rusty, having done largely non-mathematical programming for the last 30 years! -- Q Chris (talk) 17:40, 22 December 2015 (UTC)


 * Your 63.4% chance of having an error in the first hundred cases, given an average 1% error rate, is correct. But later you mentioned a 0.1% error rate, which would give you a 9.5% chance of an error in the first 100 cases.  Note that gathering stats with a small number of trials is known to be inherently unreliable, so many programs wait until they have more test cases before drawing conclusions.  Now, whether 1000 cases is enough of a wait, I can not say.  You could also take another approach and still check lower numbers of trials, but use a larger threshold.  For example, if 10% of the first 100 trials have errors, that does indicate the average error rate is likely over 1%.


 * The "0.1%" rate was an error, which I have corrected. -- Q Chris (talk) 19:36, 22 December 2015 (UTC)


 * You might also want to take a look at the law of large numbers, which states that after a large number of trials the results will tend towards the mean. In your case, you are dealing with the reverse problem, that with a small number of trials the results will vary far from the mean.  StuRat (talk) 18:06, 22 December 2015 (UTC)


 * See Sequential probability ratio test which might be what you are looking for. Dmcq (talk) 18:38, 22 December 2015 (UTC)


 * For a Normal approximation to a Binomial distribution, the usual rule-of-thumb is that both np and np(1-p) need to be greater than 10, but then you need to consider Type I and type II errors and decide what false positives and false negatives are acceptable.   D b f i r s   19:14, 22 December 2015 (UTC)
 * I don't understand why you find this result counterintuitive.
 * If in each record there is a chance of 1% for an error, then with 100 records the chance for an error in at least one of them should be much higher then 1%. I don't know why you thought that an error in 100 records should have the same probability as an error in one record.
 * The naive calculation is just multiplying the number of records by the error rate. In 10 records, there is about 10*1% = 10% chance of an error. But this breaks down with a larger number of records. In general, with error rate p and n records, the chance for no error at all is $$(1-p)^n \approx \exp(-pn)$$, and the chance for an error is 1 minus that. The number you found is essentially $$1-1/e$$.
 * More specifically, the number of errors will roughly follow the Poisson distribution with mean pn - the value of $$\exp(-pn)$$ is what this distribution gives for a value of 0. -- Meni Rosenfeld (talk) 21:58, 22 December 2015 (UTC)

I think you are getting this all wrong. Let take a step back. Your computer system has an unknown error rate. If you detect an error rate of 1% or greater then you must stop the computer system from processing further. To figure out the estimated error rate, you allow the computer system to process 100 records. Based on the number of errors in the 100 records, you must figure out what the estimated error rate is. Your argument seems to be that if the unknown error rate is 1% there is 64% chance that the first 100 records contains at least 1 record in error. But that does not solve your problem of determining what the actual error rate is. I assume finding out what the actual error rate is, is what you actually want. 175.45.116.66 (talk) 00:46, 23 December 2015 (UTC)

Here is a question you can ask. If the computer system process 100 records and it has 0 errors. What is the probability that the actual error rate is equal to or greater than 1% 175.45.116.66 (talk) 00:54, 23 December 2015 (UTC)

Higher-dimensional problem
Hello, in n-dimensional space, I have k "planes", k <= n, each defined by an equation of the type a1*x1 + a2*x2 + ... + an*xn + c = 0. I want to find the point within the intersection of these that is closest to the origin. I know one way to do this: I can solve the equations for free variables and dependent variables, set up an expression for the distance from the origin to an arbitrary point in the intersection, differentiate w.r.t. each free variable, set the differentials to zero, then solve the resulting linear simultaneous equations. At least, I think that will work (correct me if I am wrong). However, based on very simple low-dimensional cases, I'm wondering whether there is a direct general solution in terms of the original coefficients. I don't much fancy slogging through the algebra of a series of high-dimensional cases to see if anything "simple" falls out at the end, and then trying to generalise. Does anyone know whether there is any useful result better than my multi-stage procedure? 109.151.59.86 (talk) 20:16, 22 December 2015 (UTC)
 * Note that solving an invertible linear system is a special case of your problem (the intersection is a single point, so it is the closest and you have to find it), so you shouldn't be able to do better than what it takes to solve such a system. The closest thing I know to expressing a solution to a linear system using the coefficients is Cramer's rule, which is horribly inefficient if evaluated directly (and indirect evaluation isn't much different from standard elimination).
 * However, I'm not sure using differentiation is the way to go. The closest point will be orthogonal to each of the vectors spanning the intersection - you should use that. I'm not sure what is the best way, but it might be this:
 * Reduce the system to a point where you can figure out the spanning vectors;
 * Add linear equations representing the orthogonality to the spanning vector;
 * You should now have a system with a unique solution. Solve it.
 * -- Meni Rosenfeld (talk) 21:49, 22 December 2015 (UTC)
 * Thank you for your reply. Could you explain what "vectors spanning the intersection" means? How do I go about obtaining such vectors? ? 109.151.59.86 (talk) —Preceding undated comment added 02:51, 23 December 2015 (UTC)
 * I think a simpler way to put it is that (x1, x2, ... xn) must be parallel to the span of the vectors (a1, a2, ... an). In other words Xt must be in the row space of A where the subspace is defined by AX=b. I got the same thing using Lagrange multipliers just to confirm. With that, Xt = UtA for some vector U, giving X=AtU, AAtU=b, U = (AAt)-1b, X=At(AAt)-1b. Presumably the case AAt singular is when the equations are either redundant or inconsistent. --RDBury (talk) 09:39, 23 December 2015 (UTC)
 * What RDBury said is probably true (I didn't check), but to clarify what I meant - suppose you do elimination and end up with the following equation:
 * $$\begin{bmatrix}1&0&2&-3\\0&1&-4&5\end{bmatrix}\begin{bmatrix}x\\y\\z\\w\end{bmatrix}=\begin{bmatrix}3\\-7\end{bmatrix}$$
 * The free variables are z and w, the dependent are x and y. The general solution is $$x=3-2z+3w,\ y=-7+4z-5w$$, or in other words $$(x,y,z,w) = (3,-7,0,0) + z (-2,4,1,0) + w (3,-5,0,1)$$. The spanning vectors are $$(-2,4,1,0)$$ and $$(3,-5,0,1)$$. The optimal point must be orthogonal to them, so you can add the equations $$-2x+4y+z=0,\ 3x-5y+w=0$$. This gives you
 * $$\begin{bmatrix}1&0&2&-3\\0&1&-4&5\\-2&4&1&0\\3&-5&0&1\end{bmatrix}\begin{bmatrix}x\\y\\z\\w\end{bmatrix}=\begin{bmatrix}3\\-7\\0\\0\end{bmatrix}$$
 * The unique solution to this system will give you the desired point. -- Meni Rosenfeld (talk) 11:17, 23 December 2015 (UTC)
 * Thank you for the replies. 109.151.59.86 (talk) 12:08, 23 December 2015 (UTC)


 * The two methods do agree. MR reduces the original system to
 * $$\begin{bmatrix}I&C\end{bmatrix}X=b$$
 * then adds the orthogonality contition to get
 * $$\begin{bmatrix}I&C\\-C^t&I\end{bmatrix}X=\begin{bmatrix}b\\0\end{bmatrix}.$$
 * This yeilds the solution
 * $$X = \begin{bmatrix}I&C\\-C^t&I\end{bmatrix}^{-1}\begin{bmatrix}b\\0\end{bmatrix}.$$
 * You can write
 * $$\begin{bmatrix}I&C\\-C^t&I\end{bmatrix}^{-1} = \begin{bmatrix}(I+CC^t)^{-1}&-(I+CC^t)^{-1}C\\C^t(I+CC^t)^{-1}&-C^t(I+CC^t)^{-1}C+I\end{bmatrix}$$
 * which gives
 * $$X = \begin{bmatrix}(I+CC^t)^{-1}b\\C^t(I+CC^t)^{-1}b\end{bmatrix}.$$
 * On the other hand, when you plug A=[I C] into the formula I gave above you get
 * $$X = \begin{bmatrix}I\\C^t\end{bmatrix} (I+CC^t)^{-1}b$$
 * which works out to the same thing.


 * A followup question is whether you can guarantee that the matrix
 * $$\begin{bmatrix}I&C\\-C^t&I\end{bmatrix}.$$
 * is invertable, or equivalently that (1+CCt) is invertable? This is false in characteristic 2 but we're presumably working in the reals here so maybe there's some bound you can put on the eigenvalues to ensure none of them are 0. --RDBury (talk) 22:52, 23 December 2015 (UTC)