Wikipedia:Reference desk/Archives/Mathematics/2011 January 7

= January 7 =

Differentiation w/ respect to complex conjugate
My prof defined partial differentiation of a (single-variable) complex function with respect to the complex conjugate as follows:

If z = x + iy, and f(z) = u(x,y) + iv(x,y), then $$\frac {\partial f}{\partial \overline{z}} = \left(\frac {\partial u}{\partial x} - \frac {\partial v}{\partial y} \right) + i \left(\frac {\partial u}{\partial y} + \frac {\partial v}{\partial x} \right)$$

Is there an intuitive way of seeing the origin of this definition, other than an after-the-fact observation that it behaves as a partial derivative w/ respect to $$\overline {z}$$? 74.15.138.87 (talk) 01:56, 7 January 2011 (UTC)
 * I suppose the observation you refer to is something like
 * $$f(z) = f(z_0) + (z-z_0)\frac{\partial f}{\partial z}(z_0) + \overline{(z-z_0)}\frac{\partial f}{\partial \overline{z}}(z_0) + o(|z-z_0|)$$
 * given a suitable companion definition of $$\partial f/\partial z$$ (beware: I haven't checked whether this is in fact true; some factors of -1 or 2 or 1/2 may be needed to make it true) . There's nothing wrong with "after-the-fact observations"; they are only "after-the-fact" by virtue of the more or less arbitrary order in which your text presents things. The $$\partial f/\partial\overline{z}$$ symbol could equally well be defined as the complex number that makes the equation above hold, except in that case you would still need to prove that such a number is unique if exists, etc. etc. Most authors seem to feel that, all other things being equal, it is easiest to understand the formal development if definitions are chosen such that there is minimal doubt about the definition actually defining something.
 * As an alternative to either of these two characterizations, you could interpret $$\partial f/\partial\overline{z}$$ as a way to quantify how far f is from satisfying the Cauchy-Riemann equations. –Henning Makholm (talk) 02:54, 7 January 2011 (UTC)
 * There might not be anything wrong with an after-the-fact definition, but it's nice to have different perspectives on things, and certainly being able to see the logic behind the definition (before seeing its consequences) has some advantages.
 * I was looking for a way to go from
 * $$\frac {\partial}{\partial \overline{z}}f(z, \overline{z}) = \lim_{h\to 0} \frac{f(z,\overline{z} + h) - f(z,\overline{z})}{h}$$
 * to the above equation (h is, obviously, a complex number). Would you know how? 74.15.138.87 (talk) 03:43, 7 January 2011 (UTC)
 * I'm not sure your limit even makes sense to me; previously you said that f is a single-variable function but here you give it two arguments? In any case, a limit of this kind would probably not exist unless the function f happened to be an ordinary holomorphic function composed with an explicit conjugation, which is not a very interesting situation.
 * I would suggest that the property of ordinary differentiation that your definition generalizes is not the high-school limit definitions, but more the property of being the coefficient in a linear approximation. Does the characterization I gave above make sense to you? –Henning Makholm (talk) 03:56, 7 January 2011 (UTC)
 * (Also, I think this is one of the not uncommon cases where "the logic behind the definition" is that it happens to be what gives the desired consequences) –Henning Makholm (talk) 04:00, 7 January 2011 (UTC)
 * The equation you wrote above is familiar to me from real-valued differentiation, but my problem for $$\frac{\partial f}{\partial \overline{z}}$$ is the same as for $$\frac{\partial f}{\partial z}$$ (which, just to make sure I understand, is different than $$\frac{df}{dz}$$, right?). At any rate, I haven't seen that formalism for complex numbers, so I can't say I understand it entirely.
 * As for my differentiation thing, what I meant is something like this: suppose $$f(z) = |z| = z\overline{z}$$. Evidently, f is a function of z alone, but you could pretend that z and $$\overline{z}$$ are seperate variables. Then, $$\frac{\partial f}{\partial z} = \overline{z}$$ and $$\frac{\partial f}{\partial \overline{z}} = z$$. Does that make sense? Probably not ... I'm a physics major, so there's a good chance I just broke like ten rules of math. But I like to have an intuitive understanding of the math I'm using, and when I see a symbol like $$\frac{\partial f}{\partial \overline{z}}$$, this is what I think of. So, for me at least, it's nice to see how this perhaps non-rigorous understanding of the math fits into the overall picture. 74.15.138.87 (talk) 04:40, 7 January 2011 (UTC)
 * I'm not sure how well your idea of pretending that z and z* are different works. What if f is given by some arbitrary expressions for u(x,y) and v(x,y)? Then we couldn't say which x's and y's came from z and which came from z*. It might work OK in those cases where you can express f as a complex differentiable function of z and z*, if you add a factor of 1/2 to your definition as I suggest below (or at least it seemed to work in the few examples I worked out), but it still seems a bit shifty to me. –Henning Makholm (talk) 07:26, 7 January 2011 (UTC)


 * Okay, first beware that I've actually never seen the notation $$\partial f/\partial\overline{z}$$ before; I'm making this up as I go along! If you're doing this for the purpose of physics, it may be that it's all meant to be used for some kind of Hermitean-ish form and my suggestions are completely off. But:
 * Usually, $$df/dz$$ is only defined when $$f$$ satisfies the Cauchy-Riemann equations, and in that case your $$\partial f/\partial\overline{z}$$ would be identically zero. So I'm assuming that there is a $$\partial f/\partial z$$ to go along with it, such that both are somehow meaningful for a non-differentiable $$f$$.
 * My idea now is to go back to multivariate real analysis and look at the real functions u and v. Let's keep $$z_0$$ fixed and look at the differential
 * $$df = du + i\,dv = \frac{\partial u}{\partial x}dx + \frac{\partial u}{\partial y}dy + i\frac{\partial v}{\partial x}dx + i\frac{\partial v}{\partial y}dy$$
 * (from the definition of f, and the chain rule). Now, if we can write the left-hand side of this in the form
 * $$df = A(dx+i\,dy) + B(dx-i\,dy)$$
 * for some appropriate complex numbers A and B (which depend on $$z$$ but not on $$dx$$ and $$dy$$), then it would make some sense to call A and B $$\partial f/\partial z$$ and $$\partial f/\partial\overline{z}$$, respectively, because then the whole thing would look sort of like the chain rule. Expressing A and B in terms of the partial derivatives of u and v is a matter of simple (real) linear algebra. Calculate, calculate ... it turns out that B becomes half of your definition for $$\partial f/\partial\overline{z}$$. No matter; this just means that the pseudo-chain rule that works for your definitions will be
 * $$df = \frac{1}{2}\left[\frac{\partial f}{\partial z}dz + \frac{\partial f}{\partial\overline{z}}d\overline{z}\right]$$
 * which is not quite an unreasonable convention either, though it does have the strange consequence that $$\partial f/\partial z$$ is two times $$df/dz$$ when the latter is defined. Alternatively, perhaps there is an 1/2 in your notes that you forgot to copy?
 * Clearer now? –Henning Makholm (talk) 06:45, 7 January 2011 (UTC)
 * Yes, there was a missing 1/2 factor, and yes it is clear now. Thanks! 74.15.138.87 (talk) 15:29, 7 January 2011 (UTC)

limit
how would I prove that $$\lim_{x\to\infty}\frac{x!}{n^x}=\infty$$ for any n, therby proving that the factorial function grows faster than any exponential? Is this even true? 24.92.70.160 (talk) 02:43, 7 January 2011 (UTC)
 * See Factorial. Staecker (talk) 02:45, 7 January 2011 (UTC)


 * Basic idea: When you increase x by 1, the numerator increases by a factor of about x, whereas the denominator increases by a constant factor of n.  Eventually x is greater than n.  Work it out from there. --Trovatore (talk) 02:48, 7 January 2011 (UTC)


 * If part of your quandary is how to deal with the factorial, you might try converting Stirling's approximation into a bound on x!, and use that to derive the limit. Alternatively, you can substitute the Gamma function for the factorial, as the Gamma function is the continuous version of the factorial. -- 174.21.250.227 (talk) 03:06, 7 January 2011 (UTC)


 * I don't think that is any easier than keeping it as a discrete sequence and working directly from the definition. One easily sees that from a certain $$x$$ onwards, the sequence is strictly increasing, and it is then also easy for any $$N$$ to find an $$x$$ such that $$x!/n^x> N$$ (note that it is not necessary to be able to pinpoint the first such $$x$$). –Henning Makholm (talk) 03:22, 7 January 2011 (UTC)

Question: what is the meaning of R suerscript n, subscript ++.
Deascription of the question : In general in mathematics, R with superscript n and subscript + mean a cartisian space of real number of n dimentions or n coordinates. The subscript + indicate that all the values are > or = 0. However, in the book jehle G A & Reny P J (2009) advanced microeconomic theory, 2nd ed. Low price ed. pearson education. in page 36 notatioon R superscript n, subscript ++ has been used. Meaning of this new notation is not clear. Please help. —Preceding unsigned comment added by 218.248.80.62 (talk) 11:48, 7 January 2011 (UTC)


 * $$\mathbf{R}^n_{++}$$ is like $$\mathbf{R}^n_{+}$$, but the coordinates are required to be strictly positive, i.e. all of them greater than zero. Pallida  Mors  14:01, 7 January 2011 (UTC)
 * The notation may sound strange for other areas, but it is more or less widespread in Mathematical Economics. $$\mathbf{R}^n_{++}$$ is sometimes called the strictly positive orthant, see for instance this source, page 2. Pallida  Mors  14:11, 7 January 2011 (UTC)

distance measure
Hi. I have two vectors, $$(x_1,\ldots,x_n)$$ and $$(y_1,\ldots,y_n)$$, that sum to one: $$\sum x_i=\sum y_i=1$$. All elements are non-negative. I need to define a "distance" between these and I am sure that there is a better way than just $$\sum(x_i-y_i)^2$$. The correct term is eluding me. Anyone? Robinh (talk) 15:50, 7 January 2011 (UTC)
 * There are many ways of defining distances between vectors. Which is best depends on the situation. What are you trying to do with these vectors and this distance? Algebraist 15:55, 7 January 2011 (UTC)
 * (edit conflict) The term you want is probably metric. I can't say specifically what metric would be "better" than the standard Euclidean metric—it depends on what you're going to use it for. —Bkell (talk) 15:56, 7 January 2011 (UTC)
 * I think you want vector cosine - a common and efficient method of defining the distance between two vectors with the same number of elements, all between -1 and 1. -- k a i n a w &trade; 15:57, 7 January 2011 (UTC)
 * (e/c) Obvious choices are the Lp-norms ($$(\sum_i|x_i-y_i|^p)^{1/p}\,$$ for 1 ≤ p < ∞, $$\max_i|x_i-y_i|\,$$ for p = ∞). There is no telling what is "better" unless you specify a bit more what kind of application you have in mind.—Emil J. 15:59, 7 January 2011 (UTC)
 * thanks guys. The context is Dirichlet distribution, but vector cosine takes me to Hamming distance, which is more-or-less what I want (most of the elements of the vector are zero).  Cheers, Robinh (talk) 16:04, 7 January 2011 (UTC)


 * I suggested cosine instead of Hamming because Hamming will give you a headache if you have values that are not 0 or 1. Cosine will give the same results as Hamming for binary (0/1) values, but a more accurate result for a collection of values between 0 and 1. --  k a i n a w &trade; 16:08, 7 January 2011 (UTC)
 * Thanks for this. I'll use both and report back. Best wishes, Robinh (talk) 16:11, 7 January 2011 (UTC)
 * Kainaw, I don't understand what you mean when you say, "Cosine will give the same results as Hamming for binary (0/1) values, but a more accurate result for a collection of values between 0 and 1." The vector cosine will always be a real number between −1 and 1, whereas the Hamming distance will always be a nonnegative integer. —Bkell (talk) 16:18, 7 January 2011 (UTC)


 * Indeed, I'm not sure the concept of Hamming distance has any meaning at all in the context of real-valued vectors, other than being generalized into one of the Lp norms. -- The Anome (talk) 16:45, 7 January 2011 (UTC)


 * I didn't mean to imply it will give the same "value". I meant the same "result" as in a general idea of distance.  To be more specific, if the vectors are binary 0/1 values, vector cosine produces a Jaccard index (or a Tanimoto coefficient, depending on exactly how you implement it).  Jaccard index is, in general, a measure of how many elements between the vectors are the same.  Hamming distance is also a measure of how many elements between the vectors are the same.  So, the result is the same in concept even though the exact value will be different. --  k a i n a w &trade; 17:19, 7 January 2011 (UTC)

$$\sqrt{\sum(x_i-y_i)^2}$$ is the most common definition of distance. Why shouldn't it be good enough? Bo Jacoby (talk) 21:56, 7 January 2011 (UTC).
 * (OP). Well, none of the suggestions "use" the fact that the total of the elements is unity, nor the fact that each element is non-negative.  I just have tip of the tongue that there is a distance measure out there that "uses" these features of my vectors, which has found applications in statistics.  I'm sure that the distance measure I'm thinking of has some nice properties in the context of the Dirichlet distribution....but I just can't remember what it's called.  I have a vague sense that it's someone's name.  Smith's distance?  The Jones distance?   thanks everyone, Robinh (talk) 22:45, 7 January 2011 (UTC)

"In the context of the Dirichlet distribution". The natural measuring stick for a random variable is the standard deviation, which is proportional to $$\sqrt{x(1-x)}$$, so you may like to use $$\sqrt{\sum(\phi(x_i)-\phi(y_i))^2}$$ as a measure of distance between x and y, where $$\phi(x)=\frac{\arccos(1-2x)}{\pi}$$. Bo Jacoby (talk) 11:15, 8 January 2011 (UTC).

What does it mean to raise big-O to a power?
I apologise if this question is foolish, but I am rather confused. In our article Time complexity, the table at the start claims that polynomial time is 2O(log n). In the text of the article, it says that polynomial time is O(nk). Now, if someone asked me (as someone just did, which led me to the article to check I was right) I would have defined polynomial time as O(nk). But what on earth does 2O(log n) mean? O(...) is a measure of complexity, not a number; how do you raise it to a power? And why does the article say both P=2O(log n) and P=O(nk), without any explanation as to the difference? Marnanel (talk) 18:08, 7 January 2011 (UTC)
 * "f(n)=2O(log n)" means there exists a function g such that g(n)=O(log n) and f(n)=2g(n). This is indeed equivalent to being O(nk) for some k. I don't know why the table uses one form rather than the other. Algebraist 18:15, 7 January 2011 (UTC)
 * I'd say 2O(log n) is unnecessarily confusing. If the idea is to have a single expression instead of a union like $$\bigcup_kO(n^k)$$, then the fairly common notation nO(1) is simpler and easier to understand.—Emil J. 18:26, 7 January 2011 (UTC)


 * If I may (not an expert) offer a counterexample, g(n)=a*log n + b*log(log n), which is still O(log n). Then f(n)=2g(n)=nk(log n)k', which is definitely not O(nk) (unless we redefine O in this problem). Therefore the two are not equivalent. SamuelRiv (talk) 07:59, 10 January 2011 (UTC)
 * How is $$n^a(\log n)^b$$ "definitely not" $$O(n^k)$$? By the definitions I'm familiar with, it is $$O(n^{a+\epsilon})$$ for any $$\epsilon>0$$. –Henning Makholm (talk) 12:00, 10 January 2011 (UTC)


 * Okay, now that I actually looked at the article, the confusion is clear. The reason why the odd expression for polynomial time is there is to include things like nlogn, as illustrated in the table. The definition O(nk) describes the upper bound of a P algorithm, whereas the former is an exact order that gives the actual efficiency (not a bound). For those purposes, O(nlogn) or its variants cannot be adequately described by O(n1+e) for some e>0. SamuelRiv (talk) 00:30, 13 January 2011 (UTC)

Inverse Function Theorem
I have been reading Spivak's Calculus on Manifolds, and just got through the proof of the inverse function theorem. The statement is as follows:


 * Let a ∈ U ⊆ Rn, where U is open, and let f: U → Rn be continuously differentiable. Assume that f ′(a) is invertible. Then f defines a bijection of some open neighbourhood V of a onto an open neighbourhood W of f(a), and V and W can be chosen so that f-1 is differentiable on W.

I have been over this several times, and it appears to me that it is only necessary in the proof to assume that f is differentiable on U, and f ′ is continuous at a. My question is, am I correct?

If not, please give a counterexample or a reference. If so, please give a reference that states the result in at least that generality. 86.205.29.53 (talk) 20:02, 7 January 2011 (UTC)
 * Your version is true. Here is a reference. Algebraist 21:53, 7 January 2011 (UTC)
 * Hello. Unfortunately, I can't get the Preview to work on Google Books. However, now I'll know where to look! Thank you very much. 86.205.29.53 (talk) 23:24, 7 January 2011 (UTC)
 * It's working now. 86.205.29.53 (talk) 23:33, 7 January 2011 (UTC)