Wikipedia:Reference desk/Archives/Mathematics/2008 February 29

= February 29 =

Pretty
What are some of the most visually appealing bits of mathematics you can think of? Geometric, algebraic, anything. So far, I've been able to find a number of well-known fractals (Mandelbrot, Julia, Lorenz, Apollonius, Sierpinski, Menger, Koch, Weierstrass, Newton, Peano, Hilbert, Klein) and general methods for generating others, irregular tilings of the plane with colored shapes, the Poincare disk and variations on it, colored graphs of algebraic varieties and renderings of 2-manifolds, impossible shapes in the style of Escher, a few things evocative of the infinite (the Hilbert Hotel, power series), a number of pretty symbols and formulas (always looking to expand), the Platonic and Archimedean solids, and a few things that would only work with animation and several more dimensions. If anyone could add detail to one of those things, or even better give me something totally new to include, I'd be forever greatful. Black Carrot (talk) 04:59, 29 February 2008 (UTC)


 * This may not be the right thing, but I've found some beautiful youtube videos such as


 * http://www.youtube.com/watch?v=JX3VmDgiFnY -- Mobius transformations on the plane and the Riemann sphere


 * http://www.youtube.com/watch?v=J-fcRzvRBqk -- 3D Fractal - Menger sponge


 * http://www.youtube.com/watch?v=SBxBQYmPIM4 -- The E8 Lie Group


 * http://www.youtube.com/watch?v=d1Vjsm9pQlc -- The Alexander Sphere


 * Visualising mathematics is so important, and I seriously seriously beleive it could get a lot more people interested in it. I'm by no means any mathematician but my understanding of most concepts I have learnt so far have been because I have seen them visualised in some way. Damien Karras (talk) 07:01, 29 February 2008 (UTC)


 * fractal stellated dodecahedrons http://www.clowder.net/hop/Keplrfrct/Keplrfrct.html scroll down - some more stuff on this persons (David Hop) page.87.102.38.45 (talk) 12:12, 29 February 2008 (UTC)
 * There are also proceedural formulars that give a very good appearance of wood grain in a plane - look nice (provided you choose a nice palette)87.102.38.45 (talk) 12:12, 29 February 2008 (UTC)
 * I remember some guy/girl left their doctorate on the web - about filling hemispheres with smaller spheres - that to me was 'pretty' can't find it as yet but in the meantime look at Apollonian sphere packing87.102.38.45 (talk) 12:19, 29 February 2008 (UTC)
 * Cellular automaton eg Conway's Game of Life can be visually entertaining87.102.38.45 (talk) 12:32, 29 February 2008 (UTC)
 * How about the Koch Sphereflake??? (3D version of the Koch snowflake) A math-wiki (talk) 12:37, 1 March 2008 (UTC)

Learning maths from scratch
I always want to learn maths all over again, but I don’t know how to start with. Now, I’m undergraduate, I “parted” with everything science since high school, and I did almost nothing more than very simple statistics (like poisson distribution, but I don’t really remember it) and calculus (I don’t remember how to differentiate). So, if I want to know “advanced maths”, like what is being taught at university, what sorts of books (maybe in English) may I use (as an adult learner)?

My “junior high school” maths is about factorizing, polynomials, simple geometry (calculating angles) and so on; which areas do high school maths (I mean, for learning science subjects) and university maths cover? I’m looking for books for learning more advanced things (compared to my present level) and I don’t just concentrate on calculus or statistics (or something like that). Any suggestions? --Fitzwilliam (talk) 14:10, 29 February 2008 (UTC)


 * From my experience, university maths tends to recap all you learnt at school in the first year - try just going along to the first year maths lectures. -mattbuck (Talk) 14:39, 29 February 2008 (UTC)
 * Especially maths lectures targeted at science students, rather than maths students. My Maths department (at a UK uni, I know it's a little different elsewhere) offers a module called "Mathematics for Scientists and Engineers" which covers a wide range of mathematical topics at a fairly basic level (by Uni standards) without assuming much (if any) prior knowledge. If there are similar modules are your uni, those would be the ones to go to. --Tango (talk) 16:11, 29 February 2008 (UTC)


 * I'd go to a Maths Department. They love to find new students. Imagine Reason (talk) 23:15, 29 February 2008 (UTC)


 * What is your motivation? Do you want to learn advanced maths primarily because you feel it will be useful, or is it more for fun? If you have forgotten how to differentiate, maybe you should take a course in calculus (actually analysis) anyway, using a text that does not just give the rules and formulas but also precise definitions of concepts like the real numbers, limit, and continuity, and rigorous proofs. Other topics you may study that don't immediately require much prior knowledge are linear algebra and projective geometry. Also consider elementary number theory and combinatorics. A nice book is Concrete Mathematics; although aiming at hopeful computer scientists, it is also quite valuable for mathematicians. I'd also advice you not to go immediately very deep into one field of maths, but to first build up a fairly broad basic knowledge of various fields. Much of the more advanced stuff in maths requires some knowledge of other fields. --Lambiam 00:26, 1 March 2008 (UTC)

Regression Question
I have a dataset and I've just run a regression on it using R. I have a model which is predicting well, but I don't know how to tell from the results I have whether the relationship between the two variables ("part" is the DV, "conf" is the IV) is inverse or not. I've been doing a lot of research and everything I've come across as only shown me regression equations where it's quite easy to see the direction of the gradient, but it's not so clear from the output of R. Here's the model:

Generalized linear mixed model fit using Laplace Formula: part ~ 1 + conf + (1 | id) + (1 | word) Data: align Family: binomial(logit link) AIC  BIC logLik deviance 490.1 507.7 -241.1   482.1 Random effects: Groups Name       Variance Std.Dev. word  (Intercept) 2.977549 1.72556 id    (Intercept) 0.097771 0.31268 number of obs: 601, groups: word, 32; id, 20

Estimated scale (compare to 1 )  0.9142627

Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.1001     0.4014  -7.724 1.13e-14 *** conf         1.7941     0.2734   6.563 5.29e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects: (Intr) conf -0.505

So can anybody tell me where I should be looking to find out the direction of the correlation (The data is too cloudy to do it simply by studying a graph)? --Kiltman67 (talk) 14:52, 29 February 2008 (UTC)

Edit: The format of R's outputs hasn't fitted in very nicely, but to my eye it's still readable. If anyone has any problems I'm willing to alter it. --Kiltman67 (talk) 14:54, 29 February 2008 (UTC)


 * I don't understand the question. By "inverse relationship", do you mean a negative correlation (as one goes up, the other goes down)? Does the program not output correlation coefficients? What do you mean by "Laplace Formula"? Apparently not Laplace's formula. The use of the term "gradient" suggests that you may be referring to Laplace's method, but it is not clear how that relates to regression analysis (which I assume you are attempting to do). I cannot interpret the formula "part ~ 1 + conf + (1 | id) + (1 | word)"; what are the meanings of "~" and "|" here, and what are "id" and "word"?  --Lambiam 00:40, 1 March 2008 (UTC)


 * If I understand your idea and the output correctly, the program fits
 * $$part=-3.1001 + 1.7941 conf.$$
 * In line with Lambian's comment, that would mean a positive relationship between part and conf. Have we interpreted your question correctly? Pallida  Mors  18:28, 1 March 2008 (UTC)


 * That is indeed what I mean by an inverse relationship.


 * It's a Mixed Effects Logistic Regression. The Laplace formula part is just a part of the output and seems to vary depending on what data you enter. When I ran the regression the only argument I specified was that the family should be binomial, I would assume everything else is just default.


 * In the formula, ~ is the symbol that R uses for = in formulas. As for the rest, it's probably easiest explained if I tell you what the regression is about. I'm trying to find factors which will lead to people using unusual synonyms (part is their response). Conf is one of the factors I'm exploring. The type of regression I'm using allows you to include baselines and (1|id) + (1|word) is saying that different people (id) will more or less likely to use unusual synonyms and different words (word) vary in the degree people will use an unusual synonym. --Kiltman67 (talk) 03:15, 2 March 2008 (UTC)


 * Can you show your actual R code? What package are you using? Last time I tried to do this in R it was just lme that would fit these models and it was awful in terms of interpreting the output. Also, I would bet the error model does not change the result much, you could just run lm to get a general idea with more familiar output.Pdbailey (talk) 04:28, 6 March 2008 (UTC)

Teaching the adjoint method of calculating the gradient
I'm lecturing next week, and one of the topics is the adjoint method of gradient calculation. The audience is graduate students from a wide array of backgrounds, the most common of which is engineering. The level of mathematical knowledge varies greatly. They've been introduced to the Lagrangian but haven't explored it in any depth. I would like some help in figuring out how to present the adjoint method. I would like to be mathematically accurate without being too complicated, and if possible, would like to give the students some kind of intuition. I can't do anything horribly mathematically incorrect or the more mathematically-educated will be confused, and I can't do too much math without leaving the less mathematically-educated behind. This class is more aimed at people using the methods, rather than writing them, so they need to understand advantages/disadvantages and a basic idea of how it works. I also only have a few minutes for this topic.

Here's one way to present it:

The ultimate goal is to solve an optimization problem:

$$ \min_{\mathbf{x}} J(\mathbf{x,u}) $$

subject to $$ \mathbf{R}(\mathbf{x,u})=\mathbf{0}$$

where J is a scalar objective function (for example, drag), $$\mathbf{x} \in \R^{n} $$ is a vector of design variables (for example, defining the shape of a wing), $$\mathbf{u} \in \R^{k} $$ is a vector of state variables (for example, flow variables), and $$\mathbf{R} \in \R^{m} $$ is a vector of constraints, often defining a system of PDEs (for example, the Navier-Stokes equations). The method we're using to solve this problem is a gradient based optimization algorithm, so we need the gradient of the objective function with respect to the design variables $$\frac{dJ}{d\mathbf{x}}$$.

Define a vector $$ \mathbf{\psi} \in \R^m $$ that we will call the adjoint vector. Form the quantity

$$I(\mathbf{x,u})=J(\mathbf{x,u})-\mathbf{\psi}^T \mathbf{R}(\mathbf{x,u})$$.

Now take a differential of each side of this equation:

$$\delta I= \frac{\partial J}{\partial \mathbf{x}}^T\delta \mathbf{x} + \frac{\partial J}{\partial \mathbf{u}}^T\delta \mathbf{u}- \mathbf{\psi}^T \left( \frac{\partial \mathbf{R}}{\partial \mathbf{x}}^T\delta \mathbf{x} + \frac{\partial \mathbf{R}}{\partial \mathbf{u}}^T\delta \mathbf{u} \right) $$.

This must vanish. (um, why?) Rearranging

$$\delta I= \left( \frac{\partial J}{\partial \mathbf{x}}^T - \psi^T\frac{\partial \mathbf{R}}{\partial \mathbf{x}}^T\right) \delta \mathbf{x} + \left(\frac{\partial J}{\partial \mathbf{u}}^T- \psi^T \frac{\partial \mathbf{R}}{\partial \mathbf{u}}^T \right) \delta \mathbf{u}=0$$.

We have specified nothing about $$\psi$$ so far, and are free to choose it as we like. We choose it such that:

$$ \frac{\partial J}{\partial \mathbf{u}}^T- \psi^T \frac{\partial \mathbf{R}}{\partial \mathbf{u}}^T = 0 $$

or

$$\frac{\partial \mathbf{R}}{\partial \mathbf{u}} \psi = \frac{\partial J}{\partial \mathbf{u}}$$

which we are going to call the adjoint equation.

We can then get the gradient of J with respect to x by first solving the adjoint equation and then using

$$ \frac{\partial J}{\partial \mathbf{x}} = \frac{\partial \mathbf{R}}{\partial \mathbf{x}} \psi $$

You can see I'm confused here because I ended up with a partial derivative when I thought we were looking for the total derivative.

The upshot of this method is that if the original set of constraints is a PDE, so is the adjoint equation, and it takes a similar amount of time to solve. So the time for the gradient calculation is independent of n, as opposed to a finite difference calculation. —Preceding unsigned comment added by Moink (talk • contribs) 15:47, 29 February 2008 (UTC)
 * I hope this answer doesn't come too late for you to use it! It took a while for me to see the problem.  Your mistake is assuming that $$\delta I=0$$: that is true at the actual solution (call it $$(\mathbf x^*,\mathbf u^*)$$), but not elsewhere.  What is true is that $$\frac{dI}{d\mathbf x}=\frac{dJ}{d\mathbf x}$$ whenever the constraint is satisfied, because along the constraint's zero surface $$R(\mathbf x,\mathbf u)\equiv 0$$ by definition and $$\frac{dR}{d\mathbf x}\equiv 0$$ because the definition of $$\frac d{d\mathbf x}$$ here is "movement along the constraint with a speed of 1 in each x direction, with whatever resulting speed in u directions".  (You may have been perfectly clear on that last point; your exposition didn't mention it explicitly.)
 * Of course, $$\left.\frac{dJ}{d\mathbf x}\right|_{\mathbf x=\mathbf x^*}=0$$, so the same applies for $$\delta I=dI$$ (I see no reason to invoke the variational derivative here). However, we're not solving that equation!  This is optimization, not algebra, and what we want is an expression for $$\frac{dI}{d\mathbf x}$$ valid everywhere on the constraint surface (because it is equal to J  's derivative there, which is what we really want).  Of course, we choose I to contain a Lagrange multiplier which we can choose to make $$\frac{\partial I}{\partial\mathbf u}\equiv 0\rightarrow\frac{dI}{d\mathbf x}=\frac{\partial I}{\partial\mathbf x}$$, which is easier to evaluate (typically because $$n\ll m$$, I believe).  You solve the adjoint equation as you gave it, and then have $$\frac{dJ}{d\mathbf x}=\frac{\partial J}{\partial\mathbf x}-\frac{\partial R}{\partial\mathbf x}\psi$$ &mdash; again, valid only on the constraint surface, but you're always on it.  --Tardis (talk) 18:55, 6 March 2008 (UTC)


 * Thanks! My supervisor, who taught it last year, explicity called the adjoint a vector of Lagrange multipliers, and used the KKT conditions to get the adjoint equation and another equation to be used by the optimizer.  This didn't make sense to me, because adjoints are used to get the gradient everywhere, not just at the optimum, and are used for things other than optimization.  So while an adjoint has something in common with Lagrange multipliers (and, I think, they should be equal at the optimum solution), it is not the same thing.  I eventually did figure out that it's because R is stationary that we can do that manipulation, and that's how I taught it.  I'm not sure that the students got it, but that's ok, if they need to use it one day, they've heard of its existence and can look it up.  moink (talk) 20:08, 6 March 2008 (UTC)


 * Oh, and I was wrong about the equation for the total derivative. It's actually that:


 * $$\frac{dJ}{d\mathbf{x}} =\frac{\partial J}{\partial \mathbf{x}} + \frac{\partial \mathbf{R}}{\partial \mathbf{x}} \psi $$


 * The real way to present this is not even to define I, but instead just to say that since delta-R (too lazy for Latex!) is always zero, you can add any multiple of it to an expression without changing the value. So expand delta-J how I did, add psi times delta-R (which is zero), define the adjoint, cross out the terms corresponding to delta-0, and tada.  moink (talk) 20:13, 6 March 2008 (UTC)

The logistic map
How would I go about showing that the logistic map is chaotic when r = 4? I've tried to show sensitivity to initial conditions, using the solution:

$$ x_n = \frac{1}{2} \lbrace 1 - \cos[2^n \arccos(1-2x_0) ] \rbrace $$

(from http://mathworld.wolfram.com/LogisticMap.html)

...but I'm not having much luck. I'm looking for something a bit more analytic than a cobweb diagram. I considered expressing x0 in binary, so that the multiplication by 2n would just result in a shunting of the decimal point, but I'm not sure how helpful an avenue that would be to go down, or even how to go about showing it - apart from that, I'm at a loss. Any help would be great, thanks, 81.102.34.92 (talk) 17:36, 29 February 2008 (UTC)


 * Binary expansion is a good idea, but not applied to x0. Define
 * $$z_n = \frac{1}{2\pi}\arccos(1-2x_n)\,,$$
 * and consider how the binary expansion of zn relates to that of z0.
 * --Lambiam 23:59, 29 February 2008 (UTC)


 * Ahah, thanks - I take it I was aiming for zn = 2nz0? I can see how that would help show the sensitivity to initial conditions, since two numbers differing only after, say, the millionth decimal place would eventually (after enough iterations) lead to completely different zn values - is that how to think of it? What domain is zn in though? Given the image of arccos is [0,π], the expression for zn suggests that the domain would be [0,0.5] - but the continuous doubling of z0 to get each zn would take it outside this interval at some point, wouldn't it? Does that not really matter?
 * Do I have to say something about the continuity of the cosine function, to make it clear that the very small difference in the two zn values produces a very small difference in the corresponding xn values, given how I've rearranged your expression for zn? Thanks a lot, that's really helped, 81.102.34.92 (talk) 16:48, 1 March 2008 (UTC)


 * To show sensitivity to initial conditions, formally all you need to do is to consider d/dx 0 xn; the transformation to angles, in which you can see you double the angle each iteration step, gives insight. The domain you choose for arccos is not terribly relevant, since to get to xn values you apply the cos function anyway, but conceptually the easiest is to take [0,1) – and accept that the representation is not unique. xn is non-negative if the two bits of zn following the "binary point" are the same, and non-positive otherwise. (If z0 is irrational, you can replace "non-negative" and "non-positive" by the stronger "positive" and "negative", respectively, since then xn can't be 0.) This means that considering the sequence sn = sign xn, you can get any desired sequence (sn)n, like e.g. ++−+−++−−−+−−−−−+++−..., by choosing z0 right, for this case .000110001011010100001... if I did this correctly. So knowing the signs of  x0 through x1000 doesn't tell you anything about  x1001.  --Lambiam 22:25, 1 March 2008 (UTC)