Wikipedia:Reference desk/Archives/Mathematics/2007 May 24

= May 24 =

Anybody from Research?
Hi! I hope this is the suitable place to post this question. I have got time series data for daily stock returns. I want to find whether its volatility (as measured by its standard deviation)increased during a particular event. My initial idea was to calculate the S.D. of returns of a few weeks surrounding that event and compare it to the overall S.D. to find out whether there lies a difference. Is it workable? If not what can be a better way? Please suggest. Thanks.--202.52.234.140 07:21, 24 May 2007 (UTC)
 * You could replace the data (from during the event) with "dummy" data and compare the SD of that time series with the original one. "Dummy" data could be based on some sort of moving average that wouldn't significantly alter the SD so that any change in SD can be attributed entirely and directly to the event.Zain Ebrahim 08:35, 24 May 2007 (UTC)


 * One word of caution: if the hypothesis of increased volatility was occasioned by looking at the same data to be used for the test, then no test will serve to establish significance, due to irreparable observator bias.
 * If you use the overall s.d. for comparison, you may be diluting the effect (if real) too much. It is probably better to take a smaller segment, for about as long as a notable effect of the event would be expected to persist (but not too short because of the customary √(n/(n−1)) correction counteracting the s.d. bias in small samples). --Lambiam Talk  11:24, 24 May 2007 (UTC)


 * Trying to detect an occasional change in some statistic of time-series data is intrinsically difficult, as the magnitude (and genuineness) of the change is wanted and when it occurs. That's why statistical control charts were devised, and I suggest you consider something of the sort. Assuming that "current" SD can be estimated (from the most recent n readings, where the number is a compromise between over- and under-sensitivity), an ongoing plot against time could show a change. The obvious plot would be SD v time, but a small change of slope of a straight line is easier to see than a small change of level, so a cusum (no Wiki article, but just Google the term) plot would be better. In essence, decide a reasonable value of the statistic in question, then plot accumulated deviations from this. --86.132.239.228 13:28, 24 May 2007 (UTC)


 * Just like there is no such thing as "instantaneous frequency" per Uncertainty Principle, I would seriously doubt there is such a thing as "instantaneous volatility". You can measure accurate time, or accurate volatility, but not both simultaneously.  Oh, you can calculate something you might consider calling an "instantaneous frequency", but really that frequency is over some specific window, and not for that particular time-point.   You need to make sure the window size you choose for the volatility measurement is decent for your application.  How to choose this? I'm not sure.  Sometimes, if there is a standard window others have been using for some particular application, it is good to use the same sized window so you can compare your results with theirs.  Also, the wider the window, the less times you need to actually calculate the volatility.  If you calculate the volatility around two successive points $$x_1$$,$$x_2$$, the new "data" you get by calculating the volatility around $$x_2$$ is almost nil, as $$x_2$$ is highly correlated to $$x_1$$.  If you are calculating volatility for a set of 50 points, you only need to calculate volatility every 50 points, although sometimes if you want to view smoother results, you could calculate it more often (however, the more redundant points you calculate then there's a chance you may "see" something that isn't actually there.)


 * Note: This posting is a result of my original research.  I AM NOT A PROFESSIONAL STATISTICIAN!  Which means I may be a dumbass for even posting. My main point is that sometimes it makes sense to analyze data sequences in something analogous to to the methods used for analyzing time/frequency distributions, even if you are not at all interested in the sinusoidal frequencies.  Take with a huge grain of salt.


 * (Mutters to himself, Did he really help the poster?) Root4(one) 16:13, 24 May 2007 (UTC)

Mann-Whitney U-test
In the MW U-test, how do you interpret p values? The article does not explain this. Thanks very much for your help in advance. Aaadddaaammm 08:37, 24 May 2007 (UTC)


 * Our Mann-Whitney U test article doesn't even mention P-values, but our article on them has a section on their interpretation. Or did you perhaps mean the ρ (rho) statistic described at the end of Mann-Whitney_U? -- Avenue 10:08, 24 May 2007 (UTC)


 * Yea I know on a basic level what p values mean, but am not sure what exactly they mean in this case. Is it the probability that the two groups of observations have arisen by chance? Aaadddaaammm 10:46, 24 May 2007 (UTC)


 * Do you understand the principles of statistical hypothesis testing in general? The Mann-Whitney U test gives you a statistic that you would use just like any other statistic to test your hypothesis. For that test, you need to know the p-value of the statistic (or, alternatively and more conveniently, the value of the statistic for which the p-value equals a predetermined significance level, giving you the critical region). --Lambiam Talk  11:04, 24 May 2007 (UTC)

Kurtosis
My professor says the effect of excess kurtosis is to increase the probability of very large values and very small values. As per my understanding a distribution has excess kurtosis if too many data points revolve around the mean (the frequency of observations around the mean is high) and make the curve more picked than the normal distribution. For example, there are 100 observations in a sample. Ninety six of the observations are 0, two are +1 and remaining two are -1. This is a distribution with excess kurtosis. Here we can clearly see that the probability of the value 0(intermediate value, the mean) is much higher than the probabilty of -1 and +1 (the extreme values). Then how come the probability of extreme values is high in a leptokurtic distribution? Rather the probability of intermediate values should be high, isn't it?--202.52.234.140 09:41, 24 May 2007 (UTC)


 * As an illustrative example generalizing yours, take the discrete distribution DV, having a parameter V, where with 96% probability you observe 0, 1% gets you +1, 1% is for −1, 1% for +V, and the last 1% is for −V. Then the excess kurtosis Kurt(DV) = 50(V4+1)/(V2+1)2 − 3. The first few values:
 * Kurt(D0) = 47
 * Kurt(D1) = 22
 * Kurt(D2) = 31
 * Kurt(D3) = 38
 * Observe how the value first goes down, and then increases with the value of |V|. The decrease can be "explained", if that is the right term, by the peak going down from 98% to 96%. The further increase can be "explained" by the likelihood of large deviations increasing. For example, D3 has a dramatically larger probability for observations whose magnitude is at least 3 than D2. Also when viewed as a continuous function of real-valued V, Kurt(DV) attains a minimum for |V| = 1. It's all in the formula; kurtosis is not an easy thing to get a "feel" for. --Lambiam Talk  10:38, 24 May 2007 (UTC)


 * Kurtosis is measured with respect to standard deviation; if we "peak" a distribution but simultaneously keep its s.d. constant, we must be extending its tails to make up for the concentration of probability at the mean. The result will, of course, have a very high probability of giving values near its mean, but will also have a higher probability than the original distribution of giving values very far (in terms of s.d.) from the mean.  --Tardis 14:50, 24 May 2007 (UTC)

Here's a link to kurtosis, for those who are confused. StuRat 19:41, 25 May 2007 (UTC)

Quantities without symbols
Is there, in mathematical notation, a preferred way to write quantities that are represented by words rather than by an ordinary mathematical symbol? E.g., I might want to say that antenna efficiency is defined as radiated power divided by input power without introducing symbols for each of them. Rather than just writing
 * $$\mathrm{antenna\;efficiency} = \frac{\mathrm{radiated\;power}}{\mathrm{input\;power}}$$,

how should I distinguish them from ordinary mathematical symbols? —Bromskloss 09:46, 24 May 2007 (UTC)


 * I don't know about "preferred", but the way you used, with the words in roman rather than the italics conventionally used for mathematical variables, is fairly common. For obvious reasons, this is not recommended for any but the simplest formulas; just imagine writing out the symbols in
 * $$V_a={\sqrt{R_aG_a}\,\lambda\cos\psi\over\sqrt{\pi Z_\circ}}E_b$$.
 * --Lambiam Talk 10:47, 24 May 2007 (UTC)


 * If phrases are used in a formula, some authors parenthesize them.
 * $$ (\text{antenna efficiency}) = \frac{(\text{radiated power})}{(\text{input power})} $$
 * More commonly, symbols are defined, either temporary or persistent.
 * $$ \text{Let }E = \text{antenna efficiency}, P_{\text{out}} = \text{radiated power}, P_{\text{in}} = \text{input power}; \text{then} \,\!$$
 * $$ E = \frac{P_{\text{out}}}{P_{\text{in}}} . $$
 * There is often little benefit in defining symbols to be used exactly once, so authors tend to choose accordingly. --KSmrqT 15:56, 24 May 2007 (UTC)

Area of a quadrilateral
A surveyor's map of a plot of land shows it to be a convex quadrilateral. The length of each side is given along with its compass bearing from which I can calculate each internal angle. Wanting to find the area I searched for an on-line (area) calculator without success. I do recall (from long ago) being able to find a solution knowing the four sides and two opposite internal angles but I cant even find that! So, I'm asking for a link to a calculator or the formula. Thanks, hydnjo talk 19:27, 24 May 2007 (UTC)

My wikimarkup blows but

K = pq sin(theta)/2 where p and q are diagonals, and theta is the internal angle between them. Further derrivation at the provided source. Source:

Hipocrite - &laquo; Talk &raquo; 19:34, 24 May 2007 (UTC)


 * Hmm, I don't know the diagonals or how to calculate them but, from your source link, this:
 * K = sqrt[(s-a)(s-b)(s-c)(s-d)-abcd cos2([A+C]/2)] (Bretschneider's Formula)
 * seems to do it. Thanks, hydnjo talk 20:15, 24 May 2007 (UTC)
 * My oops, this shoud have been asked at /Math (not /M) so I moved it here. Sorry, hydnjo talk 20:15, 24 May 2007 (UTC)


 * I usually chop quads into two triangles and use something like the old (base x height)/2 trick for the area of each triangle then add the two together - but there are a bazillion ways to find the area of a triangle - and they are a lot easier to remember and find online than the quadrilateral case. In general, memorize the triangle equations and you can use them for any polygon providing you have the patience to chop it up!  I like that you can chop a circle up into a gazillion tiny triangles - the height of each one is the radius of the circle and the bases of all of them add up to the circumpherence - hence the area of a circle is half the circumpherence times the radius. SteveBaker 20:31, 24 May 2007 (UTC)


 * I agree with SteveBaker, but since finding the height of a triangle is slightly harder, I would prefer Heron's formula. For this you will need the diagonals, which can of course be found using the Law of cosines. -- Meni Rosenfeld (talk) 21:03, 24 May 2007 (UTC)


 * I think Heron's formula is good if you only know all sides, but damned if that's a lot of calculating! (more calculating can imply more round-off error)  Besides, assuming you know the interior angle &theta;, $$\frac{1}{2} s_{1}s_{2}\ sin(\theta)$$ is good (you don't need to think in terms of bases and heights). I think the best one, which requires the least technical math (I.E. no trig or square root) for triangles oriented in two dimensional euclidean space is the determinant trick listed on the triangle page.  As his data is from a map plot, that may be the simplest (computational) answer. Root4(one) 21:55, 24 May 2007 (UTC)


 * Thanks to all. I think that Bretschneider's formula best suits this problem (for me) because of its plug-inability. hydnjo talk 22:16, 24 May 2007 (UTC)


 * $$\frac{1}{2} s_{1}s_{2}\ sin(\theta_1) + \frac{1}{2} s_{3}s_{4}\ sin(\theta_2)$$ is not plug-in able? Root4(one) 22:22, 24 May 2007 (UTC)


 * Oops - I wrote out my previous  response off-line before actually reading your post :-( and your approach certainly is much easier to "plug-in" than Bretschneider's formula.  Thank you,  hydnjo talk 22:48, 24 May 2007 (UTC)

I'm wondering if I'm truly the one who should have said "OOPS". Geez. One think one knows some simple trig. Reply coming shortly. Root4(one)

Area of triangle 1/2bh. Any triangle has three heights and three bases (sides). Right angled triangles are a special case --two of the heights ARE the sides. In any case, assume you have side-angle-side measurements, that is sides AB and BC and angle ABC -- the angle is formed by sides AB and BC. Let AB be your base. Then the height of your triangle for this particular base is the shortest distance between C and the line collinear with AB. We can find this by assuming  BC to be a hypotenuse for a right triangle defined by BC being the hypotenuse and angle ABC being the angle opposite the height. sin(ABC) is the ratio of the length of side opposite to the angle over the length of the hypotenuse, thus to find length h, we find length(BC)*sin(ABC). Since the area of a triangle is 1/2 * base length * height length, we have 1/2 length(AB)*sin(ABC)*length(BC). There's nothing in our proof that would disallow us to have assumed BC to the base and then come up with sin(ABC)*length(AB) as the height for that base.

I suppose there was no reason to doubt myself... I just rushed to judgment after a later second reading before I was absolutely convinced. That and my mind was a bit flustered after my last martial arts practice, leading me to have some significant doubts about things and my reasoning abilities in general. That's actually not a good excuse. I was right the first time and then I had to go correct myself :P Root4(one) 02:48, 25 May 2007 (UTC)
 * Your reasoning abilities are excellent. You don't need to worry. nadav (talk) 03:24, 25 May 2007 (UTC)
 * Your TeX could use a little refinement, however. Instead of multiplying "s", "i", and "n":
 * $$\frac{1}{2} s_{1}s_{2}\ sin(\theta_1) + \frac{1}{2} s_{3}s_{4}\ sin(\theta_2)$$
 * you should use the operator name "\sin".
 * $$\tfrac12 s_{1}s_{2}\ \sin(\theta_1) + \tfrac12 s_{3}s_{4}\ \sin(\theta_2)$$
 * It's a common mistake, so I'm saying this as much for the lurking masses as for you personally. --KSmrqT 04:01, 25 May 2007 (UTC)
 * No offense taken. I appreciate constructive criticism, as well as the occasional complement.  Thank you both. Root4(one) 12:04, 25 May 2007 (UTC)
 * Interestingly, the proof of Bretschneider's formula in our article begins with Root4(one)'s simply stated and intuitive area formula and goes on to prove an area formula which appears to be more complex. No? hydnjo talk 19:00, 25 May 2007 (UTC)