Talk:Maximum spacing estimation

Definition terminology
In order to prevent plagiarism, I did not use the exact terminology of any of the papers, but some amalgam of a few. I'm pretty certain it is consistent, but it may be possible that I messed up, so a critical eye would be appreciated. -- Avi (talk) 20:50, 30 December 2008 (UTC)

Definition
This article says:
 * Pyke defines the first-order spacings as:
 * $$ D_i = F(i;\hat{\theta}) - F(i-1;\hat{\theta}) \, $$
 * $$ D_i = F(i;\hat{\theta}) - F(i-1;\hat{\theta}) \, $$

Could it be that what was meant was the following?
 * $$ D_i = F(X_i;\hat{\theta}) - F(X_{i-1};\hat{\theta}) \, $$
 * $$ D_i = F(X_i;\hat{\theta}) - F(X_{i-1};\hat{\theta}) \, $$

If not, then where do the observed random variables X1, ..., X n enter the definition of Di? Michael Hardy (talk) 21:23, 30 December 2008 (UTC)


 * No, you are correct, that is what comes from trying to prevent plagiarism by using consistent terminology but not based on any one paper. The $$X_i$$ is more formal; various papers leave out the $$X$$ for convenience. Thanks again! -- Avi (talk) 23:04, 30 December 2008 (UTC)


 * Also, I was careful to note that X1, ..., X n are the ORDERED observed RV's. Is that clear enough in the article? -- Avi (talk) 23:05, 30 December 2008 (UTC)

It was clear to me; I'm not sure what anyone else will think. Michael Hardy (talk) 00:05, 31 December 2008 (UTC)

Potential problems that need to be covered
This method has some problems in terms of general applicability, that should potentially be discussed in the article.
 * Applicability to discrete distributions or distributions with discrete components. There may be some way of interpreting the terms presently included to deal with instances which formally give Di=0.
 * Application to continuous distributions where ties occur because of rounding, or because the estimation method is being studied via a bootstrapping experiment. The first of the following papers employs a solution and the second follow-up paper might say something useful ut I haven't had time to read it in detail: [], []
 * Potential to extend the approach to multivariate distributions? There don't seem to be be any practically implementable schemes that do not depend on the coordinate system chosen for the original data. But someone may know better.

Melcombe (talk) 16:31, 31 December 2008 (UTC)


 * Good points. Regarding discrete distributions, I'll have to look over what I have. Regarding ties, Cheng&Amin as well as some other paper(s) discuss what to do (substitute loglikelihood at that point), I'll put that in the article eventually. Regarding multivariate, there are papers by Rannaby et al and others extending this method, but they're way over my head. I'll update the article with some reference to them soon too. Thanks! -- Avi (talk) 18:31, 31 December 2008 (UTC)


 * I've addressed the sensitivities to ties and two methods given as to how to deal with them, and I have added the initial forays into the multivariate area, although someone who understands this better than I do has plenty of room to flesh it out. Regarding discrete distributions, the problems that MSE was derived to fix do not seem to apply to ML, so I do not think any research was done down that path, nor may it be necessary. Thoughts? -- Avi (talk) 00:22, 2 January 2009 (UTC)


 * I think that the article does a reasonably good job of presenting what is known and what can be used presently. Perhaps a little more context could be given early on. I think MPS is mainly being used to obercome problems with MLE in specific cases, but not immediately as a generally applicable estimation methodology that one would automatically think of trying in all cases. While I could be wrong, this sort of context could be added in the lede.


 * The main other point to think about is what is actually being applied ... The uses I have come across have not been full MPS estimation schemes but just a modification of MLE where the treatment of only a largest or smallest observations is treated using a "spacing" formulation. This mixed approach could be mentioned. In any case it would be good to have some references where the estimation scheme is actually applied, rather than just being the subject of research.


 * One minor point, and probably not one for inclusion in the article, is to look at the similarity/differences of the formule being used for MPS/MSE and for a scheme based on minimising the Anderson-Darling test statistic (as the method of estimation).


 * Melcombe (talk) 10:51, 2 January 2009 (UTC)

Rating
According to WP:WPM, a Bplus article is one that is "Useful to nearly all readers. A good treatment of the subject which attempts to be as accessible as possible, with a minimum of jargon. No obvious problems, gaps, excessive information." I think this qualifies. Please feel free to revert if you disagree. I know it is weak on multivariate, but WPM also says that a Bplus "May be improved by input from experts to assess where coverage is still missing, and also by illustrations, historical background and further references." Any experts are welcome to make this even better! (or correct errors :} ) -- Avi (talk) 00:29, 2 January 2009 (UTC)

Reference Style
I may play around with the ref harvard template and see if I can get something like he standard [#] referencing in mathematical papers but with the backlinking feature. -- Avi (talk) 02:08, 6 January 2009 (UTC)
 * I think I'll hold of on that for now. -- Avi (talk) 07:46, 6 January 2009 (UTC)

GA
I am satisfied with the article now, and am passing it for GA. Thanks for the conscientious editing. Looie496 (talk) 18:22, 24 January 2009 (UTC)
 * Thank you very much for your advice, suggestions, and patience. -- Avi (talk) 07:31, 25 January 2009 (UTC)
 * It was a pleasure. Looie496 (talk) 18:34, 25 January 2009 (UTC)

Addressing A-class concerns
After a long hiatus, I am trying to address the concerns raised at WikiProject Mathematics/A-class rating/Maximum spacing estimation.

First, I added a very simplistic example. The estimate comes out rather worse than the maximum likelihood estimate, but with only two samples, once can't expect anything reasonable anyway. Next, I redid the lede and the first section to more fit with the math MoS, give a more informal intro to the topic, and move some history out of the lede. Then I added a graph, similar to the one on maximum likelihood estimation, showing the optimal point and getting one picture into the article. -- Avi (talk) 07:04, 27 July 2009 (UTC)

I've just added a bit more explanation to some of the other disciplines using the method. At this point, I think it's pretty close to being ready for another discussion. -- Avi (talk) 07:20, 27 July 2009 (UTC)


 * I looked through the article and copyedited it some. It doesn't seem to have any glaring style issues, but the MOS people will probably find a couple minor things to touch up.
 * The image in the example section would benefit from a good explanatory caption. It is also slightly too large to fit comfortably with the text in my screen.
 * I assume the "Editor" in footnotes 2 and 4 is the voice of Wikipedia. Some people would complain about us pointing out errors in papers; I don't really mind, assuming the description there is correct, but it would likely raise eyebrows at an FA discussion.
 * In terms of accessibility, I think the article will be readable by its intended audience, and I don't see any easy way to make the article more accessible in a reasonable amount of space.
 * &mdash; Carl (CBM · talk) 21:17, 28 July 2009 (UTC)

Thank you very much for your work. In response to your specific points: -- Avi (talk) 02:11, 29 July 2009 (UTC)
 * 1) Caption added and size reduced
 * Yes, and I have e-mails from Dr. Cheng confirming the typos. I added them as I spent a while trying to figure out what I was missing when I read the papers, only to realize that I did understand them, the issue was typographical errors, and I wanted to prevent anyone else from having the same problem.
 * 1) Thank you; I think so too.


 * I’d like to add some points regarding how the article can be improved.
 * 1. The sentence “'In practice, optimization is usually performed by minimizing −Sn(θ), similar to maximum likelihood procedures which usually minimizes the negative loglikelihood.” seems dubious to me. I don’t recall minimization problems being any different from maximization, it probably depends on the software you’re using… Is it possible to back up such statement with a quotation?
 * I believe you removed that, so it is moot now. -- Avi (talk) 02:35, 2 August 2009 (UTC)
 * 2. Sections should not have only 1-2 sentences in them — they look like stubs. Either try to expand corresponding sections, or merge them.
 * Merged, although I will try and expand it as well. -- Avi (talk) 20:05, 3 August 2009 (UTC)
 * 3. The MSE method is suggested as an alternative to MLE. There should probably be a section discussing pros and cons of each method. In particular, how do the methods compare in “regular” cases (that is, when MLE is applicable)? MLE relies on the use of pdf function, while MSE uses the cdf. It means that in most theoretical cases MLE would be easier, simply because when you have a cdf you can compute pdf by differentiation; whereas finding antiderivative analytically is not always possible. However, are there cases when MSE is easier to compute than MLE? Both methods have the same asymptotic distribution in “regular” cases — but what about second-order bias?
 * 4. It would be useful to have a “real” example of the application of MSE. The one with two points from exponential distribution is certainly quite simple, but not at all persuasive with respect to usefulness of the method. It is probably better to include the example with estimation of uniform distribution with unknown endpoints (I believe it was 2.1 in the Cheng’s article). Such example would clearly show that MSE gives good results where MLE fails.
 * And thank you very much for doing so [[file:face-smile.svg|25px]] -- Avi (talk) 18:55, 2 August 2009 (UTC)
 * Incidentally, I noticed that neither Uniform distribution (continuous), nor German tank problem, nor even UMVU articles mention that these estimators derived in the example 2 are in fact the best ones for estimating the continuous distribution (at least in the sense of UMVUE). Well I guess we could wait for wikielves to fix the issue :) ...  st pasha  » talk  » 22:49, 2 August 2009 (UTC)
 * Now that I think of it, maybe we should have a third example: the example where MLE fails altogether (such as mixtures of distributions, or distributions with unbounded pdf) but the MSE gives good results. ...  st pasha  » talk » 08:11, 4 August 2009 (UTC)
 * 5. Picture in the lead section would be nice. Articles with pictures generally look more attractive.
 * I'm not sure how to do that. Any suggestions? -- Avi (talk) 02:35, 2 August 2009 (UTC)
 * I was thinking about something like this: a typical cdf function in XY plane; on the X axis six points: X1, ellipsis “…”, Xi−1, Xi, ellipsis “…”, and Xn. On the Y axis 3 intervals are shown using a “{” curly bracket: D1, Di, Dn+1, and probably 2 more vertical ellipses “⁞”. The caption under the picture would read: MSE method tries to find such a distribution function that the spacings Di are all approximately the same length. This is done by minimizing the geometrical mean of all n+1 spacings. ...  st pasha  » talk » 22:49, 2 August 2009 (UTC)
 * Oy Vey; I may have to pull out my Murrell for that one Face-surprise.svg. -- Avi (talk) 23:13, 2 August 2009 (UTC)
 * Between R and inkscape, I got a rough image, but it could use some major improvement :( -- Avi (talk) 18:22, 3 August 2009 (UTC)
 * One of the Comons experts fixed the image for us :) -- Avi (talk) 21:05, 3 August 2009 (UTC)\
 * I bow to your 1337 1|\|k5c4p3 sk1llz. -- Avi (talk) 22:48, 3 August 2009 (UTC)
 * 6. (Last but not the least) Sections with consistency, efficiency and asymptotic normality have to be expanded. It is not enough to know that “there exist certain conditions under which MSE is consistent” — we need to know what those conditions are. Are they really less restrictive than MLE conditions (that is to say, is it true MSE converges in all cases when MLE does)? The article states that estimates are asymptotically normal (again under certain conditions). I can infer from the “efficiency” section that asymptotic variance of the estimator is equal to the inverse of the Fisher’s information matrix. However it doesn’t mean that everybody else can infer this, and even then it would be better to have an expression in terms of “native” function S instead of having to compute the derivatives of log-likelihood. I tried to look up those conditions myself, but then it gets fishy as Cheng refers to his older article which I couldn’t locate.
 * ...  st pasha  » talk » 23:55, 31 July 2009 (UTC)

Further suggestions
I still have several suggestions for the article, listed approximately in the decreasing order of significance.
 * Missing section on the asymptotic distribution of the MSE estimator. Is $$\hat\theta$$ asymptotically normal? Under which conditions? What is its variance?
 * The section with consistency and efficiency can be expanded to list the conditions under which the estimator is consistent and efficient. Oh and while we’re at it — usually the notion of efficiency is defined for regular estimators in regular parametric models, that is in those cases when maximum likelihood estimates exist. In irregular cases (such as uniform distribution with unknown endpoints) I don’t know how efficiency is defined, and therefore what would be the meaning of the phrase “MSE is at least as efficient as MLE”…
 * I think there should be a section which discusses potential drawbacks of the MSE method, something like: (1) for many models the cdf F is more difficult to compute than the pdf f required for MLE, (2) sensitivity to round-off errors and “clumped” observations, (3) difficulties with extending the method to the multivariate case.
 * “Generalized MSE” section requires expansion
 * “Goodness of fit” section must use the same definition of statistic S_n(θ) as the rest of the article, and thus the values for the mean and the variance have to be correspondingly adjusted. Moreover, it is a very sloppy style to say that something has asymptotic normal distribution with mean and variance as functions of n (even though that’s what Cheng&Stephens wrote in their article). For example for a standard OLS estimator we never say that β^ → N(β, σ²/n), but we write √n(β^−β)→N(0,σ²). In case of Cheng&Stephens, it is not clear to me how fast the convergence of distributions is, and how many terms in the approximation for μM and σ²M is it sensible to keep. The later claim that test statistic T(θ) has to be adjusted by ½k in case when θ is being estimated seems to contradict the statement that S(θ) has the same asymptotic distribution regardless of the fact if θ is known or estimated.
 * Regarding the use of terms “Moran statistic” and “Moran−Darling” — I’m not sure they quite fit into the context. If you look at Darling’s paper (I sent it to you, just in case), he considers the statistic Σh(Z(j)−Z(j−1)), where Z′s are any random variables distributed U[0,1], and h(·) any nice function. I don’t think Darling ever considered to apply his findings to Z′s given by the inverse probability transform. However he did consider the case when h(z)=ln(z), and thus S_n can be viewed as Darling’s statistic. As for Moran, he only ever considered the case of h(x)=x², which means his name shouldn’t be credited in this article.
 * Pictures can be converted into vector format (SVG), and use text labels with larger font size and anti-aliased. The picture of the cdf of a J-shaped distribution has some weird jump at x=10, which cannot happen in actual cdf.
 * I believe that harvard-style inline citations should be used more sparinlgy; because otherwise the article creates impression that Cheng&Amin (and also sometimes Ranneby) are the only creators and contributors to the method. I’m not saying that such quotations should be banned altogether: for example the phrase “Cheng & Stephens (1989) use an alternative definition of function S_n …” is probably ok; whereas “Maximum spacing estimator may exists in cases where MLE does not (Cheng & Amin 1983)” is probably not.
 * I think it’ll be better to remove the mention of the institutions where Cheng, Amin and Ranneby come from in the History section (or at least turn them into abbreviations) — they make the first sentence of the paragraph nearly impossible to digest.
 * [Dubious] The list of references with small numbers in front of them looks jagged (actually a common problem on Wikipedia). However the harvard quotes are capable to link directly to the item being cited through or  — take a look for example at Errors-in-variables models.

...  st pasha  » talk » 19:55, 15 September 2009 (UTC)

Comments
In the first example: Are you sure that's right? Maybe I'm confused, but ... In the first row of the table, F(x(i−1)) is given as 0. From the previous section, x(0) = −∞ and when you substitute −∞ as x into the equation F(x;λ) = 1 − e−xλ, you have two minus signs in the exponent which cancel out, so you have positive infinity in the exponent, which didn't used to work out to 1 when I was in university. Also, why is the difference between x(1) and x(0) considered, but the difference between x(n + 1) and x(n) is not included in the table nor in the calculation?

"The density will tend to infinity as x approaches the location parameter rendering estimates of the other parameters inconsistent." Not sure what this means. It seems ambiguous. Depending on the meaning, I suggest either adding a comma after "parameter", or changing "rendering" to "which renders". ☺ Coppertwig (talk) 22:03, 9 October 2010 (UTC)


 * The cdf of the exponential distribution is F(x;λ) = (1 − e−xλ)·1{x≥0}, meaning that it is set to 0 for all negative x’s. This indicator is written simply as x ≥ 0 in the article. So F(x(0)) is indeed 0. Also in the example we have n = 2, x(n+1) = +∞, and the difference between F(x(n+1)) and F(x(n)) is given in the third row of the table; it is equal to e−4λ.
 * As for your second question, I don't know what that means either.  // st pasha  » 23:44, 9 October 2010 (UTC)


 * It means the scale and shape parameter won't be estimated properly by MLE even when you have lots of values. It probably needs something showing that somehow though - might be an idea to call up someone on the reference desk for help. Dmcq (talk) 10:17, 10 October 2010 (UTC)
 * Re the first point: Oops, my mistake, I didn't notice the x ≥ 0. Thanks, Stpasha.
 * Re the second point: Dmcq, do you mean that the parsing of the sentence would be correct with a comma added after "parameter"? The question is: is it talking about a specific value of the location parameter such that that particular value renders estimates of the other parameters inconsistent, or is it the tendency of the density towards infinity which is being said to render the other parameters inconsistent?  In other words, which noun phrase in the sentence is the subject of the participle "rendering"?
 * If that sentence is talking about MLE, it should probably say so, e.g. appending "when using MLE" to the end of the sentence; I don't think it's quite clear from the context. ☺ Coppertwig (talk) 20:19, 10 October 2010 (UTC)

Maths A-class rating
I've closed the very old WikiProject Mathematics/A-class rating/Maximum spacing estimation as no consensus due to the age an lack of input in the discussion.--Salix (talk): 17:58, 6 November 2010 (UTC)
 * I think the entire A-class math rating is defunct, sadly. Thanks for closing up the loose ends. -- Avi (talk) 21:47, 2 January 2011 (UTC)

Minimizing geometric is maximal for equal spacing
In the reasoning you say that the geometric mean of the spacings is maximal when the spacings are equals. But that's also true for any symmetric function or symmetric polynomial.

Is there any rationale in choosing the product $$\sigma_n = D_{1} \cdot D_{2} \cdot ... D_{n}$$ of the $$D_i$$ rather than the average (up to the constant factor $$n$$) $$\sigma_{1} = \sum D_{i}$$ or any other elementary symmetric polynomial $$\sigma_{k} = \sum D_{i_1} \cdot D_{i_2} \cdot ... \cdot D_{i_k}$$ ? — Preceding unsigned comment added by AlainD (talk • contribs) 18:00, 29 January 2019 (UTC)
 * Almost certainly to take the sum of the logged values for optimization purposes. Anything else would add complexity. -- Avi (talk) 04:25, 1 December 2021 (UTC)