Talk:Closest pair of points problem

Untitled
Guys, this page is lovely but I'd really need some info on the complexity of this problem for Hamming spaces (coding theory). You can't use Voronoi here, we have a discrete finite space. Any idea?

Vance
 * Piotr Indyk has done quite a bit of work on approximate closest pairs in spaces that are not low-dimensional Euclidean. You might try looking up some of his research. —David Eppstein (talk) 15:36, 2 April 2009 (UTC)

Thanks. But this page is very misleading. It says "A naive algorithm of finding distances between all pairs of points and selecting the minimum requires O(dn2) time. It turns out that the problem may be solved in O(n log n) time (assuming the dimension d of the space to be constant)" However, this is not true, this holds only for Euclidean spaces, or better, I've found no reference for other spaces. —Preceding unsigned comment added by 193.205.213.166 (talk) 14:58, 24 April 2009 (UTC)


 * Michiel Smid's survey Closest-Point Problems in Computational Geometry discusses the problem in the context of any Minkowski metric. I think the part earlier in the article about d-dimensional spaces makes clear that this is referring to Rd. But obviously there are spaces other than Minkowski distances on Rd for which finding closest pairs is useful. —David Eppstein (talk) 15:08, 24 April 2009 (UTC)

Sorry to intrude in your discussion, but the article says "given n points in the d-dimensional metric space,", which means any metric space of dimension d. Why don't change that sentence into ""given n points in the d-dimensional Euclidean space R^d"? —Preceding unsigned comment added by 193.205.213.166 (talk) 07:32, 4 May 2009 (UTC)
 * Because the problem is also of interest for non-Euclidean metrics. —David Eppstein (talk) 14:09, 4 May 2009 (UTC)

Yes, the problem is of interest for non-Euclidean metric, but then the complexity isn't the same as for the Euclidean metric. I see the page has been changed, thanks for it. I have once comment: it should be said somewhere that the complexity is actually O(n log n) f(d), with f(d) exponential in d.. This may imply that for some finite spaces (such as the Hamming space) no algorithm can beat the naive approach. —Preceding unsigned comment added by VanceIII (talk • contribs) 11:50, 8 May 2009 (UTC)

Generalisations to d >= 3
Where it says "... while the divide and conquer algorithm can be generalized to take O(n log n) time for any constant value of d", can anyone give a hint on how to formulate the merge procedure for d >= 3 such that it is still linear in n? For the d = 2 case it's easy because the points can be ordered by y-value and then each point compared to the following 7 or 8 points in this list. But for d = 3, we need to do something similar in the 2D interface between the divided half-spaces. I can't figure out a linear-time algorithm for that. --Evgeni Sergeev (talk) 06:40, 28 December 2009 (UTC)

The 2d algorithm generalizes as O(n (logn) ^(d-1)). That seems to be the only algorithm not taking any assumptions on the distances. Otherwise there's a (cd)^(d/2) https://people.csail.mit.edu/indyk/6.838-old/handouts/lec17.pdf and for bounded distances, and a subqudratic algorithm for the approximate version in "Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem" Thomasda (talk) 10:28, 5 November 2016 (UTC)
 * The randomized linear time generalizes to any constant dimension (with an exponential dependence on dimension). It's also possible to use quadtrees to get O(n log n) in any dimension, deterministically, again with an exponential dependence (Clarkson, FOCS '83). I agree that naive generalizations of the divide and conquer method are worse. —David Eppstein (talk) 18:36, 5 November 2016 (UTC)

Planar Case constant split-pair limit
If there were at most 6 points in the rectangle then only 5n distances must be computed in the worst case, because one of the points is assumed to be in the rectangle, so there can be at most 5 others. The logic is faulty anyway, because some point pairs with distances less than d_min may exist in the rectangle if they are the desired split pair. I don't disbelieve that there are at most 6 points that need to be considered, but this argument doesn't prove it. I think the proof would rely on an assumption that no two points can have the same x-value. Mikebolt (talk) 02:09, 11 May 2014 (UTC)
 * Color the points red on one side of the line and blue on the other side of the line. What the argument says is that for each red point there is a rectangle, and that rectangle can contain at most six blue points. It could perhaps be reduced to five but I think it is perfectly valid as is. —David Eppstein (talk) 03:24, 11 May 2014 (UTC)
 * Thank you. I understand this argument now, and I was wrong to say that the proof is faulty. I was confusing it with the argument presented in the reference that uses a horizontally oriented bounding box. Mikebolt (talk) 23:08, 19 June 2014 (UTC)

Alternative algorithm
The recursive search algorithm is quite complicated and has high overhead for the complexity level. I think it would be helpful to also describe filtering the pre-sorted list of points in one dimension, so that only the pairs whose separation in the sorted dimension is beneath the current minimum distance are tested. This is simpler to explain and understand, and in my testing is consistently faster, unless I've got the recursive descent code wrong. See sample code here: http://pastebin.com/28dXviFg — Preceding unsigned comment added by MattVogt (talk • contribs) 06:00, 7 December 2014 (UTC)
 * We can only include material supported by reliable publications — pastebin isn't usable as a reference. But what I think we should be doing is describing the linear-time algorithm from the "Rabin flips a coin" reference, not the O(n log n) algorithm that we currently describe. —David Eppstein (talk) 06:37, 7 December 2014 (UTC)

Planar case algorithm question
2. Split the set of points into two equal-sized subsets by a vertical line...

3. Solve the problem recursively in the left and right subsets...

Do I understand it correctly that in case of at least two points having equal x-coordinate the algorithm is not defined?

Better Algortims
So I notice that the two best algorithms (by complexity) known are hardly even mentioned here (other than their result), despite their discovery being 40 or so years ago. Is there any chance of a synopsis of these being added (similar to the divide and conquer and brute force sections)? I found at least the randomized one in a textbook [citation 2, I believe], so it can't be that difficult obscure. — Preceding unsigned comment added by 2602:306:8317:3650:748D:A3C6:EA56:1A51 (talk) 00:19, 26 November 2017 (UTC)