Talk:Hierarchical clustering

This is the "main article" for Hierarchical clustering according to the Cluster_analysis page, yet that page actually has more information than this one about Hierarchical clustering algorithms. Surely this should not be. . . Electron100 (talk) 03:05, 21 September 2009 (UTC)

While I agree that this article is seriously lacking information I believe this page needs to be enhanced rather than merged. Using K-means clustering as an example the cluster analysis page gives an overview, but the main article provides more detailed information. (Humanpowered (talk) 15:30, 23 March 2011 (UTC))

Added WikiLink to User:Mathstat/Ward's_method — Preceding unsigned comment added by Jmajf (talk • contribs) 12:49, 28 November 2011 (UTC)

give example
Dear Sir Please write fluent and understandable about several kind of hierarchical clustering and please give example. — Preceding unsigned comment added by 83.172.123.165 (talk) 19:04, 16 December 2011 (UTC)

In the section "Metric" am I right that the "i" across which some of the distance metrics are summed is an index of data dimension? i.e. bivariate data will be i = {1, 2}.

If so it might make it clearer to put this definition of i in the text to make it clear to simpletons like me! Also two of the measures (Mahalanobis and cosine) do not sum across i. Does this mean they can only be used for single variate data? If not, is there another formula? — Preceding unsigned comment added by Periololon (talk • contribs) 14:44, 19 March 2012 (UTC)

V-linkage V-means
I was interested in this technique but I haven't found any reference, searching Google, Google Scholar. We need a source/reference.Moo (talk) 20:25, 11 May 2012 (UTC)


 * I found the following on what appears to be an old copy of the artcle cluster analysis at http://biocomp.bioen.uiuc.edu/oscar/tools/Hierarchical_Clustering.html


 * V-means clustering


 * V-means clustering utilizes cluster analysis and nonparametric statistical tests to key researchers into segments of data that may contain distinct homogenous sub-sets. The methodology embraced by V-means clustering circumvents many of the problems that traditionally beleaguer standard techniques for categorizing data. First, instead of relying on analyst predictions for the number of distinct sub-sets (k-means clustering), V-means clustering generates a pareto optimal number of sub-sets. V-means clustering is calibrated to a user-defined confidence level p, whereby the algorithm divides the data and then recombines the resulting groups until the probability that any given group belongs to the same distribution as either of its neighbors is less than p.


 * Second, V-means clustering makes use of repeated iterations of the nonparametric Kolmogorov-Smirnov test. Standard methods of dividing data into its constituent parts are often entangled in definitions of distances (distance measure clustering) or in assumptions about the normality of the data (expectation maximization clustering), but nonparametric analysis draws inference from the distribution functions of sets.


 * Third, the method is conceptually simple. Some methods combine multiple techniques in sequence in order to produce more robust results. From a practical standpoint this muddles the meaning of the results and frequently leads to conclusions typical of “data dredging.”


 * Unfortunately there was no citation. Melcombe (talk) 22:27, 11 May 2012 (UTC)

Hierarchical Clustering References
There is a 1967 paper, published in Psychometrika, titled "Hierarchical Clustering Schemes", by S. C. Johnson (yes, that's me...). It was extensively cited in the 70's and 80's, in part because Bell Labs gave away a FORTRAN program for free that did a couple of the methods described in the paper. The paper pointed out that there is a correspondence between hierarical clusterings and a kind of data metric called an ultrametric -- whenever you have a hierarchical clustering, it implies an ultrametic, and conversely. 76.244.36.165 (talk) 19:14, 18 October 2012 (UTC) Stephen C Johnson

US Patent application 14/718,804 achieves sub-quadratic complexity for dissimilarity measures based on distances in a Euclidean vector space.

http://arxiv.org/abs/1109.2378 is a good survey of the algorithms. — Preceding unsigned comment added by 2001:4898:80E8:B:5A:FC6F:C36B:3C4C (talk) 00:44, 4 October 2018 (UTC)

Example for Agglomerative Clustering edit
I changed The "increase" in variance for the cluster being merged (Ward's method[7]) to The "decrease" in variance for the cluster being merged (Ward's method[7]). So it is also above, to Cluster dissimilarity and so appears from Ward's method, https://en.wikipedia.org/wiki/Ward%27s_method — Preceding unsigned comment added by 2A02:5D8:200:600:82:150:200:4 (talk) 11:44, 20 August 2015 (UTC)

Divisive algorithms, hierarchical k-means
I think that hierarchical k-means deserves a mention or description, maybe even it's own page. As a starting point I'm mentioning it here. Perhaps the way to go is Hierarchical clustering#(agglomerative methods#(...),divisive#(hierarchical-kmeans,..others..)). Someone already tried to delete my reference to hkmeans in means saying it was spam - I think that's a little unfair, so i'm trying to explain it better. Fmadd (talk) 06:45, 14 May 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Hierarchical clustering. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20091110212529/http://www-stat.stanford.edu/~tibs/ElemStatLearn/ to http://www-stat.stanford.edu/~tibs/ElemStatLearn/

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 13:17, 3 November 2017 (UTC)

== Based on the rough observation of the change logs, the summary table of the 17 linkage criteria has been added on around 2014. Now, it has been reduced to 16 linkage criteria in such a way with a potential Wiki logic problem… ==

Based on the rough observation of the change logs, the summary table of the 17 linkage criteria has been added on around 2014. On April 26-27, 2024, as there are only 6 linkage criteria with their own Wiki pages. An effort, very new to the Wiki edit requirements, has been made to add a Wiki pages to a linkage criteria. Instead of just deleting the link to this new Wiki page, the existing linkage criteria has been deleted by the Wiki editor as well. Thus, the summary table constructed in 2014 of the 17 linkage criteria has now become 16 linkage criteria.

The main problem of over-deleting the record is that this can be abused by others intentionally to eliminate their competitors, and have the potential to cause serious loss of information for Wiki. Assume that there are three commercial products, A, B, and C, for a biological test experiment listed in Wiki. Supposed that the product A wants to eliminate products B and C in the Wiki list. Simply, product A can do cite-spamming for products B and C. The Wiki Editor will easily notice that there are cite-spamming for products B and C. Thus, in according with the Wiki practice for this page, the Wiki Editor will delete the Products B and C from Wiki. As a result, Product A will become the sole product listed in Wiki for that biological test experiment.

The above imaged case is to highlight that existing records before any possible cite-spamming should not be affected, and should be kept intact. Otherwise, old records as old as ten years can be easily deleted, and people of bad intention may just abuse the Wiki system in another way by using the Wiki Editor to delete their possible competitors in the Wiki.

Looking forward to the attention by the Wiki people who really care about how to do the best practice for the Wiki. It is understood that each organization has its own practices, and someone may not fully understand all these rules very well and thus overlook some rules. Yet, the consequence of over-deleting may become even much more serious than those guys who are just starting the learning process for the Wiki edit functions and still yet to understand all its many rules. 223.16.242.216 (talk) 06:03, 30 April 2024 (UTC)
 * Hi, the same formula is in place for "Hausdorff linkage", from what I can tell from a quick search these terms refer to the same method.

I also noticed that whilst you have submitted a conflict of interest edit request, you have also edited the page. If you do have a conflict of interest with this article please see WP:COI which recommends against editing the page directly. Thanks, Encoded   Talk 💬 07:08, 1 May 2024 (UTC)


 * Thanks for your reply and your quick look at this issue. Because this is quick look, it has overlooked the same issue that I raised. I am new to the Wiki edit and try to do some editing on April 26 and 27, 2024. Yet, from around 2014 up to April 26, 2024, there had been 17 linkage criteria in the linkage summary table. This table has been made with the efforts of some very good-hearted volunteer, maybe Wiki editor. There have been an item between the Minimum Variance (MNVAR)[9] and the Hausdorff linkage[10], called Mini-max linkage, between them. One can easily check the existence of this item, and see that this item has not been the product of the incident on April 26-27, 2024. On April 26-27, 2024, just an additional new link has been added to this item, and there are a total of two links for this item. As a new person of Wiki Edit, I do not know that to add a new link may be labeled as cite-spamming. Instead of just deleting the additional link, the whole record of Mini-max linkage, which should have been there when the summary table was set up, has been deleted. This is what I described as over-deleting.
 * It seems that just a quick look at the issue has not realized my point about this issue. 223.16.242.216 (talk) 09:22, 1 May 2024 (UTC)
 * Your dates are incorrect, it was added more recently than what you suggest. But that's not really relevant - that content has been in an article for a while does not mean it cannot be altered or removed - Wikipedia's policies grant no special protections to text just because it has been around for a while. MrOllie (talk) 18:09, 3 May 2024 (UTC)
 * Dear MrOllie,
 * Thanks for your reply to my message. Your reply is highly appreciated, as a Wiki Editor will need to take care of many pages voluntarily.
 * In your last message, I am happy to know that, finally there is Wiki Editor to understand that all the 17 items in the summary tables are there before April 26-27, 2024, when I tried to learn the Wiki Edit function and added an additional link to the existing record Mini-max linkage on top of its existing one link. On April 26-27, 2024, before I learn to practice the Wiki Edit function, indeed, I watched a Wiki Edit introductory video, and said that one is welcome to make try this function. Yes, there is the Wiki Editor to decide if the changes are appropriate or not. Nevertheless, the usual practice may be just to restore its changes to the original version, if the editor does not think the changes appropriate.
 * Yet, in this case, at least the record has existed there quite Mini-max linkage some time before April 26-27, 2024, and its existence is independent of the editing of that additional second link. Now, I have a question: why this record was chosen to be delete on the same dates by our Wiki Editor during April 26-27, 2024? It is very easy to think that this deletion of this existing record may be due to over-reacting / over-deleting to the trial to add one more link on April 26-27, 2024. The mathematical structure of Mini-max linkage (which is discovered in 2004) is somehow a little similar to Hausdorff linkage (discovered in 2007). Now the earlier discovered Mini-max linkage is chosen to be deleted by the Wiki Editor, while the later discovered Hausdorff linkage is chosen to be remained there in the table, may I know why there are such difference?
 * Anyway, really thanks for your previous reply out of your voluntary time.
 * It will be highly appreciated if any Wiki Editor can look deeper at this issue to see if there is any logical thinking that can be further improved in this case here. 223.16.242.216 (talk) 09:13, 10 May 2024 (UTC)