Wikipedia:Reference desk/Archives/Mathematics/2007 November 24

= November 24 =

Need Guidance In Cluster Analysis
Can anyone point me to resources for learning how to program the statistical technique known as cluster analysis? It has been many years since I've been in school, so I'm looking for something easy to understand.

Here are further (gruesome) details, needed because there are many kinds of cluster analysis.

I have experimental data consisting of hundreds of vectors (lists) of about 10 real numbers in each vector (each number is a measure of an acoustical response at the Nth harmonic of an applied acoustical signal). Each vector, then, defines a point in 10-dimensional space.

Each vector has a "category", which means that it is associated with an experimental outcome.

The points in 10-space are expected to cluster together (be close to each other) by category.

What I'm looking for is written description of how to do two stages of processing: first, how to analyze these data vectors to form clusters that have small diameter and that are far from each other (reducing the dimensionality?), and second, how to take a new vector and quickly determine the probability that it belongs to each cluster (I don't need to find the cluster to which it belongs, mostly because I expect the clusters to overlap).

Any help would be appreciated, including questioning my assumptions.

David (talk) 00:35, 24 November 2007 (UTC)


 * There are many approaches for cluster generation, mainly because the desired result is not really defined. The typical approach is to compute the Delaunay Triangulation and then collapse that graph along the shortest edges until the resulting clusters have some minimal distance or the number of cluster is below some threshhold. Can you elaborate what you mean by "the results are expected to cluster by category"? If you know the category for each point, why do you need to do a cluster analysis? —Preceding unsigned comment added by 84.187.113.101 (talk) 00:56, 24 November 2007 (UTC)


 * I've used k-means clustering a few times in the past. It's a fairly straightforward idea and easy to implement. Typically you would run it several times with different random starting conditions and different numbers of clusters, and then pick the result you like best (or come up with some sort of automated selection criteria). - Rainwarrior (talk) 01:14, 24 November 2007 (UTC)


 * K-means is good if you don't have much experience with neural net programming. If you are looking for something more advanced, however, I would suggest a support vector machine, which specializes in finding separations between clusters that give the most leg room, that is, are maximally far from each cluster. SamuelRiv (talk) 01:51, 24 November 2007 (UTC)


 * What does the quality of k-means have to do with neural networks? They've very different approaches with different advantages and drawbacks. - Rainwarrior (talk) 03:16, 24 November 2007 (UTC)


 * K-means is pretty standard to be used as an introduction to neural networks, as it is effectively a single-layer perceptron with a linear learning rule. SamuelRiv (talk) 04:27, 24 November 2007 (UTC)


 * Ahh, that's an interesting way to think about it. - Rainwarrior (talk) 20:40, 24 November 2007 (UTC)


 * I think your main problem is one of terminology. What you're trying to do is not clustering, it's classification. Clustering would be if the vectors didn't have some category associated with them, and you wanted find natural clusters (based on Euclidean distance for instance, which is what k-means can do, although it also does classification). There are hundreds of machine learning algorithms that you can use for your task, and millions of pages of theory have been written on the subject. The good news is that the basics are fairly easy to understand. I suggest you have a look at Weka. It's a reasonably friendly tool that allows you to look at data and test various classification algorithms. Some parts of it may look a little intimidating, but you can figure most of it out by just playing with it.
 * I strongly advise against just picking a simple algorithm and implementing it. It will take you a lot of time and effort, and there's little guarantee that your choice of algorithm will work for your dataset. Read up on some theory, before you make your choice. You'll usually be able to find some library that implements the algorithm for you. Finally, be sure to consider two other things: do you want to be able to 'read' the model that the algorithm creates of your data (if so, don't go with neural networks) and most importantly, how will you test the performance of your model. Having a predictive model of your data is all fine and lovely, but if you don't know how well it will perform on new data, it's very little use. risk (talk) 15:14, 24 November 2007 (UTC)

converting money
how do you convert British pounds into US dollars? how many US dollars does one million British pounds equal to? —Preceding unsigned comment added by 75.111.88.149 (talk) 01:36, 24 November 2007 (UTC)
 * http://www.google.com/search?hl=en&q=one+million+British+pounds+in+US+dollars+ PrimeHunter (talk) 02:02, 24 November 2007 (UTC)
 * At the moment, 2.0589 million U.S. dollarsThomprod (talk) 02:26, 24 November 2007 (UTC)