John H. Wolfe

John H. Wolfe is the inventor of model-based clustering for continuous data. Wolfe graduated with a B.A. in mathematics from Caltech and then went to graduate school in psychology at the University of California, Berkeley to work with Robert Tryon.

Around 1959, Paul Lazarsfeld visited Berkeley and gave a lecture on his latent class analysis, which fascinated Wolfe, and led him to start thinking about how one could do the same thing for continuous data. Wolfe's 1963 M.A. thesis is a first, but ultimately failed attempt to do this. After graduating from Berkeley, Wolfe took a job with the US Navy in San Diego first as a computer programmer and then as an operations research analyst.

He continued his research on clustering and in 1965 he published the paper that invented model-based clustering. He used the mixture of multivariate normal distributions model, estimated it by maximum likelihood using a Newton-Raphson algorithm and gave the expression for the posterior probabilities of membership in each cluster. This paper also contains the first publicly available software for estimating the model, called NORMIX. This was extended and published in a journal by Wolfe (1970).

After 1970, Wolfe worked on other topics, but model-based clustering grew rapidly. Articles on model-based clustering have garnered over 20,000 citations in scientific publications, while two of the most widely used software packages to implement it (the mclust and flexmix R packages) have been downloaded over 14 million times.