Jean-Paul Benzécri

Jean-Paul Benzécri was a French mathematician and statistician. He studied at École Normale Supérieure and was professor at Université de Rennes and later for most of his career at the Paris Institute of Statistics (l'Institut de Statistique de l'Université de Paris), Université Pierre-et-Marie-Curie in Paris. He is most known for his specific inductive approach to data analysis which led to the creation of Correspondence analysis, a statistical technique for analyzing contingency tables and for the invention of the nearest-neighbor chain algorithm for agglomerative hierarchical clustering.

Early life
Jean-Paul Benzécri was born in Oran, Algeria, in 1932, where his father was a doctor. He attended high school in Lycée Lamoricière, Oran and Lycée Bugeaud, Alger. In 1950, he was first in the entrance examination to the ENS (École Normale Supérieure) in Paris and again in 1953 to the "Agrégation de Mathématiques", a national teacher's diploma examination. He then did some science research in mathematics. Leaving for the United States in 1955 for Princeton University, after a 4 months study he submitted a Ph.D. thesis in differential geometry entitled Variété localement plates under the supervision of Henri Cartan.

From 1959 until 1960 he did conscripted military service in the Operational Research Group of the French Navy where he practiced multidimensional data modeling by traditional analytical methods without the use of a computer. In 1960 he delivered a "Doctorat" at Sorbonne, Paris entitled Sur les variétés localement affines et localement projectives again under the supervision of Henri Cartan.

Career
Benzécri's teaching career began in 1963 as an assistant professor at the Faculty of Sciences in Rennes where he created a course in mathematical linguistics. One of his first students was Brigitte Escofier-Cordier who published in 1965 a dissertation entitled Analyse Factorielle des Correspondances (Correspondence analysis) with application to textual data. In 1965, Benzécri became professor at the Sorbonne and founded the Laboratoire de Statistique inside the Paris Institute of Statistics. His initial course in "Analyse des Données" evolved into a full scale MS-PhD program which was the basis of his research activity.

Research
Since his early work in 1963 on Natural Language Processing (NLP), Benzécri got the intuition that electronic computing was going to be the Novius Organum (i.e., the new tool) enabling to solve the problem cooperatively between mathematics, logic and linguistics. Inspired by the pionneering works of Louis Guttman and Chikio Hayashi as well as by the distributional methodology of Zellig Harris, he devised a geometric equivalence to these approaches by searching the principal axes of inertia of a weighted cloud of points. These algorithms were the primary building blocks of a method which he later called "Correspondence analysis". Developing correspondence analysis with the systematic supplement of clustering techniques, his interest went to analysing both large contingency and binary tables and some other kinds of data arrays after suitable transformation including lexical tables derived from raw texts.

Favouring induction over hypothesis testing, much of his approach lies in describing and understanding how a multidimensional dataset diverges from the hypothesis of independence of its rows and columns through the interpretation of patterns often revealed by point cloud graphic displays. But he was also opened to reintroduce a new statistical framework into this purely exploratory process by deriving an a posteriori projection of supplementary variables (i.e. rows) and individuals (i.e. rows). His early familiarity with computers and their programming languages lead him to adopt tensor notations and quasi ALGOL-like algorithmic formulas in his course texts as early as 1967. This facilitated the transcription of his concepts by his fellow colleagues and students to computer programs in a wide range of languages, the latest being a wide variety on implementations in R language such as FactoMineR. Benzecri's tensor notations were precursors to the latest developments of tensor calculus for machine learning (for example, TensorFlow). In the field of clustering methods, Benzécri (1982) also proposed a new algorithm (nearest-neighbor chain algorithm) for agglomerative hierarchical clustering.

Selected publications

 * L'Analyse des données. Tome 1 : La Taxinomie, Dunod, 1973, 615 p. ISBN 2-04-007034-6
 * L'Analyse des données. Tome 2 : L'Analyse des correspondances, Dunod, 1973, 619 p. ISBN 2-04-007225-X
 * Histoire et préhistoire de l'analyse des données, Dunod, 1982, 159 p. ISBN 2-04-015467-1
 * L'Analyse des données / leçons sur l'analyse factorielle et la reconnaissance des formes et travaux, ISBN 2-04-015515-5
 * Vol. 1 : L'Analyse des correspondances, Dunod, 1982, 635 p.
 * Vol. 2 : La Taxinomie, Dunod, 1982, 632 p.
 * Pratique de l'analyse des données,
 * Tome I : Analyse des correspondances, exposé élémentaire, Dunod, 1980, ISBN 2040157328
 * Tome II : Abrégé théorique, études de cas modèles, Dunod, 1980, 466 p. ISBN 2040111816
 * Tome III : Linguistique et lexicologie, Dunod, 1981, 565 p. ISBN 978-2-04-010776-5
 * Tome IV : En médecine, pharmacologie, physiologie clinique, Statmatic, Paris, 199, 532 p. ISBN 978-2-909047-00-3
 * Tome V : Pratique de l'analyse des données en économie, Dunod, 1987, 533 p. ISBN 2-04-016509-6
 * Les cahiers de l'analyse des données, Gauthier-Villars, Dunod, 1976–1997
 * Linguistique et lexicologie, Dunod, 2007 [ré-édition], ISBN 2-04-010776-2

Only one manual was published in English under the direct supervision of Benzécri near the end of his university career.
 * Correspondence analysis handbook, Marcel Dekker (1992), 665 p. ISBN 0824784375