Judgment of Princeton

The Judgment of Princeton was a wine tasting (or blind tasting) event held on 8 June 2012 during a conference of the American Association of Wine Economists held at Princeton University in Princeton, New Jersey. The purpose of this event was to compare, by a blind tasting, of several French wines against wines produced in New Jersey in order to gauge the quality and development of the New Jersey wine industry. Because New Jersey's wine industry is relatively young and small, it has received little attention in the world wine market. The state's wine production has experienced growth in recent years largely as a result of state legislators offering new opportunities for winery licensing and repealing Prohibition-era laws that have constrained the industry's development in past years. This event was modeled after a 1976 blind tasting event dubbed the "Judgment of Paris" in which French wines were compared to several wines produced in California when that state's wine industry was similarly young and developing. The New Jersey wine industry heralded the results and asserted that the rating of New Jersey wines by the blind tasting's judges was a victory for the state's wine industry.

Details
The Judgment of Princeton, held at Princeton University on Friday, June 8, 2012, was a structured blind tasting of top New Jersey wines against top French wines from Bordeaux and Burgundy. The event was based on the famous 1976 Judgment of Paris (wine), in which California wines famously beat French wines in a blind tasting. The Judgment of Princeton was spearheaded by George M. Taber, who had been in Paris for the original Judgment of Paris and later written a book on the subject. Along with Taber, the tasting was organized and carried out by economists Orley Ashenfelter, Richard E. Quandt, Karl Storchmann, and Mark Censits, owner of CoolVines, a local wine and spirits shop, who acted in the role of merchant Steven Spurrier, gathering the competition wines from the NJ winemakers and selecting and sourcing the French wines against which they were to be pitted. The French wines were sourced from the same estates as the original wines of the Paris tasting. The event also included other members of the American Association of Wine Economists, who then posted the data set from the tastings online as an open invitation to further analysis.

The judges
Of the nine judges in Princeton, six were American, two French, and one Belgian. They are listed here in alphabetical order.

Controversy
The judges were told, in advance, similar to the set up in the Judgment of Paris, that six wines in each flight of ten were from New Jersey. Subsequently, several of the judges complained about the revelation of their judgments, as also occurred in the Judgment of Paris.

Interpretation of results
In 1999, Quandt and Ashenfelter published a paper in the journal "Chance" that questioned the statistical interpretation of the results of the 1976 Judgment of Paris. The authors noted that a "side-by-side chart of best-to-worst rankings of 18 wines by a roster of experienced tasters showed about as much consistency as a table of random numbers," and reinterpreted the data, altering the results slightly, using a formula that they argued was more statistically valid (and less conclusive). Quandt’s later paper "On Wine Bullshit" poked fun at the seemingly random strings of adjectives that often accompanied experts' published wine ratings. More recent work by Robin Goldstein, Hilke Plassmann, Robert Hodgson, and other economists and behavioral scientists has shown high variability and inconsistency both within and between blind tasters; and little correlation has been found between price and preference, even among wine experts, in tasting settings in which labels and prices have been concealed.

Methodology
The blind tasting panel was made up of nine expert judges, with each wine graded out of 20 points. The tasting was performed behind closed doors at Princeton University, and results were kept secret from the judges until they were analyzed by Quandt and announced later that day. According to an algorithm devised by Quandt, each judge's set of ratings was converted to a set of personal rankings, which were in turn tabulated cumulatively by “votes against," with a lower score better (representing higher cumulative rankings) and a higher score worse (representing lower cumulative rankings). The data were then tested by Quandt for statistically significant differences between tasters and wines using the same software he had previously employed to re-analyze the Judgment of Paris results.

The reveal
Shortly after the tasting was completed and the results tabulated, Taber, Quandt, and Ashenfelter announced the results to an audience of media, New Jersey winemakers, wine economists, and the judges themselves. The event took place in an auditorium at Princeton’s Woodrow Wilson School of Public and International Affairs as part of the American Association of Wine Economists’ annual meeting. Due to the technical limitations of Quandt's custom-built, floppy-disk-powered FORTRAN system, it was necessary for Goldstein to scrawl the results onto a giant chalkboard, eliciting murmurs of disapproval from the audience over his poor handwriting.

White wines
“Votes against” in the Ashenfelter-Quandt methodology are indicated here. (The maximum possible score in this tasting would have been 9, and the minimum 90.) Only one wine was significantly better, statistically, than the other wines: the Beaune 1er Cru Clos de Mouches 2010, the cheapest of the four white Burgundies in the lot. The rest of the wines were statistically indistinguishable from each other based on the data, meaning that no conclusions can be drawn from the rankings of wines #2 to #10.

Significantly better than the other wines:

Not statistically distinguishable from each other:

Red wines
“Votes against” in the Ashenfelter-Quandt methodology are indicated. (The maximum possible score in this tasting would have been 9, and the minimum 90.) The only wine that was significantly worse, statistically, than the other wines was #10, the Four JG’s Cabernet Franc 2008, from New Jersey. The rest of the wines were statistically indistinguishable from each other based on the data, meaning that no conclusions can be drawn from the rankings of wines #1 to #9.

Not statistically distinguishable from each other:

Significantly worse than the other wines: