Copy testing

Copy testing is a specialized field of marketing research, that determines an advertisement's effectiveness based on consumer responses, feedback, and behavior. Also known as pre-testing, it might address all media channels including television, print, radio, outdoor signage, internet, and social media.

Automated Copy Testing is a specialized type of digital marketing specifically related to digital advertising. This involves using software to deploy copy variations of digital advertisements to a live environment and collecting data from real users. These automated copy tests will generally use a Z-test to determine the statistical significance of results. If a specific ad variation out performs the baseline in the copy test, to a desired level of statistical significance, this new copy variation should be used by the marketer.

Features
In 1982, a consortium of 21 leading advertising agencies — including N. W. Ayer, D’Arcy, Grey, McCann Erickson, Needham Harper & Steers, Ogilvy & Mather, J. Walter Thompson, and Young & Rubicam — released a public document laying out the PACT (Positioning Advertising Copy Testing) Principles that constitute a good copy testing system. PACT states a good copy testing system must meet the following criteria:
 * 1) Provides measurements which are relevant to the objectives of the advertising.
 * 2) Requires agreement about how the results will be used in advance of each specific test.
 * 3) Provides multiple measurements, because single measurements are generally inadequate to assess the performance of an advertisement.
 * 4) Based on a model of human response to communications – the reception of a stimulus, the comprehension of the stimulus, and the response to the stimulus.
 * 5) Allows for consideration of whether the advertising stimulus should be exposed more than once.
 * 6) Recognizes that the more finished a piece of copy is, the more soundly it can be evaluated and requires, as a minimum, that alternative executions be tested in the same degree of finish.
 * 7) Provides controls to avoid the biasing effects of the exposure context.
 * 8) Takes into account basic considerations of sample definition.
 * 9) Demonstrates reliability and validity.

Recall
The predominant copy testing measure of the 1950s and 1960s, Burke's Day-After Recall (DAR) was interpreted to measure an ad's ability to “break through” into the mind of the consumer and register a message from the brand in long-term memory (Honomichl). Once this measure was adopted by Procter and Gamble, it became a research staple (Honomichl).

In the 70s, 80s, and 90s, validation efforts found no link between recall scores and actual sales (Adams & Blair; Blair; Blair & Kuse; Blair & Rabuck; Jones; Jones & Blair; MASB; Mondello; Stewart). For example, Procter and Gamble reviewed 10 year's worth of split-cable tests (100 total) and found no significant relationship between recall scores and sales (Young, pp. 3–30). In addition, Wharton University's Leonard Lodish conducted an even more extensive review of test market results and also failed to find a relationship between recall and sales (Lodish pp. 125–139).

The 1970s also saw a re-examination of the “breakthrough” measure. As a result, an important distinction was made between the attention-getting power of the creative execution and how well “branded” the ad was. Thus, the separate measures of attention and branding were born (Young, p. 12).

Persuasion
In the 1970s and 1980s, after DAR was determined to be a poor predictor of sales, the research industry began to depend on a measure of persuasion as an accurate predictor of sales. This shift was led, in part, by researcher Horace Schwerin who pointed out, “the obvious truth is that a claim can be well remembered but completely unimportant to the prospective buyer of the product – the solution the marketer offers is addressed to the wrong need” (Honomichl). As with DAR, it was Procter and Gamble's acceptance of the ARS Persuasion measure (also known as brand preference) that made it an industry standard. Recall scores were still provided in copy testing reports with the understanding that persuasion was the measure that mattered (Honomichl).

Harold Ross of Mapes & Ross found that persuasion was a better predictor of sales than recall (Ross), and the predictive validity of ARS Persuasion to sales has been reported in several refereed publications (Adams & Blair; Jones & Blair; MASB; Mondello ).

Diagnostic
The main purpose of diagnostic measures is optimization. Understanding diagnostic measures can help advertisers identify creative opportunities to improve executions (Young, p. 7).

Non-Verbal
Non-verbal measures were developed in response to the belief that much of a commercial's effects – e.g. the emotional impact – may be difficult for respondents to put into words or scale on verbal rating statements. In fact, many believe the commercial's effects may be operating below the level of consciousness (Young, p. 7). According to researcher Chuck Young, “There is something in the lovely sounds of our favorite music that we cannot verbalize – and it moves us in ways we cannot express” (Young, p. 22).

In the 1970s, researchers sought to measure these non-verbal measures biologically by tracking brain wave activities as respondents watched commercials (Krugman). Others experimented with galvanic skin response, voice pitch analysis, and eye-tracking (Young, p. 22). These efforts were not popularly adopted, in part because of the limitations of the technology as well as the poor cost-effectiveness of what was widely perceived as academic, not actionable research.

In the early 1980s the shift in analytical perspective from thinking of a commercial as the fundamental unit of measurement to be rated in its entirety, to thinking of it as a structured flow of experience, gave rise to experimentation with moment-by-moment systems. The most popular of these was the dial-a-meter response which required respondents to turn a meter, in degrees, toward one end of a scale or another to reflect their opinion of what was on screen at that moment.

More recently, research companies have started to use psychological tests, such as the Stroop effect, to measure the emotional impact of copy. These techniques exploit the notion that viewers do not know why they react to a product, image, or ad in a certain way (or that they reacted at all) because such reactions occur outside of awareness, through changes in networks of thoughts, ideas, and images.

Moderated and Unmoderated
Researcher-moderated empirical testing and unmoderated testing platforms evaluate implicit and unconscious bias in survey question design for market research.

Copy testing in political elections
Copy testing is utilized in an array of fields ranging from commercial development to presidential elections. In 2007, CNN employed this form of market testing throughout the primary and general election. Rita Kirk and Dan Schill from Southern Methodist University worked with CNN to gauge voters reaction to debates between presidential hopefuls. (http://www.cnn.com/2011/POLITICS/06/14/dial.testing/index.html)

Relevant Terms

 * Advertising
 * Advertising research
 * Awareness
 * Brand
 * Brand preference
 * Engagement (marketing)