Wikipedia:Benchmarking

This essay on benchmarking describes techniques and limits in measuring performance issues on Wikipedia. The term "benchmarking" has been used for many decades in testing computer performance. For example, with Wikipedia pages, a common technique is to invoke a template repeatedly, perhaps 400 times copied down a page, during an edit-preview, to check the duration of the 400 instances. The time span for each instance is, then, the total time minus 0.3 seconds (as minimum page-load), divided by 400. Timings of repeated text are limited to spans of about 1 minute, total time, otherwise the page could trigger a "WP:Wikimedia Foundation error" as cancelling the reformat due to a page-timeout limit.

Large swings in page-load times
Because articles are reformatted by various among the (400?) file servers, depending on availability, the time needed to load an article page can vary widely from minute-to-minute, not just for "very busy" times of the day. For example, during July 2012, one slow article, using dozens of large templates, took 12 seconds to reformat during an edit-preview, then within 1 minute, the repeated edit-preview (no changes) ran 20 seconds of server time, followed within the minute by a repeated edit-preview of 13 seconds. The time variation, slowed in the 2nd preview to 67% longer, was an unusually long delay, beyond the more-typical delays of 10%-40% for busy servers. That example indicates how a very-slow response can occur between 2 rapid responses, as showing a large swing in page-load times. For that reason, timings should be compared over numerous runs, selecting the minimum times to represent the underlying page-load time, as the typical technique when benchmarking any article performance issues.