Wikipedia:Wikipedia Signpost/2005-06-13/Server status

The Wikimedia Foundation continued the expansion of its server configuration last week, adding several machines and preparing to accommodate even more. In order to upgrade the facility for its main server cluster, however, the project's websites had to be taken offline for half a day.

In a planned service outage (see archived story), the cluster of servers located in Tampa, Florida was moved to a new facility last week. The purpose of the move was for relocating to a larger rackspace, which will help in handling future growth. The Wikimedia projects were unavailable for approximately 11 hours total to accomplish the move.

Downtime message problems
The most visible glitch associated with the move was in the downtime message displayed during the server relocation. While a message explaining the reasons for the interruption was prepared and translated into a handful of languages, providing this message to visitors was only sporadically successful.

It was intended that the downtime message be hosted on the new servers in Amsterdam, but a miscommunication in switching over the DNS server initially prevented this. Even when the problem was solved, the servers in Paris were apparently still redirecting traffic to the disconnected Florida servers. This meant that users accessing Wikipedia via the Paris servers received a more generic message, suggesting that the problem was a server crash rather than a planned outage. In the aftermath, there was some discussion about developing translations of the regular downtime messages as well.

Returning to normal
After the downtime, the recently added Amsterdam server cluster, hosted by Kennisnet, was put into live service. This brings the total number of operating servers to 81 (in addition to the eleven servers in Amsterdam, three are in Paris and two more servers there are awaiting upgrades before being put to use). Also, two new database servers ordered in May were delivered to the Florida data center on Wednesday.

Some performance problems continued to be reported throughout the week, although it was not clear that they were a result of the server relocation. Certain actions, especially page deletions, reportedly would fail repeatedly and attempts would only generate error messages. With some tweaks, developer Brion Vibber was largely able to fix this problem. However, operations such as loading watchlists remained persistently slow.