Wikipedia:Wikipedia Signpost/2017-02-06/Technology report



Better PDFs are coming
A new way to export pages to PDF files has been developed. The current method of creating PDFs uses the Offline content generator (OCG) service. However, it can be quite problematic for many articles, as tables–including infoboxes–are completely omitted.

There have been multiple requests for table support since the OCG was introduced in 2014. The issue was also raised in 2015 as part of that year's Community Wishlist Survey and German community technical wishlist. Since then, the German Wikimedia chapter (WMDE) has been leading the initiative on enhancing tables in PDF. It was discussed at the 2016 Wikimania Hackathon, where a solution was proposed: offer an alternative PDF download that replicates the look of the website, using browser-based rendering instead of the OCG's LaTeX-based rendering.

The new PDF creator uses the Electron Service to render pages (using the Chromium web browser as a back end). When enabled on a wiki that already uses the OGC service, clicking "Download as PDF" on the side menu will display a choice of which service to use. The Electron Service was enabled by default on Meta and German Wikipedia last week, and is planned to be deployed to more wikis later.

A community consultation is open on MediaWiki.org regarding the future of PDF rendering. It is proposed to retire the OGC by August this year, once "core" OGC features are available with the Electron service. One such feature is the book creator, which collates multiple articles into a single PDF via the Collection extension. However, there are no plans to provide a two-column option, nor any plans to support conversion to plain-text or other file formats. E

Backing up Wikimedia
Concerns were raised earlier this week on the wikimedia-l mailing list about the "back-up plan" for Wikimedia.

The most well-known backups are the data dumps of MediaWiki content. Operations Engineer Ariel Glenn, who focuses on the dumps, doesn't consider them to be a form of backup though: the dumps only contain public data that is viewable by all, and just run twice a month.

Glenn further explained that the dumps are currently stored on two servers in the Virginia datacenter, and the most recent ones are also on a third server. They are also mirrored by other organizations, placing copies in California, Illinois, Sweden, and Brazil.

Glenn noted that there are no dumps of images currently. Operations Engineer Filippo Giunchedi said, "We're looking at 120 terabytes of original [files] today." Giunchedi added that files are stored in both the Virginia datacenter and one of the Texas datacenters, so there is some redundancy.

The databases themselves have a high level of redundancy according to Database Administrator Jaime Crespo. The servers themselves use RAID10, and there are about 20 active database replicas across the Virginia and Texas datacenters with the same content that can be cloned if one server goes down. For cases of accidental data loss, there is one server that has a delayed replica by 24 hours in each datacenter.

As far as actual backups, Wikimedia uses bacula as its backup software.

"As far as content goes, we do perform weekly database dumps and store them in an encrypted format in order to provide a pretty good guarantee we will avoid data leak issues via the backups," Operations Engineer Alexandros Kosiaris said. "We've had no such issues yet, but better safe than sorry."

The backups are stored in the Virginia and Texas datacenters, and are deleted after about 45–50 days for privacy policy compliance, Kosiaris explained.

As for improvements, Glenn has been looking for new mirrors for the dumps. Crespo noted that work on selecting a location for a new Asia datacenter is in progress, including discussions with legal. L

Ten years of Twinkling
The popular Twinkle tool (available as a gadget in Special:Preferences) celebrated its tenth birthday on January 21. Originally started as the rollback script "Twinklefluff" by AzaToth, it now automates or simplifies a plethora of common maintenance tasks, including responding to vandalism, tagging articles, welcoming new users, and admin duties. It is likely that over the past decade, millions of edits have been made using Twinkle. Thank you to everyone who has made Twinkle possible, your efforts are very much appreciated! E



In brief
New user scripts to customise your Wikipedia experience Newly approved bot tasks Latest tech news from the Wikimedia technical community: 2017 #3, #4 & #5. Please tell other users about these changes. Not all changes will affect you. Translations are available on Meta.
 * Megawatch (source)  by User:NKohli (WMF) – Watch or unwatch all pages in a category (for large categories, restricted to the top 50 pages only).
 * Watchlist-openUnread (source)  by User:Evad37 –  open multiple unread watchlist pages with a single button. Various options can be set, see documentation.
 * References Consolidator (source)  by User:Cumbril – converts all references in an article to list-defined format.
 * MusikBot (task 10) – Move BLPs created by Sander.v.Ginkel to the draftspace.
 * Ramaksoud2000Bot (task 2) – Tag Wikipedia files that shadow a Commons file with ShadowsCommons.
 * Dexbot (task 11) – Clean up incorrect section names.
 * PrimeBOT (task 9) – Replace template being deleted with a "See also" link.
 * BU RoBOT (task 31) – Replace hyphens with endashes within the relevant years parameters in Infobox football biography as per MOS:DASH.
 * TheMagikBOT (task 2) – Add the pp template to protected pages that do not have them.
 * JJMC89 bot (task 8) – Replace  with   in taxonomy templates.
 * DatBot (task 5) – Replace deprecated WikiProject Chinese-language entertainment template.
 * EnterpriseyBot (task 10) – Comments out the class parameter in WikiProject banners on the talk pages of redirects.
 * JJMC89 bot II (approval) – Deploy Wikipedia information pages talk page editnotice as an editnotice for talk pages of Wikipedia and Help pages in Category:Wikipedia information pages.
 * Bender the Bot (task 7) – Replace  with   for the New York Times domain.
 * Recent changes
 * You can now upload WebP files to Commons. (Phabricator task T27397)
 * There is a new magic word called . It returns the language of the page you are at. This can be used on wikis with more than one language to make it easier for translators. (Phabricator task T59603)
 * When an admin blocks a user or deletes or protects a page they give a reason why. They can now get suggestions when they write. The suggestions are based on the messages in the dropdown menu. (Phabricator task T34950)
 * You are now able to use  to write chemical formulas. Before you could use  .   should be replaced by  . (Phabricator task T153606)
 * You now can add exceptions for categories which shouldn't be shown on Special:UncategorizedCategories. The list is at MediaWiki:Uncategorized-categories-exceptionlist. (Phabricator task T126117)
 * The "Columns" and "Rows" settings have been removed from the Editing tab in Preferences. If you wish to keep what the "Rows" setting did you can add this code to your personal CSS:  You can change the number   to make it look like you want to. (Phabricator task T26430)
 * Octicons-tools.svg Sometimes edits in MediaWiki by mistake are shown coming from private IP addresses such as 127.0.0.1. Edits and other contributions logged to these IP addresses will be blocked and shown the reason from MediaWiki:Softblockrangesreason. This should not affect most users. Bots and other tools running on Wikimedia Labs, including Tool Labs will receive a "blocked" error if they try to edit without being logged in. (Phabricator task T154698)
 * When you edit with the visual editor categories will be on the top of the page options menu. (Phabricator task T74399)
 * You can see a list of the templates on a page you edit with the visual editor. (Phabricator task T149009)
 * The OAuth management interfaces now look slightly different. (Phabricator task T96154)
 * Problems
 * video2commons was down for two weeks. This was because of a problem with Commons video transcoders. It is now back up. (Phabricator task T153488)
 * Future changes
 * The Community Tech team will develop more tools to handle harassment of Wikimedia editors. The goal is to give the communities better tools to find, report and evaluate harassment. They will also work on more effective blocking tools. (Meta page, Wikimedia email list)
 * Octicons-tools.svg The Wikimedia technical community is doing a Developer Wishlist survey. The call for wishes closed on 31 January, but discussions are continuing, and the voting phase will be open between 6 and 14 February.