Wikipedia:Wikipedia Signpost/2024-03-29/Technology report

A lack of technical support for interactive content on Wikimedia projects was lamented in a wide-ranging discussion on Wikimedia-l and elsewhere over the last two months. In particular, several community members expressed deep frustration about the state of the Graph MediaWiki extension, which had been disabled in April last year due to security vulnerabilities in the underlying third-party Vega framework (Signpost coverage). Back then, a Wikimedia Foundation representative had stated that "My hope is we can maybe restore some functionality in the next week or so." But eleven months later, graphs and charts remain deactivated, replaced by a prominent error message in many Wikipedia articles - despite extensive discussions about possible solutions.

Basque Wikimedian Galder Gonzalez Larrañaga (User:Theklan) opened the Wikimedia-l discussion by decrying this state of affairs:

He contrasted Wikimedia projects with e.g. "a place like Our World in Data [which] has been publishing data and interactive content with a compatible license for years". Several other Wikimedians likewise voiced their frustration about the lack of progress in getting graphs re-enabled.

Marshall Miller, Senior Director of Product at WMF, acknowledged these concerns on the mailing list, stating that:

Proposed solutions fail
How did things get to this point? Several proposals and plans had been pursued since the discovery of the XSS vulnerability in April 2023:
 * One proposed solution (T336595) was to re-enable the feature but treat the Vega code for graphs "like other dangerous content (Javascript, CSS) and restrict editing it to a small, trusted set of users" (similar to Interface administrators). However, this option was eventually abandoned in favor of a different approach:
 * To sandbox the graphs into an iframe (a separately loaded part of a web page). This option (T222807) had in fact already been proposed back in 2019 out of general security concerns, long before the discovery of the current vulnerabilities in Vega. By November 2023, some progress appears to have been made on its implementation. But serious performance problems remained due to complexities around caching. What's more, the iframe approach met principled objections by WMF engineer Timo Tijhof (User:Krinkle), who argued that it "severely limits our audience, technical choices, and assumptions going forward, and I think negatively impacts our mission and reach in a way that is in direct contradiction and violation of principles that thus far have been accepted without question" (citing the Wikimedia Foundations Guiding Principles and other WMF engineering conventions). In particular, Tijhof worried about the redistribution of content in other forms (citing examples including "Kiwix, IPFS and Apple Dictionary"). Noting that e.g. the Wayback Machine "consistently fails to archive [...] non-trivial JavaScript pipelines", he argued "there can be no JavaScript requirement for fundamental access to content". Lastly, as noted by WMF security engineer Scott Bassett, while the iframe sandbox "would reduce the risk of running potentially dangerous javascript within a user's browser, [it would] not eliminate the risk entirely." The iframe task was eventually closed as "Declined" earlier this month.
 * Other options that had been proposed early on included sanitizing the input. But this approach was described as "extremely tricky to get right (after all,vega failing at it is how we landed in this situation)."
 * Update the underlying library: As noted in a WMF FAQ, the Graphs extension is based on a very outdated version of the Vega library: "The last upstream release (bugfix or security) of Vega 2.x was in January 2017. Vega 5 was released in March 2019 and is still under active maintenance and development." However, by May 2023, initial hopes had been dashed that an upgrade to version 5 would suffice to resolve security concerns. The WMF's eventual plan to fix the Graphs extension (apparently now abandoned too) envisaged the version update and eventual deprecation of version 2 only in combination with the sandbox solution. What's more, as detailed in the FAQ, migrating to Vega 5 will necessitate the conversion of existing graph templates and modules, much of which needs to be done by hand (although aided by a translation tool).

Quantifying the impact
Rarely, if ever, has there been a software issue that affects Wikipedia content so visibly for such a prolonged time. By January 2024, User:Sj estimated that it had "already conservatively affected 100M pageviews." According to a September 2023 analysis, over 1.3 million pages are impacted across all Wikimedia projects - the vast majority of them (1.16 million) on the Arabic Wikipedia.

Still, on English Wikipedia, only 19,160 pages were affected. (Those numbers likely already reflect the manual removal of broken graphs from many pages.) In a more detailed 2020 analysis, volunteer developer User:Bawolff had found that "the graph extension is used on 26,238 pages [on English Wikipedia]. However, most of these are in non-content namespaces, from a template that generates a graph of page views for a specific page (w:Template:PageViews graph). There are 4,140 pages on en.wikipedia.org in the main namespace that use graphs. [...] As a percentage, that's 0.07% overall, 0.2% of "Good Articles", 0.3% of Featured Articles." Another Wikipedian reported that "In ruwiki, interactive Lua-based graphs are used in more than 26000 articles about settlements and administrative units through https://ru.wikipedia.org/wiki/Module:Statistical (also, more than 8000 on ukwiki, etc.)."

How much interactivity is needed, anyway?
Several users questioned whether the full interactive functionality of the Vega library was really needed, arguing e.g. that "Most graphs on wiki are simple bar/pie/line charts. These could be produced quite easily using even a language like Lua."

WMF engineer Gergő Tisza (who appears to have done much of the technical work on the aforementioned iframe solution) observed that

Concretely, Bawolff had observed in his 2020 analysis of the usage of graphs on English Wikipedia that:

Volunteer developer TheDJ even argued "let's be honest... the interactivity-part has been an 8 year long nightmare. Maybe its time to put that to bed and accept defeat."

On the other hand, Galder titled his post that opened the Wikimedia-l discussion "We need more interactive content: we are doing it wrong". He took a much wider view, arguing that the WMF's failure to get graphs working again was just one example of wider stagnation and lack of progress towards the goal that "By 2030, Wikimedia will become the essential infrastructure of the ecosystem of free knowledge" (quoting from a 2017 strategy document). Besides Our World in Data, Galder named several other educational websites that have surpassed Wikimedia projects on interactive content, e.g.:

(Other parts of the mailing list discussion focused on MediaWiki's shortcomings with regard to video.)

In her op-ed in this Signpost issue, Maryana Pinchuk, the Wikimedia Foundation's Principal Product Manager, pushes back against such proposals, reporting that at an event last fall, she "heard many Wikipedians express concern about where pursuing this strategy could lead us. There was fear of making Wikipedia into something it isn’t. There was also fear about the cost and risks of building big new software features and trying to compete with massive for-profit technology companies for users. I think all of these concerns are very valid."

In the Wikimedia-l discussion, Wikipedian and former WMF engineer Ori Livneh argued that direct comparisons with sites that do not contain user-generated content may severely underestimate the additional engineering work to implement such interactive features on Wikipedia. He pointed out security engineering as a bottleneck at the Foundation holding up such work:

On March 26, the WMF invited feedback on "the Product & Technology draft key results for next fiscal year. They aim to explain what outcomes we are working towards" as part of the 2024/25 annual plan. In reaction, Galder noted that "there's no single mention to this [Graphs outage problem], nor to improving the multimedia experience". In a discussion on the talk page, Miller said that "we are working on a possible plan for graphs, but I'm not sure yet what its scope will be or when we would resource it if we proceed with that plan".