User:Cscott/Ideas/Integrating MediaWiki

Installing a fully-functional MediaWiki instance has become difficult. Users installing their distribution packages or following our installation guide end up with only a bare-bones fragment of our software stack, missing most of the useful services and extensions which WMF has developed over the past decade. This is largely a feature, not a bug: our MediaWiki extension system has been very successful, and has allowed both WMF and third-parties to develop a large number of useful features loosely coupled to our core, which has been able to remain relatively small. But the core should not be confused for a full MediaWiki install.

The first step to remedy the situation is to acknowledge third-party users of MediaWiki as a first-class Audience, so that we can devote the proper resources to their support. Our new "External Wikis" team could begin by sifting through Special:Version and identifying an expansive set of "standard" extensions and features, omitting only code which is highly WMF-specific (such as fundraising, messages, or internal metrics), appropriate only to very-large-scale wikis (PoolCounter?), or deprecated/abandoned (EasyTimeline?). But the large majority of the extensions running on WMF wikis should be included. The External Wikis team would be evaluated on the number of external contributions to our stack and the number of external users running our "standard" extension set.

We should then devote effort to allowing these extensions to be downloaded and installed with little effort. The standard installation guide should include the installation of these extensions, they should be distributed with default configurations which "just work", and any "special setup" required should be addressed. This work may include packaging container-based solutions for installing a "standard" wiki, such as those based on vagrant, docker, or kubernetes. But it should also include refactoring "service" components for easy installation: by default services should also unzip into the core  directory and ship with "works by default" configurations. Any required platform packages should be listed in a standard format. A special service runner built into core will take care of forking any required long-lived processes (in the same way that our basic Scribunto install uses a forked Lua interpreter). Advanced configurations could use embedded or distributed services, but the default install will painlessly support single-server installs. In addition to our current service implementation languages (Java, node.js), decoupled services could even be written in "older" or "newer" versions of PHP as future needs warrant.

The integration of services with the standard MediaWiki installation process will extend to localization and internationalization. The same  mechanism used in   should allow localization of messages used by services as well, well-integrated with translatewiki and our other language infrastructure.

As part of this service integration work, the URL routing mechanism of the https://xx.wikipedia.org/api/rest_v1 API should be brought into core, and integrated such that new REST modules may be installed the same way any other MediaWiki extension is installed: by unzipping into the extensions directory (even if they are implemented in JavaScript). Some modules may eventually be rewritten in PHP for tighter integration, but no RESTbase modules will require a rewrite in PHP in order to be packaged as an extension. The implementation language choice will be independent. A small lightweight "service runner" can be provided to allow running these extension-packaged REST modules outside the MediaWiki framework; ops may even use this in production to allow bypassing PHP request routing overhead for certain request paths.

In addition to ensuring that "standard" extensions and services are installed and configured by default, we should renew our focus on making our content reusable by third parties. Specially, templates, modules and gadget on WMF projects should allow easy reuse across wikis (for example, using Shadow Namespaces). We should allow third parties to reference property and item definitions from Wikidata in their own wiki installations, using the Wikibase client. This goes further toward allowing external wikis to "work like Wikipedia does" out of the box.

The wikitext parser will be factored out of MediaWiki core, as described in the "zero parsers in core" proposal. The existing legacy PHP wikitext parser will be moved into an extension. Parsoid will be repackaged as an extension using the new Parser interface; initially without a rewrite in PHP, so the Parser API will communicate with Parsoid running in node.js as before. Other trivial implementations of the Parser API may be created, such as a markdown parser or a HTML-only wiki module, to demonstrate the full decoupling of core from wikitext. As a follow-up, an implementation of Parsoid may eventually be done in PHP using the new Parser API, but this rewrite could fail and is not on the critical path.

The existing "storage engine" functionality of RESTbase (only one of the many modules currently underneath the REST API) will be reimplemented on top of Multi-Content Revisions. The multiple databases corresponding to our multiple projects will also be merged, facilitating cross-project features like simultaneous display of parallel texts in multiple languages. In-progress edits (editor switching, conflict resolution, content translation) will be stored in the main database, for example in the user namespace. This will unify all of our storage in a single database layer and eliminate the need for Cassandra. This should simplify ops and (hopefully) reduce storage costs by eliminating some redundancy.

We will lower the barrier-to-entry for third-party developers and erase some of the hard boundaries between template, scribunto module, gadget, extension, skin, and core code. For web, the Marvin prototype will be continued, along with the development of a special "null skin" for core which would allow the existing PHP code in core to serve special pages and other bespoke UX as unwrapped HTML, which Marvin can clothe with an appropriate UX and skin. On mobile we will continue to move Android and iOS app code from native languages (Java, Swift) into PHP and JavaScript to enhance code reuse. In core we'll continue to research the potential of projects such as php-embed and v8js to further blur the lines between server-side PHP and JavaScript. For editors Scribunto/JavaScript will also be completed, allowing the creation of template code in JavaScript. In so far as is possible, the same APIs will be available in all four contexts. The ultimate goal should be to allow the creation of a full skin in JavaScript, templates in JavaScript, and the implementation of extensions and special pages in JavaScript.

This proposal would commit WMF resources to supporting a more complete "standard" distribution of MediaWiki on both single-server and containerized platforms. By standardizing the configuration and installation mechanisms for services we would retain the benefits of a decoupled architecture without falling into configuration/dependency hell; it would also expand the number of third parties able to run our services and contribute to their development. Decoupling wikitext from core would allow a greater amount of markup independence and clear the way for future innovation in the wikitext representation, leveraging the successful Parsoid innovation of round-trip conversions to allow editors to use their choice of visual or text-oriented markup editors. Moving to HTML-native storage for articles will also benefit performance and clear the path for future improvements such as incremental rendering and subtree editing. Finally, the embrace of JavaScript as an official "second" language for the project beside PHP will expand our developer base; embracing JavaScript for templates would allow expanding our editor base. Decoupling the UX from the PHP core would unleash further innovation in our presentation layer and allow us to create modern reactive user experiences.