Wikipedia:Wikipedia Signpost/2012-04-09/Wikidata

Wikidata, an initiative led by Wikimedia Deutschland and aimed at providing a central Wikimedia data repository, has prompted a raft of comments in the week after its first major press release (Signpost coverage). To recap, development will proceed in three stages. The first, expected to end by August of this year, will overhaul the language versions system by providing a central interwiki repository. The second, to finish by December, will use a similar method to standardise the content of infoboxes, allowing editors to add and use the data within the framework and allowing smaller wikis to share in localised versions of this data for their own infoboxes. Finally, the third stage of development will enable the automation of list and chart creation based on Wikidata data, at which point (hopefully by March 2013) Wikimedia Deutschland plans to hand over operation and maintenance to the Wikimedia Foundation.

To this framework, several requirements were added to the project pages this week, seemingly to reassure Wikimedians by establishing a narrow, achievable focus. They include a stipulation that "the success of Wikidata is not measured by the amount of data it stores, but by the creation of a healthy community and its usefulness for Wikipedia and other applications" and another affirming that "Wikidata will not be about the truth, but about statements and their references". Nevertheless, the Wikidata mailing list has been abuzz with discussion of possible applications and extensions of the project. In light of the level of attention being given to the formative project, the Signpost decided to catch up with Wikidata's community communications manager Lydia Pintscher and developer Daniel Kinzler.

The Signpost: When you express this in simple terms, what would you say is the "take home" message of Wikidata?


 * We are creating a central place where data can be stored. This could for example be something like the name of a famous person together with the birthdate of that person (as well as a source for that statement). Each Wikipedia (and others) will then be able to access this information and integrate it in infoboxes for example. If needed this data can then be updated in one place instead of several. There is more to it but this is the really simple and short version.

It certainly seems like an interesting project, and one that has captured imaginations for many years. Why do you think no-one has been able to act on the same idea before? For example, Daniel, you worked on the not dissimilar OmegaWiki project back in 2005 – have lessons been learned from projects like that?


 * Wikidata is a large and non-trivial project. There are three factors that make or break a project like this in my opinion: people, resources and timing. We are incredibly fortunate that it seems all of this is in place for Wikidata now to finally make it reality.
 * People: We have brilliant and dedicated people on the team who know their way around the community and codebase and who are passionate about the possibilities Wikidata will bring to the world. Many of them have worked or are still working on related projects like Semantic Mediawiki. At the same time many people in the community can’t wait for Wikidata to finally be available to them to start building on the base it will provide.
 * Resources: We have Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, and Google as generous donors, not to mention Wikimedia Deutschland running the project. Previous projects have not been so lucky.
 * Timing: We're at a place in time where more and more people and organisations are pushing for and using open data. Wikidata is in a unique position to become a key player there. Much more importantly, Wikidata can be a significant part of the answer to Wikimedia's current challenges of editor recruitment and retention as well as expansion to new demographics.


 * We have certainly moved on since the original experiments in 2005: the data model has become a bit more flexible, to accommodate the complexity of the data we find in infoboxes; for a single property, it will be possible to supply several values from different sources, as well as qualifiers like the level of accuracy. For instance, the length of the river Rhine could be given as 1232 km (with an accuracy of 1 km) citing the Dutch Rijkswaterstaat as of 2011, and with 1320 km according to Knaurs Lexikon of 1932. The latter value could be marked as deprecated and annotated with the explanation that this number was likely a typographical error, misrepresenting earlier measurements of 1230 km. This level of depth of information is not easily possible with the old OmegaWiki approach or [that of] classic Semantic MediaWiki. It is however required in order to reach the level of quality and transparency Wikipedia aims for. This is one of the reasons the Wikidata project decided to implement the data model and representation from scratch.

How do you envisage convincing Wikimedians who instinctively want to keep local control over articles' infoboxes that centralisation is a good thing?
 * Editors are completely free to keep their infoboxes in their local control. Of course we would love if everyone used Wikidata but first and foremost Wikidata is an offer. It is an offer to the community to make use of it in the ways the team envisions Wikidata to be used and in ways we couldn’t even dream of. We see huge potential for everyone involved and I hope with time this is seen by everyone. One of the biggest potentials for the Wikipedia community is probably the help Wikidata can provide to smaller Wikipedias who do not (yet) have the manpower necessary to curate all the data that larger Wikipedias have.

The project description encourages volunteer developers who want to contribute code to do so. How do you see this working out with WikiData?
 * We are still figuring this out to be honest. We will have a public SCRUM log where people can follow the work and get involved when they see something they would like to contribute to. I hope we have this up and running in the next days. Please be a bit patient with us here though as we are getting started. The other part is tasks we are definitely not going to do, like writing bots. If there is consensus in the community that this is something desirable then people are free to make this happen. I am always happy to help people figure out a way to contribute. Just let me know.

In the past, big projects such as LiquidThreads have proved difficult to bring to fruition. Are you confident that Wikidata will make it past phase 1 and 2 and into phase 3? Are you confident you will be able to keep the project sufficiently tightly focused to allow this to happen?
 * It will not be easy and there will be technical and social hurdles to overcome but I am confident we will be able to do this with everyone's support. All in all our goals for each step are rather modest and by using SCRUM we will take one step at a time. Even if we don't reach every single one of our goals, what we will have achieved by the end will still be significant.


 * As such, keeping focused is definitely one of the worries we have. A lot of people have really great plans for Wikidata but we absolutely need to focus on getting the project to a usable and useful state over the next year. We have written down some of the important assumptions and requirements we have for the initial development [the Requirements referenced earlier]. Those will be our guiding principles. Once the initial development is done a lot more will be possible of course but it is important that we focus on getting an initial release out that people can build on.

Lydia, Daniel, thank you.