User:Robevans123/sandbox/IB Notes

Infobox research
Randomly chosen, but not living, and with 5 to 10 data items

person/wikidata sample
Random selection of articles using the Infobox person/Wikidata:


 * Samuel Cooper Thacher, 6 data items (+photo) from WD, no references added by IB adder. 2 references added later by another editor with one supported 2 data items. References from WD - all just ref urls to google books (2 to pages, 2 to book only - no page visible).
 * 4 refs from 3 new sources added
 * TBD update Wikidata


 * Caroline Hewins, 7 data items (+photo) from WD, no references added by IB adder. 4 data items referenced, 2 not referenced, and an honorary degree, turned into a place of education in WD, and so appears as alma mater in IB.
 * 2 refs added using existing sources
 * TBD update Wikidata


 * Patrick Magruder, 6 data items (+photo) from WD, 1 ref added by IB adder. 2 data items referenced.
 * 4 refs added from a previous external link and a new source
 * TBD update Wikidata


 * Sarah Stickney Ellis, 5 data items (+photo) from WD, no references added by IB adder. 1 data item (date of birth appears incorrect - in given WD source, and contradicted by more detailed sources).
 * refs added using existing source
 * TBD update Wikidata


 * Rachel Webster, 5 data items. WD/WP match at time infobox added. 1 data item not in article, 1 data item not referenced, 3 data items referenced.
 * article updated and refs added
 * TBD update Wikidata

person sample
Random selection of articles using the Infobox person:


 * Harry Seidler, 11 data items (+photo). 9 items in body, none referenced. Wikidata would just show dates of birth and death.
 * TBD add citations and check for new sources if necessary
 * TBD update Wikidata

!Vote

 * 1A, 2A, 3A, 4A. Infoboxes are, in the main, intended to be a summary (and not an addition) of key features in an article. Almost all of these key features should, ideally, be in the text of the article, and be supported by inline citations in the article, with the citations including at least some basic information (primarily title, author, date, publisher, page number/location, and a url if available). Certain infoboxes are subject to further restrictions, for example, WP:BLP, which requires high quality sources. The current implementation of wikidata enabled infoboxes allows the inclusion of data from Wikidata, with the only requirement being that the data item has some sort of reference (a bare url or a reference to existing wikidata item suffices). There no checks to see if the data is mentioned in the article, is the same as shown in the article, or is referenced in the article.


 * Furthermore, any changes to the data in Wikidata are displayed automatically, with no hint to anyone watching the Wikipedia page that something has changed. And yes, I'm aware that I can watch the wikidata item, but the working of this seems to have a severe time lag, and also floods me with information such as label changes in other languages, additions of unreferenced data etc that have no bearing on the wikipedia article. Also, if I do see a wikidata edit that has an impact and needs changing I have to go and edit the item in Wikidata. I've done a fair amount of work on Wikidata, so I'm capable of this, but I totally understand the reluctance of some editors to get involved in this process, and it should not be a required skill for editors who wish to change something on a Wikipedia page.


 * And also, Wikidata (like Wikipedia) cannot be regarded as a reliable source. Although these, and other crowd-sourced projects, may often be a source of useful information (and even include appropriate references), we should not take them on face value. Instead, we check the given source and, if appropriate, use that to verify, and provide citations for, information in the article. It should be noted that Wikidata has no policies on Verifiability (there is a proposal which only has a handful of edits since January 2015), copyright violation, or sensitive data on living persons.


 * In summary, the current implementation of Wikidata in infoboxes stretches, sometimes to breaking point, Wikipedia content policies on Verifiability and Biographies of living persons, and severely restricts the conduct policy of Consensus, and provides new and interesting ways of breaking the policy on Vandalism. It also drives a horse and cart through MOS guidelines on Infoboxes and the content guideline on identifying reliable sources which specifically states that a wikilink is not a reliable source, yet the current implementation implies that a link (the pencil icon) to an external (sister) project is somehow a sufficient reference.


 * And yes, I'm well aware that the standards of use of infoboxes within Wikipedia is not as good as it should be, but providing more ways of getting them wrong is not the way to improve them.


 * BTW I think Wikidata is an interesting, useful resource that could be used extensively in Wikipedia. See my addition (A third way?) to the discussions below.

A third way?
Wikidata is a useful and expanding resource which has the potential to be a useful tool to find information to create and improve articles on Wikipedia. I just don't believe that including information automatically in infoboxes is the best way to make use of this resource. However, in Wikipedia, we have various policies and guidelines regarding the use of bots and semi-automated tools which we can use to make edits. I would like to propose a mechanism using similar tools and processes that could be used to utilise more information from Wikidata, while meeting Wikipedia's policies and guidelines.

Page stalkers of RexxS might have seen a suggestion I posted on his talk page a while ago. At the time I thought it might be a way of monitoring what has changed on Wikidata. I now believe that it should be re-purposed to make suggestions to Wikipedia editors (using article talk pages) about useful information that could be used in the article. I've amended the proposal slightly.

So, this suggestion uses two bots (one working on Wikidata, and the other on Wikipedia) although the two might possibly be combined into one:
 * 1) The Wikidata bot (WD BOT) regularly monitors wikidata items used in a Wikipedia infobox template, for example, the Infobox person template.
 * A list of template parameters and their matching wikidata properties needs to be generated when setting up the bot. Also, a list of wikidata items that need to be monitored needs to be generated. This could be generated from the What Links Here page, or possibly be adding a property to the Wikidata item indicating which infobox is used in an article.


 * 1) If any of the watched properties of the item have changed since the time of the previous check, WD BOT generates a report detailing the changes in a log file somewhere on Wikidata.
 * 2) The Wikipedia bot (WP BOT) regularly checks the log file on Wikidata.
 * 3) If the log file has been updated with changes since the last check, WP BOT writes a report of the changes on the talk page of the appropriate Wikipedia page. The report will also include details of the reference information on Wikidata, preferably in a cite template such as Cite Book.
 * 4) Any Wikipedia editor who has the page on their watchlist will see that the talk page has been updated (provided they haven't hidden bot edits).
 * 5) The Wikipedia editor can then check whether the changes to the infobox are appropriate and referenced and if the information is from a reliable source. They can then make any appropriate changes to the article; adding to infobox, adding text to the article to show the new/changed data in the article, adding a citation to the text using an existing source, or creating a new reference using the information in the WP BOT report.
 * Any changes are recorded in the page history of the article, so appear on its watchlist and can be easily reverted/challenged/discussed.

I've not said anything on the frequency of the bot operations. RexxS pointed out that the Infobox person has roughly 270,000 transclusions, and with 20+ parameters would require roughly 6,000,000 properties to be checked. Bearing in mind that most information in infoboxes is static and that Wikipedia is a work in progress, the bot work could be split into manageable chunks, for example, 40,000 checked each day completing a full check each week, or 10,000 checked each day completing a full cycle each month.

If such a process can be made to work then it may be possible to increase the level of automation, and also to feed back on reference information to Wikidata from the sources used in Wikipedia.

Break here
As an aside, I noticed that a bot (User:ProteinBoxBot) appears to have been updating infoboxes since 2011, which may allay the fears of some that the believe the world will collapse if anyone votes for 1A.

For example, if the article gets improved, and some facts are added that could go in the infobox, then the editor could just add a local value, but it would be much better if they used a yet to be created template, something like:

which would just display "1972", but would provide a bot with the ability to find the details of the ref and pass at least some of it to Wikidata.

Looking at some examples from Wikipedia where the article uses Infobox person/Wikidata exclusively with data from Wikidata, there are a number of possible outcomes (and I've seen most of them):
 * The infobox displays information that matches the article and is referenced in the article. The infobox shows all the information that a thoughtful editor might have chosen to put in the infobox.
 * The infobox displays information that matches the article and is referenced in the article. The infobox shows more information than that a thoughtful editor might have chosen to put in the infobox to summarise the article.

For a particular data item that is shown in an infobox (whether manually input or called from Wikidata): In an ideal world, for most data items, the value shown is the same in the article text and in the infobox, and is supported by an inline citation to a reliable source (with the same value...). Of course, this is not always the case, and can be down to vandalism, unfinished editing, or ignorance of the usual relationship between the body of the article and the infobox.
 * the item can be in the body of the article or not
 * the item can be referenced in the article or not
 * the values in the article and the infobox can be the same or different

For an article with a manually entered infobox an editor can get the article to the ideal standard, with their choice of what appears in the infobox. Then, anyone who watches the article can clearly see any subsequent changes to the article, and can accept or revert or discuss or enter dispute/cleanup/verification templates etc, until consensus is reached. This process can keep the article within policy and guidelines [list here].