Wikipedia talk:VisualEditor/A/B test

This is not a good idea.
In my opinion, this is not a good idea. There are too many bugs, too much uneditable stuff, too much inconsistency between the preview and the actual page, and just too soon. CSS and HTML elements are a disaster, and by disaster, I really mean disaster. See the article Chloroplast as normal, and then under the visual editor. Even without HTML, tables are problematic and do not display right, any image format other than a thumbnail utterly fails, reflist is buggy and ineditable, z-index bugs make a lot of buttons and content physically inaccessible, and the disappearing table of contents can be misleading when trying to align images to their corresponding text. Fix the bugs, then release it to the newcomers who will probably be more scared off by a broken, misaligned, malrendered, misleading, frustrating visual editor then the trusty wikimarkup. By all means, keep improving Visual editor—it's a step in the right direction. Just don't unleash it until it's ready, and for pete's sake, put a bright red button at the top of the toolbar that new users can use to turn the thing off, without having to go dig around in the preferences.—Kelvinsong (talk) 19:20, 17 June 2013 (UTC)

Okay, there was an edit conflict, and I wanted to add:

Look at this diff that happened from a routine one letter spelling correction. Somehow VisualEditor managed to add in 1,207 bytes of useless quotation marks and nowiki tags, while removing tons of random spaces that the Citation tool added in, more distressingly, it added gibberish to book reference titles and wiped out hidden HTML comments.

 

Became

  —Kelvinsong (talk) 19:43, 17 June 2013 (UTC)


 * As the page says, we're fixing a lot of the bugs pre-release. As you say, references are buggy and uneditable, and this is one of the examples of things that'll be stuck in pre-release - you can go test it here if you want. Like you, I have some concerns here, but again: we can turn it off in an instance if something goes irrevocably. wrong. We're not talking about a large pool of users, and we're not talking about an infinite release; it's for 1, maybe 2 weeks, just to get some data on what impact we can expect the VE to have.
 * On the issues there; well, frankly that article is not representative of any other pages I've really seen on enwiki. I see a lot of weird markup hacks (this template for example) to get around the fact that MediaWiki is not built to do what you're trying to do with it...and we tend not to prioritise fixes for things MediaWiki isn't meant to be doing. I've not seen this setup anywhere else, and I expect that the vast majority of articles will lack that amount of complexity and that amount of non-standard formatting. Okeyes (WMF) (talk) 19:31, 17 June 2013 (UTC)


 * You, my friend, have not looked deep enough. Want an example of a nonfunctional, popular template? Try annotated image, which relies on the same CSS trick that the Chloroplast DNA template you linked to does. How about taxobox? That is an extremely common template that the Visual editor doesn't even come close to rendering right. Examples, see Chordate and Bryozoa, the first two pages that link to annotated image.—Kelvinsong (talk) 19:43, 17 June 2013 (UTC)
 * I'm not seeing any issues in Chordate, other than the elements being uneditable. The bug you mention above is a known; again, a lot of bugs are going to be kicked out before we're comfortable with any release. Okeyes (WMF) (talk) 19:47, 17 June 2013 (UTC)
 * Actually, I see a taxobox bug, but not an annotated image bug. Unlike the original issue you raised, it doesn't break the page - people can edit around it. Okeyes (WMF) (talk) 19:48, 17 June 2013 (UTC)


 * Excuse me, but are we seeing the same thing??? At Chordate, the labels are rearranged into a single column and compressed, and the Taxobox is a mess. At Bryozoa, the annotated image is far worse—many labels are just thrown into a list at the bottom, or entirely absent. Again the table is all messed up, the cells are too big and the borders are strange. And remember, those were the first two articles in an alphabetical list—give me time, and I could certainly find a much more horrendous example.


 * While we're talking about tables, if you don't think science articles belong on wikipedia, you can check out 113th United States Congress or United States Senate elections, 2012 for some interesting table bugs. The Senate elections one would be very hard to "edit around", as the table is dissolved into a vertical chain of cells that requires a great deal of scrolling to get around.—Kelvinsong (talk) 19:57, 17 June 2013 (UTC)
 * Okay, so on the images problem; let's break it down here. There are two potential problems.
 * Problem 1 is annotated images breaking to the point where they ruin page structure, and make the page as a whole uneditable. I'm certainly seeing that in your first example, and it is a problem - but it's a problem limited to very few articles, unless you have other examples of something breaking that dramatically. It comes from, well, using MediaWiki in a way it's not expected to be used and not built to be used, with a lot of CSS hacks. Now, is this a problem? Most definitely! But it's not a VisualEditor problem - supporting every possible permutation of everything that can be hacked together is not a realistic goal, particularly for any beta release - it's a MediaWiki wide problem; MediaWiki is not built to produce that kind of beautiful imagery (and it is beautiful). I'll be honest, you're not likely to get support for that in the first VisualEditor release, and I can't promise it'll ever come, because again, it's not something the VE is expected to support, but I strongly encourage you to speak to Fabrice, who I understand is going to do some multimedia work next fiscal year (read: from July). Gorgeous annotateable and link-including images sounds right up his alley, and I hope it works well.
 * Problem 2 is annotated images breaking, not to the point where they ruin page structure, but to the point where they just look off. This is visible in your other examples. This is also a problem, but not a problem that effects editing; the images are templates, meaning they can be edited with the template inspector, and the disruption doesn't persist outside the template structure, meaning the rest of the page is editable with or without the template inspector. It's certainly disappointing to have visual elements the VE doesn't interpret properly, but again, this is a CSS hack; it's not something MW was built for so it's not something the VisualEditor is likely to support, and relying on supported multimedia formats is your best bet. Okeyes (WMF) (talk) 20:26, 17 June 2013 (UTC)

Problems with training class
Yesterday Wikimedia Australia was running an all-day Wikipedia edit training class at Southport, Queensland and, as the presenter, I was at a loss to explain to the students what was happening (as I did not know of this A/B testing). It was a real problem. These were mostly new users. I was delivering a presentation with screenshots that I had made only the previous day and what a number of the trainees were experiencing was quite different. Apart from all manner of strange error messages from time to time, the citation templates did not appear on screen for the students from the drop-down menus, which made it hard for them to add their citations (having just told them how important citations are to Wikipedia). Did anyone bother for one minute to consider the impact on training courses before embarking on A/B testing in this way? Surely an email to chapters and other groups who run training could have allowed the testing to be organised in a way that didn't impact any training courses. Kerry (talk) 05:50, 26 June 2013 (UTC)
 * Can you give an example of a way to run a randomised study that allows people to opt-out based on their presence in a training class? We didn't send a particular note out to trainers or chapters, no; we did, however, have a general watchlist notice up for a week - which should have been visible by all. Okeyes (WMF) (talk) 12:40, 26 June 2013 (UTC)
 * What watchlist notice? I do not even know what you are talking about. This training class was listed on the front page of Wikimedia Australia's web page. Given that the people who come to training classes tend to be older, mostly female and generally less IT-savvy (that is, under-represented demographics among Wikipedia), it is very regrettable that their training class experience was sabotaged in this way by a randomised trial of new users; it is difficult to imagine that they went away with a positive experience of Wikipedia in the circumstances. In particular, I understand that the Visual Editor being tested has known bugs/deficiencies in regard to citations. Why was a randomised trial being conducted with known bugs in something as important as citations. Given the importance of citations to Wikipedia's credibility, naturally we cover citations in a training course, and this did not work at all for a number of people in the class. Also, are the trainees affected still being exposed to the new Visual Editor software? That is, is this trial still running? Because the trainees have been given printed manuals based on the current editor, any attempt by them to edit after the training sessions using the manuals will continue fail if they are still in the trial. As a volunteer, I spent days into the preparation and delivery of that workshop day, plus I had other volunteers and staff from some of our GLAM partners present. I am bitterly disappointed that this project undermined our efforts through a failure to assess the risks of this A/B testing strategy and put appropriate risk management strategies in place. Kerry (talk) 23:59, 26 June 2013 (UTC)
 * So, the watchlist notice is fairly self-explanatory; it's a notice at the top of every user's watchlist that has been in place since 17 June. This follows a watchlist notice since the 7 June which follows repeated posts on the Wikimedia blog, regular announcements to the mailing lists, and consistent coverage in the Signpost almost since the project was conceived - nearly two years ago. I agree that it is regrettable that their experience was not the best, although I would be interested in knowing what the bugs in relations to citations you are highlighting in particular are. The randomised trial was put off precisely because of high-profile bugs; if there was something prominent that prevented citations from working, for instance, we would not have held it. If you are aware of something we've missed, please let us know.
 * The trainees will still be exposed to the VisualEditor; you can have them disable it in their Preferences (under 'editing', right at the bottom), which will make the training manuals more useful. As you'll see us announce in the morning, however, the VisualEditor is soon to become the default editor for Wikipedia. I am sorry to hear of your disappointment, but I'd like to put it in scope. We assessed the risks of the A/B testing strategy. We put appropriate strategies to mitigate that risk in place. With that, I will admit that we did not consider the impact on your training class. Why? Because it's a single training class of a small number of individuals, at one time. While I appreciate that the situation is unpleasant for them, you have to understand the scale we're dealing with, here. The VisualEditor is going to become the default editor for all new contributors on all Wikipedias; some 200 projects, where around 3,000 accounts are created a day on the English-language Wikipedia alone. What we were attempting to do, as you'll know from the A/B test page, is get some data on the impact that new users with a far more powerful and intuitive editor have - what they do, what bugs they face, what it does to their likelihood of staying around. This is not something we can put off or deprecate for a single training session run by a single chapter; we have (by my count) 40 of them, not to mention the outreach work done by planned chapters (9), nascent organisations (24) or groups of willing individuals doing their bit off the radar. To let the testing timetable be set by all possible confounds would mean releasing this software without that testing, or not at all.
 * That's not something we want, any more than we wanted to harm your training session - for which I, again, apologise sincerely - because we're both working for the same goal, here. We're both trying to get newcomers from all backgrounds engaged in Wikipedia's work; to have the sum of all human knowledge built by the sum of all humans. I am deeply sorry that our efforts clashed. Okeyes (WMF) (talk) 02:50, 27 June 2013 (UTC)

From an old version of the FAQ:
 * How was the community notified of this upgrade to the MediaWiki software?
 * Multiple announcements in the Signpost throughout 2013, such as here and there.
 * Multiple announcements at the Village Pump (technical) in 2012 and 2013, such as here, here, here,  here, etc.
 * At the Teahouse, Editor assistance, New contributors' help page, and Help desk (April 2013).
 * At Help:Editing (April 2013).
 * At three dozen of the biggest WikiProjects (May 2013).
 * At the Community Bulletin Board (May 2013).
 * In person at the Amsterdam Hackathon 2013, where features related to categories, templates, references and images were worked on (May 2013).
 * At WikiProject Accessibility and WT:ACCESS (June 2013).
 * Watchlist notice for all registered users about this general announcement (June 2013).

If you somehow managed to miss all of these announcements, plus the watchlist notice specifically about this test and the other announcements made since we quit maintaining the list, then perhaps you could suggest some pages that you do watch, so that they could be added to the list of places to make announcements next time. We can't realistically reach every user, because some people just don't care to keep up with what's going on outside of their specific articles, but we'd like to reach as many as possible. Whatamidoing (WMF) (talk) 11:45, 1 July 2013 (UTC)

Analysis of results
I'm interested to know how the results of this test are going to be evaluated. What, specifically, are we looking for here? The only thing the project page seems to talk about is an increase in conversion rate, which it asserts "is undoubtedly a good thing". I would like more details on how exactly the conversion rate is measured - is this just the number of edits generated, or are we looking at how many new editors actually stick around? Are we going to get any metrics on the ratio of "good" and "bad" edits made with both systems? Are we going to consider the amount of extra work which has to be done by the community (e.g. fixing formatting errors in visual editor edits) compared with the number of formatting errors made by users using the markup system? In regard to this last one, the project page states This test will help give us an idea [...] whether any increase [in new users] causes problems, and if there's anything we can do to help mitigate those problems - how exactly are you evaluating what a "problem" is? - Kingpin13 (talk) 11:05, 26 June 2013 (UTC)
 * So, at the moment we're looking at a lot of things, from two different vectors (User:EpochFail can give more details, but I built some of the dashboards and tracking). We have two sources of data; first, the User Metrics API, which allows for the tracking of tagged cohorts over the long-term (such as the test group versus the control group). This will let us look at short-term metrics on editor engagement, such as how many edits they make, how likely they are to edit at all, so on and so forth - it will also allow for long-term tracking of how likely users are to keep editing. The second is a set of dashboards that runs off MediaWiki data - we're using this to check things like whether blocks increase or reverts increase.
 * Fine-grained analysis such as error comparisons would be nice, but would require hand-coding and probably a lot of manual effort. What I'm doing (today, actually!) is pulling out a long list of VisualEditor diffs and going through them to spot bugs and errors sourced from the VE. Hopefully that might turn up "things that are wrong, but not bugs", as it were. Well, hopefully it won't turn up any, but you know what I mean Okeyes (WMF) (talk) 16:26, 26 June 2013 (UTC)
 * Thanks for the response . I've had good experiences in the past with the foundations analysis of A/B testing and your message here (as well as 's on my talk page) lead me to expect a similar standard here. However, I was half expecting (perhaps foolishly) that this test would be assessed and responded to before further deployment, but your message at VPT suggests otherwise.
 * In any case, while we are on the subject of testing, I wanted to make a comment about the recent fairly extensive use of usertesting.com I have seen from the Foundation. I have to admit, I find it very interesting to get an insight into how editing the wiki feels for new users, and I won't deny that this method of testing seems to have proven very effective. However, I do find it quite demeaning when a lot more weight is given to the opinions of those testers about the direction the site should be taken (who are, at the end of the day, just going through the paces to get a pay cheque), than is given to those of the community who volunteer their time here freely. Perhaps it's just me, but when I read through the Visual Editor "Why are we building this?" pages and see them plastered with (what I assume are) quotes from people who don't even want to edit Wikipedia, I do feel a little like the Foundation is incorrectly prioritising who they listen to. To reiterate, I do think the usertesting data is useful and I would encourage it to continue, I would just like to see a change in the amount of weight given to those tests. - Kingpin13 (talk) 20:45, 1 July 2013 (UTC)
 * Well, I want to make clear that one thing you won't find me doing is going "5 people and a dog say it's hard, ergo it's hard". I think that it's important we weigh "what the community is good at" and "what the community is bad at", and do the same for, well, any group of people. So: the community is fantastic in its understanding of the necessities of wiki-editing, and of the workflows. We'd be foolish not to ask the community's opinion when we're making changes to, well, any workflow, and the editing workflow of the biggest of them all. So we have to involve the community in this conversation - we'd be dumb not to do so. The community is bad at having empathy for newcomers, which is not, I think, anything that is anyone's fault. Me, I've been around since 2005 - I'm sure the same is true of you (if not earlier!). We joined a very long time ago, and the memories of how hard or easy different things on wikipedia are is slightly woozy and contextless in our heads, if we can remember them at all. We signed up as part of a different generation of internet users, when writing in markup is hard. We're great at defining a complete list of "these are the things the wiki demands, and that the community demands". We're kinda going to suck at defining a complete list of "this is what 2013-era internet users demand".
 * User testers are great at giving us that impression - we can throw things at a group of total neophytes and see what they think of the interface, or the workflow, or whatever we're testing. But this feedback isn't the full length and breadth of the data we need, because they suck at thinking things through. Again, not their fault: they've not had long experience to Wikipedia, why things are done a certain way, what certain changes or processes are meant to prevent. We have.
 * So ultimately the answer is to involve both people. I really hope I'm not giving the impression that we're ignoring wikipedians, because I'd like to think we're not. If someone comes up to me and goes "what if it increases vandalism", we're testing that. "What if we get so many newcomers our workflows are overwhelmed" - hey, I'm worried about that too, and thinking hard on it. What I abhor, and what I probably do sound judgmental around, is users who come in going "anyone who can't get markup is a moron we don't want", because it's not within their skillset to make judgment calls like that. Again, empathy, distance. Ideally a lot of the legitimate points people are raising (and there are many legitimate points!) will be addressed by quantitative data, because I'm a quant kinda guy. Whenever I see anyone go "the community demands X!" or, for that matter, "user testers demand X!" my brain hears it as "a non-blind observational study with a tiny number of participants in which the researchers played an active role demands..." :P. It's useful data, but we shouldn't be basing our decisions solely on what editors think or solely what user testers thing. We should appreciate the strengths and weaknesses of each group, and look to utilise their feedback accordingly. Okeyes (WMF) (talk) 21:25, 1 July 2013 (UTC)
 * Did the users in the B wing, who started up with Wikimarkup, have the VisualEditor switched in on them at the launch? If so, you killed any validity of long-term surveying. Adam Cuerden (talk) 00:16, 13 July 2013 (UTC)
 * VisualEditor was made available to everyone on 01 July (assuming browser support, etc.), including all users in this test group. Whatamidoing (WMF) (talk) 00:02, 14 July 2013 (UTC)

Discussion at WP:VE/F
See WP:VE/F -- John Broughton (♫♫) 15:48, 16 July 2013 (UTC)
 * *headdesk* --j⚛e deckertalk 17:22, 16 July 2013 (UTC)