Wikipedia talk:Web statistics tool/Archive 1

A Wikipedia article traffic statistics tool at http://stats.grok.se has been recently quoted in a number of WP:RM discussions.

I've created this talk page as a central point for discussion as to its merits, and how it should be used. I'll put some basic information in the project page as well, but this talk page is the more important one for now. Andrewa (talk) 01:09, 15 March 2008 (UTC)

I've turned off sinebot for this page... Please sign comments, but I don't think you need to sign entries in the list of requested moves nelow. Andrewa (talk) 14:40, 15 March 2008 (UTC)

Requested moves
In order to evaluate the usefulness of the tool for the purposes of WP:RM, I've started a list of discussions in which the tool has featured. It's not exhaustive; Feel free to add to it. We might keep it alphabetical for now. Andrewa (talk) 01:21, 15 March 2008 (UTC)

Results are in the format closed support-oppose result, ratio of page hits as measured by the web statistics tool to next most viewed page. Results of open discussions are not shown.
 * Battery (electricity) closed 3-3 no consensus, 5:1 ratio
 * Breaking closed 1-6 keep, 12:1 ratio (note: Breakin' has 2:1 ratio over Breaking)
 * Burnham Park closed 1-3 no consensus, 3:1 ratio
 * Car wash closed 1-7 keep, 1.37:1 ratio
 * Clarence Williams closed 2-5 keep, 15:1 ratio
 * David Robertson closed 3-1 move, 4.5:1 ratio
 * Europa (moon)
 * Georgia closed 28-25 no consensus, 1.2:1 ratio
 * Go (board game) closed 8-7 no consensus, 9:1 ratio
 * Gordon Bell closed 2-2 no consensus, 4:1 ratio
 * Hillary Clinton archived
 * John Henry admin closed, 4:1 ratio
 * Jumbo Elliott closed 1-3 keep, 7:1 ratio
 * Madonna (entertainer) closed 1-7 not moved, 20:1 ratio
 * Marquette Building closed 1-3 no consensus, 5:1 ratio
 * Pat Ryan admin moved, 1.34:1 ratio
 * Sting closed 7-2 move, 1.38:1 ratio
 * Washington Park closed 3-3 move, 0.82:1 ratio
 * Willie Johnson closed 3-3 no consensus, 25:1 ratio
 * Worcester closed No consensus or Oppose, not moved 6-11 18:1 (of articles which could have the same name), 0.68:1 (of cities with the same name) ratio

Article importance

 * Regression analysis

Known anomalies
Situations where the counts have been obviously in error, possibly because of a program accessing an article a large multiple number of times, and not reflecting actual users.


 * The stats for the 3rd and 4th of March missed some hours entirely, so those two days are omitted.
 * Canine reproduction went from a steady 450 views a day to a steady 170,000 views per day for about two weeks, then dropped to 8,800 views per day.


 * Newton went from a steady 1.5 to 3 k views per day to over 300,000 views for one day only.


 * Resolution normally gets viewed about 500 to 1,000 times a day, was hit 300,000 times a day for three days (same program that hit Newton?).


 * Lambda calculus normally gets viewed about 400 to 800 times a day, was hit 130,000 times on 31 March/1st April.

There are also examples where an article actually receives a large number of views on one day because of media attention. Examples of that are Superdelegate and Julie Dubela, and most weekly TV shows such as American Idol. Heath Ledger went from 4,000 views to 2 million the day after his death and has been steadily tapering off since. Articles that are featured on the Main page clearly get a lot of views because of that exposure such as Xenon which was a featured article on February 10th. Valentine's day got 1.1 million views on, well, Valentine's day.

Other uses
Is anybody clicking through and reading any of these sub-articles?
 * John McCain

How helpful is the tool for WP:RM decisions?
From Talk:Willie Johnson:

''The stat tool should be the only evidence that matters. It tells you who is viewing what pages based on what words they enter to get to them. — Preceding unsigned comment added by TonyTheTiger (talk • contribs) ''

From http://stats.grok.se/about :

I wouldn't base any important decisions on these stats.

More discussion is required before we adopt Tony's recommendation, IMO! Andrewa (talk) 02:42, 15 March 2008 (UTC)


 * Hi Andrew. One instance where the tool might be useful: if someone was arguing that something was "primary use", whereas I thought there was no primary, then the results of the stat tool might help me make my case. On the other hand, the tool should probably not be used as the sole argument that something is primary use. Sam Staton (talk) 11:35, 15 March 2008 (UTC)


 * Agree. Andrewa (talk) 14:41, 15 March 2008 (UTC)
 * Page views can be subject to deliberate and unintended false readings. One example is documented at Louis Pasteur, which got 27,000 views on one day and less than a thousand a day the rest of the month, another at Canine reproduction which got 500 views a day except for a two week period where it got 150,000 views a day. There are other considerations to use other than just more page views when establishing primary usage. For example, George W. Bush gets twice as many hits as George H. W. Bush, but since they are both presidents there is no way that anyone would propose to choose one as the primary usage over the other, although if they did, it would be senior, and not junior that was used, as was done in the case of John Adams and John Q. Adams. --Gonezales (talk) 17:48, 15 March 2008 (UTC)


 * I think that relying on a tool that attempts to measure Wikipedia page views as the sole basis for determining primary topic is inherently problematic for a variety of reasons. The tool itself is still new and not exactly proven to be accurate with a high degree of reliability. Second, it is rather solipsistic. It might sometimes give an indication for when page titles could be better arranged, but by itself the sample is far too limited. IMO, unless the case is so lopsidedly obvious, determination of primary topic should be based on external sources and not simply on internal page views. older ≠ wiser 19:15, 15 March 2008 (UTC)


 * It seems to me that, at the very least, we have a rough consensus rejecting the claim that the stat tool is the only evidence that matters. A proposal along these lines seems to have no chance of becoming Wikipedia policy.


 * There also seems to be a rough(er) consensus that the tool should never be accepted as evidence supporting a particular primary usage, however unbalanced the statistics may be in one direction. And this is consistent with the results in WP:RM recently, where several proposals claiming that the stat tool supported a particular primary usage have been rejected.


 * But there's also the interesting suggestion that the tool may be useful to support a claim that there is no primary usage. Andrewa (talk) 05:53, 17 March 2008 (UTC)

One editor appears to think that the tool is the only criteria that should be used for establishing primary usage. --Gonezales (talk) 03:36, 19 March 2008 (UTC)


 * Yes, exactly. And here this editor has the opportunity to give reasons for this belief, as have any others of this opinion. So far the silence (-> is deafening. Andrewa (talk) 17:27, 11 April 2008 (UTC)
 * I noticed that. Well I think it should be the primary tool for answering questions about primary usage, but that it should be backed up by the number of links. I'm trying to get a number on how dominant a topic should be for it to be called primary, but so far the old school "it's dominant if I think it's dominant" have resisted using it at all, but I think that in time it will become more accepted. Bear in mind though that there actually is no requirement that anyone defend their position - arguments stand or fall on their own merits. 199.125.109.76 (talk) 01:34, 12 April 2008 (UTC)

Suggestions?
Hi, seeing that a page seems to have been created to discuss this I thought that I'd use it to gather suggestions. If you have any suggestions or thoughts on new features you'd like to see in the statistics tool, I'd be interested to hear about it.

Some of the thoughts I've had so far is to be able to view a ranked list of all articles in a category, and to compare several articles directly. Some of the other suggestions have been an API, being able to view most accessed non-existent pages, fixing case insensitivity, and top lists for all languages. I'd be willing to be open to letting people here guide the development if we can come up with a ranked list of things you'd like me to start with. What would be most useful? henrik • talk  22:01, 17 March 2008 (UTC)
 * I'll add my voice to the chorus of thanks. Whilst day-by-day charts are fun, what I really want is to get stats on lots of articles (a Project can have anywhere from 1000-50,000 articles), with short downlad times - but the stats don't need to be timely, and they don't need to be at higher resolution than 1 month.  So for instance, if there's a source of data that updates in April reporting a single number for hits-in-January, that's absolutely fine, just so long as I'm not having to wait 5 seconds to  download the stat for just one article - multiply that by 50,000 articles and that becomes a lot of seconds! I just need to get a rough idea of what articles are hot and what not at some time in the vagely recent past.  Then armed with that knowledge we can set priorities in a Project - for instance which Top importance articles are really the priorities to get to GA first, or which Low importance articles are actually of more interest than the Project thinks. I've just had an assessment bot approved which among other things assesses town articles on the basis of their population - obviously small but world-famous places like St Tropez have to be picked out manually, and easy access to page hits would be a useful way to highlight potential misassessments. With my very Project-centric view of things, ideally the stats should be integrated into the WP 1.0 Bot assessment pages - just run it once a month to update with "last months" stats, or give WP 1.0 Bot access to the data so that any articles added to the Project get the stats when WP1B adds them to that list. I guess when I've got some time http://stats.grok.se/~henrik/wikistats/pagecounts_en_20080201_to_20080223_full_sorted.gz would probably serve me quite well, just pick out the articles I want from that, but it would be nice (for both of us) if I didn't have to download 450MB.... Just a web interface where I can send a text list and out comes page hits in January would be fine.  Thanks again. FlagSteward (talk) 02:14, 22 March 2008 (UTC)

I think the tool itself is great. What needs development is the methodology for its use... some guidelines, based on existing guidelines and of course policy.

At the very least, the tool is useful to suggest paths of enquiry, for example suggesting what might be a primary usage. Evidence as to whether or not this is true can then be gathered from other sources. At the very most, IMO its usefulness falls short of the proposal that it is (or should be) the only evidence that matters in deciding primary usage.

If so, these are boundaries. They're very conservative IMO, and deliberately so, as a starting point. How can we improve on them? Or can we at least agree on them, as the first step to refining them? Andrewa (talk) 17:52, 11 April 2008 (UTC)


 * I think I am pretty much in agreement with Andrewa on this. I think the tool is potentially very useful -- but only as an indicator that requires interpretation -- not as results to be mechanistically implemented according to arbitrary thresholds. IMO, the most useful potential lies in providing evidence that a title should be a disambiguation page rather than a specific article. It seems to me that is easier to demonstrate by comparing page views than it is to conclusively demonstrate that one particular topic merits primary topic status. older ≠ wiser 18:04, 11 April 2008 (UTC)
 * No one would disagree that it has to be interpreted. There is no reason for example for working on a page just because it was mentioned on the Main page a few months ago and got a spuriously high count because of that. Or for jumping on a page of someone who had an untimely death a couple of months ago. Or deciding in April that Valentine's day was the most important article to work on because it got a million page views two months ago. Does the horse leaving the barn ring a bell? 199.125.109.76 (talk) 01:41, 12 April 2008 (UTC)


 * Thanks for the tool, Henrik. Do you still have time to maintain it?  On thing that would be useful would be automating the rankings, or at least explaining that they are based on a particular month, at present December 2010.  See Talk:WikiLeaks for an example of an editor who noticed apparent discrepancies.  --Cedderstk 08:21, 26 July 2012 (UTC)

Move to

 * The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section. 

The result of the move request was: page not moved: no concensus in over 5 weeks. I put in a redirect from the suggested new name. Anthony Appleyard (talk) 08:51, 1 May 2011 (UTC)

Web statistics tool → View count tool — There are many web statistics tools for Wikipedia, but this page apparently deals exclusively with a specific view count tool, and the name of the page should reflect that. Mikael Häggström (talk) 05:21, 20 March 2011 (UTC)


 * As the creator of this trivial but useful project page I can see no point in objecting to the move if that's what anyone else wants, but no point in moving it either, frankly! Just please leave a redirect, as the page does serve a purpose  and the title was chosen simply because that's what people were calling it on talk pages and particularly of course in RM discussions. Move it and move on to something useful is my advice. Andrewa (talk) 02:56, 28 March 2011 (UTC)
 * The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.

Removed note on main content being on talk page.
I removed the note ''This page is probably less important for the moment than the talk page so far. Feel free to expand this page, and remove this notice when it's no longer needed.'' to encourage filling the actual article instead of the talk page. This talk page is already too long to conveniently read entirely, so it's better to sort out the important parts to the actual article. Feel free to complement with content from this talk page. Mikael Häggström (talk) 05:28, 20 March 2011 (UTC)

Seems to be nonfunctional
Apparently, this tool calculated its last view counts on Mars 15. Does anyone know if this state is permanent, and if so, are there any other view count tools out there? Mikael Häggström (talk) 05:37, 20 March 2011 (UTC)
 * It's up and running again now. Mikael Häggström (talk) 08:40, 21 March 2011 (UTC)

"Views"
If it counts only "views", how does it deal with redirects that just redirect and don't get viewed? Kauffner (talk) 10:07, 26 March 2011 (UTC)
 * They get counted as views, as you technically 'view' the redirect for a fraction of a second. The Cavalry (Message me) 21:11, 4 September 2011 (UTC)
 * Yeah, but the problem is the hit only counts for the redirect. See Wikipedia article traffic. B137 (talk) 06:00, 19 November 2015 (UTC)

Merge discussion for Pageview statistics
An article that you have been involved in editing, Pageview statistics, has been proposed for a merge with another article. If you are interested in the merge discussion, please participate by going here, and adding your comments on the discussion page. Thank you. Rezonansowy (talk &bull; contribs) 15:03, 15 January 2014 (UTC) This merge has not been done yet, meanwhile there are 2 more pages generally dealing with the same subject: what do you think? --.js[democracy needed] 02:07, 6 February 2016 (UTC)
 * maybe you? Thanks in advance! --Rezonansowy (talk &bull; contribs) 15:33, 19 January 2014 (UTC)
 * Which page do people want Pageview statistics to be merged with? In "please participate by going here,", "here" is a redlink pointing to Generally, these articles are about the same thing.. Anthony Appleyard (talk) 18:23, 19 January 2014 (UTC)
 * I mean merge this page (Web statistics tool) with Pageview statistics because generally, these articles are about the same thing. --Rezonansowy (talk &bull; contribs) 23:10, 20 January 2014 (UTC)
 * Web statistics tool (here)
 * Pageview statistics
 * About page view statistics
 * Statistics


 * To me, it would make sense to merge About page view statistics into Pageview statistics, and to merge Web statistics tool into, so we end up with two separate pages. Merging everything into  wouldn't make things better, IMHO. &mdash; Dsimic (talk &#124; contribs) 06:28, 6 February 2016 (UTC)


 * @Dsimic: Thx for your thoughts, I reconsidered and replaced the merge messages by a better hatnote. Anyway this will be basically evolving in the next time and perhaps it will even make sense to keep several pages. One for documentation of the old stats.grok.se page that will remain very heavily linked and mentioned – and might also be recovered, who knows? [WP:Stats.grok, WP:Stats.grok.se, stats.grok.se, Stats grok, Template:Stats.grok.se ...] On the other hand we don't know yet, which new tools will establish, perhaps more than one for different requirements and different uses? So let's see how the relating pages will be reflecting those changes. But I do support the merge of About page view statistics into Pageview statistics as you suggest. --.js[democracy needed] 13:23, 6 February 2016 (UTC)


 * You're welcome. The  looks fine to me, and I'd strongly suggest that About page view statistics is merged into Pageview statistics.  It's simply redundant to have About page view statistics lying around, if you agree.  By the way, let's hope that stats.grok.se will become regularly updated soon, it's a truly great utility. &mdash; Dsimic (talk &#124; contribs) 16:16, 7 February 2016 (UTC)


 * Fine, so we agree on that, do you suggest we put a merge template in those 2 again and wait, or should we boldy redirect "About page view statistics" to "Pageview statistics" and copy the relevant parts to there? --.js[democracy needed] 23:02, 8 February 2016 (UTC)


 * IMHO, it would be the best to place merge to/from templates, pointing to this very discussion, and let them there for a couple of weeks. That way, other editors will also be noticed to express their opinions.  Hope you agree. &mdash; Dsimic (talk &#124; contribs) 03:18, 9 February 2016 (UTC)


 * Concur with Dsimic, great idea. I appreciate the nice new utility and the restoration of the stats!  My only suggestion is that the blue color be made darker, as it is so faint I can barely make it out on my screen without having to look from a different angle.  Also, the orange and brown tend to blend together.-JGabbard (talk) 04:12, 9 February 2016 (UTC)

Stats missing for 7–10 February 2015?
It doesn't seem to matter what article I check, the stats are missing for 7–10 February 2015. Does anyone know what happened? Curly Turkey ¡gobble! 05:47, 14 February 2015 (UTC)

Ninety-day limit
[moved from Wikipedia:Web statistics tool] I don't know whether this is the right place to ask. I just looked up stats for John Yudkin, and the tool offered now under tools --> page information --> page view statistics offers only the last 90 days. Is that a temporary thing or is that all that's on offer now? Pinging, whose names are on the page. SarahSV (talk) 23:11, 9 April 2016 (UTC)
 * Should probably use the talk page rather than writing here, but anyway you can query for any date range. Click on the date range field and select your dates, or simply type them in. Best &mdash; MusikAnimal  talk  00:03, 10 April 2016 (UTC)
 * sorry about posting in the wrong place. I got confused by the signature on the other page and didn't look properly.


 * Thanks for the reply. I can't seem to get a date range to work before 2015. I'm trying to find stats from 2008 to the present. SarahSV (talk) 00:18, 10 April 2016 (UTC)
 * As it says at the bottom of the date picker, that tool has no statistics prior to August 2015. The reason for this is that the data prior to that date is not comparable to the data after that date. I believe Analytics changed the definition of "page view" at some point in order to make the data more accurate. See Research:Metrics standardization for more information. If you want to view the old stats anyway, you can do so at https://stats.grok.se. Sorry I don't have a better solution for you. Kaldari (talk) 00:42, 10 April 2016 (UTC)


 * Thanks, . I don't understand what it means that the data prior to a certain date isn't comparable with it after another. Can the Foundation host http://stats.grok.se/? If seems not to be working, but it was a wonderful tool. Or are you saying it perhaps wasn't accurate? SarahSV (talk) 01:20, 10 April 2016 (UTC)
 * I basically mean that the old data wasn't accurate, although that might be an over-simplification. I'm afraid it isn't likely that the Foundation will host http://stats.grok.se/. The current priority is improving the new tool at http://tools.wmflabs.org/pageviews/, for example, adding the ability to include stats for all redirects to a page. Sorry that I can't be of more help! Kaldari (talk) 02:07, 10 April 2016 (UTC)


 * Okay, thanks, Kaldari. It's a real shame that http://stats.grok.se/ isn't available. We were able to use it to observe trends; the media did too. SarahSV (talk) 22:05, 12 April 2016 (UTC)


 * , there's a similar problem with the article revision stats. I just looked up someone who has made thousands of edits to a talk page (to ask them to step back), but the stats show a much smaller figure, because they don't go back before 2015. But the page mentions that only at the top, easily missed, so that it's leaving a false impression. Who do we need to approach to get these tools working again? SarahSV (talk) 21:55, 15 April 2016 (UTC)
 * Which tool are you talking about? Kaldari (talk) 23:27, 15 April 2016 (UTC)


 * Sorry, I can't show you what I mean now, because I'm getting a "no web service" message. For example, try Talk:Female genital mutilation --> page information --> revision history statistics. Sometimes we're taken to WikiHistory, sometimes a different tool, and sometimes no service. SarahSV (talk) 23:50, 15 April 2016 (UTC)

Update sidebar link
It looks really rubbish that there's effectively a dead link on the main sidebar of every page - it is not time that the "Traffic stats" link was updated to the Labs stats tool? Or is it not ready for that yet? Le Deluge (talk) 12:32, 12 April 2016 (UTC)


 * Do you have a user script that adds it? There's no such link for me (on either account) or for anons… Jdforrester (WMF) (talk) 16:18, 12 April 2016 (UTC)
 * Heh, it's funny how little customisations become so embedded in your user experience that you forget that they're not actually part of Wikimedia.  I installed User:Smith609/toolbox.js some 5 years ago, which gives you Traffic stats, Edit history stats and Page watchers in the sidebar. I've dropped him a line but since he's not on much Wiki much these days, I can sort myself out with a local version if need be. Cheers. Le Deluge (talk) 19:40, 12 April 2016 (UTC)
 * No worries at all. If you need some help just shout. :-) Jdforrester (WMF) (talk) 20:00, 12 April 2016 (UTC)
 * The new tool is actually linked for all users on all pages at the german wikipedia, but before that was implemented, I created a js, that does just that for the new tool. It is by now obsolete for german Wikipedia, but you can add it to your global.js and get the information everywhere in wikimedia sites: w:de:Benutzer:°/mwArticleStatistics --° ( Gradzeichen ) 09:59, 13 April 2016 (UTC)

Filtering transient spike anomalies
the WP:POPULARLOWQUALITY list suffers from the "known anomalies" problem of transient spikes. There is a general solution to a closely related problem in the R code on pp. 191-2 of Xu et al. (2014) but I think it's much easier to use some measure of whether a spike less than four days long have standard deviations above, say +6, or one of the algorithms in. Can we get a separate API resource access for a filtered top-1000, please? EllenCT (talk) 17:09, 23 May 2016 (UTC)


 * I think that sounds like a reasonable request to make of the Analytics team; would you like me to copy your request into Phabricator for you? Jdforrester (WMF) (talk) 19:57, 23 May 2016 (UTC)
 * yes, please. I should also mention that the link from this page to the "PageviewAPI" is broken so I could not figure out the right person to ask. Thank you so much! EllenCT (talk) 21:37, 23 May 2016 (UTC)


 * Done – T136049; you may wish to subscribe if you can. The link is now fixed to be to https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI (note the different wiki, because it's a Wikimedia service rather than a generic MediaWiki piece of code). Jdforrester (WMF) (talk) 23:00, 23 May 2016 (UTC)

Top 100,000?
before I make a formal request for it, is there any way you could please ask around the Analytics team to find out whether a top 100,000 list would be feasible to produce? WP:POPULARLOWQUALITY would be far better if it could show just Stub- and Start-class predictions from those, instead of C-class predictions from the top 1,000, many if not most of which would probably not be considered particularly urgently in need of improvement. EllenCT (talk) 12:22, 25 May 2016 (UTC)


 * I'm almost certain that would be vetoed on reader privacy grounds, sorry. The "top 1000" threshold was chosen to avoid a lot of such issues. Jdforrester (WMF) (talk) 21:37, 3 June 2016 (UTC)
 * Thank you so much for your kind help. https://dumps.wikimedia.org/other/analytics/ can not leak anything substantial about the hundreds of millions of individual readers, as far as I can tell. Understanding readers is a top priority for ranking potential improvements. The top 20,000 should include the 100 most popular articles that ORES predicts are stub-class at least 95% of the time. If making a temporary daily top 200,000 tally can be done in a reasonable amount of RAM, I would like to expand that to the most popular 1,000 predicted stubs. The good news is that there is plenty of data to do that from what is available at present.
 * Here is an example of the top 594 stub predictions sorted by both pageviews and ORES' Start class confidence. Sorry I couldn't filter redirects and disambiguation pages. EllenCT (talk) 05:35, 4 June 2016 (UTC)

Link not working
when I click on this tool - http://stats.grok.se/ - these words pop up - internal server error Red Rose 13 (talk) 13:15, 1 March 2017 (UTC)