Talk:Usage share of operating systems/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2 Archive 3 Archive 5

I put in a list of reasons why the use of browser statistics is unreliable in estimating the proportions of PCs using different operating systems and followed with an example comparing results from surveys, and hits from a particular web site, w3schools.com. I thought the single reason, unreliable reporting, was incomplete. Having viewed server logs, I know it is often impossible to get it right. It would be better to have survey data. Pogson 00:46, 14 August 2007 (UTC)

If we are going to use web stats from unknown pages out there, I wonder whether it would be feasible/desirable to include the stats from Wikipedia. Wikipedia is widely used by everyone so would be a neutral, high-volume site to collect statistics. That might be a better sample than the unknown sites in the three collections in the article. Pogson 01:32, 16 August 2007 (UTC)

I think methodology is a really big problem. When I look at the visitors of two sites that I help hosting (both small, but not too small, 1500 unique visits monthly) I see big differences that are a result of the different public, but also a real problem with more and more spambots and harvester bots that are undistinguishable from windows/msie users. On the one site (which has more forms and not yet ip blocking) bots are at least 40%, perhaps even more.
If I visit for example the XiTi Monitor and look for its methodology, it simply states:

Methodology:
This survey of operating systems was conducted on French-speaking websites.

That's not very insightful. --Kornelis (talk) 07:56, 13 June 2008 (UTC)

Other methodologies

Please include some information from sources beside web browsers for comparison purposes. — Omegatron 21:02, 13 October 2007 (UTC)

What other sources are there? --Harumphy 12:09, 14 October 2007 (UTC)
Surveys? Sales records? The numbers will vary, but it would be good to have for comparison purposes. — Omegatron 16:48, 14 October 2007 (UTC)
If you think any such info has been published, please point us to it. --Harumphy 16:26, 16 October 2007 (UTC)
Are these statistics worldwide or U.S.? It appears the stats from w3counter are but I do not know about the rest.DrRisk13 15:20, 22 October 2007 (UTC)
Net Applications and W3 Counter both say they're global, XiTi says it monitors "francophone websites", W3 Schools stats are just for its own site. --Harumphy 14:15, 29 October 2007 (UTC)

Usage share of server operating systems

Should this article not also exist to juxtapose the 'desktop operating systems' comparison? Altonbr (talk) 23:01, 7 January 2008 (UTC)

The problem is that apart from web servers, most servers aren't connected to the internet; so how do you collect reliable data? You can't go by licenses sold, since FOSS software typically isn't sold by license. The only reason this article exists is that web browser user agent strings give us a way of estimating desktop share; no such equivalent exists for (non-web-)servers . -- simxp (talk) 00:26, 22 January 2008 (UTC)

The OS Share Data

The article divides Windows into its major reviesions, but groups both MACOS and OSX into one heading; OSX. This is inaccurate. Either the column should reflect only actual OSX data, or it should be labelled something other than "OSX" but "Apple" or somesuch.

Wageslave (talk) 20:59, 11 April 2008 (UTC)

  • The sources are themselves a bit vague about which versions of Mac OS they're talking about. I suggest that in the individual tables we just cite what the sources themselves say, and just conflate everything into "Mac OS" in the summary table. --Harumphy (talk) 07:56, 13 April 2008 (UTC)
That may work better.
Wageslave (talk) 14:23, 13 April 2008 (UTC)

Some sources e.g. NetApplications separate Android from Linux while others don't. Technically it's still Linux and while primarily used on smartphones at the moment it will be used in netbooks too. Should Android and Linux be summarized in the main table but separated in individual tables where possible? —Preceding unsigned comment added by 91.9.216.22 (talk) 02:37, 6 June 2009 (UTC)

Mean

Someone calculated the median and I replaced it with a mean to give a much better average, and someone removed or never added W3 Schools into the summary table for some reason so I added that in. My question is does anyone know if there's a way to have spreadsheet-like calculations in tables? That would be neat, so that each time the figures change, you don't need to recalculate the mean value. Just a thought. :) Yfrwlf (talk) 17:24, 23 July 2008 (UTC)

Had it occurred to you that the table was that way for a reason?
W3 Schools only collects stats for its own site, and is thus not anything resembling a valid sample. That's why it was removed from the summary table.
A mean figure is subject to undue influence from 'wild' figures that aren't supported by other sources. The whole point of a median is that it has a bias toward figures that are in the centre of the range and discounts idiosyncratic highs and lows.
Please don't change the table again without consensus here first.--Harumphy (talk) 18:51, 22 July 2008 (UTC)
Obviously, I couldn't see the reason for it. A summary table is supposed to be just that, a summary of the details which follow. If W3 Schools isn't considered to be a valid source for this Wikipedia entry, then why is it included in the sections below? Clearly there seems to be some disagreement as to what qualifies as a valid source here. If the qualifications here are that a site has to keep tabs on "multiple websites" or whatnot, then W3 Schools should be removed, or it should be clarified that the summary should only include the sites which meet this criteria. Understandable that a median is a bit less bias than a mean though as it's influenced a little less, agreed. :) Yfrwlf (talk) 17:24, 23 July 2008 (UTC)
There are relatively few sources of published stats, and this article makes the most of those few. The W3 Schools table is included because, despite the caveat about it recording visits to its own site only, it may be of interest even though it is not statistically significant. Because it is not statistically significant, it isn't included in the summary table. Maybe we should call the summary table something else.--Harumphy (talk) 09:14, 23 July 2008 (UTC)
I still think of it as indecision as to whether or not W3 Schools is a valid source, I mean any single website could monitor it's own web traffic hits and most do, they just don't display it publicly, but if this is a place for any and all statistics except for the summary table, so be it. I was under the assumption that W3 Schools was a big site with a random audience, and that's why anyone gave a care about it. The site is for web development, so you could make claims like "more web developers use Linux, so the site is biased", but that's still a bit extreme to say since the site is a general web development learning site and is of interest to anyone, it's just industry-specific. It begs the question of what types of sites do the other monitors monitor though, are they specific to certain industries or operating systems, or do they seem to take a more random slice of web traffic? Of course as the number of sites goes up, the probability of being biased goes down, even though it's still entirely possible if they cater to specific things. You can nitpick anything to death, so perhaps any criticism like that can go under each section if any is found like how it's stated in W3 Schools that it's only monitoring one site, and for now perhaps we can call the summary table something like "Summary of multi-site traffic monitors" or "Summary of multi-domain traffic monitors" perhaps? If no one objects and can think of a better name, I'll change it to the latter which I assume is the better of the two. Yfrwlf (talk) 17:24, 23 July 2008 (UTC)
It's not indecision. It may be valid to cite W3 Schools for one purpose (e.g. passing interest) but not for another (e.g. something from which the median is derived). Your proposed title is clumsy and it makes an arcane point at the expense of simplicity, clarity and brevity. IMHO. --Harumphy (talk) 21:17, 23 July 2008 (UTC)
Then under "Summary table", it should be stated that it only includes those websites that monitor multiple domains, so that someone else doesn't edit it and make the same changes I did. Yfrwlf (talk) 00:50, 24 July 2008 (UTC)
I've tried a slightly different approach - by explaining the inconsistency under the W3 Schools entry rather than under the summary table. --Harumphy (talk) 08:02, 24 July 2008 (UTC)
Currently the page has your explanation as to why the W3 Schools entry isn't in the summary table, and the W3 Schools in the summary table. One should be removed for consistancy. Though after reading this, unsure as to which to do. SiCoe (talk) 21:16, 9 December 2008 (UTC)

Graph Data

Representing the data as graphs would much easier to visualize as apposed to numbers. I guess this is one of those things someone has to care enough about to do. —Preceding unsigned comment added by Devon Fyson (talkcontribs) 03:23, 13 September 2008 (UTC)

It would be difficult to do well and it would need keeping up-to-date at least once a month.--Harumphy (talk) 09:03, 13 September 2008 (UTC)

Summary table

Jdm64's edits are not good ones, because (a) W3 Counter and OneStat only publish data for some versions of Windows, and not for Windows as a whole, and (b) the medians are incorrect. (The correct way to calculate the median of four values is to discard the highest and lowest and take the mean of the middle two.) The version of the table before his edits accurately summarised what the sources say and had correct medians. Therefore I intend to revert unless there is a consensus here in favour of Jmd64's edits.--Harumphy (talk) 11:09, 5 October 2008 (UTC)

Please do so, as you point out the current table simply isn't correct. Having a summary of the separate Windows 98/2000/XP/Vista percentages is actually quite useful. Thanks. Hexene (talk) 19:44, 5 October 2008 (UTC)
24 hours have passed (approx), so I've reverted to page to the 'status quo' version in the absence of support for Jdm64's version.--Harumphy (talk) 10:44, 6 October 2008 (UTC)
Sorry for not seeing this reply sooner. (1) I'll agree with how you calculate the average. (2) Someone that is looking for a Summary of the different operating system's market share would want the summary to include larger categories: ie windows, mac, linux. Then, the detailed statistics can be seen in the other sections of the page. With your format one would have to calculate the total market share for windows every time the numbers changed, instead of the more elegant format of the summary actually Being a Summary. Also all four sites give stats for windows vista, xp, 2000, me and 98. Henceforth the total can easily be summed (what I did). If the broken out stats for windows is really needed, then at the very least another column can be added for the total windows market share. (both sides pleased). Also some of the sites list window 2003, nt and 95 which are not in the summary, why are those excluded? Because it's suppose to be a summary. Jdm64 (talk) 01:47, 7 October 2008 (UTC)

I edited the table, but this time giving more stats, not removing any. I added a total windows and also a median. This new table should now give a better summary with all information that was requested. Jdm64 (talk) 02:29, 7 October 2008 (UTC)

Also added an "other" field by 100% - sum(windows, mac, linux). I feel this is necessary if we are going to exclude windows nt, 95, me, 2003, mac intel and mac ppc from the summary but have them in the detailed stats for the different sources. Jdm64 (talk) 02:48, 7 October 2008 (UTC)
This is getting ridiculous. First you 'improve' the table by removing half the data. Now you 'improve' it by putting in twice as much. In both cases you've added data about the 'total Windows' and 'Other' share which are not logically deducible from the cited sources. (W3 Counter only lists the top ten OSs, OneStat even fewer. In neither case is it possible to deduce how to allocate the unlisted share between Windows and Other.) Please stop messing with the table and please don't make any changes until it is agreed here first.--Harumphy (talk) 13:27, 7 October 2008 (UTC)
Sorry, but you said "don't revert" and I didn't, only added info (wasn't clear). My conserns are as follows:
  1. All the sites list at least (windows vista/xp/2000/me/98, mac and linux) but the table doesn't show windows me. So, why not include it? On what grounds do we select what os is listed. Because windows me is on all the sites.
  2. The percent's don't add up to 100%, so why not add other/error to show some form of accuracy rating, or even the market share for the total of the others that includes windows 2003/95/nt/ce, *bsd, unix (which are on at least some of the sites). This could be calc by just 100%-(all other fields). There shouldn't be worry of error because all the sites list all the top OSs, and the largest OS not listed was <0.20%, and the stats on the sites are even known to be inaccurate because of how they're calculated (browsers). This will improve the summary because one can easily see how accurate or precise the site is and where the bias is for the pool the stats come from.
  3. From the view of a reader (not the view of an editor) one would want the summary to include summed up totals for windows/mac/linux. Windows 98 is on the table but has little relevance because of its small market share, outdated version of windows, and the fact that a summary should only include specific os versions. Then the reader would scroll down and look at the detailed summaries for the different sites.
  4. All version of windows are still windows. Dividing it up and not showing a total breaks the flow of the summary. Why not break up mac into ppc/intel? Or linux into ubuntu/fedora/suse. The table shows a total for linux, but different versions of linux can be as different as the versions of windows. So for consistency windows should have a total just like mac/linux. There should be little uncertainty of the total for windows because all the sites list the top versions of windows. The largest non-listed version ~<0.20%, and guessing about "other" would stand on similar reasoning.
  5. Mean is used to show an "average", but given how the stats are created in the first place (browser) the smaller the market share the greater the error in the data. Maybe median wouldn't work either and a more statistically accurate formula would give more accuracy (ie. using standard deviation and such). This is also important because some of the sites are updated less frequently and the "age" of the data would have an effect when calculating the "average", especially for small market shares.
  6. The order of the OSs should go from largest to smallest. So the rankings of the different OSs can be comprehended at a glance. Starting with win98 is illogical because its relevance is less than that of others like xp or vista. Reading from left to right with high to low priority is more logical and consistent with how lists and rankings are created. Other orderings could be by date of release. The current format is inconsistent.
  7. Maybe two tables would better convey the summary. One for the top 3 OSs the readers would be interested in, then another including versions of windows with mac and linux. The first table compiled from the second.
I'm not trying to vandalize the page, but I reason the current version is inefficient/inconsistent. It seams like your format is the only "correct" way. I'm not saying my way is better, but the current format should be improved. Jdm64 (talk) 01:10, 8 October 2008 (UTC)

Jmd64 - many thanks for your detailed post here, which deserves an equally detailed response. Taking each of your points in order:

  1. The lack of Me is a historical accident. At one time we included W3 Schools in the summary table, and they don't have stats for Me. We dropped W3 Schools because it samples only one site - its own - and it seemed wrong to treat this as being on a par with the sources that sample many sites. I agree that you have identified an anomaly which needs to be sorted out.
  2. It is obvious from the weak correlation between the figures from the different sources that must be a great deal of error in at least some of them. The table makes this quite clear - there's no need to labour the point. I wouldn't object strongly to an 'other' column (100% minus the other columns) but I think it's unnecessary.
  3. You speak 'from the point of view of a reader' but in reality are merely expressing an opinion. I agree that 98 is outdated and small in share. The same is true of Me. We need to consider how to treat 98 and Me consistently.
  4. We split up Windows by the major versions, but not MacOS/Linux because (a) the source data is available like that, (b) because Windows has around 90% share and has two versions which have bigger share that MacOS and Linux put together, and (c) people want it. We haven't got an 'All Windows' column because that data is only available from two of the four sources cited.
  5. Median has an advantage over mean in that it prevents a single high or low source figure skewing the derived figure. It's not worth doing standard deviations because (a) it would mean little to most readers and (b) the raw data isn't up to it.
  6. The sort order of the data is first from highest to lowest (Windows/Mac/Linux) and then from oldest to newest (Windows versions). If we don't do this then the Mac and Linux columns will end up interspersed among the various Windows columns, which will look awful.
  7. I disagree. I think the 'inconsistency' between multiple columns for Windows and single columns for MacOS/Linux is reasonable in view of Windows' 90% share and the fact that the sources we're citing tend to do it that way. Using two tables would solve a problem that doesn't really exist.--Harumphy (talk) 11:32, 8 October 2008 (UTC)

So, I currently see the following alterations that you might approve:

  1. The inclusion/exclusion of windows 98/ME needs to be sorted out. (I would lean towards excluding both to make the table more concise; and because of how outdated and low the market share is for both)
  2. An "other" column could be included.
  3. The sort order could be as follows (Windows XP | Windows Vista | Windows 2000 | Windows 98 | Mac OS | Linux) which would not break the "windows group" and would flow from high to low; new to old. Which would maintain consistent high to low priority.

Please give a yes/no reply for each alteration (#1 might need further consideration) that you approve me changing. Jdm64 (talk) 22:00, 8 October 2008 (UTC)

It isn't for me to "approve" anything. Please see WP:OWN. WP works by consensus. It's a pity that there's only two of us in this discussion. Is anyone else listening? In response to your three numbered points:
  1. The preamble to the article says that OSs are included if their share ever exceeded 0.1%. On this yardstick both 98 and Me should be included. If we want to exclude them then logically we ought to reconsider that yardstick first. (It currently applies to all the tables including the summary.)
  2. An 'other' column is in my view unnecessary clutter, so I would prefer that it is not included.
  3. The beauty of the present column order is that Windows versions are grouped on the left, and the 'current' OSs (XP, Vista, MacOS, Linux) are grouped together on the right. With your proposed sort order the obsolete versions of Windows would be in the middle, which would detract from the overall utility of the table IMHO.--Harumphy (talk) 08:22, 9 October 2008 (UTC)
I know that nobody owns/controls an article, but since you've been the only one to object to my edits and you have significant portion of the edits, it seems like anything you object to would be reverted by you. And yeah, I wish there were other people in this discussion to have a more balanced discussion. But to reply:
  1. If 0.01% is the marker then there are a few inconsistencies. The Net Applications might soon need a SunOS column as it's almost over 0.01% (not an inconsistency but a warning of potential future clutter). XiTi definably needs Windows 2003, and maybe "Other windows versions" as it's both listed and well over 0.01%, and maybe soon Windows 95. My point is, if 0.01% is the lower bound then we might soon be adding way to many columns to the page, making the page cluttered (ie. an increase in market share for *BSDs/Solaris). But even if that doesn't happen is the Summary suppose to list all OSs > 0.01%? Or is that to be relegated to the detailed summaries? In which case ME should be added; and maybe 95/2003 and put a --- or N/A in the location were the site doesn't list that OS.
  2. As you stated the inclusion/exclusion of "other" is an opinion. So, I suggest that it be added, and then afterward if any objections we can discuss the issue further at that time. Because how it is now it's my opinion against yours, as nobody else seems to care/notice. The change might help conjure up more opinionated people to further the discussion.
  3. I guess I mostly agree with you. But still, the first column being 98 frustrates seems backwards to me. We could have your sort order, but in reverse: (Linux | Mac OS | Windows Vista | Windows XP | Windows 2000 | Windows 98 | Windows ME | Other) Which would put the most "interesting" columns first and not break the flow since it's just the current version in reverse; and might even improve it by moving 98/ME more out of sight, and bringing focus to Linux/Mac/Vista. My reasoning is that one reads from left to right.
So since nobody "owns" a page, I'll (1) Include windows ME (2) Add an 'other' column (3) reverse the current sort; If there are no objections in 24hrs. Jdm64 (talk) 02:59, 10 October 2008 (UTC)
OK, I will chime in with my 2 cents into this discussion:
  1. . I find it valuable to have column for combined market share of all versions of Windows. If only to be able to track Windows market share over time. With market share moving from, say, Windows XP to Windows Vista - it is difficult to tell today looking at the tables, whether Windows gaining or losing overall market share.
  2. . I don't think 'Others' column is valuable at all - it doesn't tell much. If there will be a new uprising OS, it will get its own column.
  3. . I prefer not to include obscure OS's such as 98 or ME. The market share of Web browsers page solved it by having separate tables for separate time periods, when some browsers were more/less prominent.
  4. . No opinion on other issues (I didn't fully understand median vs. mean discussion, but I don't care either way - I don't think either of them is important).
Wikiolap (talk) 05:26, 10 October 2008 (UTC)

In response to Jmd64's points:

  1. The threshold is currently 0.1% not 0.01%.
  2. Since you posted, Wikiolap has agreed with my view that an 'other' column is not useful.
  3. If we get rid of 98 and Me from the summary table, that will solve the problem.

Reasoning-wise, one reads from left to right, and time flows from past to future. It seems to me logical to correlate the two by having the oldest stuff on the left. Whether we reverse the order or not, we need to ensure that the column order in all the tables is consistent.

In response to Wikiolap's first point:

  1. We only have two sources of 'All Windows' data: Net Applications and XiTi. If we introduce an All Windows column, I suggest there should be data from just these two sources with the other two rows containing empty cells. With just two sources the median would equal the mean.

Putting all this together, I suggest we:

  1. Drop 98 from the summary table.
  2. Add an All Windows column to the table, between the existing Windows columns and MacOS/Linux, using data from Net Applications and XiTi only.--Harumphy (talk) 08:49, 10 October 2008 (UTC)

My mistake about the percent, but that still makes XiTi needing both Windows 2003 and "Other windows versions" for consistency. And it seems like my original request for an "all windows" is gaining support. Just a side note: If we're going from left to right, old to new, that would put the order as following with initial release (Linux 1991, Windows2000 2000, MacOS X 2001, Windows XP 2001, Windows Vista 2007). I'm ok with your suggested ordering, but that would not really correlate to time ordering. But I'm also fine with ordering it by time, which with keeping windows together, would be: (Linux, MacOSX, Total Windows, 2000, XP, Vista). Other than that I'm fine with dropping 98 and adding total windows. Jdm64 (talk) 09:22, 10 October 2008 (UTC)

I think we're in agreement now. By 'oldest to newest' I was thinking 'time since last supported' rather than 'time since first released', but it doesn't matter. This discussion has forced us all to think this through carefully, and we'll have a much better article as a result of it. So much better than edit warring.--Harumphy (talk) 09:45, 10 October 2008 (UTC)
My preference for the ordering would be not 'from oldest to newest', but 'from Windows to everything else', which is essentially order by popularity/market share - I think readers are more interested in seeing first the most widely used OS, and only then more obscure ones. Wikiolap (talk) 17:32, 10 October 2008 (UTC)

So from the current discussion I see the following possible versions. Note: (%) will represent that the data is not stated explicitly, but calculated implicitly, or however else we want to note this fact. I would vote for #1, unless someone comes up with another version: Jdm64 (talk) 20:11, 10 October 2008 (UTC)

1

Source Date Windows XP Windows Vista Windows 2000 Total Windows Mac OS Linux Source
Net Applications September 2008 68.67% 18.33% 1.89% (90.23%) 8.23% 0.91% [1]
W3 Counter September 2008 73.04% 12.30% 2.24% (89.10%) 5.62% 1.99% [2]
XiTi Monitor August 2008 71.22% 18.99% 1.56% 93.61% 4.10% 1.16% [3]
OneStat April 2008 78.93% 13.24% 2.82% 95.94% 3.36% 0.42% [4]
Median September 2008 72.13% 15.79% 2.07% 91.92% 4.86% 1.04% ---

2

Source Date Total Windows Windows XP Windows Vista Windows 2000 Mac OS Linux Source
Net Applications September 2008 (90.23%) 68.67% 18.33% 1.89% 8.23% 0.91% [5]
W3 Counter September 2008 (89.10%) 73.04% 12.30% 2.24% 5.62% 1.99% [6]
XiTi Monitor August 2008 93.61% 71.22% 18.99% 1.56% 4.10% 1.16% [7]
OneStat April 2008 95.94% 78.93% 13.24% 2.82% 3.36% 0.42% [8]
Median September 2008 91.92% 72.13% 15.79% 2.07% 4.86% 1.04% ---

I prefer 1, modified as follows to improve the headings and the Total Windows figures to produce 3 below:--Harumphy (talk) 07:34, 11 October 2008 (UTC)

3

Source Date Microsoft Windows Mac OS Linux Sources
XP Vista 2000 All versions
Net Applications September 2008 68.67% 18.33% 1.89% 90.29% 8.23% 0.91% [9][10]
W3 Counter September 2008 73.04% 12.30% 2.24% --- 5.62% 1.99% [11]
XiTi Monitor August 2008 71.22% 18.99% 1.56% 93.61% 4.10% 1.16% [12]
OneStat April 2008 78.93% 13.24% 2.82% 95.94% 3.36% 0.42% [13]
Median September 2008 72.13% 15.79% 2.07% 93.61% 4.86% 1.04% ---
I like #3. Wikiolap (talk) 16:03, 11 October 2008 (UTC)
Agreed #3; I updated the page. Jdm64 (talk) 17:51, 11 October 2008 (UTC)

There is another statistics source -- http://www.statowl.com/operating_system_market_share.php, we may consider adding it to the list of stats sources.

Worldwide view? English-biased for sure.

I was following the supplied links and it hit me hard. Of course computer use is not universal in every part of the world, but this is too much. [14] US: 27.45%, other countries aren't representative either. [15] Needs no comment. [16] Would somebody care to find some more surveys? bkil (talk) 18:55, 19 October 2008 (UTC)

Furthermore, XiTi has trivial French bias. OneStat claims to debias to a certain extent based on countries, but details are not given. Their research only includes, however people "that are using one of OneStat.com's services". Do we know who they are, or what language they need to speak? Could anybody give more information on this topic? bkil (talk) 19:09, 19 October 2008 (UTC)
As far as I know this article already uses the best freely available sources. If you know of better ones please let us know.--Harumphy (talk) 21:37, 19 October 2008 (UTC)
Just because there may be no better sources doesnt mean that it is ok to present the available sources as accurate. I would vote for putting a large disclaimer around the Net Applications data. The ISP data that Bkil provided says it all IMO. —Preceding unsigned comment added by 85.225.219.164 (talk) 10:57, 14 December 2008 (UTC)

Thank you for your feedback. It's good faith that drives me too, as I would like to improve the encyclopedic quality of this article. Harumphy, I'm not sure if I'm enough for such a difficult task. It would be best if someone would publish real scientific research for us to link. Maybe we could start with conducting statistics based on representative samples from the major countries (or as many as possible). Are there any interested sociologists, mathematicians or computer scientists in the house...? :) bkil (talk) 16:52, 27 December 2008 (UTC)

Automated methods can rarely provide a representative sample and are not generally reliable anyway, because they are easily decieved by bots, simple user settings and other effects. And another thing that has not yet been mentioned before: if on the one hand user "A" visits 1000 different sites a day, while on the other hand user "B" visits only 100 sites a day, then user "A" could get a tenfold increase in representation when analysed with the conventional automated methods because the IP address logs may not be synchronized. A last one that has gone unnoticed: what if some people simple use the Internet less than others? Any online pageload or presence based detection scheme will fail to handle these cases. These kind of biases must also be explicitly coped with by making further statistics to uncover (part of) the hidder relationships. I have listed only a few grave biases, so I would still strongly recommend doing real statistics like they do it in most any field of interest. Check out: Audience measurement, Reach (advertising), People meter. bkil (talk) 16:52, 27 December 2008 (UTC)

Methodology in general: what is a fair usage share definition?

We would need an article that considers real-world personal operating system usage. Why exclude mobile phones, PDA's, netbooks and the like? Here's another question for you to chew on: what is the purpose of this article, what do we chart? Number of licensed sold, installations made, total hours being ran or interactively used daily? Anyway, I think it would be okay to ask the following in a questionnaire:

Please fill in the table below. (in each row of the table: OS information / average daily usage in hours).

bkil (talk) 17:50, 27 December 2008 (UTC)

How would you count the following examples?

  1. Two OSes as a multi-boot option.
  2. Having loaded two at the same time: one OS being virtualized in another or paravirtualization.
  3. Having loaded one in a virtual machine but letting it sit idle with a background all day.
  4. Having loaded one in a virtual machine but only using that to play some music in the background all day.
  5. Running two at the same time, but one OS is using 20% of the CPU, and the other is using the remainder (e.g.: tiling the screen so that one is playing a lower resolution video and the other is playing a higher resolution video). If percentage does not count, I can be running dozens of them at the same time :)
  6. Allocating one CPU core of the machine to one OS and allocating the other three the another.
  7. ...

What is usage anyway? Is being loaded into (virtualized) memory considered usage, or is interacting with it considered a requirement? (Note that that would exclude music and video playing boxes and the like.) bkil (talk) 17:50, 27 December 2008 (UTC)

The problem is in obtaining the statistics. Even if anyone of your examples was the official way to count usage, still how would such statistics be counted? Unless every computer user submitted statistics on the OS usages of their computers, there's no 100% accurate way to determine the correct percent. --- Jdm64 (talk) 22:09, 27 December 2008 (UTC)
Hey, we do seem to mostly agree, but see the section above. An estimation is by design never intended to be 100% accurate, but you can get away with pretty good results from even small samples if using mathematically correct statistical methods. It's not easy, but some people do make a living out of it: Opinion poll, Audience measurement, Reach (advertising), People meter. bkil (talk) 00:43, 5 January 2009 (UTC)
Currently, there are two ways to count OSes.
  1. Create a site where users voluntarily submit what OS they run and the percentages are extrapolated to the entire general public. The problem with this is that the sample will not be random or unified, henceforth of little accuracy.
  2. Use the aggregated server logs of as many servers as possible. From the logs, again the percentages are extrapolated to the entire general public. The problems with this method are (1) Some computers are not on the Internet. (2) Some computers will never access the site that is used in the generation of the statistics. (3) Many computers are behind a NAT and could be indistinguishable from another computer. (4) Multi-booting and VM OSes. (5) The same computer accessing a site at a later time and being counted as another install. --- Jdm64 (talk) 22:09, 27 December 2008 (UTC)
You could refine method #1 by including a CAPTCHA and asking attendants to fill in some details anonymously that would help in taking a random sample from their votes. The problem with this method is that people don't always tell the truth, especially if taking tests behind screens anonymously (cf. pseudonymity). I don't have numbers on this, so it may still be more reliable than #2. A third choice would be applying paper-based statistics (opinion polls). It's not as fancy as the ones you list, but it has been (ab-)used for many decades now. :) bkil (talk) 00:43, 5 January 2009 (UTC)
The main problem with any form of voluntary site poll is in what type of people that will visit a site like that. The highest majority of people that will visit that site are the people that are most interested in technology. Your average computer user wouldn't really care to visit such a site. Henceforth, the OS that will have the largest percent (or at least over-disproportional) would be Linux. Out of all the OSes, Linux has the most outspoken group, even though the current stats show the market share at about 1.5%. To gather unbiased data it has to be involuntary and widespread across sites of all fields of interests. Jdm64 (talk) 21:43, 5 January 2009 (UTC)
And don't forget the "read-only" mob out there! You're right, but it looks like I didn't make myself clear. I propose a _paper-based_ opinion poll done in-person! By the way, you've got some nice recommended technology over there, I'm a Debian user myself! :) bkil (talk) 00:36, 14 January 2009 (UTC)
I still think my point stands for a paper poll (I was trying to generalise). But also, the sample size and frequency of the web traffic stats would still give more accurate results. The sites that gather the stats do so continuously and with a sample size of anyone that hits the sites. What I'm trying to say is that: Currently, the most accurate way of gathering such usage patterns would be in the use of the net, a very widely used medium. Other forms of polls couldn't reach the sample size and frequency of polling. If accuracy is really important (which I think is as good as we're currently going to get), increase the number of sites that gather the stats and also have better algorithms for extracting the data from the logs (ie. eliminating hits from the same user, etc.,). Jdm64 (talk) 01:43, 14 January 2009 (UTC)
The second method is the most accurate, but still faulty. As far as wikipedia is concerned, it's main goal is to gather information from other sources in encyclopedia form. Henceforth, this page serves as the conglomeration of the statistics gathered by the few sites that track the usage; putting it a concise and easy to read form so the reader can make their own conclusion. --- Jdm64 (talk) 22:09, 27 December 2008 (UTC)
I must agree with the role of Wikipedia in this question. bkil (talk) 00:43, 5 January 2009 (UTC)
To answer your original question. I think the official way to count OSes would be just the number of installations irregardless of their use or installation method. Multi-booting would count one for each OS installed, as would the number of virtualized OSes. For example, in my house there are 9 computers with 5 Windows and 13 Linux. As long as the installed OS works (can be booted) it should be counted. --- Jdm64 (talk) 22:09, 27 December 2008 (UTC)
Thanks, that sounds like a fair and straight-forward definition. But what about other ubiquitous devices like smartphones and netbooks...? bkil (talk) 00:43, 5 January 2009 (UTC)
Netbooks could be counted, no problem, because they usually run OSes like Linux and Windows. Smartphones would be different as the OS they run is usually specialised. The issue is two fold. (1) The user agent string is really the only way to know what browser/OS a device is running, from the server's perspective, without asking the user. Because of this a netbook might return Linux/Firefox which would be indistinguishable from a desktop computer. So, netbooks should be counted. Smartphones usually show something like iPhone/webkit or <something>/Opera Mobile, which is henceforth recognisable as a smartphone. (2) Currently this page is titled "Usage share of desktop operating systems", which in my definition would include: desktops, laptops, netbooks, and any device that would be indistinguishable as one of these devices by the user agent string. Smartphones can usually be identified because they're not running either Linux, MacOS, Windows or other OS usually categorised as a desktop OS (ie. *BSD, Solaris) Jdm64 (talk) 21:43, 5 January 2009 (UTC)
After all that said, aren't we only considering desktop usage because someone found a few pageload data on the 'net about only that? In my opinion, it would make more sense to look at 'the big picture' (TM). Of course we could make some more charts. bkil (talk) 00:36, 14 January 2009 (UTC)

Microsoft Says Linux Bigger Competitor than Apple

Microsoft does not agree with the numbers presented on this Wikipedia page.

http://www.osnews.com/story/21035/Ballmer_Linux_Bigger_Competitor_than_Apple

Might be worth a mention, dunno. :) —Preceding unsigned comment added by 207.114.255.2 (talk) 05:42, 31 March 2009 (UTC)

Pie chart does not have "other"

The numbers listed for the pie chart currently add up to 95.98%.--NapoliRoma (talk) 20:16, 19 June 2009 (UTC)

Yes, that "doesn't add up to 100%" bugs me also. But it is showing exactly what it claims to show -- the "median" numbers from the first table in this article. Can we add a "other OS" column to that first table in this article? Like so:

source date Microsoft Windows Mac OS Linux other OS sources
Vista XP 2000 all versions
Net Applications July 2009 17.90% 72.93% 0.97% 93.04% 4.86% 1.05% 2.62% [17][18]
W3 Counter July 2009 21.95% 61.52% 0.86% 86.56% 6.92% 2.02% N/A [19]
AT Internet Institute February 2009 28.90% 62.18% 1.20% 93.82% 4.59% 1.24% 0.35% [20]
OneStat December 2008 21.16% 72.02% 0.54% 93.72% 3.66% 0.47% N/A [21]
StatCounter July 2009 22.52% 70.03% 0.63% 93.18% 4.21% 0.68% 2.0% [22]
median July 2009 21.95% 70.03% 0.86% 93.18% 4.59% 1.05% 2.0% ---

This table is a copy-and-paste from table the article, with an added "other" column. The above numbers in the "other" column are directly copied-and-pasted from the reference given at the end of the row (except for the "median" row).

Since this "other" column gives data that does, in fact, "appear in the sources cited" and "achieve a minimum share of at least 0.1%", can we go ahead and add it already? --68.0.124.33 (talk) 04:18, 11 August 2009 (UTC)

Ok, I changed the table using the formula: Other = 100 - (windows + mac + linux). Jdm64 (talk) 05:05, 11 August 2009 (UTC)

My compliments on the page

This is a great page. I like the usage of median and the fact that using net stats keeps it simple. It's a useful resource to refer back to. Cheers guys, --Dilaudid (talk) 08:13, 22 July 2009 (UTC)

Change in NetApps Methodology and Removal of Old Numbers. . .

With the announcement of NetApplication's change in methodology, and reevaluation of its old numbers, I revised the NetApps charts with the new reported numbers. However, NetApplications does not provide for free numbers before Q4 2007, so I removed those numbers from this article.— SterlingNorth (talk) 14:02, 5 August 2009 (UTC)

AT Internet Institute new data

There's new data from ATII. But it seems to be for a couple of European countries. Not sure how you guys would like to incoporate this. http://www.atinternet-institute.com/en-us/internet-users-equipment/operating-systems-august-2009/index-1-2-7-176.html —Preceding unsigned comment added by Rasmasyean (talkcontribs) 07:05, 5 October 2009 (UTC)

OneStat Data December 2008

seems too outdated to be useful? --95.117.216.233 (talk) 12:29, 30 October 2009 (UTC)

Windows 7

Some of these sources are reporting Windows 7 data. Is it time to get that added in the chart? —Preceding unsigned comment added by 173.16.124.126 (talk) 19:44, 9 November 2009 (UTC)

Added - thanks ! Wikiolap (talk) 00:22, 23 November 2009 (UTC)

Added a mean

I added a mean. Windows 7 has reportedly more initial sales than Vista did, and by a large factor too, so someone ought to dig up the statistics on that. Hope that helps. Unflavoured (talk) 05:15, 19 November 2009 (UTC)

  • I've removed the mean. I guess you either didn't see or didn't understand the comment saying, "The format of this table has been carefully considered and discussed by editors over a period of time. Please do not change it until a consensus to do so has been reached on the talk page first". --Harumphy (talk) 14:08, 19 November 2009 (UTC)
Right, I did not see it. Any reason why there is a median but no mean? Unflavoured (talk) 06:55, 20 November 2009 (UTC)
With such a small number of sources the median better eliminates outliers, but really the two values are fairly close. That's at least one reason. Jdm64 (talk) 09:55, 20 November 2009 (UTC)
The mean just adds clutter and isn't useful enough to earn its keep.--Harumphy (talk) 00:28, 21 November 2009 (UTC)

Complete rewrite

The more I look at this article, the more unsatisfactory it seems. It isn't really an article at all. It's just a collection of tables of stats cribbed from various web sites. It isn't really about desktop operating systems, but rather web client operating systems. And nowadays these include mobile devices, games consoles etc., as well as traditional desktop and laptop computers. And there's no article about operating system share in other fields - e.g. supercomputers, servers, mobile phones, embedded devices and so on.

I suggest a complete rewrite and a rename: Usage share of operating systems. We could then have sections on the Top 500 supercomputers, servers, desktop/laptops, web clients, mobile devices, embedded etc., and bring in a number of sources other than just the web client stats as at present. --Harumphy (talk) 14:18, 19 November 2009 (UTC)

Hmm, interesting idea. I know for me, I'm only interested in the summary table and pie chart. The other detailed charts are basically useless because I already have the information I want (the most recent data) in the summary table. If I wanted more info then I could go to the actual source. So, I would see this new page "Usage share of operating systems" as containing mainly summary tables of supercomputer, server, desktop and mobile usage. But, also description of accuracy and methodology of data gathering. And maybe some easy to conceptualize graphs. Thoughts..... Jdm64 (talk) 19:01, 19 November 2009 (UTC)
I wholeheartedly agree with the rewrite.
The web browsing stats are not representative of desktop use (as we see from the MS chart and other sources) and should not be reported as such. None of those sources give a good (that is a reliable one) estimate even on web surfing, so having that as base for the article seems very bad.
For the fields where the sources are as bad as for the web stats it is necessary to spend some space on how our summaries are calculated and on the reliability of the sources, but as Jdm64 writes, that is hardly interesting for most readers, so should perhaps be separated somehow.
--LPfi (talk) 10:15, 20 November 2009 (UTC)
If you are updating this I would like to see a chart or a graph about how usage shares are developed over time. --Zache (talk) 12:05, 20 November 2009 (UTC)

There don't seem to be any objections to the rewrite, so I'll make a start by moving the page and pruning the all tables except the summary. For sections, I suggest: Supercomputers, mainframes, servers, desktops and laptops, netbooks, web clients, mobile devices, games consoles, embedded devices.--Harumphy (talk) 10:24, 21 November 2009 (UTC)

Share of desktop systems

I reinserted the quote of Aaron Seigo and tried to estimate the share on Ballmer's diagram. I think it is important to show how wildly the estimates vary. Showing only the 1 % figure, or similar estimates based on the same kind of counting, gives an image that that estimate is somehow reliable, no matter how much is said about possible problems (I added some of them explicitly). --LPfi (talk) 15:00, 23 November 2009 (UTC)

Both Aaron Seigo's and Ballmer's estimates don't look like good sources for encyclopedic article. Ballmer's quote is backed by some internal Microsoft research which isn't made public. Aaron Seigo's is much worse - if you read the source, he actually did not even want to give any number, because he did not have data, only when hard pressed he made up some number. So both these sources (especially Seigo) are not suitable for the encyclopedia and should be removed. Sources from the analysts and research organizations are more credible. Wikiolap (talk) 17:52, 23 November 2009 (UTC)
Aaron Seigo was clearly reluctant to put a figure on it, and there's no evidence that it was anything more than an inspired guess. It should go. The provenance of Ballmer's chart is equally unverifiable, but I think it was sufficiently interesting and unexpected that we should include it. The chart shows Linux and Apple about equal, but doesn't put a figure on either, and we should not assume that if it had it would be the same as our own 5% median.--Harumphy (talk) 19:05, 23 November 2009 (UTC)

Order of sections

I think the order of systems by Harumphy was better than this. Although I understand that desktop systems are interesting for many readers, the order of other systems becomes very strange. I think ordering the sections from supercomputers down to embedded devices is logical and easy to grasp.

If we base the order on what is "most widespread", then sections will wander up and down depending on anybody's thoughts on the matter (aren't cell phones more widespread than desktop computers?), and on the classification: any time a category is split the parts will move down, and similar uses can be found far away from each other.

--LPfi (talk) 15:09, 23 November 2009 (UTC)

It is fine to further refine the ordering of sections, but there is no good reason to start the article from relatively obscure statistics on supercomputers and mainframes. Wikiolap (talk) 17:48, 23 November 2009 (UTC)
I think the new ordering is an improvement on my version.--Harumphy (talk) 19:11, 23 November 2009 (UTC)

Game consoles

Should we include section about operating systems on games consoles ? The market shares of current generation consoles (Wii/Xbox/PS3) are tracked closely by number of sources (including NPD). Wikiolap (talk) 02:07, 24 November 2009 (UTC)

What OS? Each one runs a specific specialized OS that is only found on that device! Where as the other categories make sense because the OS isn't made to only run on one device. There's no page describing the OS found on the XBox360 because the device and its OS are nearly inseparable. This page is about OSs not devises. So, NO Jdm64 (talk) —Preceding undated comment added 06:06, 24 November 2009 (UTC).
I am fine either way, but in the original article rewrite proposal user Harumphy mentioned both game consoles and embedded devices. Just trying to get consensus. Wikiolap (talk) 07:57, 24 November 2009 (UTC)
I threw games consoles into my rewrite proposal without thinking about it much. I know next to nothing about them. If the consoles all come with a dedicated OS, and if therefore it's reasonable to infer the OS share from the console share, then we could just say that and point to a page about games console share if there is one. It might be complicated slightly by (e.g.) people putting Linux on PS3's - however I suspect that the effect of this on OS share would be so small we could overlook it.--Harumphy (talk) 10:34, 24 November 2009 (UTC)

statowl.com

This appears to be another source of stats: [23]. What does anyone make of it? The corporate/residential splits are interesting.--Harumphy (talk) 16:52, 24 November 2009 (UTC)

Looks like interesting and credible site. We should add its stats. Wikiolap (talk) 17:18, 24 November 2009 (UTC)
OK. I suggest that when the Nov stats come out we drop onestat and put statowl in its place. --Harumphy (talk) 19:38, 27 November 2009 (UTC)

NOK. Statowls data for the MAC (10.97%) deviates by a factor of 2.15 from the median of the other statistics providers (5.12%). Statowl's data for Linux (0.40%) deviates by a factor of 2.5 from the median of the other statistics providers (1.00%) and by a factor of 3.88 from Wikimedias own data (1.55%). On Statowl's shiny website I can find no reference to the scope of the data nor how they acquired the data. Please remove. --95.117.217.56 (talk) 13:47, 18 December 2009 (UTC)
One month later: In addition to the deviations mentioned above Statowls' data for "other" (0.12%) deviates by a factor of more than 10 from the median of the other statistics providers (1.24%). While of course a study may show significantly different numbers it can be bothered to give some reasoning. And while it shows by far the highest numbers for MAC (11.26%) it does not even care to list the iPhone (so iPhone would either be included in "other" (and thus below 0.12%) or be (wrongly) attributed to MAC). In short, I do not see how Statowl would meet WP:RS criteria. Please remove Statowl from the statistics table. --95.117.212.91 (talk) 14:06, 18 January 2010 (UTC)
AFAICS StatOwl looks different because it's looking only at US residential use. I don't think that makes it an unreliable source - just a source that's measuring a different thing. Most of the sources have limited scope and quirks in their methodology. However, these six are the best we've got. The article doesn't endorse them - indeed it draws attention to the inherent flaws with user agent stats. It just summarises what the sources say, draws a line through the middle of the figures (the median) and, apart from a few caveats, leaves it to the reader to decide what if anything to make of them. It will be obvious to any reader that StatOwl differs from the others, and that is sufficient IMHO. --Harumphy (talk) 19:13, 18 January 2010 (UTC)

web clients summary table

Looking at this further, I suggest we make a number of changes as follows:

  • Put statowl in place of onestat;
  • Change AT Internet Institute stats from France only to average of France/Germany/Spain/UK;
  • Add Wikimedia stats [24];
  • Add a column for iPhone, between Mac OS and Linux;
  • Sort rows into alphabetical order;
  • Various minor cosmetic tweaks.
Source Date Microsoft Windows Apple Linux Other[1] Sources
7 Vista XP 2000 All versions Mac iPhone
AT Internet Institute Aug. 2009 0.80% 29.13% 59.83% 92.80% 4.93% 0.75% 1.00% 0.75% [25]
Net Applications Oct. 2009 2.15% 18.83% 70.48% 0.78% 92.52% 5.27% 0.37% 0.96% [26][27]
StatCounter Oct. 2009 2.21% 23.60% 67.55% 0.57% 93.36% 4.71% 0.68% [28]
StatOwl Oct. 2009 88.95% 10.52% 0.43% 0.10% [29]
W3 Counter Oct. 2009 2.76% 22.23% 59.10% 0.60% 85.68% 7.38% 0.44% 2.14% 4.36% [30]
Wikimedia Sep. 2009 1.41% 26.00%[2] 59.80%[2] 0.99% 88.68% 6.45% 0.92% 1.51% 2.44% [31]
Median Oct. 2009 2.18% 23.60% 59.83% 0.69% 90.74% 5.86% 0.60% 0.98% ---

Having done this, I'm not so sure about statowl. Its figures are hard to believe. It also appears from the Network Traffic/ISP share to be very US-centric.--Harumphy (talk) 12:44, 28 November 2009 (UTC)

My comments:
* Computing average of France/Germany/Spain/UK for AT Internet Institute will be original research IMO, because their weights are not the same. And it will still not represent the whole Europe. So either we stick to the published figure without deriving new numbers, or we drop this source completely.
* I am against putting Wikimedia stats into the table. They do 1:1000 sampling.
Everything else is fine, and I think statowl deserves its place. And it does have detailed stats too, so after these modifications the table would look the following:
Source Date Microsoft Windows Apple Linux Other[3] Sources
7 Vista XP 2000 All versions Mac iPhone
Net Applications Oct. 2009 2.15% 18.83% 70.48% 0.78% 92.52% 5.27% 0.37% 0.96% [32]
StatCounter Oct. 2009 2.21% 23.60% 67.55% 0.57% 93.36% 4.71% 0.68% [33]
StatOwl Oct. 2009 1.20% 27.43% 58.57% 0.80% 88.95% 10.52% 0.43% 0.10% [34]
W3 Counter Oct. 2009 2.76% 22.23% 59.10% 0.60% 85.68% 7.38% 0.44% 2.14% 4.36% [35]
Median Oct. 2009 ? ? ? ? ? ? ? ? ---
and BTW, I also think it is time to drop Windows 2000 from the table...
Wikiolap (talk)

Ok, I'll summarize what I like and what I don't.

  • Against putting iPhone in the summary for (1) It's listed in the mobile section and shouldn't be listed twice -- at lest in this way (2) If we add iphone what about every other os that can connect to a website (SymbianOS, Blackberry, android, etc.,). The table should include only non-mobile OSs. (3) Not all our source list iphone in the same category of statistics so we shouldn't either.
  • For Removing windows2k, but only once the November stats are out.
  • For Removing OneStat if they don't come out with new information by the December stats. I'd say that as long as the source is less than one year old it's ok.
  • For Adding Wikimedia stats to the table even though they state it's a 1:1000 sample. All the sources are not 100% accurate to the real value -- it's all some form of statistical sampling. I know that NetApp messes with their stats to take into consideration relative country sizes. And the values from wikimedia are very close to the other sources so apparently their sampling is returning values just like our other sources. But either way it wouldn't really effect our median.
  • Either Keeping or removing AT Internet Institute, or averaging their stats. But if we remove it we should replace it with another source.
  • For Adding StatOwl, especially if we're removing all these other sources.
  • Either sorting order I don't care. Jdm64 (talk) 21:19, 28 November 2009 (UTC)

My response to Wikiolap's and Jdm64's comments:

  • The taboo against original research exists where it makes it impossible for other editors to verify. That is not the case with the proposed 4-country average, because anyone can check our figures against the data in the cited source. I don't understand how picking figures for one country of the four is better. Our method could be explained by a footnote, which I've added to the draft table. Also, the Euro-centricity of AT as a source balances the (alleged) US-centricity (or English-language-centricity) of some of the others.
  • Wikimedia's 1:1000 sampling reduces 4 billion hits to 4 million, which is ample to reduce sampling error to a very small order indeed. It's a huge sample with a good geographical and linguistic spread. ISTM to be a really good source.
  • OK, include StatOwl.
  • Keep OneStat and review it on 9 Dec. (They last published on 8 Dec 2008.) If not updated by then, delete it.
  • This table is now about web clients, regardless of whether they are fixed or mobile. Many OSes, not just iPhone, are mentioned in other sections. The criterion for inclusion should be share. I suggest we include an OS if it is showing 0.5% or more on at least two recent sources. That will keep the number of columns about where it is now. On this measure both W2K and iPhone merit inclusion.

So now the table becomes:

Source Date Microsoft Windows Apple Linux Other[4] Sources
7 Vista XP 2000 All versions Mac iPhone
AT Internet Institute Aug. 2009 0.80% 29.13% 59.83% 92.80% 4.93% 0.75% 1.00% 0.75% [36][5]
Net Applications Oct. 2009 2.15% 18.83% 70.48% 0.78% 92.52% 5.27% 0.37% 0.96% [37][38]
OneStat Dec. 2008 21.16% 72.02% 0.54% 93.72% 3.66% 0.47% 2.15% [39]
StatCounter Oct. 2009 2.21% 23.60% 67.55% 0.57% 93.36% 4.71% 0.68% [40]
StatOwl Oct. 2009 1.20% 27.43% 58.57% 0.80% 88.95% 10.52% 0.43% 0.10% [41]
W3 Counter Oct. 2009 2.76% 22.23% 59.10% 0.60% 85.68% 7.38% 0.44% 2.14% 4.36% [42]
Wikimedia Sep. 2009 1.41% 26.00%[2] 59.80%[2] 0.99% 88.68% 6.45% 0.92% 1.51% 2.44% [43]
Median Oct. 2009 2.18% 23.60% 59.83% 0.76% 92.52% 5.27% 0.60% 0.96% ---

Hopefully this table reflects consensus, or at least approaches it.--Harumphy (talk) 11:18, 29 November 2009 (UTC)

I agree with all your points, except about iphone and win2k, I think your 0.5% is arbitrary. I think the list of OSs that are include should meet both of the following rules:
  1. 50% + 1 of the sources should include the os in their statistics. Think of it in relation to "original research", if most of our sources don't include an os, why should we? (We currently have 7 sources so at least 4 must list the os, iphone currently fails)
  2. The list should only include OSs that are recent and|or frequently used. Now recent/frequent are arbitrary. So I would clarify them. Recent is if a valid licensed copy or support contract can be procured from the original developers or licensed distributors. Frequently uses would be the same as rule #1. (This might remove win2k)
Also think of the pie chart, do we really need/want 8 slices? With over a majority of them under 5%? Jdm64 (talk) 23:04, 29 November 2009 (UTC)
Four of the seven sources do mention iPhone, and extended W2K Pro support is available until July 2010, so both pass by your yardsticks! On that basis I'll stick the chart into the article. I'd be happy to lose the pie chart. It isn't very useful. It gets out of date every time the table changes, and unless someone keeps it up to date it should go.--Harumphy (talk) 00:11, 30 November 2009 (UTC)
Opps, I think I was mixing the two tables when I wrote that (and I said "might" for win2k, depending on your definition of "support" as general support ended in 2005, but full support in 2010). Ok, I'm fine then. I guess we're going with my formal rules for what OSs are included? I still think the chart is good to visually see what the numbers equate to. What frequency should the chart be updated? I created it, and update it monthly. Jdm64 (talk) 01:07, 30 November 2009 (UTC)
Pie chart is definitely useful, as long as it is reasonably up to date - doesn't even have to be the very latest month.
For Win2K/iPhone criteria - I think the best one is how many of our sources decided to report it. If majority have iPhone - we will report it too. If majority drops Win2K - we will drop it too.
I am still very much against deriving average for AT Institute. It is not the number that they chose to report, and applying average assumes same number of usage in all 4 countries, which is both wrong and OR. Wikiolap (talk) 02:23, 30 November 2009 (UTC)
I'm with Harumphy that the average is not original research. "Carefully summarizing or rephrasing source material without changing its meaning is not synthesis—it is good editing."[44] The average is a summary mechanism, just like a summary of a source, but in this case the source has numbers not words. Also in another section, it states that it's allowed to do calculations[45]. As I see it, we're still following wikipedia policy. Jdm64 (talk) 02:49, 30 November 2009 (UTC)
Assuming it's the same in all four countries is an approximation, but it's a better approximation than arbitrarily picking France and ignoring the other three. We could weight the figures by population or something, but I doubt it would make much difference.--Harumphy (talk) 08:48, 30 November 2009 (UTC)
Wikiolap: I see you've added a footnote about Wikimedia's 1:1000 sampling. Why do you think this matters? From Sample_size#Estimating_proportions we can see that Wikimedia's sample size of over 4,000,000 means there's a 95% probability that the sampling error is less than 0.05%.--Harumphy (talk) 09:17, 30 November 2009 (UTC)
The 1:1000 sampling is a fact, and we are letting readers know about it. They can draw their conclusions whether or not it matters to them. We can certainly add info about the initial size and about sample size too. (BTW, I don't think there is enough information to estimate sampling error, since we don't even know what sampling method was used). Wikiolap (talk) 16:13, 30 November 2009 (UTC)
The sampling method is stated on the cited page. It comes from Squid log entries. They're taking 1 in every 1000 of those log entries. From that working out the error is straightforward. It's basic statistics. Maybe you never did a statistics course and don't understand how it works. I did. It isn't enough for something to be a fact. It has to be a relevant fact. If it isn't, it's a waste of space. So, I'll ask the question again. Why do you think it matters?--Harumphy (talk) 16:52, 30 November 2009 (UTC)
To answer your direct question: I think it is important to inform the reader that the numbers for Wikimedia are derived from the sample of the data, and not the full data. It is full disclosure. Wikimedia felt compelled to state this at the very top of their page, so the very least we can do is to add this note as well. Hopefuly this convinces you that this is indeed a relevant fact. Wikiolap (talk) 19:18, 30 November 2009 (UTC)
Not really. Why go into such an unimportant detail about Wikimedia, which uses a very rational and scientific sampling policy to produce a quantifiable small error, when the sampling policies of some of the of the sources we cite are much more dubious? For example, StatOwl's ISP share shows that its sampled hits come almost entirely from US residential accounts. W3 Counter uses only the last 15,000 hits of the month for each site it samples - thus biasing the sample to the end of the month and against larger sites. Several of the sources are from companies that only seem to cater for English-speaking webmasters. None of them defines a 'hit' precisely. There are countless sources of error and I think you are drawing undue attention to one of the smallest sources of error of the lot.--Harumphy (talk) 23:09, 30 November 2009 (UTC)
This is not an unimportant detail - Wikimedia itself considered it important enough to highlight it at the statistics page. If there are other important biases (like the one you cite about W3 Counter) - they should be noted as well, so the readers will know how to read the numbers. Wikiolap (talk) 03:33, 1 December 2009 (UTC)

Honestly, the user doesn't care. And the ones that do, read the original source. Statistically, all the sampling methods used by the sources are valid, even wikimedia. Henceforth, it's not really necessary to add this information. It's redundant; statistics is sampling! Moreover, the other source all use other statistical "tricks" in their calculations. At least wikimedia is honest in stating theirs. Many of the others... who knows what they do! What we should add is a disclaimer for all the other sources as not being completely open in their statistical annalist -- if we do add any disclaimer. But with all these truths being said, I think we should add it for one reason and one reason only -- to stop Wikilap from whining. If it'll stop this useless discussion then fine, let Wikilap have his disclaimer. Can we just move on? Jdm64 (talk) 08:31, 1 December 2009 (UTC)

Sounds good to me - thanks. Wikiolap (talk) 16:03, 1 December 2009 (UTC)

Further tweaks to web client table

1. Statcounter has separate tables for fixed and mobile OS stats, and an OS v. mobile OS table which shows the fixed and mobile share of web clients at 98.72% and 1.28% respectively. I've weighted the stats we use accordingly. It was wrong before. Also, as we don't (yet) have a separate column for Android, I bundled that in with Linux.

2. Looking at Android further, it appears that the way we report it is inconsistent. I suggest we just include Android with Linux for those sources that list it and add a note to say what we've done. I'd prefer not to list Android separately because we would then logically have to list all the other mobile OSs with a tiny albeit bigger share!

3. We could improved the averaged AT Internet Institute stats by weighting each country's share of the mean in proportion to its number of internet users, e.g. from here: [46].--Harumphy (talk) 16:23, 5 January 2010 (UTC)

#1 and #2 look good to me. For #3 I say we should not derive new numbers, but simply reproduced whatever numbers they reported, or just drop the source altogether. Wikiolap (talk) 17:28, 5 January 2010 (UTC)
It seems that Android has now made the top ten reported by W3 Counter. It's now reported by 4 of the 6 sources. According to some trade sites it's expected to become the second most popular mobile web client OS imminently. Contrary to what I said in #2 above I think it deserves its own column. No other mobile OS is reported by 4 sources, so the precedent I feared setting doesn't apply. The table becomes:
Source Date Microsoft Windows Apple Linux Other[6] Sources
7 Vista XP 2000 All versions Mac iPhone other Android
AT Internet Institute Aug. 2009 0.80% 29.13% 59.83% 92.80% 4.93% 0.75% 1.00% 0.52% [47][7]
Net Applications Dec. 2009 5.71% 17.87% 67.77% 0.62% 92.21% 5.11% 0.44% 1.02% 0.05% 1.17% [48][49]
StatCounter Dec. 2009 6.13% 21.74% 64.29% 0.39% 92.97% 4.66% 0.41% 0.67% 0.05% 1.24% [50]
StatOwl Nov. 2009 3.53% 30.37% 63.89% 0.78% 88.52% 10.97% 0.40% 0.11% [51]
W3 Counter Dec. 2009 6.80% 21.29% 56.61% 0.50% 86.43% 7.44% 0.64% 2.14% 0.08% 3.37% [52][8]
Wikimedia Nov. 2009 3.54% 25.90%[2] 57.45%[2] 0.86% 88.18% 6.75% 1.00% 1.50% 0.05% 2.52% [53][9]
Median Dec. 2009 4.63% 23.96% 61.86% 0.62% 90.37% 5.93% 0.64% 1.01% 0.05% --- ---

--Harumphy (talk) 16:17, 6 January 2010 (UTC)

Your classification of "normal" linux as "other" makes it really confusing. And do we really need to show Android? It is linux, not a separate version like the progression of windows. I still think it should be removed and placed into only mobile. Similar with iphone, it's also Mac OS X, just a highly modified version. Jdm64 (talk) —Preceding undated comment added 20:14, 6 January 2010 (UTC).
Fair enough. Let's stick to #2 above.--Harumphy (talk) 22:36, 6 January 2010 (UTC)
Where are you getting the numbers for StatCounter? I only see Linux and not android. There is android in "mobile os", but you cant mix those two statistics, because the source didn't mix it (unless you're reading from somewhere else). Also, NetApp has android, but JavaME, Symbian, and WinMobile all have larger share! Then there's wikimedia's stats, it makes a distinct classification of android as mobile. And also, other mobile OSs have higher share (BlackBerry, Symbian, iPhone, winmobile, JavaME). But still, like my previous point, Android is linux, just a different distro. Should we list the popular distros of linux? Ubuntu is about 50% of the 1% that linux has. That's larger than android. Jdm64 (talk) 01:50, 7 January 2010 (UTC)
The relevant StatCounter info is spread over three tables: (1) fixed OS, (2) mobile OS, and (3) fixed OS v. mobile OS. To get the share of all web clients, fixed and mobile, which is what our summary table is about, it's necessary to multiply the fixed OS table figures by 0.9872 and the mobile OS table figures by 0.0128 (these are the shares from the fixed OS v. mobile OS table). I think it's perfectly OK to combine data from different tables in this way: it's easily verifiable by other editors from the cited sources and thus isn't OR. The reason for adding a column for Android is that a majority of the sources now list Android. (This was the threshold for inclusion which IIRR you suggested in November and which IMHO makes sense.) I agree that Ubuntu is a more deserving case but unfortunately it's identified by only two of the sources. It's important to remember what the summary table is summarising: web client stats, fixed and mobile, reported by the six cited sources.--Harumphy (talk) 11:34, 7 January 2010 (UTC)
Ok, I see how you did that, seams reasonable for that source. But, I still think that there's overlap. Android reports itself as linux in the user agent string. So, for sources that don't list android, they're still counting it technically -- just under linux. So really, the .05% of android is almost nearly a subset of the ~1% of linux. But when looking at the table, this would not be readily apparent. So, android could be listed but it needs to be more apparent that it's at least partially a subset -- especially for sources that don't explicitly list it. This is also why listing regular linux as the "other" makes no sense -- android is the other. Jdm64 (talk) 19:16, 7 January 2010 (UTC)
Agreed. It's a pity the sources are dumb enough to separate Android rather than Ubuntu. In the circumstances I suppose the best we can do is to include Android in the single Linux column, adding back the Android figure where the source has separated it out.--Harumphy (talk) 09:44, 8 January 2010 (UTC)
Ok, I guess that should work. But a footnote definably needs to be added stating why this was done. It looks like the only sources that would need android added back into linux are: NetApp and W3. The others seem to know that android is linux. Jdm64 (talk) 04:02, 9 January 2010 (UTC)

Reintroduce w3Schools

I think that w3Schools stats should be added. Some felt that it should be removed because it wasn't accurate. But, I think what's more likely is that the scope is different. It's more U.S. centric stats as apposed to global. But in the current table we have a known European source (AT Inst). So, I say that we should add w3Schools for the following reasons:

  1. W3Schools isn't inaccurate, just more U.S. centered, and we have a European source. Either remove AT Inst or add w3Schools to be balanced.
  2. The summary table is just a summary of sources out there. Having more sources gives the reader more information to make an informed conclusion of their own. (Jdm64 (talk) 19:32, 19 January 2010 (UTC))
The six sources we use now measure a wide sample of web sites. W3Schools shows stats for just its own site. Loads of sites do that. Why pick this one? The European bias of AT Inst is currently balanced by StatOwl's US bias.--Harumphy (talk) 19:56, 19 January 2010 (UTC)
Ok, fair enough. I guess StatOwl is US bias. This was mainly a "preemptive" statement for anybody thinking of adding w3Schools (there was a recent movement to have it added to the browser page (not me)). Although, too bad we can't find more good sources. Jdm64 (talk) 20:42, 19 January 2010 (UTC)

Tablets

Tablets need to be moved to their own section. They have nothing in common with Netbooks, though they may replace them in some sales. Netbooks should be combined with Notebooks, the difference between them is artificial.

UrbanTerrorist (talk) 07:55, 27 January 2011 (UTC)

Yes. A netbook is just a small laptop/notebook. I've changed the sub-heading "Netbooks and Tablets" to just "Netbooks". There's very little info on tablets as a category so far. The main one is the iPad which runs iOS and this gets covered anyway. Do all tablets speak mobile, or are some them WiFi only?--Harumphy (talk) 09:20, 27 January 2011 (UTC)
Sorry Harumphy, I've been busy. Please see the note at the bottom under Long Term Suggestions. UrbanTerrorist (talk) 03:35, 11 August 2011 (UTC)
  1. ^ Where not provided by source, other is calculated by subtracting Windows, Apple and Linux shares from 100%.
  2. ^ a b c d e f Wikimedia stat for Vista includes Server 2008; XP includes Server 2003.
  3. ^ Where not provided by source, other is calculated by subtracting Windows, Apple and Linux shares from 100%.
  4. ^ Where not provided by source, other is calculated by subtracting Windows, Apple and Linux shares from 100%.
  5. ^ Average of AT Internet Institute figures for France, Germany, Spain and UK.
  6. ^ Always calculated by subtracting the sum of Windows, Apple and Linux shares from 100%.
  7. ^ Mean value of AT Internet Institute's figures for France, Germany, Spain and UK.
  8. ^ W3Counter report is based on the last 15,000 page views to each of 30,961 website tracked
  9. ^ Wikimedia uses 1:1000 sampling of its logs when deriving the usage numbers