Wikipedia:Reference desk/Archives/Computing/2015 September 21

= September 21 =

Screen Reader shares…
Hello,

Some features in ʜᴛᴍʟ like ᴀʀɪᴀ are not directed to be supported by web‑browsers but by screen readers for blind people.

When I search if a feature is supported by screen readers, I see various lists. However this doesn’t help because I have no idea about the traffic coverage, even for the English language.

Would it be possible to find such shares ? 2A02:8420:508D:CC00:56E6:FCFF:FEDB:2BBA (talk) 00:05, 21 September 2015 (UTC)
 * When it comes to Internet applications for the blind, you will likely not find anyone who knows more than Bryan Smart. I'd ask him directly. 209.149.113.66 (talk) 11:43, 21 September 2015 (UTC)
 * I know the stats for my language. But I need them for the English language in order to prove to a site a situation I discovered is harmful. 2A02:8420:508D:CC00:56E6:FCFF:FEDB:2BBA (talk) 18:19, 21 September 2015 (UTC)
 * Are you asking what percentage of viewers of a certain page are likely to be using screen readers? If so, this page has some links and discussion . You can also just look at distribution of blind people. In the USA, I think it's safe to assume that most blind adults use the web, but that might be less true in other places. There also might be some useful info here at WebAIM SemanticMantis (talk) 18:58, 21 September 2015 (UTC)
 * Not exactly, I found an English only site which don’t filter protocols on the cite="" attribute but is doing it for other ones like href="" data="" src="". They refuse to change the current behavior by claiming the percentage of blind users that would be affected by the javascript: protocol on the cite="" attribute is probably very small.
 * I found various screen readers vulnerable to that case, but I couldn’t found if there’re used by a lot of users. (and to be honest, there’s is a bounty to win if I prove such a situation is harmful for blind users)
 * That’s why I need screen reader shares, the same way browser shares exists. 2A02:8420:508D:CC00:56E6:FCFF:FEDB:2BBA (talk) 23:12, 21 September 2015 (UTC)
 * I'm not visually impaired, or that familiar with accessibility issues, but my impression is that JAWS (screen reader) is the most popular screen reader. This is supported by our article, with a source [//webaim.org/projects/screenreadersurvey6], although their percentage is now well below 50%. It seems from that source that JAWS, ZoomText, Window-Eyes, NonVisual Desktop Access, VoiceOver are what would need to be considered, at a minimum. This doesn't tell you much about versions of these software of course. Also the source mentions some further complications, like the fact some of the software also functions as a magnifier and that may be what some respondents were using and not the screen reader portion. (Also it wasn't just looking at people affected by blindness, although most did have some soft of visual impairment. Then again, I don't quite understand why the website you're referring to would only care about people using screen readers who are blind.) The source, a survey by WebAIM, doesn't seem to be exclusively referring to English usage statistics, but considering their are based in the US with an English website, and the survey was I think in English, and a big percentage of their respondents came from the US, follow by Europe (including UK), then Australia and Oceania, it seems likely English users were a big percentage of users surveyed. Although interestingly, 25.6% did use more than on language.  Being a survey, and from the sound of it not one that even attempts to be random, it does run the risks such surveys normally do, still it's probably better than nothing. Also coming up with a good alternative is probably difficult, since screen readers are generally not exposed in the user agent or elsewhere from the sound of it .  However are you sure the website in question doesn't have their own data? If it's a competently run major website, it may have. For various reasons, website audience will often vary, and while it's possible the website design itself is one of the causes, I think in the end many website designers will care much more about their actual audience than the general audience. (Browser shares, which are normally derived from user agents are I think often handled that way.)  If they do have their own data, and it suggests for example that VoiceOver is not particularly significant to them, considering the somewhat low usage share in the WebAIM survey, it unfortunately may be difficult to convince them it's something to worry about.  I'll also ask User:Graham87 who uses a screen reader and is often a helpful source of information for accessibility related issues such as this, if they can offer any more help.  Nil Einne (talk) 14:38, 22 September 2015 (UTC)
 * I can't add very much to what Nil Einne and others said ... I was going to mention WebAIM, but you guys beat me to it! Regarding Windows, I get the impression that NVDA is more popular among more technically inclined users and users from non-English-speaking countries ... but its popularity is increasing across the board as more features are added to it, simply because of its status as free software compared to JAWS which costs about 1,100 US dollars. Graham 87 14:51, 22 September 2015 (UTC)
 * Ok, looks like most of them use Internet Explorer. But for blind people who use other web browsers, what are the shares since those firsts ones here require Trident ?
 * I’m Ok to reveal the site name and more details. However, as this is an unpatched flow, this deserve private discussion. My guess is they added that attribute considering web browsers don’t use it, thus forgetting about visually impaired.
 * I also wonder on how to load the cite attribute with JAWS (screen reader), but I couldn’t find it. 2A02:8420:508D:CC00:56E6:FCFF:FEDB:2BBA (talk) 19:35, 22 September 2015 (UTC)
 * I'm not sure what I can say beyond the survey results, except they corroborate with my experience in this area. I had never heard of the protocol attribute before reading this thread. Graham 87 02:01, 23 September 2015 (UTC)
 * No it’s the cite attribute which is often used with the blockquote ʜᴛᴍʟ element. 2A02:8420:508D:CC00:56E6:FCFF:FEDB:2BBA (talk) 09:30, 23 September 2015 (UTC)
 * Aha, I understand you now. I'd never heard of that one either, but I can imagine how it could be problematic. Graham 87 13:56, 24 September 2015 (UTC)
 * 87 Yes, and I told them in the same e‑mail the longdesc attribute is vulnerable to that threat (I told them both attributes concerns screen reader), but it seems they only care about longdesc because web browser (meaning impacting non disabled users) are supporting it.
 * I currently keep insisting. I’ll reply here to tell if the fact that visually impaired should to avoid GitHub need to be advertised. But keeping that fact relatively private is necessary to stay eligible to the bounty. 2A02:8420:508D:CC00:56E6:FCFF:FEDB:2BBA (talk) 17:24, 25 September 2015 (UTC)

How could VW and Audi software cheat the US regulations?
According to "Diesel cars from Volkswagen and Audi cheated on clean air rules by including software that made the cars' emissions look cleaner than they actually were."

How can the VW and Audi software detect the car is being tested? As a first thought, I imagined that they analyzed the emissions coming out of the exhaust pipe, and do not need to connect to the car board computer or look at the dashboard. At least, that's how I expect a technical inspection to work. Relying on the cars' own sensors or software is tricky, even without fraud. You would not discover software or sensors malfunctioning this way. --Scicurious (talk) 15:38, 21 September 2015 (UTC)
 * A comment by 1995droptop claims "EPA testing rules do not allow for anything to be plugged into the OBD port during testing." Another comment by BahamaTodd says "The car was able to detect humidity, steering angle, vehicle speed, and duration of operation, to determine that it was undergoing an EPA emissions test." (Also mentioned is how the EPA generally uses fairly well controlled enviroments, I presume for consistency and fairness.) A search based on that finds other sources of varying quality like    which say something similar, and also go in to detail on what they did in the testing mode and why they may have chosen to change the settings in normal driving mode. One of those links to the EPA which calls it a "defeat device"  which doesn't go in to details on precisely how it's believed to have detected testing, nor does the Notice of Violation . However, although not the best sources, I'm inclined to trust that the earlier ones probably right in how it worked. Nil Einne (talk) 18:18, 21 September 2015 (UTC)


 * (Full disclosure: I designed some of the equipment used to conduct the EPA tests in question.) Emission tests are normally done on a dynomometer using the FTP-75 test schedule.


 * All VW had to do was to program the car to go into EPA test mode when the computer saw that the vehicle was going through the FTP-75 driving cycle combined with none of the normal steering corrections that happen as a human stays in his lane on the highway.


 * It's a well-known problem in the industry, and we have detection measures which I am not at liberty to reveal. The interesting question is how VW managed to not trip those detection measures (which no doubt have been updated several times since I was working with them). --Guy Macon (talk) 04:37, 22 September 2015 (UTC)


 * From our article Volkswagen emissions violations including its talk page Talk:Volkswagen emissions violations, and other sources like and stuff I've heard on the news, it sounds like the cheating was uncovered because the International Council on Clean Transportation funded real world testing and found strong discrepancies between the lab tests and real world tests for some manufacturers including Volkswagen. Although as Guy Macon said the issue of possible cheating was I think fairly well known, and possibly observed by others conducted real world testing (like Emissions Analytics), the study caused enough concern for the EPA and possibly other US agencies (and I think EU ones too) to further investigate including eventual threats to withhold certification which lead to eventual admission; if some of what I've read is right, the ICCT wasn't actually so much trying to prove manufacturers were cheating on the US standards, but because they believed they were significantly better than the EU ones and wanted to show this but then uncovered the discrepancy. The greater difficulty of fair real world tests is I presume one reason why they weren't implemented yet, but it seems the EU and possibly South Korea and China was already strongly considering making them compulsory   before the cheating had been confirmed (although possibly after the ICCT results). Nil Einne (talk) 07:31, 23 September 2015 (UTC)

How are Arabic glyphs actually rendered ?
Let's consider, for example, two letters: Arabic Jeem and the Pashto letter Dze. In Unicode the first has one its principal codepoint 062C, as well as four additional codepoints for its shapes (isolated, final, initial, medial) FE9D, FE9E, FE9F, FEA0. When you type the word jīm itself (جيم) the first glyph is supposedly not 062C but FE9F (the initial shape) this is why it has such a shape (ﺟ) and not (ﺝ).

The Pashto letter Dze (or Dzeem) in turn has only one Unicode codepoint 0681 and no codepoints for the shapes. But nevertheless the word dzīm is ځيم not ځ يم (I added a space between the letters to illustrate). How does it happen?

Bellow is the table. The first row is Jeem. It has four independent glyphs for its shapes. The second row is Dzeem. Its shapes are got with tatwil.

--Lüboslóv Yęzýkin (talk) 18:50, 21 September 2015 (UTC)
 * You’ll be interested in computer font kerning. It also allows to adjusts letters so they override. 2A02:8420:508D:CC00:56E6:FCFF:FEDB:2BBA (talk) 23:31, 21 September 2015 (UTC)


 * In Serbian, the italic shapes of certain letters differ from those in Russian. This is not encoded, the info has to come from somewhere else (such as the document's metadata - for example in wiki markup, the &#123;&#123;lang&#125;&#125; tag.) Similarly, in Greek, an algorithm to automatically put text into lowercase knows which form of the sigma to substitute (σ or ς) even though all it has is Σ (in other words, the relationship is one-to-many (non-bijective) and the algorithm has to guess based on context (such as whether the next character is a letter or punctuation.) The point is, Unicode text rendering engines are that smart Asmrulz (talk) 08:48, 22 September 2015 (UTC)
 * I appreciate your effort, but it does not explain the question. Two Greek sigmas has two different Unicode points, so it is not too difficult to create an algorithm. While a great deal of additional Arabic letter do not. You cannot simply say to a program "if a condition is such, then get this one code point/glyph and substitute with another one", as there is no "another one". I wanted to know how it miraculously happened that the letter Dzeem (remember it's just one of several dozens such letters) lost its tail in the middle and beginning and grew a little connector (conjunction?) at the end.--Lüboslóv Yęzýkin (talk) 15:03, 22 September 2015 (UTC)
 * Kerning table in the font file ? 2A02:8420:508D:CC00:56E6:FCFF:FEDB:2BBA (talk) 19:53, 22 September 2015 (UTC)


 * Kerning just adjusts the glyph's position. Ligatures are replacement glyhps. I think the font will have ligature rules embedded into it to show replacement glyphs in certain situations.
 * The Unicode website has a FAQ page about Ligatures, Digraphs, Presentation Forms vs. Plain Text. This page has many answers along the lines of "The existing ligatures exist basically for compatibility and round-tripping with non-Unicode character sets. Their use is discouraged. No more will be encoded in any circumstances." In particular, it looks like the term "presentation form" is a more precise term for Arabic. Here are some excerpts from that section:
 * "Presentation forms are ligatures or glyph variants that are normally not encoded but are forms that show up during presentation of text, normally selected automatically by the layout software. A typical example are the positional forms for Arabic letters. These don't need to be encoded, because the layout software determines the correct form from context. For historical reasons, a substantial number of presentation forms was encoded in Unicode as compatibility characters, because legacy software or data included them."
 * "Can one use the presentation forms in a data file? A: It is not recommended because it does not guarantee data integrity and interoperability. In the particular case of Arabic, data files should include only the characters in the Arabic block, U+0600 to U+06FF."
 * --Bavi H (talk) 03:37, 23 September 2015 (UTC)


 * Here is a page that has detailed information: Understanding characters, keystrokes, codepoints and glyphs. In particular, section 3.3 From codepoints to glyphs seems relevant. This section describes a hypothetical English "handwriting" font, that could display letters in different ways depending on the context.
 * "As I work through all the details, I might actually decide that the only way to really get the word "picture" to display the way I want is to create it as a single, complex glyph for the entire word. (This may seem unusual, but such things are possible in fonts, and some fonts have even done things like this.) So, I have a single glyph for "picture", but this is stored in data as a sequence of seven characters. What I need, then, is for some process to intervene between the stored data and the glyphs that will recognise this particular sequence of characters and select the single, combined glyph, rather than seven separate glyphs. This is precisely the kind of processing that happens in modern systems that are designed to support complex scripts."
 * --Bavi H (talk) 04:00, 23 September 2015 (UTC)

After some research I found the more specific answers. Arabic letters are represented with the help of two principal things:
 * The Glyph Substitution table (GSUB) in the font itself . It can be seen (and edited) in programs like FontForge. If the table is absent in the font (and I have such old fonts, without the GSUB but with the representation forms-B block), the representation forms will not appear.
 * Uniscribe or similar services. And I was somewhat wrong: the presentation form blocks A and B were introduced into Unicode only for compatibility, their usage are strongly discouraged now, this is probably why many fonts lack the blocks altogether. The scenario I described above (the code point substitution) was used only in some older software (though I do not know what exactly).

Thanks to Bavi H for a hint in what direction I had to research. --Lüboslóv Yęzýkin (talk) 20:27, 24 September 2015 (UTC)