Wikipedia talk:AutoWikiBrowser/Typos

Tie scores being treated as false positives
1-1, 2-2, etc. scores are being overlooked for ndash replacement and with all the sports articles I work on, this is getting a little too frustrating. - I found out you had made a change to make this a false positive in March 2020. Is there any way this can be made to bypass fewer of these (i.e. look for additional text to match)? A *lot* of tie scores aren't getting a correction (unless I catch it and perform the correction manually). Stefen Towers among the rest!  Gab • Gruntwerk 23:02, 3 February 2024 (UTC)
 * the relevant part of rule "0–0",, was moved the next day into rule "2–1", precisely so that "0–0" can find draws and ties. If "0–0" isn't finding ties now, it's not because of that lookbehind; it's because "0–0" needs to be expanded with more relevant keywords.   ~ Tom.Reding (talk ⋅dgaf)  13:33, 4 February 2024 (UTC)
 * Thanks for the reply.  is still in "2–1", and stopping corrections of draws/ties that don't meet "0–0". "0–0" currently doesn't cover a lot of scenarios I'm seeing in my typo correction work, but I don't know why the draws/ties need to be avoided in "2–1" 's general case in the first place. Is there any harm from removing that code? That's what I'm driving at.  Stefen Towers among the rest!   Gab • Gruntwerk 17:37, 4 February 2024 (UTC)
 * the reason that lookbehind exists is because the rule was incorrectly catching journal volume numbers, e.g. "Some Journal 5-5", preceded by "5-4" and succeeded by "5-6" etc., which should not be changed/en-dashed.  ~ Tom.Reding (talk ⋅dgaf)  17:46, 4 February 2024 (UTC)
 * I can see why we don't want a journal volume number en-dashed, but I don't quite understand why volume numbers that look like draws/ties are more problematic than those where the numbers aren't equal (e.g. "5-4"). If the "5-4" is in the middle of the series instead of "5-5", wouldn't it be falsely corrected? Stefen Towers among the rest!   Gab • Gruntwerk 17:58, 4 February 2024 (UTC)
 * Is it that a draw/tie is guaranteed to not be a number range if referring to a volume? If that's the case, I understand it better now after thinking more about it. At any rate, I am working on more cases to convert the ties to use ndash outside of this false positive check. But I wish this check wasn't needed - we really miss a lot of genuine draws/ties as it stands. Stefen Towers among the rest!   Gab • Gruntwerk 00:01, 5 February 2024 (UTC)
 * do you have a list of pages where ties are known to have been missed?  ~ Tom.Reding (talk ⋅dgaf)  11:59, 5 February 2024 (UTC)
 * @Tom.Reding I don't keep a list of these. I just keep seeing these missed, and I have to correct them manually. It's any draw/tie that the "2–1" rule would ordinarily catch but doesn't due to the false positive catch. If you run AWB typo checks in sports articles especially, it can get rather frustrating. Stefen Towers among the rest!   Gab • Gruntwerk 16:26, 5 February 2024 (UTC)
 * Here is an example. While it shows "17-17" and "7-7" being corrected, that happened only because I saw in my AWB viewer they weren't corrected, and I manually placed the en dash there. Stefen Towers among the rest!   Gab • Gruntwerk 16:56, 5 February 2024 (UTC)

rule amended to find "17-17" and "7-7" in that example, but with enough specificity, I think, to avoid most/all freeform journal citations, which are unlikely to end the line at a journal volume (if so, they can/should have the page # appended). ~ Tom.Reding (talk ⋅dgaf) 21:32, 19 June 2024 (UTC)
 * Thanks for addressing this. While I usually have no issue reading RegEx, it has been many months since I looked at this. So, I guess what you did is add code to catch more cases for the "0–0" rule, so any tie ranges within those cases won't be skipped in the "2–1" rule. Do I have it right? Stefen Towers among the rest!   Gab • Gruntwerk 05:25, 20 June 2024 (UTC)
 * correct for the "0–0" rule, which I did isolate & test. I did not look at the "2–1" rule, since it wasn't relevant to, nor triggered for, the "17-17"/"7-7" example. Also, rules aren't necessarily run in the order they appear on the typo page, and so should be coded in a way that is independent of the firing sequence of other rules.  ~ Tom.Reding (talk ⋅dgaf)  18:39, 20 June 2024 (UTC)

False Positive
Vice-President gets corrected to Vice-president, should be Vice President or vice president (I think) DarmaniLink (talk) 18:13, 11 February 2024 (UTC)


 * With respect to companies/organizations, I defer to them using a hyphen in the title per their choice. At the same time, titles like "Vice President of the United States" definitely don't have a hyphen. At any rate, created rules related to this, so maybe he has some thoughts here. In the meantime, feel free to skip any typo corrections you don't feel comfortable with, or do a manual edit if you so choose.  Stefen Towers among the rest!   Gab • Gruntwerk 17:38, 12 February 2024 (UTC)
 * Stefen is correct. Note that in Europe the hyphen is generally used, while in the US it is less often used, but this has to be handled case by case. Chris the speller   yack  02:06, 13 February 2024 (UTC)
 * Is it "Vice-president" or Vice-President" for british english/europe? DarmaniLink (talk) 02:09, 13 February 2024 (UTC)
 * Most resources seem to suggest that for 'Vice President' both are capitalized when used as a title. Neils51 (talk) 23:54, 13 February 2024 (UTC)

Kuty
"ua" was flagged as a typo of "uk" (sorry about the previous mistakes, im sleep deprived). I'm not sure if this was the typos or something else that flagged this though. I'm not seeing a regex that would have done that, just saw the attempted diff. DarmaniLink (talk) 02:10, 13 February 2024 (UTC)
 * That was a general fix, not a typo fix. There is an entry in AutoWikiBrowser/Template redirects that tells the software to replace instances of the redirect lang-ua with lang-uk. -- John of Reading (talk) 07:59, 13 February 2024 (UTC)
 * ah, i assumed lang-uk was british english, I really don't know why
 * should have checked that, sorry DarmaniLink (talk) 15:11, 13 February 2024 (UTC)

Typos restructuring
As this is a complicated tool rather than an article, any major restructuring needs to be discussed. Please use this topic to explain what you think ought to be done here. Also, if changes are to be done, they need to be done more piecemeal, so editors can readily see what is moving where. A lot of difficult work has gone into building the list over time, and we need to be extra careful. Stefen Towers among the rest!  Gab • Gruntwerk 04:35, 12 April 2024 (UTC)
 * Apologies, and noted. As it stands, the current structure of the list is highly disorganized, with many fixes lasting more than a year in a general section at the top of the page. This not only makes finding which issues have or haven't been addressed difficult, but also results in situations where the same issue may be covered twice via different means, such as km² via its own unicode to sup tag listing and one that also addresses m² and cm². I don't mean to remove recent additions from the top of the list, as I understand the importance of testing them extensively before they are integrated into the main lists, but I also think that it's important to sort articles into sections after the year time period has passed in order to maintain an effective organization scheme. In terms of organization, the Capitalisation, Grammar, and sections at the top do a pretty good job at dividing rules into groups (improving navigability, facilitating standardization, and making it easy to see what's not represented), ones with too many listings inevitably become bloated and should be divided into meaningful subsections. To illustrate my point, here's a restructured collection of the current sections, to be amended when the listings at the top are sorted:

4 Typo list 4.1 Recent additions 4.1.1 Unsorted 4.1.x Common subsections (TBD) 4.2 Academia 4.3.1 Academic titles 4.3.2 Academic fields 4.3.3 College degrees 4.3 Capitalisation 4.4.1 Brand names 4.4.1.1 Colleges and universities 4.4.1.2 Companies and organizations 4.4.1.3 Products 4.4.1.4 Technology 4.4.1.5 Websites 4.4.1.6 Unsorted 4.4.2 Placenames (high-level) 4.4.2.1 Continents and subcontinents 4.4.2.2 Oceans 4.4.2.3 Geographical proper names 4.4.3 Placenames (low-level) 4.4.3.1 Canada 4.4.3.2 France 4.4.3.3 United Kingdom 4.4.3.4 United States (states) 4.4.3.5 United States (cities) 4.4.4 Time 4.4.4.1 Calendrical proper nouns 4.4.4.2 Holidays 4.4.4.3 Epochs, ages and dynasties 4.4.5 Society 4.4.5.1 Cultures, languages, and ethnic groups 4.4.5.2 Ethnicity & language 4.4.5.3 Religious 4.4.6 Unsorted 4.4 Decapitalisation 4.5.1 Medals 4.5.2 Miscellaneous 4.5 Mispellings 4.5.1 A           4.5.2 B            4.5.3 C            4.5.4 D            4.5.5 E            4.5.6 F            4.5.7 G            4.5.8 H            4.5.9 I            4.5.10 J            4.5.11 K            4.5.12 L            4.5.13 M            4.5.14 N            4.5.15 O            4.5.16 P            4.5.17 Q            4.5.18 R            4.5.19 S            4.5.20 T            4.5.21 U            4.5.22 V            4.5.23 W            4.5.24 X            4.5.25 Y            4.5.26 Z        4.6 Accents and diacritics 4.7.1 Proper nouns 4.8 Formatting 4.8.1 Calendar dates 4.8.2 SI unit symbols 4.8.3 Symbols and HTML entities 4.9 Grammar 4.9.1 Articles 4.9.2 Contractions 4.9.3 Replace space by hyphen 4.9.4 Joined words 4.9.5 Split words 4.9.6 Duplicated words 4.9.7 Redundant words 4.9.8 Euphemisms 4.9.9 Preposition usage 4.9.10 Punctuation 4.9.11 Remove hyphens after adverbs ending in -ly 4.9.12 Remove other hyphens (replace with space) 4.10 General rules 4.10.1 Unsorted 4.10.2 Beginnings 4.10.3 Middles 4.10.4 Endings 4.10.4.1 A               4.10.4.2 B                4.10.4.3 C                4.10.4.4 D                4.10.4.5 E                4.10.4.6 F                4.10.4.7 G                4.10.4.8 H                4.10.4.9 I                4.10.4.10 J–K 4.10.4.11 L               4.10.4.12 M                4.10.4.13 N                4.10.4.14 O                4.10.4.15 P                4.10.4.16 Q                4.10.4.17 R                4.10.4.18 S                4.10.4.19 T                4.10.4.20 U–V 4.10.4.21 W       4.11 Incorrect phrases


 * While it would make the TOC longer, I think it would make it much easier for people to find issues they'd like to address (using the RegEx replacement rules to search for articles that fulfill those criteria), make it easier for people to make new rules to fill in the gaps of existing ones (such as the 'cubed' rule I previously added on the basis of the 'squared' rule), and facilitate the adding of new rules (by categorizing similar rules together, it's easier to see what's missing). For instance, by grouping together university capitalization rules into their own subsection, people may think of additional entries to add to that specific section that they might not have otherwise in looking at a more generalized list.


 * Apologies again for making such a drastic edit without consultation; it just seems like the potential of RegEx typo fixing would be doubled if there were a greater deal of clarity and structure in the rule categorization scheme. CoolieCoolster (talk) 05:16, 12 April 2024 (UTC)
 * I'm looking through this and it's quite overwhelming. It would really help if you made a list of "change x to y" proposals, and possibly separating those out so they can be discussed individually. It's really difficult to grasp the value of this restructuring as a whole. What we had wasn't not working, and I can't yet see that the potential would be doubled with the proposed changes. Stefen Towers among the rest!   Gab • Gruntwerk 21:26, 12 April 2024 (UTC)
 * It's just my proposed solution; the overall problem is that while the page mentions sorting listings into sections after a year, it doesn't appear that that is occurring in an organized manner on a consistent basis, making it difficult to interpret what is or isn't present without using the browser's search function for individual words. I don't mean to be blunt, but given that an effective reorganization would involve moving hundreds of lines to new or existing sections, discussing the moving of any individual line would be missing the forest for its trees. While moving any one line has no inherent value on its own, and should only be done if it functions identically when sorted as it did initially, the value of the sorted list as a whole is the ability to see what functions are currently unaccounted for, particularly for common typos that one might not have considered otherwise.
 * My intention is to help, not harm the project, so until consensus on the matter is reached I'll stick to just organizing any replacement rules I add myself to subcategories of the existing New additions section. However, given that organization is already listed as being part of the list-making process, as long as new organization keeps rules above the General rules section to avoid rule interference, it seems a shame to forgo the benefits of a sorted list for the sake of avoiding change for change's sake. CoolieCoolster (talk) 21:56, 12 April 2024 (UTC)
 * There is no "avoiding change for change's sake" going on here. I asked for a detailed explanation of the specific x-to-y changes. That's all. I already have read your overall contention about the structure. I'm not inclined to agree to such massive changes unless and until I (and hopefully others) can understand that they make sense. I am not against change or improvement. Stefen Towers among the rest!   Gab • Gruntwerk 22:01, 12 April 2024 (UTC)
 * Need a project approach here. Firstly, is the proposal a good idea?  Namely, create a redefined list and cleanup entries.  In the interests of consensus, yes. Sure what's there works however it can be tricky to navigate and there is duplication.  Next, how to approach it?  It makes sense to me that a parallel (new) list is built on a subpage and material coped to it, removing the potential for harm of the active page.  A list of editors willing to be involved to be obtained on that page and a work list created where editors can put their name against specific components and thus spread the load.  A lot of the work may be sheer copy/paste, which if divvied up won’t be so onerous.  I would suggest that any entries that in the interim are added or revised in the active list have a date stamp against them (that seems to be happening) and perhaps the editor’s guesstimate as to classification against the new list. There are other aspects that will need addressing (testing, etc.) however at this stage not much point in writing a book. Neils51 (talk) 01:33, 13 April 2024 (UTC)
 * To make sure that there's enough people that both think list restructuring would be worthwhile and are willing to help with the restructure on a parallel page (enabling everyone involved to review it so that a consensus on a functional list can be established), I'll make a signup list below. Per my statements above, it won't involve making any modifications to the current list until consensus can be reached that the structure of the parallel list meets the needs of all users involved. I think having at least three people willing to work on the list would help in splitting the workload and ensuring that the structure is a product of consensus and not unilateral editing. CoolieCoolster (talk) 22:57, 15 April 2024 (UTC)
 * 1. CoolieCoolster (talk) 22:57, 15 April 2024 (UTC)
 * 2. Neils51 (talk) 09:03, 16 April 2024 (UTC)
 * 3.


 * I have doubts most folks who use the typo list are looking at the structure. They're just pulling the whole lot into AWB and using them for typo hunting. At any rate, I'd be happy to review what's done and offer feedback. My main concern is the typo rules themselves staying intact in the result and that the list is no less readable than it is now. I didn't put my name on the list because I don't entirely agree with the premises as stated and my time is eaten up with too many other things. Stefen Towers among the rest!   Gab • Gruntwerk 06:35, 18 April 2024 (UTC)

Misspelled CNN reporter name
I don't know if this is the right place for this, but there are 47 articles that misspell a CNN reporter's name in references as Arinne de Vogue instead of Ariane de Vogue. Annoyedhumanoid (talk) 02:35, 10 June 2024 (UTC)