User:Art LaPella/AWB explanation

This page explains my AWB editing. It does not specifically explain the edit whose edit summary you probably just clicked, nor does it explain AWB itself. It explains all of my similar edits in a general way.

Why don't I let AWB list all of my changes in its edit summary, as most AWB users do? Because most of them usually wouldn't fit in the summary. And also because most of my changes would be meaningless without understanding how I programmed my AWB Find and Replaces. AWB doesn't provide an easy way to program exceptions to a Find and Replace. As a workaround, I change the exceptions by adding YwBsF and other meaningless code words, so the main Find and Replace won't Find them. After that main Find and Replace, I change YwBsF back again. Then the process repeats for another Find and Replace and its exceptions. It works for me, but it doesn't work in an edit summary to see all the text changed to YwBsF and back again.

Most of the changes in my edits are based on Manual of Style and its subpages. I don't try to enforce all of the Manual of Style, because most of the manual is too subjective to be automated with AWB. Even the parts I do automate have frequent exceptions, so this project could not be accomplished with a bot. I also do AWB's "general fixes" and "RegexTypoFixes", correct User:Art LaPella/Citation template double period bug problems, and do other proofreading such as changing "pages 14" to "page 14". My AWB "Find and Replace" selections are listed at User:Art LaPella/AWB list, but they aren't as easy to read as they are within AWB.

Not counting the general fixes and RegexTypoFixes linked in the previous paragraph, below I have listed my most frequent (as of July 2010) changes, along with a link to the Manual of Style guideline or other rule that I am trying to enforce. If I have too automatically applied the Manual of Style to a special situation where the guideline in the Manual wasn't intended to apply, you could be right. But if you believe the Manual of Style guideline I'm using is always wrong, or if you wonder why I care at all about things like the difference between hyphens and dashes, then please see User:Art LaPella/Because the guideline says so.

Nowrap and nbsp
The Nowrap template and &amp;nbsp; are used when you don't want a line to end in the middle of an expression like 17 kg. Most of my Nowraps and nbsps are specified by the WP:NBSP guideline, including this elaboration at MOS:NUM. The guideline also repeatedly calls for nbsps or Nowraps in any "places where breaking across lines might be disruptive to the reader". But since I learned to read from one line to the next at a very young age, I have trouble imagining any such places where the reader would have trouble following from one line to the next. So rather than guess what other places might be intended, I use nbsps or Nowraps only in places that closely resemble the examples given in the MOS:NUM elaboration of the WP:NBSP guideline, with two exceptions covered by other guidelines: £11 billion is listed as an example, but not 11 billion without any currency symbol; however that situation is covered near the end of MOS:NUM, where what is now the third bullet point to the end begins, "When both a figure ...". The other exception is the nbsp before three dots; see WP:ELLIPSIS where it says, "To keep the ellipsis from wrapping to the next line ...".

Most of the Nowraps I add are to dates in citations, such as the date= and accessdate= parameters. The guideline specifies "12 November" as an example of a good place for an nbsp or Nowrap, but they probably had normal text in mind when they said "disruptive to the reader", not citations. However, 1) a Nowrap doesn't hurt a citation 2) unlike human editors, AWB software doesn't get tired of adding Nowraps throughout an article 3) it would actually take more programming to tell AWB to only change dates that aren't in a citation, and 4) the drawback of Nowrap or nbsp, which is that it interferes with editing by making the edit page less readable, is also less important in a citation because the editor is likely to skip over the entire citation when trying to comprehend the edit text. Removed after objections.

Why do I sometimes use Nowrap instead of nbsp? It has been argued that Nowrap is easier for newbies to understand (not everyone knows or remembers what nbsp stands for). Nowrap is recommended at the elaborated version of WP:NBSP. So if you don't like Nowrap, please discuss or change MOS:NUM so the rest of us can share in your wisdom. Once again, see User:Art LaPella/Because the guideline says so.

You might think that adding &amp;nbsp; into a wikilink like World War II would make it a redlink. But that link is blue for me, despite the nbsp you can see in edit mode. Nobody using different browsers or operating systems has ever complained on this issue. Previous discussion here.

I don't use nbsp or Nowrap near the beginning of a paragraph. Such an nbsp would not matter unless the reader were using a very narrow window or a very large font size. A couple editors have opposed an nbsp in places where it would normally have no effect.

Hyphens, dashes, and minus signs
What's the difference? Look closely: - – — − They each look different. The first is a hyphen, the second is an en dash, the third is an em dash, and the fourth is a minus sign. If I changed one of those four lines to another, or if I added or removed spaces before and after, it's because of WP:DASH. The end of WP:HYPHEN also helps explain it. Who cares about obscure guidelines? Again, see User:Art LaPella/Because the guideline says so.

Here are the most common situations in this category. A hyphen with a space before and after is often used to mean something like a comma. According to the guideline, any such hyphen should be changed to an en dash (the guideline also allows em dashes if the spaces are removed). A hyphen without spaces often occurs in phrases like "The Civil War, 1861-1865" or "pages 14-16". Those hyphens should also be en dashes. More often than not, these hyphens occur in the titles of references. So if a cited book has a title like The Civil War, 1861-1865, do we change the hyphen even though that may change the title? This was discussed here, and someone reinstated my dashes again afterwards. Removed after an objection. I still change hyphens and similar punctuation in web page titles, but not books. If you're looking for a web page, you'll probably use the URL, or you'll resort to the Wayback Machine which asks for the URL, or you might even look up the web page's title using a search engine like Google – but any search engine we've checked ignores punctuation. But if you're looking for a book (that is, anything on paper rather than the Internet), you're likely to use WorldCat, which doesn't work if the punctuation doesn't match, as of August 2010.

For better or worse, I don't substitute hyphens with dashes in dates of the form mm-dd-yyyy, which are usually the date that a reference is retrieved. My thought was that such dates are more likely to be updated, and an editor updating that date is unlikely to take the trouble to use dashes instead of hyphens. That would result in a mixture of dashes and hyphens in the retrieved dates.

Removing wikilinks
Wikilinks like United States should usually be unlinked according to this guideline. The exact list of links that my software removes can be found by searching User:Art LaPella/AWB list for a word like "Asia". The list was chosen conservatively; that is, more links should probably be unlinked, but hopefully we can agree that at least this list of links is covered by WP:OVERLINK. A phrase like "India and Pakistan fought a war" might be considered unfair to either India or Pakistan, because I linked one but not the other, so in such a case I would probably unlink both.

Commas
See WP:COPYEDIT. Sometimes I rearrange a sentence, rather than insert another comma into a place such as almost at the end of a sentence. I also make sure each comma has a space afterwards, and no space before.

Curly quotes
See the Quotation characters paragraph of WP:MOS.

Removing periods from captions
See WP:MOS.

Contractions
See WP:CONTRACTION.

USA
With frequent exceptions, it should usually be US or U.S. not USA, according to WP:MOS: "Do not use U.S.A. or USA, except ..."

Uncapitalization in headers
Only the first word and proper nouns should be capitalized, whether the header is an article header (see WP:Manual of Style (capital letters)) or a table header (see WP:Manual of Style (tables)).

24th not 24th
See MOS:NUM.

7 kg with a (non-breaking) space, not 7kg
Unit symbols paragraph in MOS:NUM: "use 10 m or 29 kg, not 10m or 29kg."

ALL CAPITAL LETTERS
See WP:ALLCAPS.

Remove phrases like "Note that ..." or "Clearly ..."
See MOS:NOTED.

&
If I changed "&" to "and", see WP:&. This happens most often in a citation in a list of authors. That is a debatable interpretation of WP:&, because citations aren't exactly "running prose", but neither are they a place where "space is limited" because the reader has no reason to look down at the references unless he clicks one of them.

If I changed "and" to "&", it is because my program looks for a list of companies like Barnes & Noble. That company's article, website and logo consistently say "Barnes & Noble", not "Barnes and Noble".

Typos I noticed
AWB allows manual editing at the same time, if I see something obviously wrong.

Page, pages, pp., p., etc.
No guideline requires multiple pages to be "pages 14–17" not "page 14–17", but that is common sense if English is your native language. Similarly, I didn't find a command to use "p." only for single pages and "pp." only for multiple pages, but dictionaries agree that "p." means "page" and "pp." means "pages", and I didn't find other publications confusing the two (with or without the periods or spaces afterwards).

From x to y
"From x–y" should be "from x to y", and "between x–y" should be "between x and y", according to WP:ENDASH.

Even less frequent changes
My software looks for a list of problems much longer than this explanation page. If a change isn't listed above, and it isn't listed with AWB's general fixes and RegexTypoFixes, it's probably based on some obscure Manual of Style guideline, similar to the sections above. Or if you can figure it out, you might look for it in my AWB Find and Replaces listed at User:Art LaPella/AWB list.

Changes I don't make
Other AWB users make AWB edits that I could easily add to my software if there were a consensus. But where the Manual of Style is ambivalent on an issue, I don't see how you can argue that the consensus really exists. If it does, why don't you change the guideline? Specifically:


 * Removing double spaces between words, or double spaces after a period. MOS:FULLSTOP
 * A space after the equal signs in a heading. WP:HEAD
 * Changing &amp;ndash; and &amp;mdash; to – or — despite no mention in the Manual of Style. Some want to reverse that process.
 * "Fixing" redirects that aren't broken. Several automated processes promote this despite WP:R2D, and there is frustratingly little interest in harmonizing the guideline with actual practice.
 * WP:UNLINKDATES. That is in the Manual of Style, and I might be persuaded to include it in my editing. But in the past, that issue has generated more wikidrama than it is worth, so I leave that to the wikiwarriors who enjoy that sort of thing.

More automation
So why don't I fix some of this stuff by changing templates or using a bot? Probably for the same reasons you aren't volunteering.

If you mean do it myself, the expected level of vanity wildly underestimates the difficulty of learning new software, and I'd rather clean up Wikipedia than study software. Could anybody even direct me to a manual for programming a bot, and examples of how to interface it with Wikipedia?

If you mean organize others to do it, I've had no success herding cats on Wikipedia. Just getting agreement on specifications is usually impossible. Getting anyone to fix User:Art LaPella/Citation template double period bug, among the most widespread typos on Wikipedia, has also been impossible. I meet plenty of people who make the perfect the enemy of the good, but they don't want to solve the problem. If I want something done, I do it myself, and occasionally my example inspires some imitation.