User:Brighterorange/punctuationtest

Spelling (puSPELL)
We test for a few common spelling errors. If you write "seperately" or "seperate" then we'll fix that. How embarassing!

Commas (puCOMMA)
If you have commas normally, then nothing happens.

If you have commas with no spaces,then we fix.

If you have too many spaces, then we also fix.

Also this strange combination ,is fixed too.

But inside a link like http://news.agency/article/2007,a100,b1,,c3.stm or or named link we shouldn't add spaces,although after such links we still should. There are other false positives like 2,3,3-trimethylpentane,but these are pretty rare.

We don't bother with a comma,    followed by extra space,   because that doesn't affect the page rendering. However,   if the comma is already being fixed, we do normalize the space.

Semicolons (puSEMICOLON)
We treat semicolons just like commas;there must be a space after and not before.

Semicolon is also used for HTML entities ; this means that we should ignore it in certain scenarios&mdash;like when used to create dashes.

En dashes (puENDASH)

 * (missing test cases for en dashes)

Don't miss the case where there's a reference with something like:

We should also avoid en dashes within links like http://news.agency/article/2007-2008-1-2-3.stm or or named link we shouldn't convert to en dashes but 10-20 doctors agree that we should still do it after such links. Links to pages like 1956-57 in English football shouldn't get en dashes, even if they are piped, but if the pipe part contains a range like 1956-1957, then they should. Same goes for templates, eg. or ; however, the template arguments themselves should be en dashed!

Link syntax (puLINKSPACE)
The only thing we do with links is remove trailing spaces. Those should never be there. We don't touch regular links.

Also the syntax is a common idiom used to use the space character as a sort key in the category, which makes it show up first in the list. We filter this out. 

Born (puBORN)
Some other encyclopedias use the abbreviation "b." for "born", which means that we sometimes see (b. 1979) in biographical articles. The manual of style endorses (born 1979). But we shouldn't touch, say, "A.b.c.'s" or "slab."

Decades
Sometimes people write 1980's instead of 1980s. They shouldn't do that. But we shouldn't touch something like 20's (or should we?) or.

Parentheses
A common error is to forget space (of whatever sort)after a closing parenthesis. Sometimes people put stray spaces (who knows why ) before closing parentheses too. But there needn't be space unless the next thing is a word (so we should detect this).

XHTML
Empty tags in XHTML should have a slash before the closing &gt;. Probably the most common tag in Wikipedia articles is &lt;br/&gt;. So we should turn linebreaks without the slash into proper XHTML. This should happen even if the TAGSARECAPITALIZED.

City-State (puCITYSTATE)
It's common to see New York, New York or Pittsburgh, Pennsylvania when Pittsburgh, Pennsylvania looks nicer and is more usable. Even with the city-state template it's a pain to do this, though. We can transform this automatically for US states. Since it currently uses the pipe trick, it does not work properly in references—so be careful! Also, false positives for images and category links: Additionally, when linking to cities in Georgia, like Athens, Georgia, we need to link to Georgia (U.S. state) since Georgia is a disambiguation page.

Reference tags (puREF)
According to the manual of style for references, references should follow punctuation (other than dashes) unless a smaller particle (for example, an individual term) is what the reference binds to. A reference can take many forms ; like it can have parameters or be XHTML empty. There shouldn't be any space before references because that may cause the reference to wrap to the next line, and the same is true if there are multiple references in a row. Also because references are long in code, sometimes people accidentally put punctuation both before and after the reference! ! Sometimes people don't put space after a reference, which looks weird.

Finally, it's surprisingly common to leave off punctuation entirely at the end of a line after a reference