User:Proteins/Prosesize script acid test

A pathological article for testing prose-size scripts. Note extra line below, which should not be counted.

Bad heading level H4
Stetsonville is located at 45.07639°N, -90.31389°W (45.076413, -90.313952).

According to the United States Census Bureau, the village has a total area of 0.4 square miles (1.0 km²).

Oak Park Mall is open from 10 a.m. to 9 p.m. Monday through Saturday and 11 a.m. to 6 p.m. on Sundays. The mall contains nearly 200 stores and the mall area is 1500000 sqft, making it the largest mall in the Kansas City Metro Area.

Section 1: Blank lines:
Make this section harder.

Really, really hard.

Section 2: Lists
Sometimes an unordered list is useful...


 * in giving various cases
 * and some subcases
 * such as these
 * which may have subcases
 * in discussing cases with complicated logic

An ordered list


 * 1) Here's the first item.
 * 2) Here's the second.
 * 3) And here's the third.

A discursive list

  First term  Describes the first item.   Second item  Describes the second item.   Third term  Describes the third item.  

Section 3: Punctuation and special characters
Em-dashes&mdash;which are very common&mdash;are used in many hyphen–hungry texts, but – sparingly. Two words separated by an unspaced em-dash should count as two words, not one. However, two things separated by an unspaced en-dash should generally be counted as a single compound word, as in blood–brain barrier; see WP:MOS.

&amp; &amp; &amp; These ampersands should count as one character each, not five.

One Two Three Four Five These are five words separated by non-breaking spaces, not one word.

Other special characters should count as one character each:

&#247; &#60; &#62; &#63; &#38; &amp; &#47; &quot; &rdquo; &euro; &#32; &pound; &#167; &uarr; &#46; &#40; &#41; &iquest; &spades; &#61; &#58; &#59; &copy;

&amp; &amp; &cent; &copy; &Agrave; &ntilde; &#174; &oslash; &mdash; – &micro; &dagger; &Dagger; &#64;

The last six characters of this line are divided into two words of three characters each by a #32 space character:

&#91; &#92; &#93; &#94; &#95; &quot;&rdquo;&euro;&#32;&pound;&#167;&uarr;

Section 4: Blockquotes, cquotes and tables
"whether 'tis nobler in the mind"

The Germans have four violin concertos. The greatest, most uncompromising is Beethoven's. The one by Brahms vies with it in seriousness. The richest, the most seductive, was written by Max Bruch. But the most inward, the heart's jewel, is Mendelssohn's.

"The Al/air battery system can generate enough energy and power for driving ranges and acceleration similar to gasoline powered cars...the cost of aluminum as an anode can be as low as US$ 1.1/kg as long as the reaction product is recycled...Only the Al/air EVs can be projected to have a travel range comparable to ICEs. From this analysis, Al/air EVs are the most promising candidates compared to ICEs in terms of travel range, purchase price, fuel cost, and life-cycle cost."

However, tables should not be counted, such as this one from

the Wikipedia article on acute myeloid leukemia.

Section 5: Poems and indented text
I've never seen a purple cow and hope I never see one


 * But I will tell you this right now
 * I'd rather see than be one!
 * Ogden Nash
 * That poem is called "The Purple Cow".

The previous indented lines had no linespaces between them.


 * This line is indented once and has one line space ahead of itself


 * This line is indented twice and has one line space ahead of itself


 * This line is again indented twice and has one line space ahead of itself


 * This line is indented three times and has one line space ahead of itself


 * This line is yet again indented twice and has one line space ahead of itself


 * This line is indented four times and has two line spaces ahead of itself

Section 6: Refmark text
This is a citation, but this is a superscript x2.

This is a template test.

H5 heading
=H1 heading=

Section 8: Images




Section 9: Unusual formatting of text
We begin with a space at the beginning of a line; in Wiki-markup, this corresponds to the &lt;PRE&gt; tag in HTML.

For I am the very model of a modern major general...

'Twas blighted affection that made him exclaim...

Correct results
The correct results of a prose-size script on this article should be:

This article has three bad jumps in heading level, one in the lead and two in section 7; there's also an illegal H1 heading in section 7. The MediaWiki software appears to have a bug in formatting articles with such H1 sections and lead sections with a jump; please note the table of contents.

Known bugs and conventions in articlestructure.js
The reference script articlestructure.js has a few bugs and/or conventions.


 * Text conventions/bugs
 * Each &lt;PRE&gt;-tagged text is counted as a new paragraph.
 * Sections that come after a "closing" section such as "See also" are not counted, such as this one. Another example is 2003 German Grand Prix.
 * Closing sections may have main-article text, as in the "See also" section of Operations Specialist (US Navy) or in the "Notes" section of the Legend of the White Cowl. This text is not counted.
 * The heading "Source" or "Sources" is treated as equivalent to "References", making it a closing section. For example, see Yellowtail flounder, Nothropus or John Bell.  However, "Source" could be a section heading of an article, as in the "source of the Nile".
 * The heading "Literature" is used in Heroin to mean "Further reading", but is not included in the script. That doesn't affect the prose-size counting for Heroin, but could for other articles.
 * The "Memoir" section of Martin Clemens may mean "Further reading", but is counted by this script.
 * The boxed author-abbreviation sentence in Louis van Houtte is not counted.
 * Disambiguation, See also, Further, Main article, etc. messages are counted as text if they are written out in italics. As templates, however, they are not counted.  For an example, see Lieutenant-General (United Kingdom), Vital Signs (pop band), Hak Ja Han, Nevada State Route 427, or 2005 LPGA Tour.  An anomalous example is the "Source" messages found in List of Sabre and Fury units in US military.
 * Wikipedians may disagree whether two words joined by an unspaced en-dash or hyphen should count as one word or two. This script treats them as a single compound word.  Therefore, an unspaced en-dash used (improperly) as a parenthetical em-dash gives the wrong word count, but the correct character (byte) count.
 * The text floating at right in W. Eugene McCombs within a DIV tag is not counted.


 * Image conventions/bugs


 * Images are counted only if they are at least 80 pixels in both width and height. This eliminates flag icons and other such images.
 * Doesn't count very narrow but otherwise informative images, such as Universal College Application or 1-Tetradecanol.
 * Misses images composed of many smaller images, such as the uniforms in Anagennisi Dherynia.
 * Misses small mugshots in Louis Buchalter. Perhaps check for captions or an "Enlarge" button if the size test fails.