Module:DecodeEncode/doc

Implements Lua functions mw.text.decode, mw.text.encode in a module.
 * &rarr;

See List of XML and HTML character entity references.

Decode (&amp;copy; &rarr; ©)

 * Decodes Named Entities from entity name into a regular (unicode) character:
 * &rarr;
 * &rarr;

All well-defined named entities are decoded (HTML Named character references, formally: as defined in the PHP table).


 * A regular, rendered sentence:
 * "At 100 °F, & with a "burning" sun above, we, we ⁄walked⁄."


 * In code:
 * " " -- wikitext


 * Processing:
 * &rarr;
 * -- In code: straight characters, no named entities.


 * Renders, again:
 * "At 100 °F, & with a "burning" sun above, we ⁄walked⁄."

Decode a reduced set only
By setting true, only these five entity names are decoded: '&amp;lt;', '&amp;gt;', '&amp;amp;', '&amp;quot;', '&amp;nbsp;' (that is, into '&lt;', '&gt;', '&amp;', '&quot;', ' ').


 * Note: There is a difference with the relevant Lua parameter. (This only concerns your task if you also work directly with the Lua mw.text.decode function). Lua documentation defines parameter decodeNamedEntities, having this effect: when omitted or false, only the reduced set of entities is recognized and decoded. This use of 'false' is inverted in using subset_only: false = true.


 * Also, this module ignores the "omitted" logic: subset_only should be set explicitly to 'true' to be effective.

Encode (© &rarr; &amp;copy;)

 * Function  encodes some entity-named characters into that name (for example:   &rarr;  ).

Regular sentence:
 * "At >100 °F, & with a "burning" sun above, we walked. ©"

In code:

Encode:


 * &rarr;




 * Renders as:


 * "At &gt;100 &#176;F, &amp; with a &quot;burning&quot; sun above, we walked. &#169;"

character set to encode
Per Lua documentation, only a small set of characters is processed. The characterset can be set (expanded) by using charset.


 * Example: <>" \'& (the default), <>°"'&©; characters not in the default will be replaced by their decimal entity:  &rarr;   (hexadecimal number, not decimal nor named &amp;copy;)

Known issues

 * 13 Sep 2021: NOTE: The encode function with user-supplied charset is now used productively in R/superscript and R/ref. Before implementing breaking changes here, these templates need to be adjusted accordingly!


 * 26 Sep 2021:
 * Note: Possible bug: Decoding  works, but   doesn't.
 * Resolved in code.


 * 4 Feb 2023:


 * See
 * Resolved in code.