User talk:Mathglot/Regex

All regular expressions are standard PCRE unless otherwise stated. A few might be Cirrus regexes used by Wikipedia's regex editor; see also Cirrus regex syntax.

Piped links

 * 1) Match piped link in any namespace (e.g, could be 'File:' in first part)
 * 2) Match piped link in current namespace only (not containing colon in first part):
 * 3) Piped or unpiped link in current namespace:
 * 1) Piped or unpiped link in current namespace:
 * 1) Piped or unpiped link in current namespace:
 * 1) Piped or unpiped link in current namespace:

Fix MOS:REFPUNCT problems

 * Search:
 * Replace:

Citations with 'author=' to 'last=... first=...'
Assumes a regular CS1 or CS2 citation, with space before vertical bar, and '|author=' present:
 * Search:
 * Replace:

Alt (author or author1; name possibly wikilinked):
 * Search:
 * Replace:

Move url to the back

 * Search:
 * Replace:

Possible failure case:

Swap last with first

 * Search:
 * Replace:

Swap editor-last with editor-first

 * Search:
 * Replace:

Swap editorN-last with editorN-first

 * Search:
 * Replace:

Swap lastn with firstn

 * Search:
 * Replace:

Move last-first before title

 * Search:
 * Replace:

Move year after first

 * Search:
 * Replace:

Punctuation after citation, to before
Sfn:
 * Search:
 * Replace:

Swap |first=X |last=y around so last is first in citation

 * Search:
 * Replace: *

plain refs to cite web
Text sources which don't use cite web may be transformed by a series of regex replaces, if the format is reasonably standard. For example, by this series:

See also User:Mathglot/sandbox/Templates/Cite MLA (in progress...)

Updating named refs to template:R
Example: Holocaust denial, revision 843383121. Three steps:

1. change quoted named refs:

2. change unquoted named refs (with or without trailing blanks before the slash)

3. combine consecutive R's

g(repeat till done)

Edit summary:

Add leading hidden token to ref-named citations as prep for sorting the Bibliography

 * Search:
 * Replace: *

Alphabetize citations in Bibliography
The technique is 1) add a leading token consisting of the (first) last name, 2) sort, 3) strip out the token. Only step 1 is shown:
 * Search:
 * Replace: *

Article page history to parsed data
Turn article page history into a series of parsed lines:


 * 1=ARTICLE_TITLE 2=REVISION 3=HH:MM 4=Month DD, YYYY 5=TOTAL_BYTES 6=BYTE_CHANGE


 * 1) Go to article page history page
 * 2) Rt-click, Page source
 * 3) Select-all, copy, paste
 * 4) Apply Search/Replace Regex below, with "dot matches newline"
 * 5) Optional step to convert underscore to blank in article titles

SEARCH:

To generate the following output, use this replacement: 1=ARTICLE_TITLE 2=REVISION 3=HH:MM 4=Month DD, YYYY 5=TOTAL_BYTES 6=BYTE_CHANGE

REPLACE:

To generate the following sample output, use this replace instead:
 * 916661155 Risk_aversion diff 00:29 September 20, 2019; (change:-1b to 31,671 bytes)

REPLACE:

To generate a six-column table row with this data, including one extra column for remarks, use this: REPLACE:

Followed by optional underscore replacement. (s/_/ /gi).

To generate the following table row examples (table header/footer code added for context):

User contribution history to parsed data
Turn article page history into a series of parsed lines:


 * 1=REVISION 2=TITLE 3=TIMESTAMPE 4=BYTE_CHANGE 5=EDIT_SUMMARY


 * 1) Go to user contrib history page
 * 2) Rt-click, Page source
 * 3) Select-all, copy, paste
 * 4) Find '<h4 class="mw-index-pager-list-header-first' and cut everything above it.
 * 5) Find ' ' and cut everything below it
 * 6) Apply Search/Replace Regex below, with "dot matches newline"
 * 7) Optional step to convert underscore to blank in article titles

SEARCH: (options: dot matches newline)

To generate the following output 1=REVISION 2=TITLE 3=TIMESTAMPE 4=BYTE_CHANGE 5=EDIT_SUMMARY use this replacement:

REPLACE:

To generate: rev=REVISION title=TITLE timestamp=TIMESTAMPE bytes=BYTE_CHANGE summary=EDIT_SUMMARY REPLACE:

To generate: rev=REVISION title=TITLE SEARCH: (options: dot matches newline) REPLACE:

Convert glossary anchor to vanchor
SEARCH: REPLACE:

Convert glossary &tl;term> to be in-linkable
SEARCH: REPLACE:

ES: Convert glossary &lt;term>s to be in-linkable via global regex replace s!^{{term\s*\|(term\s*=\s*)?([^|{}]+)!{{term|\1|2={{Vanchor|\2}}!g

Parse wikilinks
Parse wikilinks, exclude colons to exclude namespaces (this will exclude wikilinks that have colons in the anchor):

$1 = Target article $2 = Anchor (#-fragments untested):



This saves the pipe (if there is one) in \2, so can use replace to generate lang-prefixed links, for example, if translating a nav template from en to fr, one could start like this:


 * Search:
 * Replace:

This adds superscript wikidata links to all wikilinks on a page so they can be easily translated:
 * Search:
 * Replace:

New contribs Translated pages to bullet list
From Special:contribs with 'new' pages box ticked; extracting pages with ContentTranslation tool summary:


 * 1) Search:
 * 2) Copy matches
 * 3) Replace:

Interlanguage template transformation

 * 1) Turn ca:Template:GEC into sfn:
 * 2) * Search:
 * 3) * Rplce:

FR - EN article translation preprocessing
1 -> 2  -> \1 3 -> "\1" 4 -> 5 ->

Substify and unsubstify

 * Substify
 * Search:
 * Replace:


 * Unsubstify
 * Search:
 * Replace:

Wikilink to undefined
Aimed at Nav template translation, so handles bulleted links, optional pops or bolding, and specific lang prefix:

Unpiped links (e.g., ):


 * Search:
 * Replace:

Piped links (.e.g., ):
 * Search:
 * Replace:

For bios or proper names, duplicate the Foreign name in the English article field:
 * Search:
 * Replace:

Examples:

Section demote

 * Search:
 * Replace:

Subsection promote

 * Search:
 * Replace:

Reflib section from last-first-year

 * Search:
 * Replace:

Zed to ess (recognize ⟶ recognise)

 * Search: /((?:[a-z-[aeiuo]]{0,3}[aeiouy]{1,2}){1,}[a-z-[aeiuo]]{0,3}[iy])z((?:e|ed|es|er|ers|ing)\b)/g
 * Replace: $1s$2