Wikipedia:AutoWikiBrowser/Regular expression

A regular expression or regex is a sequence of characters that define a pattern to be searched for in a text. Each occurrence of the pattern may then be automatically replaced with another string, which may include parts of the identified pattern. AutoWikiBrowser uses the .NET flavor of regex.

Anchors
Used to anchor the search pattern to certain points in the searched text.

Character classes
Expressions which match any character in a pre-defined set. This list is not exhaustive.

Tokens
Tokens match a single character from a specified set or range of characters.

Groups
Groups match a string of characters (including tokens) in sequence. By default, matches to groups are captured for later reference. Groups may be nested within other groups.

Quantifiers
Quantifiers specify how many of the preceding token or group may be matched.

By default, quantifiers are "greedy", meaning they will match as many characters as possible while still allowing the full expression to find a match. Adding a question mark ("?") after a qualifier will make it non-greedy, meaning it will match as few characters as possible while still allowing the full expression to find a match. See for examples.

Metacharacters and the escape character
Metacharacters are characters with special meaning in regex; to match these characters literally, they must be "escaped" by being preceded with with the escape character \.

Back references
Used to match a previously captured group again.

Look-around
Used to check what comes before or after, without consuming or capturing. ("Without consuming" means that matches for look-around assertions do not become part of the string to be replaced. In the following examples, only "abc" is consumed.) In .NET regex, all regex syntax can be used within a look-around assertion.

Commenting
Comments in the search string do not affect the resulting matches.

Using captured groups in the replacement string
Captured groups can be output as part of the replacement string.

Tokens and groups
Tokens and groups are portions of a regular expression which can be followed by a quantifier to modify the number of consecutive matches. A token is a character, special character, character class, or range (e.g. ). A group is formed by enclosing tokens or other groups within parentheses. All of these can be modified to match a number of times by a quantifier. For example:,  ,  ,  ,  , and.

Greed and quantifiers
Greed, in regular expression context, describes the number of characters which will be matched (often also stated as "consumed") by a variable length portion of a regular expression – a token or group followed by a quantifier, which specifies a number (or range of numbers) of tokens. If the portion of the regular expression is "greedy", it will match as many characters as possible. If it is not greedy, it will match as few characters as possible.

By default, quantifiers in AWB are greedy. To make a quantifier non-greedy, it must be followed by a question mark. For example:

In this string:

this expression: will match.

This expression: will match Lorem ipsum and consectetur adipisicing.

Be careful with expressions like (\w)(]*>.*? )([,.:;]), whose center capture group will span more than one ref group if the outer conditions are met:

Regex behavior options
Regex offers several options to change the default behavior. Five of these options can be controlled with inline expressions, as described below. Four of these options can also be applied to the entire search pattern with check boxes in the AWB "Find-and-replace" tools. By default, all options are off.

Inline syntax
The options statement (?flags-flags) turns the options given by "flags" on (or off, for any flags preceded by a minus sign) from the point where the statement appears to the end of the pattern, or to the point where a given option is cancelled by another options statement. For example:

Alternatively, the syntax (?flags-flags:pattern) applies the specified options only to the part of the pattern appearing inside the parentheses:

User-made shortcut editing macros
You can make your own shortcut editing macros. When you edit a page, you can enter your short-cut macro keys into the page anywhere you want AWB to act upon them.

For example, you are examining a page in the AWB edit box. You see numerous items like adding, inserting line breaks , commenting out entire lines comment, inserting state names, p, insert Level 2,3,or even 4 headlines, etc... This can all be done by creating your short-cut macro keys.
 * The process
 * Create a rule. See Find and replace, Advanced settings.
 * Edit your page in the edit box. Insert your short-cut editing macro key(s) anywhere in the page you want AWB to make the change(s) for you.
 * Re-parse the page. Right click on the edit box and select Re-parse from the context pop up menu. AWB will then re-examine your page with your macro short-cut key(s), find your short-cut key(s) and perform the action you specified in the rule.

Naming a short-cut macro key can be any name. But it is best to try and make it unique so that it will not interfere with any other process that AWB may find and suggest. For that reason using /// followed by a set of lowercase characters that you can easily remember is best (lowercase is used so that you do not have to use the shift key). You can then enter these short-cut macros keys you create into the page manually or by using the edit box context menu paste more function. The reason why we use three '/' is so that AWB will not confuse web addresses/url's in a page when re-parsing.

Examples:

Create a rule as a regular expression.

Efficiency
Efficiency is how long the regex engine takes to find matches, which is a function of how many characters the engine has to read, including backtracking. Complex regular expressions can often be constructed in several different ways, all with the same outputs but with greatly varying efficiency. If AWB is taking a long time to generate results because of a regex rule:


 * Try constructing the expression a different way. There are several online resources with guidance to creating efficient regex patterns.
 * Using the "advanced settings" find-and-replace tool, enter expressions on the "If" tab to filter the pages that an expensive find-and-replace rule is applied to.

Online regular expressions testing tools

 * RegEx Storm (supporting .NET regex flavour);
 * RegEx101 (supporting .NET regex flavour)
 * RexEx Pal
 * RegExr
 * Rubular

Desktop regular expression testing tool

 * RegEx Hero

Documentation about regular expressions

 * Regular Expressions in .NET Well House Consultants.
 * Regular-Expressions.info
 * Regular Expressions perldoc.perl.org.
 * Regular Expression Syntax docs.python.org.
 * Regular Expression Language – Quick Reference MSDN.
 * .NET regular expressions MSDN.
 * Regular Expressions – User Guide zytrax.com.