Regular expressions
A regular expression, also known as a regex or regexp, is a string whose pattern (template) describes a set of strings. The pattern determines which strings belong to the set. A pattern consists of literal characters and metacharacters, which are characters that have special meaning instead of a literal meaning.
Matching Symbols
Regular Expression |
Description |
. |
Matches any character |
^regex |
Finds regex that must match at the beginning of the line. |
regex$ |
Finds regex that must match at the end of the line. |
[abc] |
Set definition, can match the letter a or b or c |
[abc][vz] |
Set definition, can match the letter a or b or c followed by either v or z |
[^abc] |
When a caret appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c. |
[a-d1-7] |
Ranges: matches a letter between a and d and figures from 1 to 7, but not d1. |
X|Z |
Finds X or Z |
XZ |
Finds X directly followed by Z. |
$ |
Checks if a line end follows. |
Meta Characters
The following meta characters have a pre-defined meaning and make certain common patterns easier to use, e.g., \d instead of [0..9].
Regular Expression |
Description |
\d |
Any digit, short for [0-9] |
\D |
Any non-digit, short for [^0-9] |
\s |
A whitespace character, short for [\t\n\x0b\r\f] |
\S |
A non-whitespace character |
\w |
A word character, short for [a-zA-Z_0-9] |
\W |
A non-word character [^\w] |
\S+ |
Several non-whitespace characters |
\b |
Matches a word boundary where a word character is [a-zA-Z0-9_] |
These meta characters have the same first letter as their representation, e.g., digit, space, word, and boundary. Uppercase symbols define the opposite.
Quantifier
A quantifier defines how often an element can occur. The symbols ?, *, + and {} define the quantity of the regular expressions
Regular Expression |
Description |
Example |
* |
Occurs zero or more times, is short for {0,} | X* finds no or several letter X, <sbr /> .* finds any character sequence |
+ |
Occurs one or more times, is short for {1,} | X+ Finds one or several letter X |
? |
Occurs no or one times, is short for {0,1} | X? finds no or exactly one letter X |
{X} |
Occurs X number of times, {} describes the order of the preceding liberal | \d{3} searches for three digits, .{10} for any character sequence of length 10. |
{X,Y} |
Occurs between X and Y times, | \d{1,4} means \d must occur at least once and at a maximum of four. |
*? |
? after a quantifier makes it a reluctant quantifier. It tries to find the smallest match. This makes the regular expression stop at the first match. |
Grouping and back reference
You can group parts of your regular expression. In your pattern you group elements with round brackets, e.g., (). This allows you to assign a repetition operator to a complete group.
In addition these groups also create a back reference to the part of the regular expression. This captures the group. A back reference stores the part of the string which matched the group. This allows you to use this part in the replacement.
Via the ${} you can refer to a group. ${1} is the first group, ${2} the second, etc.
Negative look ahead
Negative look ahead provides the possibility to exclude a pattern. With this you can say that a string should not be followed by another string.
Negative look ahead are defined via (?!pattern). For example, the following will match "a" if "a" is not followed by "b".
a(?!b)