Character Escapes
Character escapes act as shorthands for some common character classes.
Digit character — \d
The character escape \d
matches digit characters, from 0
to 9
. It is equivalent to the character class [0-9]
.
While 59
is also a pair of digits, most engines look for non-overlapping matches from left to right by default.
\D
is the negation of \d
and is equivalent to [^0-9]
.
Word character — \w
The escape \w
matches characters deemed “word characters”. These include:
- lowercase alphabet —
a
–z
- uppercase alphabet —
A
–Z
- digits —
0
–9
- underscore —
_
It is thus equivalent to the character class [a-zA-Z0-9_]
.
Whitespace character — \s
The escape \s
matches whitespace characters. The exact set of characters matched is dependent on the regex engine, but most include at least:
- space
- tab —
\t
- carriage return —
\r
- new line —
\n
- form feed —
\f
Many also include vertical tabs (\v
). Unicode-aware engines usually match all characters in the separator
category.
The technicalities, however, will usually not be important.
Any character — .
While not a typical character escape, .
matches any1 character.
- Except the newline character
\n
. This can be changed using the “dotAll” flag, if supported by the regex engine in question.↩