Flags
Flags (or “modifiers”) allow us to put regexes into different “modes”.
Flags are the part after the final /
in /pattern/
.
Different engines support different flags. We’ll explore some of the most common flags here.
Global (g
)
All examples thus far have had the global flag. When the global flag isn’t enabled, the regex doesn’t match anything beyond the first match.
(Case) Insensitive (i
)
As the name suggests, enabling this flag makes the regex case-insensitive in its matching.
Multiline (m
)
In Ruby, the m
flag performs other functions.
The multiline flag has to do with the regex’s handling of anchors when dealing with “multiline” strings—strings that include newlines (\n
). By default, the regex /^foo$/
would match only "foo"
.
We might want it to match foo
when it is in a line by itself in a multiline string.
Let’s take the string "bar\nfoo\nbaz"
as an example:
bar
foo
baz
Without the multiline flag, the string above would be considered as a single line bar\nfoo\nbaz
for matching purposes. The regex ^foo$
would thus not match anything.
With the multiline flag, the input would be considered as three “lines”: bar
, foo
, and baz
. The regex ^foo$
would match the line in the middle—foo
.
Dot-all (s
)
JavaScript, prior to ES2018, did not support this flag. Ruby does not support the flag, instead using m
for the same.
The .
typically matches any character except newlines. With the dot-all flag, it matches newlines too.
Unicode (u
)
In the presence of the u
flag, the regex and the input string will be interpreted in a unicode-aware way. The details of this are implementation-dependent, but here are some things to expect:
- Character classes may match astral symbols.
- Character escapes may match astral symbols and may be unicode-aware.
- The
i
flag may use Unicode’s case-folding logic. - The use of some features like unicode codepoint escapes and unicode property escapes may be enabled.
Whitespace extended (x
)
When this flag is set, whitespace in the pattern is ignored (unless escaped or in a character class). Additionally, characters following #
on any line are ignored. This allows for comments and is useful when writing complex patterns.
Here’s an example from Advanced Examples, formatted to take advantage of the whitespace extended flag:
^ # start of line
(
[+-]? # sign
(?=\.\d|\d) # don't match `.`
(?:\d+)? # integer part
(?:\.?\d*) # fraction part
)
(?: # optional exponent part
[eE]
(
[+-]? # optional sign
\d+ # power
)
)?
$ # end of line