Repetition
Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.
Making things optional
We can make parts of regex optional using the ?
operator.
Here’s another example:
Here the s
following http
is optional.
We can also make capturing and non-capturing groups optional.
Zero or more
If we wish to match zero or more of a token, we can suffix it with *
.
Our regex matches even an empty string ""
.
One or more
If we wish to match one or more of a token, we can suffix it with a +
.
Exactly x
times
If we wish to match a particular token exactly x
times, we can suffix it with {x}
. This is functionally identical to repeatedly copy-pasting the token x
times.
Here’s an example that matches an uppercase six-character hex colour code.
Here, the token {6}
applies to the character class [0-9A-F]
.
Between min
and max
times
If we wish to match a particular token between min
and max
(inclusive) times, we can suffix it with {min,max}
.
There must be no space after the comma in {min,max}
.
At least x
times
If we wish to match a particular token at least x
times, we can suffix it with {x,}
. Think of it as {min,max}
, but without an upper bound.
A note on greediness
Repetition operators, by default, are greedy. They attempt to match as much as possible.
*
can match zero or more instances of a
. In each of the example strings, it could just as well match, say, zero a
s. Yet, it matches as many as it can.
Suffixing a repetition operator (*
, +
, ?
, …) with a ?
, one can make it “lazy”.
The suffix ?
here is different from the repetition operator ?
.
Now, .*
matches no more than the minimum necessary to allow a match.1
[…] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more.
—Andrew S on Stack Overflow
Examples
Bitcoin address
YouTube Video
We can adjust this to not match the last broken link using anchors, which we shall encounter soon.
- In this example, we could also achieve this using
/"[^"]*"/g
(as is best practice).↩