Repetition
Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.
Making things optional
We can make parts of regex optional using the ?
operator.
/a?/g
- 1 match
- 2 matches
a
- 3 matches
aa
- 4 matches
aaa
- 5 matches
aaaa
- 6 matches
aaaaa
Here’s another example:
/https?/g
- 1 match
http
- 1 match
https
- 1 match
http/2
- 1 match
shttp
- 0 matches
ftp
Here the s
following http
is optional.
We can also make capturing and non-capturing groups optional.
/url: (www\.)?example\.com/g
- 1 match
url: example.com
- 1 match
url: www.example.com/foo
- 1 match
Here's the url: example.com.
Zero or more
If we wish to match zero or more of a token, we can suffix it with *
.
/a*/g
- 1 match
- 2 matches
a
- 2 matches
aa
- 2 matches
aaa
- 2 matches
aaaa
- 2 matches
aaaaa
Our regex matches even an empty string ""
.
One or more
If we wish to match one or more of a token, we can suffix it with a +
.
/a+/g
- 0 matches
- 1 match
a
- 1 match
aa
- 1 match
aaa
- 1 match
aaaa
- 1 match
aaaaa
Exactly x
times
If we wish to match a particular token exactly x
times, we can suffix it with {x}
. This is functionally identical to repeatedly copy-pasting the token x
times.
/a{3}/g
- 0 matches
- 0 matches
a
- 0 matches
aa
- 1 match
aaa
- 1 match
aaaa
- 1 match
aaaaa
Here’s an example that matches an uppercase six-character hex colour code.
/#[0-9A-F]{6}/g
- 1 match
#AE25AE
- 1 match
#663399
- 1 match
How about #73FA79?
- 1 match
Part of #73FA79BAC too
- 0 matches
#FFF
- 0 matches
#a2ca2c
Here, the token {6}
applies to the character class [0-9A-F]
.
Between min
and max
times
If we wish to match a particular token between min
and max
(inclusive) times, we can suffix it with {min,max}
.
/a{2,4}/g
- 0 matches
- 0 matches
a
- 1 match
aa
- 1 match
aaa
- 1 match
aaaa
- 1 match
aaaaa
There must be no space after the comma in {min,max}
.
At least x
times
If we wish to match a particular token at least x
times, we can suffix it with {x,}
. Think of it as {min,max}
, but without an upper bound.
/a{2,}/g
- 0 matches
- 0 matches
a
- 1 match
aa
- 1 match
aaa
- 1 match
aaaa
- 1 match
aaaaa
A note on greediness
Repetition operators, by default, are greedy. They attempt to match as much as possible.
/a*/g
- 2 matches
aaa
- 2 matches
aaaa
- 2 matches
aaaaa
*
can match zero or more instances of a
. In each of the example strings, it could just as well match, say, zero a
s. Yet, it matches as many as it can.
/a{2,4}/g
- 1 match
aaaaa
- 1 match
aaa
- 1 match
aaaa
Suffixing a repetition operator (*
, +
, ?
, …) with a ?
, one can make it “lazy”.
/a{2,4}?/g
- 2 matches
aaaaa
- 1 match
aaa
- 2 matches
aaaa
The suffix ?
here is different from the repetition operator ?
.
/".*"/g
- 1 match
"quote"
- 1 match
"quote", "quote"
- 1 match
"quote"quote"
/".*?"/g
- 1 match
"quote"
- 2 matches
"quote", "quote"
- 1 match
"quote"quote"
Now, .*
matches no more than the minimum necessary to allow a match.1
[…] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more.
—Andrew S on Stack Overflow
/<.+>/g
- 1 match
<em>g r e e d y</em>
/<.+?>/g
- 2 matches
<em>lazy</em>
Examples
Bitcoin address
/([13][a-km-zA-HJ-NP-Z0-9]{26,33})/g
- 1 match
3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v
- 1 match
1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
- 1 match
2016-03-09,18f1yugoAJuXcHAbsuRVLQC9TezJ
YouTube Video
/(?:https?:\/\/)?(?:www\.)?youtube\.com\/watch\?.*?v=([^&\s]+).*/gm
- 1 match
youtube.com/watch?feature=sth&v=dQw4w9WgXcQ
- 1 match
https://www.youtube.com/watch?v=dQw4w9WgXcQ
- 1 match
www.youtube.com/watch?v=dQw4w9WgXcQ
- 1 match
youtube.com/watch?v=dQw4w9WgXcQ
- 1 match
fakeyoutube.com/watch?v=dQw4w9WgXcQ
We can adjust this to not match the last broken link using anchors, which we shall encounter soon.
- In this example, we could also achieve this using
/"[^"]*"/g
(as is best practice).↩