Repetition
Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.
Making things optional
We can make parts of regex optional using the ? operator.
/a?/g- 1 match
- 2 matches
a - 3 matches
aa - 4 matches
aaa - 5 matches
aaaa - 6 matches
aaaaa
Here’s another example:
/https?/g- 1 match
http - 1 match
https - 1 match
http/2 - 1 match
shttp - 0 matches
ftp
Here the s following http is optional.
We can also make capturing and non-capturing groups optional.
/url: (www\.)?example\.com/g- 1 match
url: example.com - 1 match
url: www.example.com/foo - 1 match
Here's the url: example.com.
Zero or more
If we wish to match zero or more of a token, we can suffix it with *.
/a*/g- 1 match
- 2 matches
a - 2 matches
aa - 2 matches
aaa - 2 matches
aaaa - 2 matches
aaaaa
Our regex matches even an empty string "".
One or more
If we wish to match one or more of a token, we can suffix it with a +.
/a+/g- 0 matches
- 1 match
a - 1 match
aa - 1 match
aaa - 1 match
aaaa - 1 match
aaaaa
Exactly x times
If we wish to match a particular token exactly x times, we can suffix it with {x}. This is functionally identical to repeatedly copy-pasting the token x times.
/a{3}/g- 0 matches
- 0 matches
a - 0 matches
aa - 1 match
aaa - 1 match
aaaa - 1 match
aaaaa
Here’s an example that matches an uppercase six-character hex colour code.
/#[0-9A-F]{6}/g- 1 match
#AE25AE - 1 match
#663399 - 1 match
How about #73FA79? - 1 match
Part of #73FA79BAC too - 0 matches
#FFF - 0 matches
#a2ca2c
Here, the token {6} applies to the character class [0-9A-F].
Between min and max times
If we wish to match a particular token between min and max (inclusive) times, we can suffix it with {min,max}.
/a{2,4}/g- 0 matches
- 0 matches
a - 1 match
aa - 1 match
aaa - 1 match
aaaa - 1 match
aaaaa
There must be no space after the comma in {min,max}.
At least x times
If we wish to match a particular token at least x times, we can suffix it with {x,}. Think of it as {min,max}, but without an upper bound.
/a{2,}/g- 0 matches
- 0 matches
a - 1 match
aa - 1 match
aaa - 1 match
aaaa - 1 match
aaaaa
A note on greediness
Repetition operators, by default, are greedy. They attempt to match as much as possible.
/a*/g- 2 matches
aaa - 2 matches
aaaa - 2 matches
aaaaa
* can match zero or more instances of a. In each of the example strings, it could just as well match, say, zero as. Yet, it matches as many as it can.
/a{2,4}/g- 1 match
aaaaa - 1 match
aaa - 1 match
aaaa
Suffixing a repetition operator (*, +, ?, …) with a ?, one can make it “lazy”.
/a{2,4}?/g- 2 matches
aaaaa - 1 match
aaa - 2 matches
aaaa
The suffix ? here is different from the repetition operator ?.
/".*"/g- 1 match
"quote" - 1 match
"quote", "quote" - 1 match
"quote"quote"
/".*?"/g- 1 match
"quote" - 2 matches
"quote", "quote" - 1 match
"quote"quote"
Now, .* matches no more than the minimum necessary to allow a match.1
[…] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more.
—Andrew S on Stack Overflow
/<.+>/g- 1 match
<em>g r e e d y</em>
/<.+?>/g- 2 matches
<em>lazy</em>
Examples
Bitcoin address
/([13][a-km-zA-HJ-NP-Z0-9]{26,33})/g- 1 match
3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v - 1 match
1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx - 1 match
2016-03-09,18f1yugoAJuXcHAbsuRVLQC9TezJ
YouTube Video
/(?:https?:\/\/)?(?:www\.)?youtube\.com\/watch\?.*?v=([^&\s]+).*/gm- 1 match
youtube.com/watch?feature=sth&v=dQw4w9WgXcQ - 1 match
https://www.youtube.com/watch?v=dQw4w9WgXcQ - 1 match
www.youtube.com/watch?v=dQw4w9WgXcQ - 1 match
youtube.com/watch?v=dQw4w9WgXcQ - 1 match
fakeyoutube.com/watch?v=dQw4w9WgXcQ
We can adjust this to not match the last broken link using anchors, which we shall encounter soon.
- In this example, we could also achieve this using
/"[^"]*"/g(as is best practice).↩