Regular Expressions For Regular Folk

Repetition

Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.

Making things optional

We can make parts of regex optional using the ? operator.

/a?/g
  • 1 match
  • 2 matchesa
  • 3 matchesaa
  • 4 matchesaaa
  • 5 matchesaaaa
  • 6 matchesaaaaa

Here’s another example:

/https?/g
  • 1 matchhttp
  • 1 matchhttps
  • 1 matchhttp/2
  • 1 matchshttp
  • 0 matchesftp

Here the s following http is optional.

We can also make capturing and non-capturing groups optional.

/url: (www\.)?example\.com/g
  • 1 matchurl: example.com
  • 1 matchurl: www.example.com/foo
  • 1 matchHere's the url: example.com.

Zero or more

If we wish to match zero or more of a token, we can suffix it with *.

/a*/g
  • 1 match
  • 2 matchesa
  • 2 matchesaa
  • 2 matchesaaa
  • 2 matchesaaaa
  • 2 matchesaaaaa

Our regex matches even an empty string "".

One or more

If we wish to match one or more of a token, we can suffix it with a +.

/a+/g
  • 0 matches
  • 1 matcha
  • 1 matchaa
  • 1 matchaaa
  • 1 matchaaaa
  • 1 matchaaaaa

Exactly x times

If we wish to match a particular token exactly x times, we can suffix it with {x}. This is functionally identical to repeatedly copy-pasting the token x times.

/a{3}/g
  • 0 matches
  • 0 matchesa
  • 0 matchesaa
  • 1 matchaaa
  • 1 matchaaaa
  • 1 matchaaaaa

Here’s an example that matches an uppercase six-character hex colour code.

/#[0-9A-F]{6}/g
  • 1 match#AE25AE
  • 1 match#663399
  • 1 matchHow about #73FA79?
  • 1 matchPart of #73FA79BAC too
  • 0 matches#FFF
  • 0 matches#a2ca2c

Here, the token {6} applies to the character class [0-9A-F].

Between min and max times

If we wish to match a particular token between min and max (inclusive) times, we can suffix it with {min,max}.

/a{2,4}/g
  • 0 matches
  • 0 matchesa
  • 1 matchaa
  • 1 matchaaa
  • 1 matchaaaa
  • 1 matchaaaaa
Warning

There must be no space after the comma in {min,max}.

At least x times

If we wish to match a particular token at least x times, we can suffix it with {x,}. Think of it as {min,max}, but without an upper bound.

/a{2,}/g
  • 0 matches
  • 0 matchesa
  • 1 matchaa
  • 1 matchaaa
  • 1 matchaaaa
  • 1 matchaaaaa

A note on greediness

Repetition operators, by default, are greedy. They attempt to match as much as possible.

/a*/g
  • 2 matchesaaa
  • 2 matchesaaaa
  • 2 matchesaaaaa

* can match zero or more instances of a. In each of the example strings, it could just as well match, say, zero as. Yet, it matches as many as it can.


/a{2,4}/g
  • 1 matchaaaaa
  • 1 matchaaa
  • 1 matchaaaa

Suffixing a repetition operator (*, +, ?, …) with a ?, one can make it “lazy”.

/a{2,4}?/g
  • 2 matchesaaaaa
  • 1 matchaaa
  • 2 matchesaaaa
Warning

The suffix ? here is different from the repetition operator ?.


/".*"/g
  • 1 match"quote"
  • 1 match"quote", "quote"
  • 1 match"quote"quote"
/".*?"/g
  • 1 match"quote"
  • 2 matches"quote", "quote"
  • 1 match"quote"quote"

Now, .* matches no more than the minimum necessary to allow a match.1

[…] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more.

Andrew S on Stack Overflow

/<.+>/g
  • 1 match<em>g r e e d y</em>
/<.+?>/g
  • 2 matches<em>lazy</em>

Examples

Bitcoin address

/([13][a-km-zA-HJ-NP-Z0-9]{26,33})/g
  • 1 match3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v
  • 1 match1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
  • 1 match2016-03-09,18f1yugoAJuXcHAbsuRVLQC9TezJ

YouTube Video

/(?:https?:\/\/)?(?:www\.)?youtube\.com\/watch\?.*?v=([^&\s]+).*/gm
  • 1 matchyoutube.com/watch?feature=sth&v=dQw4w9WgXcQ
  • 1 matchhttps://www.youtube.com/watch?v=dQw4w9WgXcQ
  • 1 matchwww.youtube.com/watch?v=dQw4w9WgXcQ
  • 1 matchyoutube.com/watch?v=dQw4w9WgXcQ
  • 1 matchfakeyoutube.com/watch?v=dQw4w9WgXcQ

We can adjust this to not match the last broken link using anchors, which we shall encounter soon.


  1. In this example, we could also achieve this using /"[^"]*"/g (as is best practice).