Groups
Groups, as the name suggests, are meant to be used to “group” components of regular expressions. These groups can be used to:
- Extract subsets of matches
- Repeat groups an arbitrary number of times
- Refer to previously matched substrings
- Enhance readability
- Allow complex alternations
We’ll see how to do a lot of this in later chapters, but learning how groups work will allow us to study some great examples in these later chapters.
Capturing groups
Capturing groups are denoted by (
… )
. Here’s an expository example:
Capturing groups allow extracting parts of matches.
Using your language’s regex functions, you would be able to extract the text between the matched braces for each of these strings.
Capturing groups can also be used to group regex parts for ease of repetition of said group. While we will cover repetition in detail in chapters that follow, here’s an example that demonstrates the utility of groups.
Other times, they are used to group logically similar parts of the regex for readability.
Backreferences
Backreferences allow referring to previously captured substrings.
The match from the first group would be \1
, that from the second would be \2
, and so on…
Backreferences cannot be used to reduce duplication in regexes. They refer to the match of groups, not the pattern.
Here’s an example that demonstrates a common use-case:
This cannot be achieved with a repeated character classes.
Non-capturing groups
Non-capturing groups are very similar to capturing groups, except that they don’t create “captures”. They take the form (?:
… )
.
Non-capturing groups are usually used in conjunction with capturing groups. Perhaps you are attempting to extract some parts of the matches using capturing groups. You may wish to use a group without messing up the order of the captures. This is where non-capturing groups come handy.
Examples
Query String Parameters
We match the first key-value pair separately because that allows us to use &
, the separator, as part of the repeating group.
(Basic) HTML tags
As a rule of thumb, do not use regex to match XML/HTML.1234
However, it’s a relevant example:
Names
Find: \b(\w+) (\w+)\b
Replace: $2, $1
5
Before
John Doe
Jane Doe
Sven Svensson
Janez Novak
Janez Kranjski
Tim Joe
After
Doe, John
Doe, Jane
Svensson, Sven
Novak, Janez
Kranjski, Janez
Joe, Tim
Backreferences and plurals
Find: \bword(s?)\b
Replace: phrase$1
5
Before
This is a paragraph with some words.
Some instances of the word "word" are in their plural form: "words".
Yet, some are in their singular form: "word".
After
This is a paragraph with some phrases.
Some instances of the phrase "phrase" are in their plural form: "phrases".
Yet, some are in their singular form: "phrase".
- https://stackoverflow.com/a/590789↩
- https://stackoverflow.com/a/6751339↩
- https://blog.codinghorror.com/parsing-html-the-cthulhu-way/↩
- https://web.archive.org/web/20071018202901/http://oubliette.alpha-geek.com/2004/01/12/bring_me_your_regexs_i_will_create_html_to_break_them↩
- In replacement contexts,
$1
,$2
, … are usually used in place of\1
,\2
, … to refer to captured strings.↩