Regular Expressions

Overview

Regular expressions were developed originally to define members of one particular class of "formal languages". They are most frequently used to form "patterns" that describe specific strings or categories of strings. Such patterns, which are themselves strings, can be used to test or manipulate other strings in various ways. The three basic operations in which regular expressions are used are:

matching (Does this (entire) string match this pattern?)
searching (Is this pattern found within this string?)
transforming (such as replacing one or all occurrences of a pattern with another string)

Though there are some differences from one programming language to another, particularly in the built-in functions that are used to perform the various regular expression operations, the basic ideas underlying regular expressions are mostly the same across all programming languages.

Some situations for which regular expressions might be found useful include:

Data validation: Is this data "well-formed"?
Example: Is this sequence of characters a valid postal code?
Decision making: To what set does this string belong?
Example: Does this file name represent a pdf document?
Parsing input: What is in this data?
Example: This is a date, but what is the year?
Transforming input: Can we format, or re-format, these strings in some useful way?
Example: Can we replace the periods in these telephone numbers with dashes?

This page is not a regular expression tutorial, but it does give a summary of most of what you need to know, and suggests a number of exercises that you might like to try with an online regex tool like regexpal. Note, however, that regexpal is a JavaScript tool, so you cannot use it to test any regular expression syntax that is not supported by that language (such as POSIX bracket expressions, for example).

Advice and Notes

Keep referring back to these items as you read through the rest of this page. Some may not apply or be relevant until you have read and absorbed something that comes later.

Always prefer the use of regular expressions wherever possible over hand-coding a solution from scratch.
Try to ensure that your regular expressions find not only what you want, but only what you want. In this context, be aware that there are several "flavors" of regular expression engine, and there may be subtle differences between them.
Regular expressions are case-sensitive by default.
Regular expressions are eager (the earliest match is preferred). For example, the regex (get|getValue|set|setValue) will match the set in setValue and not setValue itself when tested against setValue.
Regular expressions are also greedy by default, which essentially means that a "quantified repetition" part of a regex will try to match as much as possible before turning it over to the next part of the expression. However, it still "defers" to the need to get an overall match. For example, if our regex is .+\.jpg, the first part (.+) will match all of filename.jpg because .+ is greedy, but then it will "give back" .jpg so that we get an overall match. Note, however, that as little as possible is "given back". For example, if our regex is .*[0-9]+ and our string is Page 266, then the .* matches Page 266, but the final 6 is "given back" to get an overall match. So the end result is that .* matches Page 26 and the [0-9]+ matches only the final 6.
On the other hand, if a quantified expression is made "lazy" by appending a ? to it, then it tries to match as little as possible before turning things over to the next part of the expression. For example, if our regex is .*?[0-9]+, then .*? matches as little as possible before turning it over to [0-9]+. The result is that .*? matches "Page " and [0-9]+ matches the 266.
But be careful with "laziness", because if everything is optional, then "nothing" is a match. For example, if .*?[0-9]*? is our regex and Page 266 our string, then both parts of the regex succeed by matching nothing, so the overall match turns out to be nothing.
Do not escape ordinary characters, and remember that in regular expressions the double quote is just an ordinary character.
The order of characters in a character class does not matter.
Metacharacters inside character classes are already "escaped", except for these: ] - ^ \. However, it also doesn't hurt if you do escape, inside a character set, a metacharacter that doesn't need escaping. Note that you can also use predefined character classes like \w inside a square-bracketed character class.
The underscore (_) is a "word character", but the hyphen (-) is not.
POSIX bracket expressions may not be supported in JavaScript, Java, .NET or Python.
A POSIX bracket expression must go inside a character class.
Grouping with parentheses can be used for the following:
- Clarification
- Applying a repetition operator
- Capturing part of an expression for matching or replacing
When constructing a complex regular expression, it is often a good technique to put conceptually distinct parts on separate lines and then, when you are happy with the overall construct, join the lines.
In an alternation, either left or right can match (of course), but left gets precedence.
All anchors refer to a position, not an actual character, and they have zero width. The ^ and $ symbols are virtually universal start and end anchors (respectively), but \A and \Z are also recognized in Java, .NET, Perl, Python and Ruby.
In single-line mode note that:
- ^ and $ do not match at line breaks.
- Also \A and \Z (if available) do not match at line breaks.
- Many (older?) Unix tools support only single-line mode.
In multiline mode note that:
- ^ and $ match at the start and end (respectively) of lines.
- But \A and \Z (if available) still do not match at line breaks.
- Many programming languages support multiline mode.
Note that a "word boundary" is not an actual character (and in particular it is not a space), but a position that occurs in one of these places:
- Before the first word character in a string
- After the last word character in a string
- Between a word character and a non-word character
Backreferences to optional expressions can be very subtle, and how they behave differs from one regex engine to another:
- Element A is optional, but the group/capture is not optional:
  (A?)B matches B and "captures nothing"
- Element A is not optional, but the group/capture is optional:
  (A)?B matches B but "does not capture anything"
Like anchors and word boundaries, lookaround assertions are also zero-width.
Metacharacters inside character sets do not need to be escaped, except for these four: ] - ^ \
Negative lookahead expressions give us a way to match something that should be rejected.
Some principles for constructing better regexes:
- Define as precisely as you can the quantity of any repeated expression. For example, .+ is faster than .* and .{5} or {3,7} are even faster.
- Narrow the scope of any repeated expression as much as possible. For example, [A-Za-z] is better than .+.
- Provide clearer starting and ending points. For example, <[^>]+> is better than <.+>.
- Put the simplest expression first in any alternation. For example, \w+_\d{2,4}|\d{4}_\d{2}_\w+|export\d{2} is not as good as export\d{2}|\d{4}_\d{2}_\w+|\w+_\d{2,4}.
- If you need to nest regular expressions, be especially careful. Remember that with regular expressions we are always struggling with the tradeoffs between precision, readability and efficiency.
- Word boundaries can be used to improve regex efficiency.

Regular Expression Details

Metacharacters: \ | ( ) [ ] { } ^ $ * + ? . - : ! =
These are characters that have a special meaning within the context of regular expressions, and which do not "match" themselves in a "pattern matching" operation unless they are "escaped". A character (including the backslash character itself) is "escaped" by placing a backslash (\) in front of it. Here's a brief indication of what each metacharacter, or each metacharacter pair, is used for ...
\ For escaping other characters or itself
| Alternation (the "or" character)
( ) For enclosure, just to achieve clarity, but also for "capturing" subexpressions
(?: ) For enclosing a non-capturing group (?: turns off capturing and backreferences, for efficiency and to preserve space for other captures, for example) Think of it this way: The ? says "give this group a different meaning", while the : says that the meaning is that "the group is non-capturing".
(?= ) For enclosing a positive lookahead assertion
(?! ) For enclosing a negative lookahead assertion
(?<= ) For enclosing a positive lookbehind assertion (not widely supported and, in particular, not in JavaScript, and often only for simple expressions, such as those of fixed length, when it is supported)
(?<! ) For enclosing a negative lookbehind assertion (not widely supported and, in particular, not in JavaScript, and often only for simple expressions, such as those of fixed length, when it is supported)
[ ] For delimiting a character class
{ } For delimiting a numerical range
^ and $ For marking the beginning (^) or end ($) of a string/line
\A and \Z Also for marking the beginning (\A) and end \Z of a string, but never a line (and much less widely supported than ^ and $)
^ For negating a character class
* + ? For repetition: 0 or more (*), one or more (+), and 0 or 1 (?)
? Makes *?, +?, ?? and {min,max}? "lazy" instead of "greedy" (the default for those quantifiers without the (second) ?)
. For any character except the newline character
- For indicating a range in a character class
Ordinary Characters (which match themselves): The letters (both uppercase and lowercase), and the digits:
A B C ... Z a b c ... z 0 1 2 ... 9
The non-metacharacter punctuation characters:
! " # % & ' , - / : ; < = > @ _ ` ~
The blank space character
Character Classes (and their "negations"): [abcd] Any one of the lowercase letters a, b, c or d
[^abcd] Any character except one of the lowercase letters a, b, c or d
[246] Any one of the digits 2, 4 or 6
[^246] Any character except one of the digits 2, 4 or 6
[a-z] Any lowercase character
[^a-z] Any character except a lowercase character
[3-7] Any digit from 3 to 7 inclusive
[^3-7] Any character that is not a digit from the range 3 to 7 inclusive
Note that in character classes the dash (-) is effectively a metacharacter unless it appears as the first or last character in the class (when it cannot be indicating a range), and the caret (^) is a meta character unless it does not appear as the first character. And be reminded (again) that predefined character classes like those shown in the following section are eligible to be placed within a square-bracketed character class.
Predefined Character Classes: \d A digit (same as [0-9])
\D Not a digit (same as [^0-9])
\w A letter, digit or underscore (same as [a-zA-Z0-9_])
\W Not a letter, digit or underscore (same as [^a-zA-Z0-9_])
\s A whitespace character (same as [ \t\r\n])
\S Not a whitespace character (same as [^ \t\r\n])
POSIX Bracket Expressions: Note: POSIX = Portable Operating System Interface for Unix
[:alpha:] Same as [a-zA-Z]
[:digit:] Same as [0-9]
[:alnum:] Same as [a-zA-Z0-9]
[:lower:] Same as [a-z]
[:upper:] Same as [A-Z]
[:xdigit:] A hexadecimal character (same as [a-fA-F0-9])
[:punct:] A printable character that is not a space, digit or letter
[:space:] Same as \s
[:blank:] Same as a space or a tab
[:print:] A printable character, including whitespace characters
[:graph:] A printable non-whitespace character
[:cntrl:] A (non-printable) control character
Alternations: a|b|c
Quantifiers (appended to a pattern): * (zero or more)
+ (one or more)
? (zero or one)
{n} (exactly n)
{m,n} (any number from m to n, inclusive, assuming m<n)
{n,} (at least n) (Example: \d{1,} is same as \d+)
{0,n} (at most n) (Example: \d{0,} is same as \d*)
Anchors: ^pattern matches only at the beginning of the string
pattern$ matches only at the end of the string
Note that if ^ is anywhere but at the beginning of the pattern, or if $ is anywhere but at the end of the pattern, then these two characters are just "ordinary" characters that match themselves.
The named "boundary" patterns (\b and \B): \b matches the boundary (i.e., the position, not a character) between a word character (\w) and a non-word character (\W), while \B matches a "non-boundary".
Pattern Modifiers: In general, any regular expression engine will provide some way to modify patterns in various ways, such as finding all matches or only the first match, or ignoring case during a match or search. However, this is one of the things that may differ quite radically as you move from one programming language to another.
Parentheses: Placing parentheses around part of a regular expressions, as in (pattern), does not change whether pattern is matched or not, but it causes the match (if it occurs) to be "remembered". Then, later on, \1, \2, \3, ... can be used for access to the first, second, third, ... such "remembered" matches within the regular expression itself. Some languages (Perl, and now also C++11, for example) use $1, $2, $3, ... to contain these remembered matches for later access outside the regular expression. However, these variables may also be used to contain other values, so if you want to use them it is important to be aware of what language you're using and the context in which you are using them, so that you know exactly what they contain on any particular occasion.

Exercises for Regular Expression Familiarization via The JavaScript Tool regexpal

Familiarity with regular expressions can only come with practice, and getting really comfortable can take a great deal of practice. Fortunately there are some tools to help us. The easiest way to try the following exercises is to use a program like regexpal, where you can try examples like those given in the table below. When trying examples like these, as others you may have made up, it's usually a good idea to enter the data first, then the regular expression. This gives you a chance (often) to watch as various things are matched "along the way" until you have entered the full regular expression that you wish to test. This in itself can sometimes reveal subtleties or be a good learning experience in other ways.

Ordinary Characters, the Period Metacharacter, and Escape Characters
regex	data string	Notes
`car`	`car carnival Carnival`	Try with Global on and off.
`zz`	`pizzazz`	Try with Global on and off.
`cat`	`The cow, camel and cat communicated.`	Try with Global on and off. The data string is all on one line.
`h.t`	`hot hat hit heat hate hzt h t h#t h:t h.t`	The data string is all on one line. Note regexpal highlighting of the period (`.`) in the regex.
`.a.a.a`	`banana papaya #a$a@a abacab`	Note the last match in particular.
`a.a.a.`	`banana papaya #a$a@a abacab`	Compare last match with preceding ones.
`9.00`	`9.00 9500 9-00`
`9\.00`	`9.00 9500 9-00`
`h.._export.txt`	`his_export.txt her_export.txt`
`h.._export\.txt`	`his_export.txt her_export.txt`
`resume..txt`	`resume1.txt resume2.txt resume3_txt.zip`	The data string is all on one line.
`resume.\.txt`	`resume1.txt resume2.txt resume3_txt.zip`	The data string is all on one line.
`a\tb`	`a b`	There's a TAB between a and b.
`a\nb`	`a b`	The data string is on two lines.
`c\nd`	`abc def`	The data string is on two lines.
Character Classes, Negative Character Classes, and Predefined Character Classes
`[aeiou]`	`Bananas Peaches Apples`
`gr[ea]y`	`gray grey`
`gr[ea]t`	`great`
`gr[ea][ea]t`	`great graet greet graat`
`[abcdefghijklmnopqrstuvwxyz]`	`Now we know how to make negative character sets.`	Type the regex in one character at time, then put a `^` at the start. The data string is all on one line.
`[^aeiou]`	`It seems I see the sea I seek.`	Try with Global on and off.
`see[^mn]`	`It seems I see the sea I seek.`
`h[abc.xyz]t`	`hat hot h.t`	The period is not a metacharacter in this regex.
`var[[(][0-9][\])]`	`var(3) var[4]`	Also try `[])]`, `[)]]` and [)] as last part of regex.
`file[0-\_]1`	`file01 file-1 file\1 file_1`	Thinks `-` indicates a range.
`file[0\-\_]1`	`file01 file-1 file\1 file_1`	Now thinks `\` is escaping the underscore (`_`).
`file[0\-\\_]1`	`file01 file-1 file\1 file_1`	Now gets all four.
`\d\d\d\d`	`1984 text`
`\w\w\w\w`	`1984 text 1_5W`
`[\w\-]`	`blue-green paint`
`[\d\s]`	`123 456 789 abc`
`[^\d\s]`	`123 456 789 abc`
`[\D\S]`	`123 456 789 abc`
Repetition Expressions, Greediness and Laziness
`apples*`	`apple apples applessssss`	Also try `apples+` and `apples?` for the regex.
`\d\d\d\d*`	`1234567890 1234 123 12`	Also try `\d\d\d+` for the regex.
`colou?r`	`color colour`
`[a-z]+\d[a-z]*`	`abc9xyz`	Also try `9xyz` and `abc9` for the string.
`\w+s`	`We picked apples.`	Also try `We picked applessssss.` for the string.
`\w+_\d{2,4}-\d{2}`	`report_1997-04 budget_03-04 memo_712539-100`
`\d+\w+\d+`	`01_FY_07_report_99.xls`	Illustrates regex "greediness".
`\d+\w+?\d+`	`01_FY_07_report_99.xls`	Illustrates regex "laziness".
`".+", ".+"`	`"IBM", "Samsung", "Apple, Inc."`	Illustrates regex "greediness".
`".+?", ".+?"`	`"IBM", "Samsung", "Apple, Inc."`	Illustrates regex "laziness".
Grouping and Alternation Metacharacters
`abc+`	`abcccc`
`(abc)+`	`abcabcabc`
`(in)?dependent`	`independent dependent`
`run(s)?`	`I run fast. He runs faster.`	Same as `runs?` but clearer.
`[A-Z][0-9]`	`A1B2C3D4E5F6G7H8I9J0`	Also try `([A-Z][0-9])`, `([A-Z][0-9])+` and `([A-Z][0-9]){3}`.
`apple\|orange`	`apple orange appleorange apple\|orange`	Also try `apple\\|orange`.
`abc\|def\|ghi\|jkl`	`abcdefghijklmnopqrstuvwxyz`	Try with Global on and off.
`applejuice\|sauce`	`applejuice applesauce`	Try with Global on and off.
`apple(juice\|sauce)`	`applejuice applesauce`
`(peanut\|peanutbutter)`	`peanutbutter`	Illustrates regex "eagerness".
`peanut(butter)?`	`peanutbutter`	Illustrates regex "greediness".
`(\w+\|FY\d{4}_report\.xls)`	`FY2003_report.xls`
`xyz\|abc\|def\|ghi\|jkl`	`abcdefghijklmnopqrstuvwxyz`	Shows eagerness; turn off Global.
`(AA\|BB\|CC){6}`	`AABBAACCBB`
`(\d\d\|[A-z][A-Z]){3}`	`112233`	Also try `AABBCC`, `AA66ZZ`, `11AA44`.
Anchored Expressions: Start and End Anchors
`[A-Z]`	`Mr. Smith went to Washington.`	Also try `^[A-Z]`, `\.`, `\.$` as the regex.
`^[A-Z][A-Za-z\-. ]+\.$`	`Mr. Smith went to Washington.`	Also take out the period in the regex.
`^\w+@\w+\.[a-z]{3}$`	`me@here.com, you@there.com`	Try with and without either or both anchors.
Anchored Expressions: Single-Line and Multiline Modes
`[a-z]+`	`milk apple juice sweet peas yogurt sweet corn apple sauce milkshake sweet potatoes`	Try `^` at the beginning, and `$` at the end with and without a newline following the last item, and also with the "Multiline anchors" option (in `regexpal`) on and off.
Anchored Expressions: Word Boundaries
`\b\w+\b`	`This is a test.`
`\b\w+\b`	`abc_123`
`\b\w+\b`	`top-notch`
`\BThis`	`This is a test.`
`\B\w+\B`	`This is a test.`
`apples\band\boranges`	`apples and oranges`
`apples\b \band\b \boranges`	`apples and oranges`
`\b[\w']+\b`	`Shall I compare thee to a summer's day?`	Another example of greediness (look carefully at the matches in `regexpal`).
`\b[\w']+?\b`	`Shall I compare thee to a summer's day?`	Another example of laziness (look carefully at the matches in `regexpal`).
Capture Groups and Backreferences
`(apples) to \1`	`apples to apples`
`(ab)(cd)(ef)\3\2\1`	`abcdefefcdab`
`<(i\|em)>.+?</\1>`	`<i>Hello</i>`
`<(i\|em)>.+?</\1>`	`<em>Hello</em>`
`<(i\|em)>.+?</\1>`	`<i>italics</i> <em>emphassis</em> <i>bad</em> <em>bad</i>`	Data should all be on one line. Also make the regex greedy by removing the `?`.
`\b([A-Z][a-z]+)\b\s\b\1son\b`	`Steve Smith, John Johnson, Eric Erikson, Evan Evanson`	Finding names of people whose first name is repeated in their last name.
`\b(\w+)\s+\1\b`	`Paris in the the spring.`	Finding repeated words.
Backreferences with Optional Groups
`(A?)B`	`AB`	Matches `AB` and captures `A`.
`(A?)B`	`B`	Matches `B` but captures nothing, so captures occur on zero-width matches.
`(A?)B\1`	`ABA`
`(A?)B\1`	`B`	Matches B, so backreferences become zero-width as well.
`(A?)B\1C`	`ABAC`
`(A?)B\1C`	`BC`	Matches BC, so backreferences become zero-width as well.
Non-Capturing Group Expressions
`(oranges) and (apples) to oranges`	`oranges and apples to oranges`	These do capture.
`(oranges) and (apples) to \1`	`oranges and apples to oranges`	The backreference does match.
`(oranges) and (apples) to \2`	`oranges and apples to oranges`	The backreference does not match.
`(oranges) and (apples) to \2`	`oranges and apples to oranges oranges and apples to apples`	Now the backreference matches the second line.
`(?:oranges) and (apples) to \1`	`oranges and apples to oranges oranges and apples to apples`	Here the backreference `\1` also matches the `second` line because the first group is not captured.
`(?:oranges) and (apples) to \2`	`oranges and apples to oranges oranges and apples to apples`	Now nothing matches because nothing has been saved in `\2`.
Lookaround Assertions (Lookahead and Lookbehind)
`(?=seashore)sea`	`seashore seaside`	A positive lookahead assertion.
`sea(?=shore)`	`seashore seaside`	An equivalent positive lookahead assertion.
`sea(?:shore)`	`seashore seaside`	A non-capturing expression ... be sure not to confuse what gets matched, what gets captured, and what is simply "asserted".
`\b[A-Za-z']+\b,`	`Verily, verily, I say onto you, give, take, and then, if you like, give back`	Match all words that are followed immediately by a comma and also include the comma in the match.
`\b[A-Za-z']+\b(?=,)`	`Verily, verily, I say onto you, give, take, and then, if you like, give back`	Now the lookahead assertion says the comma has to be there, but this time it is not matched.
`\b[A-Za-z']+\b(?:,)`	`Verily, verily, I say onto you, give, take, and then, if you like, give back`	Once again, the comma is matched, this time in a non-capturing group.
`\d{3}-\d{3}-\d{4}`	`555-302-4321 555-781-6978`	Both are matched.
`^[0-5\-]+$`	`555-302-4321`	This matches.
`^[0-5\-]+$`	`23140-5`	This also matches.
`(?=^[0-5\-]+$)\d{3}-\d{3}-\d{4}`	`555-302-4321`	This matches. (The regex is all on one line.)
`(?=^[0-5\-]+$)\d{3}-\d{3}-\d{4}`	`555-781-6978`	This does not match. (The regex is all on one line.)
`(?=^[0-5\-]+$)\d{3}-\d{3}-\d{4}`	`555-302-4321 555-781-6978 555-245-1321`	Put `Multiline anchors` on. (The regex is all on one line.)
`(?=^[0-5\-]+$)(?=.*4321)\d{3}-\d{3}-\d{4}`	`555-302-4321 555-781-6978 555-245-1321`	Put `Multiline anchors` on. (The regex is all on one line.)
`\b(?=\w*ou)[A-Za-z']+\b(?=,)`	`Verily, verily, I say onto you, give, take, and then, if you like, give back`
`(?=.*\d).{8,15}`	`base355ball`	Match a password that contains 8 to 15 characters and at least one digit.
`(?=.\d)(?=.[A-Z]).{8,15}`	`base355Ball`	Match a password that contains 8 to 15 characters and at least one digit and one capital letter.
`(?!seashore)sea`	`seashore seaside`	A negative lookahead assertion.
`sea(?!shore)`	`seashore seaside`	An equivalent negative lookahead assertion.
`online(?! training)`	`online training and online courses`	The data string is all on one line.
`online(?!.*training)`	`online video training and online courses and online videos`	Finds `online` as long as it's not followed by `training`, even if there are other words in between the two. (The data string is all on one line.)
`\bblack\b(?! dog)`	`The black dog followed the black car into the black night.`	The data string is all on one line.
`\bblack\b(?= dog)`	`The black dog followed the black car into the black night.`	Compare this one with the previous one. (The data string is all on one line.)
`(?=^[0-5\-]+$)(?!.*4321)\d{3}-\d{3}-\d{4}`	`555-302-4321 555-781-6978 555-245-1321`	Put `Multiline anchors` on. (The regex is all on one line.) Here we're using a combination of a positive lookahead and a negative lookahead.
`\b[A-Za-z']+\b(?![.,])`	`Verily, verily, I say onto you, give, take. And then, if you like, give back.`	Find all words not followed by a period or a comma.
`\bblack\b(?!.*\bblack\b)`	`The black dog followed the black car into the black night.`	Finds the last occurrence of the word `black`. (The data string is all on one line.)
`(\bblack\b)(?!.*\1)`	`The black dog followed the black car into the black night.`	Also finds the last occurrence of the word `black`. (The data string is all on one line.)
`(?<=baseball)ball`	`baseball football`	A positive lookbehind assertion. (Not supported in JavaScript.)
`ball(?<=baseball)`	`baseball football`	An equivalent positive lookbehind assertion. (Not supported in JavaScript.)

Regular Expressions in C++

As of C++11, regular expressions are a standard part of the language. The default regular expression grammar is that of ECMAScript, which is the most powerful, but there are five other alternative grammars, should the need arise to use one of them: basic, extended, awk, grep and egrep. For the most part, you can simply use the default, which requires no extra effort on your part.

As with regular expressions in any programming language, if you want to use them you are probably wanting to find strings that "match" a regular expression in their entirety, to search for substrings that match a regular expression, and/or to replace all or part of a string with some other string. C++ 11 provides functions that perform these actions:

regex_match() tries to match the entire string and returns true or false
regex_search() tries to find a matching substring and returns true or false
regex_replace() tries to replace a matching substring and returns the revised string

Some sample programs that you can find here illustrate some of the ways you can now work with regular expresssions in C++11:

The first sample program, learn_regex1.cpp, illustrates matching, searching and replacing in the simplest possible form of each, when both the regex object and the string object on which it acts consist of just simple "ordinary" characters.
The second sample program, learn_regex2.cpp, shows how we can use an smatch object to get access to more details about whatever match or matches occur during a call to regex_match() or regex_search().
The third sample program, learn_regex3.cpp, uses a regex object that contains something more that just ordinary characters, as well as (again) an smatch object to get access to more details about whatever match or matches occur during a call to regex_match() or regex_search().

In the testers subdirectory under the link mentioned above you will find a number of other sample programs that you should study and use for experimentation.