Regular Expressions
Regular expressions were developed originally to define members of one particular class of "formal languages". They can be used to form "patterns" that describe specific strings or categories of strings. Though there are some differences from one language to another, particularly in the built-in functions that are used to perform "pattern matching", the basic ideas underlying regular expressions are the same across all programming languages.
This page assumes you already know something about regular expressions and just want to quickly look up something you can't quite remember. For much more detail, explanation and lots of examples see here.
These metacharacters do not match themselves unless they are
escaped:
\ | ( ) [ ] { } ^ $ * + ? .
These "ordinary" characters match themselves:
A B C ... Z a b c ... z 0 1 2 ... 9
! " # % & ' , - / : ; < = > @ _ ` ~
The period metacharacter (.
) matches any single character
except a newline.
[adps] [^adps]
[246] [^246]
[a-z] [^a-z]
[3-7] [^3-7]
Note that in character classes the dash (-)
is
effectively a metacharacter unless it appears as the first character in
the class (when it cannot be indicating a range), and the caret
(^)
is a meta character unless it does not appear
as the first character.
\d \D \w \W \s \S
a|b|c
The "boundary pattern" \b
matches the boundary (that is,
the position) between a word character (\w
) and a non-word
character (\W
), while \B
matches a
"non-boundary".
*
(zero or more)+
(one or more)?
(zero or one){n}
(exactly n
){m,n}
(any number from m
to
n
, inclusive, assuming m<n
){n,}
(at least n
){0,n}
(at most n
)/pattern/g
causes all instances to be matched, not
just the first/pattern/i
ignores case in the match^pattern
matches ^pattern
only at the
beginning of the stringpattern$
matches ^pattern
only at the end
of the stringNote that if ^
is anywhere but at the beginning of the
pattern, or if $
is anywhere but at the end of the pattern,
then these two characters are just "ordinary" characters that match
themselves.
Placing a pattern in parentheses, as in (pattern)
, does
not change whether pattern
is matched or not, but causes the
match (if it occurs) to be "remembered". Then \1
,
\2
, \3
(and so on, up to \9) can be used for
access to the first, second, third (and so on) such "remembered" matches
within the regular expression itself. This particular aspect of regular
expressions is not consistent across the various programming languages,
so it is important to be aware of what language you're using, and the
context, so that you know exactly what these "variables" contain on any
given occasion.