Regular Expressions

A refresher concerning the syntax and rules of constructing regular expressions.

See also Configuring an incremental index of a staged website.

Syntax of regular expressions

Text

Any single character

[chars]

Character class: One of chars

[^chars]

Character class: None of chars

text1|text2

Alternative: text1 or text2

Quantifiers

?

0 or 1 of the preceding text

*

0 or N of the preceding text (N > 1)

+

1 or N of the preceding text (N > 1)

Grouping

(text)

Grouping of text, either to set the borders of an alternative or to make back references where the Nth group is used on the RHS of a RewriteRule with $N)

Anchors

^

Start of line anchor.

$

End of line anchor.

Escaping

\char

Escape the particular char. For example, to specify the chars ".[]()" and so forth.

Rules about regular expressions

  • An ordinary character—not one of the special characters described below—is a one-character regular expression that matches itself.

  • A backslash () followed by any special character is a one-character regular expression that matches the special character itself. Special characters include the following:

    • . (period), * (asterisk), ? (question mark), + (plus sign), [ (left square bracket), | (vertical pipe), and \ (backslash) are always special characters, except when they appear within square brackets.

    • ^ (caret or circumflex) is special at the beginning of a regular expression, or when it immediately follows the left of a pair of square brackets.

    • $ (dollar sign) is special at the end of a regular expression.

    • . (period) is a one-character regular expression that matches any character, including supplementary code set characters with the exception of new-line.

    • A non-empty string of characters enclosed in [ ] (left and right square brackets) is a one-character regular expression that matches one character, including supplementary code set characters, in that string.

      If, however, the first character of the string is a ^ (circumflex), the one-character regular expression matches any character, including supplementary code set characters, with the exception of new-line and the remaining characters in the string.

      The ^ has this special meaning only if it occurs first in the string. You can use - (minus sign) to indicate a range of consecutive characters, including supplementary code set characters. For example, [0-9] is equivalent to [0123456789].

      Characters specifying the range must be from the same code set. When the characters are from different code sets, one of the characters specifying the range is matched. The - loses this special meaning if it occurs first (after an initial ^, if any) or last in the string. The ] (right square bracket) does not terminate such a string when it is the first character within it, after an initial ^, if any. For example, []a-f] matches either a ] (right square bracket) or one of the ASCII letters a through f inclusive. The four characters listed as special characters above stand for themselves within such a string of characters.

Rules for constructing regular expressions from one-character regular expressions

You can use the following rules to construct regular expressions from one-character regular expressions:

  • A one-character regular expression is a regular expression that matches whatever the one-character regular expression matches.
  • A one-character regular expression followed by a * (asterisk) is a regular expression that matches zero or more occurrences of the one-character regular expression, which may be a supplementary code set character. If there is any choice, the longest leftmost string that permits a match is chosen.
  • A one-character regular expression followed by a ? (question mark) is a regular expression that matches zero or one occurrences of the one-character regular expression, which may be a supplementary code set character. If there is any choice, the longest leftmost string that permits a match is chosen.
  • A one-character regular expression followed by a + (plus sign) is a regular expression that matches one or more occurrences of the one-character regular expression, which may be a supplementary code set character. If there is any choice, the longest leftmost string that permits a match is chosen.
  • A one-character regular expression followed by {m}, {m,}, or {m,n} is a regular expression that matches a range of occurrences of the one-character regular expression. The values of m and n must be non-negative integers less than 256; {m} matches exactly m occurrences; {m,} matches at least m occurrences; {m,n} matches any number of occurrences between m and n inclusive. Whenever a choice exists, the regular expression matches as many occurrences as possible.
  • The concatenation of regular expressions is a regular expression that matches the concatenation of the strings matched by each component of the regular expression.
  • A regular expression enclosed between the character sequences ( and ) is a regular expression that matches whatever the unadorned regular expression matches.
  • A regular expression followed by a | (vertical pipe) followed by a regular expression is a regular expression that matches either the first regular expression (before the vertical pipe) or the second regular expression (after the vertical pipe).

You can also constrain a regular expression to match only an initial segment or final segment of a line, or both.

  • A ^ (circumflex) at the beginning of a regular expression constrains that regular expression to match an initial segment of a line.
  • A $ (dollar sign) at the end of an entire regular expression constrains that regular expression to match a final segment of a line.
  • The construction ^regular expression$ constrains the regular expression to match the entire line.

There are some predefined character class names that you can use in place of complex bracketed regular expressions. For example, a digit can be represented by the one-character regular expression [0-9] or by the character class one-character regular expression [[:digit:]].

The predefined character classes and their meanings are the following:

Character class

Meaning

[[:alnum:]]

An alphabetic character or a digit.

[[:alpha:]]

An alphabetic character.

[[:blank:]]

A space or a tab.

[[:cntrl:]]

A control code; non-printing character.

[[:digit:]]

A digit.

[[:graph:]]

Any printing character except space.

[[:lower:]]

A lower-case alphabetic character.

[[:print:]]

Any printing character including space.

[[:punct:]]

Punctuation.

[[:space:]]

White space such as a space, a tab, or an end-of-line.

[[:upper:]]

An upper-case alphabetic character.

[[:xdigit:]]

A hexadecimal digit, upper- or lower-case.

Two special character class names match the null space at the start and the end of a word. In other words, they do not match an actual character. A word is considered to be any sequence of alphabetic characters, digits, or underscores (_).

Character class

Meaning

[[:<:]]

start of a word

[[:>:]]

end of a word

On this page

Adobe Summit Banner

A virtual event April 27-28.

Expand your skills and get inspired.

Register for free
Adobe Summit Banner

A virtual event April 27-28.

Expand your skills and get inspired.

Register for free
Adobe Maker Awards Banner

Time to shine!

Apply now for the 2021 Adobe Experience Maker Awards.

Apply now
Adobe Maker Awards Banner

Time to shine!

Apply now for the 2021 Adobe Experience Maker Awards.

Apply now