If you want to perform only a single match operation with a regular
expression, and don't need to use any of the flags,
you don't have to create a
Pattern object: simply pass the string
representation of the pattern and the CharSequence
to be matched to the static matches( ) method: the
method returns TRue if the specified pattern
matches the complete specified text, or returns
false otherwise.
Pattern defines the following flags that control various aspects of
how regular expression matching is performed. The flags are the
following:
Table 16-3. Java regular expression quick reference
Syntax
|
Matches
|
---|
Single characters
|
x
|
The character x, as long as
x is not a punctuation character with
special meaning in the regular expression syntax.
|
\p
|
The punctuation character p.
|
\\
|
The backslash character
|
\n
|
Newline character \u000A.
|
\t
|
Tab character \u0009.
|
\r
|
Carriage return character \u000D.
|
\f
|
Form feed character \u000C.
|
\e
|
Escape character \u001B.
|
\a
|
Bell (alert) character \u0007.
|
\uxxxx
|
Unicode character with hexadecimal code
xxxx.
|
\xxx
|
Character with hexadecimal code xx.
|
\0n
|
Character with octal code n.
|
\0nn
|
Character with octal code nn.
|
\0nnn
|
Character with octal code nnn, where
nnn <= 377.
|
\cx
|
The control character
^x.
|
Character classes
|
[...]
|
One of the characters between the brackets. Characters may be
specified literally, and the syntax also allows the specification of
character ranges, with intersection, union, and subtraction
operators. See specific examples below.
|
[^...]
|
Any one character not between the brackets.
|
[a-z0-9]
|
Character range: a character between (inclusive) a
and z or 0 and
9.
|
[0-9[a-fA-F]]
|
Union of classes: same as [0-9a-fA-F]
|
[a-z&&[aeiou]]
|
Intersection of classes: same as [aeiou].
|
[a-z&&[^aeiou]]
|
Subtraction: the characters a through
z except for the vowels.
|
.
|
Any character except a line terminator. If the
DOTALL flag is set, then it matches any character
including line terminators.
|
\d
|
ASCII digit: [0-9].
|
\D
|
Anything but an ASCII digit: [^\d].
|
\s
|
ASCII whitespace: [ \t\n\f\r\x0B]
|
\S
|
Anything but ASCII whitespace: [^\s].
|
\w
|
ASCII word character: [a-zA-Z0-9_].
|
\W
|
Anything but ASCII word characters: [^\w].
|
\p{group}
|
Any character in the named group. See group names below. Many of the
group names are from POSIX, which is why p is used for this character
class.
|
\P{group}
|
Any character not in the named group.
|
\p{Lower}
|
ASCII lowercase letter: [a-z].
|
\p{Upper}
|
ASCII uppercase: [A-Z].
|
\p{ASCII}
|
Any ASCII character: [\x00-\x7f].
|
\p{Alpha}
|
ASCII letter: [a-zA-Z].
|
\p{Digit}
|
ASCII digit: [0-9].
|
\p{XDigit}
|
Hexadecimal digit: [0-9a-fA-F].
|
\p{Alnum}
|
ASCII letter or digit: [\p{Alpha}\p{Digit}].
|
\p{Punct}
|
ASCII punctuation: one of !"#$%& (
)*+,-./:;<=>?@[\]^_ {|}~].
|
\p{Graph}
|
visible ASCII character: [\p{Alnum}\p{Punct}].
|
\p{Print}
|
visible ASCII character: same as \p{Graph}.
|
\p{Blank}
|
ASCII space or tab: [ \t].
|
\p{Space}
|
ASCII whitespace: [ \t\n\f\r\x0b].
|
\p{Cntrl}
|
ASCII control character: [\x00-\x1f\x7f].
|
\p{category}
|
Any character in the named Unicode category. Category names are one
or two letter codes defined by the Unicode standard. One letter codes
include L for letter, N for
number, S for symbol, Z for
separator, and P for punctuation. Two letter codes
represent subcategories, such as Lu for uppercase
letter, Nd for decimal digit,
Sc for currency symbol, Sm for
math symbol, and Zs for space separator. See
java.lang.Character for a set of constants that
correspond to these subcategories; however, note that the full set of
one- and two-letter codes is not documented in this book.
|
\p{block}
|
Any character in the named Unicode block. In Java regular
expressions, block names begin with
"In", followed by mixed-case
capitalization of the Unicode block name, without spaces or
underscores. For example: \p{InOgham} or
\p{InMathematicalOperators}. See
java.lang.Character.UnicodeBlock for a list of
Unicode block names.
|
Sequences, alternatives, groups, and references
|
xy
|
Match x followed by
y.
|
x|y
|
Match x or y.
|
(...)
|
Grouping. Group subexpression within parentheses into a single unit
that can be used with *, +,
?, |, and so on. Also
"capture" the characters that match
this group for use later.
|
(?:...)
|
Grouping only. Group subexpression as with ( ),
but do not capture the text that matched.
|
\n
|
Match the same characters that were matched when capturing group
number n was first matched. Be careful
when n is followed by another digit: the
largest number that is a valid group number will be used.
|
Repetition
|
x?
|
zero or one occurrence of x; i.e.,
x is optional.
|
x*
|
zero or more occurrences of x.
|
x+
|
one or more occurrences of x.
|
x{n}
|
exactly n occurrences of
x.
|
x{n,}
|
n or more occurrences of
x.
|
x{n,m}
|
at least n, and at most
m occurrences of
x.
|
Anchors
|
^
|
The beginning of the input string, or if the
MULTILINE flag is specified, the beginning of the
string or of any new line.
|
$
|
The end of the input string, or if the MULTILINE
flag is specified, the end of the string or of line within the
string.
|
\b
|
A word boundary: a position in the string between a word and a
nonword character.
|
\B
|
A position in the string that is not a word boundary.
|
\A
|
The beginning of the input string. Like ^, but
never matches the beginning of a new line, regardless of what flags
are set.
|
\Z
|
The end of the input string, ignoring any trailing line terminator.
|
\z
|
The end of the input string, including any line terminator.
|
\G
|
The end of the previous match.
|
(?=x)
|
A positive look-ahead assertion. Require that the following
characters match x, but do not include
those characters in the match.
|
(?!x)
|
A negative look-ahead assertion. Require that the following
characters do not match the pattern x.
|
(?<=x)
|
A positive look-behind assertion. Require that the characters
immediately before the position match x,
but do not include those characters in the match.
x must be a pattern with a fixed number of
characters.
|
(?<!x)
|
A negative look-behind assertion. Require that the characters
immediately before the position do not match
x. x must be a
pattern with a fixed number of characters.
|
Miscellaneous
|
(?>x)
|
Match x independently of the rest of the
expression, without considering whether the match causes the rest of
the expression to fail to match. Useful to optimize certain complex
regular expressions. A group of this form does not capture the
matched text.
|
(?onflags-offflags)
|
Don t match anything, but turn on the flags specified by
onflags, and turn off the flags specified
by offflags. These two strings are
combinations in any order of the following letters and correspond to
the following Pattern constants:
i (CASE_INSENSITIVE),
d (UNIX_LINES),
m (MULTILINE),
s (DOTALL),
u (UNICODE_CASE), and
x (COMMENTS). Flag settings
specified in this way take effect at the point that they appear in
the expression and persist until the end of the expression, or until
the end of the parenthesized group of which they are a part, or until
overridden by another flag setting expression.
|
(?onflags-offflags:x)
|
Match x, applying the specified flags to
this subexpression only. This is a noncapturing group, like
(?:...), with the addition of flags.
|
\Q
|
Don't match anything, but quote all subsequent
pattern text until \E. All characters within such
a quoted section are interpreted as literal characters to match, and
none (except \E) have special meanings.
|
\E
|
Don't match anything; terminate a quote started with
\Q.
|
#comment
|
If the COMMENT flag is set, pattern text between a
# and the end of the line is considered a comment
and is ignored.
|