FileDocCategorySizeDatePackage
Pattern.javaAPI DocAndroid 1.5 API15799Wed May 06 22:41:04 BST 2009java.util.regex

Pattern

public final class Pattern extends Object implements Serializable
Represents a pattern used for matching, searching, or replacing strings. {@code Pattern}s are specified in terms of regular expressions and compiled using an instance of this class. They are then used in conjunction with a {@link Matcher} to perform the actual search.

A typical use case looks like this:

Pattern p = Pattern.compile("Hello, A[a-z]*!");

Matcher m = p.matcher("Hello, Android!");
boolean b1 = m.matches(); // true

m.setInput("Hello, Robot!");
boolean b2 = m.matches(); // false

The above code could also be written in a more compact fashion, though this variant is less efficient, since {@code Pattern} and {@code Matcher} objects are created on the fly instead of being reused. fashion:

boolean b1 = Pattern.matches("Hello, A[a-z]*!", "Hello, Android!"); // true
boolean b2 = Pattern.matches("Hello, A[a-z]*!", "Hello, Robot!"); // false

Please consult the package documentation for an overview of the regular expression syntax used in this class as well as Android-specific implementation details.

see
Matcher
since
Android 1.0

Fields Summary
private static final long
serialVersionUID
public static final int
UNIX_LINES
This constant specifies that a pattern matches Unix line endings ('\n') only against the '.', '^', and '$' meta characters.
public static final int
CASE_INSENSITIVE
This constant specifies that a {@code Pattern} is matched case-insensitively. That is, the patterns "a+" and "A+" would both match the string "aAaAaA".

Note: For Android, the {@code CASE_INSENSITIVE} constant (currently) always includes the meaning of the {@link #UNICODE_CASE} constant. So if case insensitivity is enabled, this automatically extends to all Unicode characters. The {@code UNICODE_CASE} constant itself has no special consequences.

public static final int
COMMENTS
This constant specifies that a {@code Pattern} may contain whitespace or comments. Otherwise comments and whitespace are taken as literal characters.
public static final int
MULTILINE
This constant specifies that the meta characters '^' and '$' match only the beginning and end end of an input line, respectively. Normally, they match the beginning and the end of the complete input.
public static final int
LITERAL
This constant specifies that the whole {@code Pattern} is to be taken literally, that is, all meta characters lose their meanings.
public static final int
DOTALL
This constant specifies that the '.' meta character matches arbitrary characters, including line endings, which is normally not the case.
public static final int
UNICODE_CASE
This constant specifies that a {@code Pattern} is matched case-insensitively with regard to all Unicode characters. It is used in conjunction with the {@link #CASE_INSENSITIVE} constant to extend its meaning to all Unicode characters.

Note: For Android, the {@code CASE_INSENSITIVE} constant (currently) always includes the meaning of the {@code UNICODE_CASE} constant. So if case insensitivity is enabled, this automatically extends to all Unicode characters. The {@code UNICODE_CASE} constant then has no special consequences.

public static final int
CANON_EQ
This constant specifies that a character in a {@code Pattern} and a character in the input string only match if they are canonically equivalent. It is (currently) not supported in Android.
private String
pattern
Holds the regular expression.
private int
flags
Holds the flags used when compiling this pattern.
transient int
mNativePattern
Holds a handle (a pointer, actually) for the native ICU pattern.
transient int
mGroupCount
Holds the number of groups in the pattern.
Constructors Summary
private Pattern(String pattern, int flags)
Creates a new {@code Pattern} instance from a given regular expression and flags.

param
pattern the regular expression.
param
flags the flags to set. Any combination of the constants defined in this class is valid.
throws
PatternSyntaxException if the regular expression is syntactically incorrect.

        if ((flags & CANON_EQ) != 0) {
            throw new UnsupportedOperationException("CANON_EQ flag not supported");
        }
        
        this.pattern = pattern;
        this.flags = flags;
        
        compileImpl(pattern, flags);
    
Methods Summary
public static java.util.regex.Patterncompile(java.lang.String pattern)
Compiles a regular expression, creating a new Pattern instance in the process. This is actually a convenience method that calls {@link #compile(String, int)} with a {@code flags} value of zero.

param
pattern the regular expression.
return
the new {@code Pattern} instance.
throws
PatternSyntaxException if the regular expression is syntactically incorrect.
since
Android 1.0

    
                                                                                         
           
        return new Pattern(pattern, 0);
    
public static java.util.regex.Patterncompile(java.lang.String pattern, int flags)
Compiles a regular expression, creating a new {@code Pattern} instance in the process. Allows to set some flags that modify the behavior of the {@code Pattern}.

param
pattern the regular expression.
param
flags the flags to set. Basically, any combination of the constants defined in this class is valid.

Note: Currently, the {@link #CASE_INSENSITIVE} and {@link #UNICODE_CASE} constants have slightly special behavior in Android, and the {@link #CANON_EQ} constant is not supported at all.

return
the new {@code Pattern} instance.
throws
PatternSyntaxException if the regular expression is syntactically incorrect.
see
#CANON_EQ
see
#CASE_INSENSITIVE
see
#COMMENTS
see
#DOTALL
see
#LITERAL
see
#MULTILINE
see
#UNICODE_CASE
see
#UNIX_LINES
since
Android 1.0

        return new Pattern(pattern, flags);
    
private voidcompileImpl(java.lang.String pattern, int flags)
Compiles the given regular expression using the given flags. Used internally only.

param
pattern the regular expression.
param
flags the flags.

        if (pattern == null) {
            throw new NullPointerException();
        }
        
        if ((flags & LITERAL) != 0) {
            pattern = quote(pattern);
        }
        
        // These are the flags natively supported by ICU.
        // They even have the same value in native code.
        flags = flags & (CASE_INSENSITIVE | COMMENTS | MULTILINE | DOTALL | UNIX_LINES);
        
        mNativePattern = NativeRegEx.open(pattern, flags);
        mGroupCount = NativeRegEx.groupCount(mNativePattern);
    
protected voidfinalize()

        try {
            if (mNativePattern != 0) {
                NativeRegEx.close(mNativePattern);
            }
        }
        finally {
            super.finalize();
        }
    
public intflags()
Returns the flags that have been set for this {@code Pattern}.

return
the flags that have been set. A combination of the constants defined in this class.
see
#CANON_EQ
see
#CASE_INSENSITIVE
see
#COMMENTS
see
#DOTALL
see
#LITERAL
see
#MULTILINE
see
#UNICODE_CASE
see
#UNIX_LINES
since
Android 1.0

        return flags;
    
public java.util.regex.Matchermatcher(java.lang.CharSequence input)
Returns a {@link Matcher} for the {@code Pattern} and a given input. The {@code Matcher} can be used to match the {@code Pattern} against the whole input, find occurrences of the {@code Pattern} in the input, or replace parts of the input.

param
input the input to process.
return
the resulting {@code Matcher}.
since
Android 1.0

        return new Matcher(this, input);
    
public static booleanmatches(java.lang.String regex, java.lang.CharSequence input)
Tries to match a given regular expression against a given input. This is actually nothing but a convenience method that compiles the regular expression into a {@code Pattern}, builds a {@link Matcher} for it, and then does the match. If the same regular expression is used for multiple operations, it is recommended to compile it into a {@code Pattern} explicitly and request a reusable {@code Matcher}.

param
regex the regular expression.
param
input the input to process.
return
true if and only if the {@code Pattern} matches the input.
see
Pattern#compile(java.lang.String, int)
see
Matcher#matches()
since
Android 1.0

        return new Matcher(new Pattern(regex, 0), input).matches();
    
public java.lang.Stringpattern()
Returns the regular expression that was compiled into this {@code Pattern}.

return
the regular expression.
since
Android 1.0

        return pattern;
    
public static java.lang.Stringquote(java.lang.String s)
Quotes a given string using "\Q" and "\E", so that all other meta-characters lose their special meaning. If the string is used for a {@code Pattern} afterwards, it can only be matched literally.

param
s the string to quote.
return
the quoted string.
since
Android 1.0

        StringBuffer sb = new StringBuffer().append("\\Q");
        int apos = 0;
        int k;
        while ((k = s.indexOf("\\E", apos)) >= 0) {
            sb.append(s.substring(apos, k + 2)).append("\\\\E\\Q");
            apos = k + 2;
        }

        return sb.append(s.substring(apos)).append("\\E").toString();
    
private voidreadObject(java.io.ObjectInputStream s)
Provides serialization support

        s.defaultReadObject();

        compileImpl(pattern, flags);
    
public java.lang.String[]split(java.lang.CharSequence inputSeq, int limit)
Splits the given input sequence around occurrences of the {@code Pattern}. The function first determines all occurrences of the {@code Pattern} inside the input sequence. It then builds an array of the "remaining" strings before, in-between, and after these occurrences. An additional parameter determines the maximal number of entries in the resulting array and the handling of trailing empty strings.

param
inputSeq the input sequence.
param
limit Determines the maximal number of entries in the resulting array.
  • For n > 0, it is guaranteed that the resulting array contains at most n entries.
  • For n < 0, the length of the resulting array is exactly the number of occurrences of the {@code Pattern} +1. All entries are included.
  • For n == 0, the length of the resulting array is at most the number of occurrences of the {@code Pattern} +1. Empty strings at the end of the array are not included.
return
the resulting array.
since
Android 1.0

        int maxLength = limit <= 0 ? Integer.MAX_VALUE : limit;

        String input = inputSeq.toString();
        ArrayList<String> list = new ArrayList<String>();

        Matcher matcher = new Matcher(this, inputSeq);
        int savedPos = 0;
        
        // Add text preceding each occurrence, if enough space. Only do this for
        // non-empty input sequences, because otherwise we'd add the "trailing
        // empty string" twice.
        if (inputSeq.length() != 0) {
            while(matcher.find() && list.size() + 1 < maxLength) {
                list.add(input.substring(savedPos, matcher.start()));
                savedPos = matcher.end();
            }
        }
        
        // Add trailing text if enough space.
        if (list.size() < maxLength) {
            if (savedPos < input.length()) {
                list.add(input.substring(savedPos));
            } else {
                list.add("");
            }
        }
        
        // Remove trailing spaces, if limit == 0 is requested.
        if (limit == 0) {
            int i = list.size() - 1;
            // Don't remove 1st element, since array must not be empty.
            while(i > 0 && "".equals(list.get(i))) {
                list.remove(i);
                i--;
            }
        }
        
        return list.toArray(new String[list.size()]);
    
public java.lang.String[]split(java.lang.CharSequence input)
Splits a given input around occurrences of a regular expression. This is a convenience method that is equivalent to calling the method {@link #split(java.lang.CharSequence, int)} with a limit of 0.

param
input the input sequence.
return
the resulting array.
since
Android 1.0

        return split(input, 0);
    
public java.lang.StringtoString()

        return pattern;