jregex
Class Pattern

java.lang.Object
  |
  +--jregex.Pattern
All Implemented Interfaces:
REFlags, java.io.Serializable
Direct Known Subclasses:
PathPattern, WildcardPattern

public class Pattern
extends java.lang.Object
implements java.io.Serializable, REFlags

A handle for a precompiled regular expression.
To match a regular expression myExpr against a text myString one should first create a Pattern object:

 Pattern p=new Pattern(myExpr);
 
then obtain a Matcher object:
 Matcher matcher=p.matcher(myText);
 
The latter is an automaton that actually performs a search. It provides the following methods:
  • search for matching substrings : matcher.find() or matcher.findAll();
  • test whether the text matches the whole pattern : matcher.matches();
  • test whether the text matches the beginning of the pattern : matcher.matchesPrefix();
  • search with custom options : matcher.find(int options)

    Flags
    Flags (see REFlags interface) change the meaning of some regular expression elements at compiletime. These flags may be passed both as string(see Pattern(String,String)) and as bitwise OR of:

  • REFlags.IGNORE_CASE - enables case insensitivity
  • REFlags.MULTILINE - forces "^" and "$" to match both at the start and the end of line;
  • REFlags.DOTALL - forces "." to match eols('\r' and '\n' in ASCII);
  • REFlags.IGNORE_SPACES - literal spaces in expression are ignored for better readability;
  • REFlags.UNICODE - the predefined classes('\w','\d',etc) are referenced to Unicode;
  • REFlags.XML_SCHEMA - permits XML Schema regular expressions syntax extentions.

    Multithreading
    Pattern instances are thread-safe, i.e. the same Pattern object may be used by any number of threads simultaniously. On the other hand, the Matcher objects are NOT thread safe, so, given a Pattern instance, each thread must obtain and use its own Matcher.

    See Also:
    REFlags, Matcher, Matcher.setTarget(java.lang.String), Matcher.setTarget(java.lang.String,int,int), Matcher.setTarget(char[],int,int), Matcher.setTarget(java.io.Reader,int), MatchResult, MatchResult.group(int), MatchResult.start(int), MatchResult.end(int), MatchResult.length(int), MatchResult.charAt(int,int), MatchResult.prefix(), MatchResult.suffix(), Serialized Form

    Fields inherited from interface jregex.REFlags
    DEFAULT, DOTALL, IGNORE_CASE, IGNORE_SPACES, MULTILINE, UNICODE, XML_SCHEMA
     
    Constructor Summary
    protected Pattern()
               
      Pattern(java.lang.String regex)
              Compiles an expression with default flags.
      Pattern(java.lang.String regex, int flags)
              Compiles a regular expression using REFlags.
      Pattern(java.lang.String regex, java.lang.String flags)
              Compiles a regular expression using Perl5-style flags.
     
    Method Summary
    protected  void compile(java.lang.String regex, int flags)
               
     int groupCount()
              How many capturing groups this expression includes?
     java.lang.Integer groupId(java.lang.String name)
              Get numeric id for a group name.
     Matcher matcher()
              Returns a targetless matcher.
     Matcher matcher(char[] data, int start, int end)
              Returns a matcher for a specified region.
     Matcher matcher(MatchResult res, int groupId)
              Returns a matcher for a match result (in a performance-friendly way).
     Matcher matcher(MatchResult res, java.lang.String groupName)
              Just as above, yet with symbolic group name.
     Matcher matcher(java.io.Reader text, int length)
              Returns a matcher taking a text stream as target.
     Matcher matcher(java.lang.String s)
              Returns a matcher for a specified string.
     boolean matches(java.lang.String s)
              A shorthand for Pattern.matcher(String).matches().
     Replacer replacer(java.lang.String expr)
              Returns a replacer of a pattern by specified perl-like expression.
     Replacer replacer(Substitution model)
              Returns a replacer will substitute all occurences of a pattern through applying a user-defined substitution model.
     boolean startsWith(java.lang.String s)
              A shorthand for Pattern.matcher(String).matchesPrefix().
     RETokenizer tokenizer(char[] data, int off, int len)
              Tokenizes a specified region by an occurences of the pattern.
     RETokenizer tokenizer(java.io.Reader in, int length)
              Tokenizes a specified region by an occurences of the pattern.
     RETokenizer tokenizer(java.lang.String text)
              Tokenizes a text by an occurences of the pattern.
     java.lang.String toString_d()
              Returns a less or more readable representation of a bytecode for the pattern.
     java.lang.String toString()
               
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
     

    Constructor Detail

    Pattern

    protected Pattern()
               throws PatternSyntaxException

    Pattern

    public Pattern(java.lang.String regex)
            throws PatternSyntaxException
    Compiles an expression with default flags.
    Parameters:
    regex - the Perl5-compatible regular expression string.
    Throws:
    PatternSyntaxException - if the argument doesn't correspond to perl5 regex syntax.
    See Also:
    Pattern(java.lang.String,java.lang.String), Pattern(java.lang.String,int)

    Pattern

    public Pattern(java.lang.String regex,
                   java.lang.String flags)
            throws PatternSyntaxException
    Compiles a regular expression using Perl5-style flags. The flag string should consist of letters 'i','m','s','x','u','X'(the case is significant) and a hyphen. The meaning of letters:
    • i - case insensitivity, corresponds to REFLlags.IGNORE_CASE;
    • m - multiline treatment(BOLs and EOLs affect the '^' and '$'), corresponds to REFLlags.MULTILINE flag;
    • s - single line treatment('.' matches \r's and \n's),corresponds to REFLlags.DOTALL;
    • x - extended whitespace comments (spaces and eols in the expression are ignored), corresponds to REFLlags.IGNORE_SPACES.
    • u - predefined classes are regarded as belonging to Unicode, corresponds to REFLlags.UNICODE; this may yield some performance penalty.
    • X - compatibility with XML Schema, corresponds to REFLlags.XML_SCHEMA.
    Parameters:
    regex - the Perl5-compatible regular expression string.
    flags - the Perl5-compatible flags.
    Throws:
    PatternSyntaxException - if the argument doesn't correspond to perl5 regex syntax. see REFlags

    Pattern

    public Pattern(java.lang.String regex,
                   int flags)
            throws PatternSyntaxException
    Compiles a regular expression using REFlags. The flags parameter is a bitwise OR of the folloing values:
    • REFLlags.IGNORE_CASE - case insensitivity, corresponds to 'i' letter;
    • REFLlags.MULTILINE - multiline treatment(BOLs and EOLs affect the '^' and '$'), corresponds to 'm';
    • REFLlags.DOTALL - single line treatment('.' matches \r's and \n's),corresponds to 's';
    • REFLlags.IGNORE_SPACES - extended whitespace comments (spaces and eols in the expression are ignored), corresponds to 'x'.
    • REFLlags.UNICODE - predefined classes are regarded as belonging to Unicode, corresponds to 'u'; this may yield some performance penalty.
    • REFLlags.XML_SCHEMA - compatibility with XML Schema, corresponds to 'X'.
    Parameters:
    regex - the Perl5-compatible regular expression string.
    flags - the Perl5-compatible flags.
    Throws:
    PatternSyntaxException - if the argument doesn't correspond to perl5 regex syntax. see REFlags
    Method Detail

    compile

    protected void compile(java.lang.String regex,
                           int flags)
                    throws PatternSyntaxException

    groupCount

    public int groupCount()
    How many capturing groups this expression includes?

    groupId

    public java.lang.Integer groupId(java.lang.String name)
    Get numeric id for a group name.
    Returns:
    null if no such name found.
    See Also:
    MatchResult.group(java.lang.String), MatchResult.isCaptured(java.lang.String)

    matches

    public boolean matches(java.lang.String s)
    A shorthand for Pattern.matcher(String).matches().
    Parameters:
    s - the target
    Returns:
    true if the entire target matches the pattern
    See Also:
    Matcher.matches(), Matcher.matches(String)

    startsWith

    public boolean startsWith(java.lang.String s)
    A shorthand for Pattern.matcher(String).matchesPrefix().
    Parameters:
    s - the target
    Returns:
    true if the entire target matches the beginning of the pattern
    See Also:
    Matcher.matchesPrefix()

    matcher

    public Matcher matcher()
    Returns a targetless matcher. Don't forget to supply a target.

    matcher

    public Matcher matcher(java.lang.String s)
    Returns a matcher for a specified string.

    matcher

    public Matcher matcher(char[] data,
                           int start,
                           int end)
    Returns a matcher for a specified region.

    matcher

    public Matcher matcher(MatchResult res,
                           int groupId)
    Returns a matcher for a match result (in a performance-friendly way). groupId parameter specifies which group is a target.
    Parameters:
    groupId - which group is a target; either positive integer(group id), or one of MatchResult.MATCH,MatchResult.PREFIX,MatchResult.SUFFIX,MatchResult.TARGET.

    matcher

    public Matcher matcher(MatchResult res,
                           java.lang.String groupName)
    Just as above, yet with symbolic group name.
    Throws:
    NullPointerException - if there is no group with such name

    matcher

    public Matcher matcher(java.io.Reader text,
                           int length)
                    throws java.io.IOException
    Returns a matcher taking a text stream as target. Note that this is not a true POSIX-style stream matching, i.e. the whole length of the text is preliminary read and stored in a char array.
    Parameters:
    text - a text stream
    len - the length to read from a stream; if len is -1, the whole stream is read in.
    Throws:
    java.io.IOException - indicates an IO problem
    OutOfMemoryException - if a stream is too lengthy

    replacer

    public Replacer replacer(java.lang.String expr)
    Returns a replacer of a pattern by specified perl-like expression. Such replacer will substitute all occurences of a pattern by an evaluated expression ("$&" and "$0" will substitute by the whole match, "$1" will substitute by group#1, etc). Example:
     String text="The quick brown fox jumped over the lazy dog";
     Pattern word=new Pattern("\\w+");
     System.out.println(word.replacer("[$&]").replace(text));
     //prints "[The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog]"
     Pattern swap=new Pattern("(fox|dog)(.*?)(fox|dog)");
     System.out.println(swap.replacer("$3$2$1").replace(text));
     //prints "The quick brown dog jumped over the lazy fox"
     Pattern scramble=new Pattern("(\\w+)(.*?)(\\w+)");
     System.out.println(scramble.replacer("$3$2$1").replace(text));
     //prints "quick The fox brown over jumped lazy the dog"
     
    Parameters:
    expr - a perl-like expression, the "$&" and "${&}" standing for whole match, the "$N" and "${N}" standing for group#N, and "${Foo}" standing for named group Foo.
    See Also:
    Replacer

    replacer

    public Replacer replacer(Substitution model)
    Returns a replacer will substitute all occurences of a pattern through applying a user-defined substitution model.
    Parameters:
    model - a Substitution object which is in charge for match substitution
    See Also:
    Replacer

    tokenizer

    public RETokenizer tokenizer(java.lang.String text)
    Tokenizes a text by an occurences of the pattern. Note that a series of adjacent matches are regarded as a single separator. The same as new RETokenizer(Pattern,String);
    See Also:
    RETokenizer, RETokenizer.RETokenizer(jregex.Pattern,java.lang.String)

    tokenizer

    public RETokenizer tokenizer(char[] data,
                                 int off,
                                 int len)
    Tokenizes a specified region by an occurences of the pattern. Note that a series of adjacent matches are regarded as a single separator. The same as new RETokenizer(Pattern,char[],int,int);
    See Also:
    RETokenizer, RETokenizer.RETokenizer(jregex.Pattern,char[],int,int)

    tokenizer

    public RETokenizer tokenizer(java.io.Reader in,
                                 int length)
                          throws java.io.IOException
    Tokenizes a specified region by an occurences of the pattern. Note that a series of adjacent matches are regarded as a single separator. The same as new RETokenizer(Pattern,Reader,int);
    See Also:
    RETokenizer, RETokenizer.RETokenizer(jregex.Pattern,java.io.Reader,int)

    toString

    public java.lang.String toString()
    Overrides:
    toString in class java.lang.Object

    toString_d

    public java.lang.String toString_d()
    Returns a less or more readable representation of a bytecode for the pattern.