Pattern p=new Pattern("(\\d\\d):(\\d\\d):(\\d\\d)");
Replacer r=p.replacer("[hour=$1, minute=$2, second=$3]");
//see also the constructor Replacer(Pattern,String,boolean)
String result=r.replace("the time is 10:30:01");
//gives "the time is [hour=10, minute=30, second=01]"
You can also to append the result either to a StringBuffer:
StringBuffer sb=...;
Replacer r=...;
r.replace("the input string",sb);
//now sb contains the result of replacement
Note that in the Replacer class there are a lot of similar methods for
various input types.
If some task requires a flexibility that perl-like substitution expressions couldn't provide,
one can use a custom implementation of a Substitution
interface. For example:
Pattern p=new Pattern("(\\d+)\\+(\\d+)");
Substitution add=new Substitution(){
public void appendSubstitution(MatchResult match,TextBuffer dest){
int a=Integer.parseInt(match.group(1));
int b=Integer.parseInt(match.group(2));
dest.append(String.valueOf(a+b));
}
}
Replacer r=p.replacer(add);
String result=r.replace("1+2 3+4");
//"3 7"
String tokenizing using jregex's RETokenizer class is pretty similar to using a standard StringTokenizer class.
The only difference is that RETokenizer uses a pattern occurence as a token delimiter:
String theText=" Some --- strings --- separated by \"---\"";
Pattern p=new Pattern("(?<!\")---(?!\")"); //three hyphens not enclosed in quotemarks
RETokenizer tok=new RETokenizer(p,theText);
while(tok.hasMore())System.out.print("Next token: "+tok.nextToken());
//prints:
// Some
// strings
// separated by "---"
RETokenizer has a split() method that allows to get all the tokens as a String array:
There is an important issue regarding how the RETokenizer handles few adjacent delimiters,
as it can take them either as a single delimiter or as several ones with the empty tokens between.
One can control this behavoiur using the RETokenizer.setEmptyEnabled(boolean) method.
As a descendant of the jregex.Pattern, the PathPattern inherits all its functionality,
allowing to search and match the path strings.
For example, the pattern */*.java would match the following strings: foo/Bar.java, bar\Foo.java (on windows), and wouldn't
match the FooBar, FooBar.java, foo/bar/FooBar.java.
Note, that each wildcard takes a capturing group in the pattern.
Usage:
String myPath=...;
Pattern p=new PathPattern("**/*"); //the "**" is the 1-st group, the "*" is the second
Matcher m=p.matcher(myPath);
if(m.matches()){
System.out.println("file name: "+m.group(1));
System.out.println("directory: "+m.group(2));
}
Forces a '^' tag to match BOLs and a '$' to match EOLs
Disabled
REFlags.DOTALL
"s"
Forces a '.' (dot) tag to match line separator chars
Disabled
REFlags.IGNORE_SPACES
"x"
Forces a compiler to ignore spaces in expression; allows to sparse a pattern for better readability
Disabled
REFlags.UNICODE
"u"
Forces a compiler to treat \w, \d, \s, etc. as relating to Unicode
Disabled
REFlags.XML_SCHEMA
"X"
Enables compatibility with XML schema regular expressions
Disabled
Passing flags through a string looks like "imsxuX-imsxuX" where chars before a hyphen enable appropriate flag, and after a hyphen disable it.
Such string you can also embed into a pattern using the "(?imsxuX-imsxuX)" and "(?imsxuX-imsxuX:)" constructs. The first one sets flags for the rest part of the pattern,
while the second sets flags for the enclosed part (that resides between the colon and the closing parenthesis).
int gc=myPattern.groupCount();
System.out.println("Group count: "+gc);
test whether some group is captured;
find out where some group starts and ends, how long is it;
retrieve its contents:
for(int i=0;i<gc;i++){
System.out.println("Group #"+i+":");
if(!myMatcher.isCaptured(i)){ // see
System.out.println(" Not captured, taking next..");
continue;
}
System.out.println(" starts at "+myMatcher.start(i)); // see
System.out.println(" ends at "+myMatcher.end(i)); // see
System.out.println(" length: "+myMatcher.length(i)); // see
System.out.println(" contents: \""+myMatcher.group(i)+"\""); // see
}
Note that all methods dealing with retrieving information on a match are grouped in
MatchResult interface, which is implemented by a Matcher class.
Example
Suppose we have a string myString (which actually is "The time is 15:20:45"),
and we suspect it contain a time in a "hh:mm:ss" format. And if so, we want to know the minute.
Let's begin:
Pattern hms=new Pattern("\\b(\\d\\d):(\\d\\d):(\\d\\d)\\b"); // "\\b" tags a word boundary
Matcher m=hms.matcher(myString);
if(m.find()){
System.out.println("Found!");
String grp2=m.group(2);
int minute=Integer.parseInt(grp2);
System.out.println("The minute is "+minute);
//prints "The minute is 20"
}
else{
System.out.println("Not found :((");
}
Note 1. The first letters in category names represent the whole family, for example \p{L} represents all of Lu, Ll, Lt, Lm, Lo.
Note 2. To print a list of currently supported names launch the jregex.CharacterClass class as a java application:
java -cp .;[path to jregex.jar] jregex.CharacterClass >names.txt
REFlags.UNICODE/"u" flag
Turning on this flag in a pattern constructor forces a compiler to treat appropriate perl classes
(\d, \D, \w, \W, \s, \S, etc) as belonging to Unicode. That is, \d
becomes the same as \p{N} and so on.