Home | Documentation | Foundry | Examples | Demo | Download | Feedback | Books
Advanced features
jregex library suggests some interesting experimental features:
  • Named groups
  • Incomplete matching
  • Non-breaking search

    Named groups

    As you know (see
    Appendix B), a capturing group is a part of a pattern, that is marked by parentheses (without '?' following the opening one). Such group captures a corresponding part of input, so one can obtain its contents using one of MatchResult's methods by supplying a group number. Such numbering is usually done automatically, according to an order of precedence of opening round brackets as they appear in a pattern.
    A problem arises when somebody's application hardly relies on a lot of patterns. Patterns tend to change over the time. Some changes may affect a group order which leads to inconsistency both in pattern backrefences and hardcoded group numbers. Such inconsistencies are very hard to find, resulting in code unmaintainability.
    Named groups address this problem by allowing to assign either a symbolic name or an explicit numeric id to a particular group.
    To assign a symbolic name to a group, a name (a word conforming to a Java identifier syntax) should be enclosed in curly brackets just after the opening parenthesis:
       Pattern hms=new Pattern("({Hour}\\d\\d):({Minute}\\d\\d):({Second}\\d\\d)");
       //A group name is one in a curly brackets just after the opening parenthesis
       Matcher m=hms.matcher(someStr);
       if(m.matches()){
          System.out.println("Hour: "+m.group("Hour"));
          System.out.println("Minute: "+m.group("Minute"));
          System.out.println("Second: "+m.group("Second"));
       }
                      
    It's not so hard to guess that under the hood of a symbolic name there should be a numeric id. Yes, it really is, and one can take it from a Pattern by method groupId(String) and use it the same way as if it were an ordinal group id:
       //hms and m are defined above
       int hourId=hms.groupId("Hour").intValue();
       System.out.println(hourId); //prints "1"
       System.out.println(m.group(hourId)); //prints an hour
                      
    If a word in curly brackets is a decimal number, the group is forced to have a corresponding numeric id, without assigning any symbolic name:
       Pattern ordinal=new Pattern("(\\w+)-(\\w+)-(\\w+)");
       Pattern forced=new Pattern("({3}\\w+)-({2}\\w+)-({1}\\w+)");
       //forced id is a number in curly brackets just after the opening parenthesis.
       //Except the group numbering, the two patterns above are equivalent
       
       String str="one-two-three";
       Matcher om=ordinal.matcher(str);
       Matcher fm=forced.matcher(str);
       om.find();//surely finds
       fm.find();//surely finds
       System.out.println(om.group(1)+","+om.group(2)+","+om.group(3));
        //prints "one,two,three"
        
       System.out.println(fm.group(1)+","+fm.group(2)+","+fm.group(3));
        //prints "three,two,one"
        
       System.out.println(forced.groupId("1"));
        //prints "null" because such symbolic name wasn't assigned
        
       System.out.println(fm.group("1"));
        //throws IllegalArgumentException stating that such symbolic name wasn't assigned
                      
    Try this right now in a demo applet using the following pattern and target:
  • pattern: ({5}\w+)-({3}\w+)-({1}\w+)
  • target: one-two-three

    Incomplete matching

    This feature allows to find out whether the string could match by examining only its beginning part. For example, the string is being typed into a textfield, and you want to reject the rest characters after the first few ones appear incorrect (see
    working example).
    Such problem can be solved using matchesPrefix() method in a Matcher class (there is also a shorthand method in the Pattern class, see Patterh.startsWith()).
       Pattern re=new Pattern("\\d\\d:\\d\\d"); //two digits + colon + two digits
       System.out.println("Pattern: "+re);
       
       Matcher m=re.matcher();
       test(m,"");
       test(m,"1");
       test(m,"12");
       test(m,"1:2");
       test(m,"12:");
       test(m,"12:1");
       test(m,"12:12");
       test(m,"12:123");
       test(m,"123");
       ...
       
    static void test(Matcher m,String s) throws Exception{
       m.setTarget(s);
       System.out.println("\""+s+"\" : "+m.matchesPrefix());
    }
    
    Output:
    
    Pattern: \d\d:\d\d
    "" : true
    "1" : true
    "12" : true
    "1:2" : false
    "12:" : true
    "12:1" : true
    "12:12" : true
    "12:123" : false
    "123" : false
                      
    You may see this feature in action here.

    Non-breaking search

    Using non-breaking search you can find all possible occureneces of a pattern, including those that are intersecting or nested. This is achieved by using the Matcher's method proceed() instead of find():
    ...
    Matcher m=new Pattern("\\w+").matcher(" abc,de ");
    while(m.proceed()){
       System.out.println(m.toString());
    }
    ...
    
    Output:
    
    abc
    ab
    a
    bc
    b
    de
    d
    e
                      
    Note, that using find() you could find only the "abc" and "d".
    Such feature is useful when one needs to search by a rule that cannot be expressed completely as a regular expression. In such case we can use a non-breaking pattern search as a primary filter, then applying the rule to the current result, a sort of:
    //finding all odd numbers in a string
    ...
    Matcher m=new Pattern("\\d+").matcher("abcd 1234");
    while(m.proceed()){
       if(isOddNum(m)){
          System.out.println(m);
       }
    }
    static boolean isOddNum(MatchResult mr){
       int lastDigit=mr.charAt(mr.length()-1)-'0';
       if((lastDigit%2)>0) return true;
       return false;
    }
    ...
    Output:
    1
    123
    3
                      
    In order the results not to intersect, one should call skip() after each successful match:
    Matcher m=new Pattern("\\d+").matcher("abcd 1234");
    while(m.proceed()){
       if(isOddNum(m)){
          System.out.println(m);
          m.skip();
       }
    }
    Output:
    1
    3
                      

  • Home | Documentation | Foundry | Examples | Demo | Download | Feedback | Books
    Copyright 2000-2002 S. A. Samokhodkin