Normative
Mn = Mark, Non-Spacing
Mc = Mark, Spacing Combining
Me = Mark, Enclosing
Nd = Number, Decimal Digit
Nl = Number, Letter
No = Number, Other
Zs = Separator, Space
Zl = Separator, Line
Zp = Separator, Paragraph
Cc = Other, Control
Cf = Other, Format
Cs = Other, Surrogate
Co = Other, Private Use
Cn = Other, Not Assigned
Informative
Lu = Letter, Uppercase
Ll = Letter, Lowercase
Lt = Letter, Titlecase
Lm = Letter, Modifier
Lo = Letter, Other
Pc = Punctuation, Connector
Pd = Punctuation, Dash
Ps = Punctuation, Open
Pe = Punctuation, Close
*Pi = Punctuation, Initial quote
*Pf = Punctuation, Final quote
Po = Punctuation, Other
Sm = Symbol, Math
Sc = Symbol, Currency
Sk = Symbol, Modifier
So = Symbol, Other
*Unsupported by Java (and hence unsupported by jregex).
A grouping of related characters within the Unicode encoding space. A block may contain unassigned positions, which are reserved.
"
A list of unicode blocks along with their boundaries is here(local copy) and
here(at www.unicode.org).
How to use these names in patterns
1-st, remove all spaces;
2-nd, prepended each name with "In", so:
'Basic Latin' is used as \p{InBasicLatin},
'Latin-1 Supplement' is used as \p{InLatin-1Supplement},
'Cyrillic' is used as \p{InCyrillic},
'Armenian' is used as \p{InArmenian},
etc.