C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
But some are more complex. We use java.util.regex, and its Pattern class, for these.
With Regex, we can use Pattern.matches for simple syntax, but this method is slower. For speed, we first use Pattern.compile and then the Matcher class.
Pattern.matches example. We call Pattern.matches in a loop. Its first argument is the regular expression's pattern. It also accepts the string we want to test for matches.
And: It returns a boolean. If a match was found, this value equals true. For groups, we need to instead use a Matcher.
Based on: Java 7 Java program that uses Pattern.matches import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // Some strings to test. String[] inputs = { "dog", "dance", "cat", "dirt" }; // Loop over strings and test them. for (String input : inputs) { boolean b = Pattern.matches("d.+", input); System.out.println(b); } } } Output true true false true Pattern d A digit character. .+ One or more characters of any type.
Pattern.compile and Matcher. Next we learn a faster way to match regular expressions. We use Pattern.compile to create a compiled pattern object.
Then: We call the matcher() method on the pattern instance. This returns a Matcher class instance.
Matches: Finally the matches method is used. This returns true if the matcher has a match of the compiled pattern.
Java program that uses Pattern.compile, Matcher import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // Compile this pattern. Pattern pattern = Pattern.compile("num\\d\\d\\d"); // See if this String matches. Matcher m = pattern.matcher("num123"); if (m.matches()) { System.out.println(true); } // Check this String. m = pattern.matcher("num456"); if (m.matches()) { System.out.println(true); } } } Output true true Pattern num The letters "num" must be present. \d\d\d Three digits characters.
Capturing groups. Often regular expression patterns use groups to capture parts of strings. Here we use positional groups. We access them by their position (1, 2 or more).
Tip: We create the compiled Pattern and initialize the Matcher like usual. After calling matches() we access groups.
Java program that uses group method import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) { Pattern pattern = Pattern.compile("(\\d+)\\-(\\d+)"); // Get matcher on this String. Matcher m = pattern.matcher("1234-5678"); // If it matches, get and display group values. if (m.matches()) { String part1 = m.group(1); String part2 = m.group(2); System.out.println(part1); System.out.println(part2); } } } Output 1234 5678 Pattern (\d+) One or more digit characters, in a group. \- A hyphen.
Named groups. With names, we easily access specific groups from a matched pattern. We use angle brackets to name groups in the pattern. Then we call group() with a String name argument.
Java program that uses named groups import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // Specify a pattern with named groups. Pattern pattern = Pattern.compile("(?<first>..)x(?<second>..)"); Matcher m = pattern.matcher("c3xp0"); // Check for matches. // ... Then access named groups by their names. if (m.matches()) { String part1 = m.group("first"); String part2 = m.group("second"); System.out.println(part1); System.out.println(part2); } } } Output c3 p0 Pattern (?<first>..) Group named "first" with two characters. x Letter x. (?<second>..) Group named "second" with two characters.
Pattern.quote. Characters must be escaped ("quoted") to avoid being seen as metacharacters. For example a star must be escaped to mean an asterisk, not a Kleene closure of "zero or more."
Pattern.quote: This method surrounds a String with a Q and an E. Between these characters, everything is escaped.
So: We match the star as a star. Without Pattern.quote, we receive a "dangling metacharacter" exception.
Java program that uses Pattern.quote import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // Quote this value. String value = "*star"; String quote = Pattern.quote(value); System.out.println(value); System.out.println(quote); // Try matching with quoted value. boolean result1 = Pattern.matches(quote, "*star"); System.out.println(result1); // This fails because it was not quoted. boolean result2 = Pattern.matches(value, "*star"); System.out.println(result2); } } Output *star \Q*star\E true Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0 *star ^
Start, end in pattern. Often in regular expressions we want to match the start or end of strings. Two metacharacters are useful here: the "^" and the "$." These match the start, the end.
Here: A method called startsWithAEndsWithZ tests a String. It returns true if the first char is "a" and the last is "z."
Caution: Testing chars (with startsWith, endsWith, charAt) is more efficient. But it becomes harder to code when requirements change.
Java program that tests start, end in pattern import java.util.regex.Pattern; public class Program { public static boolean startsWithAEndsWithZ(String value) { // Test start and end characters. return Pattern.matches("^a.*z$", value); } public static void main(String[] args) { String[] values = { "a123z", "b123z", "az", "aq", "aza" }; // Loop over and test these Strings. for (String value : values) { System.out.print(value); System.out.print(' '); System.out.println(startsWithAEndsWithZ(value)); } } } Output a123z true b123z false az true aq false aza false Pattern ^ Matches start of string. a Lowercase a. .* Zero or more characters. z Lowercase z. $ Matches end of string.
A benchmark. This benchmark compares the performance of using Pattern.compile (and the matches method) with Pattern.matches. With compile() we reuse the same pattern many times.
Result: Using compile() and a Matcher is a clear performance boost. This approach is 200% faster than Pattern.matches.
Java that benchmarks Matcher, Pattern.matches import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) throws Exception { // ... Compile. Pattern pattern = Pattern.compile("num\\d\\d\\d"); long t1 = System.currentTimeMillis(); // ... Use Matcher with compiled pattern. for (int i = 0; i < 100000; i++) { Matcher m = pattern.matcher("num123"); if (!m.matches()) { throw new Exception(); } } long t2 = System.currentTimeMillis(); // ... Use Pattern.matches method. for (int i = 0; i < 100000; i++) { if (!Pattern.matches("num\\d\\d\\d", "num123")) { throw new Exception(); } } long t3 = System.currentTimeMillis(); // ... Times. System.out.println(t2 - t1); System.out.println(t3 - t2); } } Output 31 ms, Pattern.compile, Matcher 90 ms, Pattern.matches
Performance, named groups. We reference groups with names or indexes using the group method on Matcher. In this test, named accesses are slower. Using indexes, like 1 or 2, is faster.
So: Unless named groups in a Regex make the program much clearer, it is a better choice to use indexes to access groups.
Java that times named groups, matcher import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // ... Compile. Pattern pattern1 = Pattern .compile("(?<digitpart>\\d\\d),(?<letterpart>\\s+)"); Pattern pattern2 = Pattern.compile("(\\d\\d),(\\s+)"); long t1 = System.currentTimeMillis(); // ... Use pattern with named groups. for (int i = 0; i < 200000; i++) { Matcher m = pattern1.matcher("34,cat"); if (m.matches()) { String part1 = m.group("digitpart"); String part2 = m.group("letterpart"); if (part1 != "34" || part2 != "cat") { System.out.println(false); break; } } } long t2 = System.currentTimeMillis(); // ... Use pattern with indexed (ordinal) groups. for (int i = 0; i < 200000; i++) { Matcher m = pattern2.matcher("34,cat"); if (m.matches()) { String part1 = m.group(1); String part2 = m.group(2); if (part1 != "34" || part2 != "cat") { System.out.println(false); break; } } } long t3 = System.currentTimeMillis(); // ... Times. System.out.println(t2 - t1); System.out.println(t3 - t2); } } Results 44 ms, group(name) 21 ms, group(index)
Split, Pattern. A split method is available on Pattern instances. This lets us split based on a Regex delimiter. The Pattern can be compiled once and reused many times.
Java that uses split, Pattern import java.util.regex.Pattern; public class Program { public static void main(String[] args) { String line = "cat, dog, rabbit--100"; // Compile a Pattern that indicates a delimiter. Pattern p = Pattern.compile("\\W+"); // Split a String based on the delimiter pattern. String[] elements = p.split(line); for (String element : elements) { System.out.println(element); } } } Output cat dog rabbit 100 Pattern \W+ One or more non-word characters.
Word count. A regular expression can be used to count words. The split() method is helpful here. But a faster option is to use a for-loop.
Often, regular expressions reduce performance. The special text language used has some costs. With String methods and for-loops, we can directly manipulate and test Strings.
Complex tasks. When complexity builds, writing custom loops becomes a challenge. With Regex we simplify programs. We make them easier to write, to understand.