TheDeveloperBlog.com


Java Pattern.matches Method: Regex Examples

Regex. Many String matching requirements can be done directly with Strings. But some are more complex. We use java.util.regex, and its Pattern class, for these.


With Regex, we can use Pattern.matches for simple syntax, but this method is slower. For speed, we first use Pattern.compile and then the Matcher class.


Pattern.matches example. We call Pattern.matches in a loop. Its first argument is the regular expression's pattern. It also accepts the string we want to test for matches.

And: It returns a boolean. If a match was found, this value equals true. For groups, we need to instead use a Matcher.

Based on:

Java 7

Java program that uses Pattern.matches

import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) {

	// Some strings to test.
	String[] inputs = { "dog", "dance", "cat", "dirt" };

	// Loop over strings and test them.
	for (String input : inputs) {
	    boolean b = Pattern.matches("d.+", input);
	    System.out.println(b);
	}
    }
}

Output

true
true
false
true

Pattern

d     A digit character.
.+    One or more characters of any type.

Pattern.compile and Matcher. Next we learn a faster way to match regular expressions. We use Pattern.compile to create a compiled pattern object.

Then: We call the matcher() method on the pattern instance. This returns a Matcher class instance.

Matches: Finally the matches method is used. This returns true if the matcher has a match of the compiled pattern.

Java program that uses Pattern.compile, Matcher

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) {

	// Compile this pattern.
	Pattern pattern = Pattern.compile("num\\d\\d\\d");

	// See if this String matches.
	Matcher m = pattern.matcher("num123");
	if (m.matches()) {
	    System.out.println(true);
	}

	// Check this String.
	m = pattern.matcher("num456");
	if (m.matches()) {
	    System.out.println(true);
	}
    }
}

Output

true
true

Pattern

num     The letters "num" must be present.
\d\d\d  Three digits characters.

Capturing groups. Often regular expression patterns use groups to capture parts of strings. Here we use positional groups. We access them by their position (1, 2 or more).

Tip: We create the compiled Pattern and initialize the Matcher like usual. After calling matches() we access groups.

Java program that uses group method

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) {

	Pattern pattern = Pattern.compile("(\\d+)\\-(\\d+)");

	// Get matcher on this String.
	Matcher m = pattern.matcher("1234-5678");

	// If it matches, get and display group values.
	if (m.matches()) {
	    String part1 = m.group(1);
	    String part2 = m.group(2);

	    System.out.println(part1);
	    System.out.println(part2);
	}
    }
}

Output

1234
5678

Pattern

(\d+)  One or more digit characters, in a group.
\-     A hyphen.

Named groups. With names, we easily access specific groups from a matched pattern. We use angle brackets to name groups in the pattern. Then we call group() with a String name argument.

Java program that uses named groups

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) {

	// Specify a pattern with named groups.
	Pattern pattern = Pattern.compile("(?<first>..)x(?<second>..)");
	Matcher m = pattern.matcher("c3xp0");

	// Check for matches.
	// ... Then access named groups by their names.
	if (m.matches()) {
	    String part1 = m.group("first");
	    String part2 = m.group("second");

	    System.out.println(part1);
	    System.out.println(part2);
	}
    }
}

Output

c3
p0

Pattern

(?<first>..)   Group named "first" with two characters.
x              Letter x.
(?<second>..)  Group named "second" with two characters.

Pattern.quote. Characters must be escaped ("quoted") to avoid being seen as metacharacters. For example a star must be escaped to mean an asterisk, not a Kleene closure of "zero or more."

Pattern.quote: This method surrounds a String with a Q and an E. Between these characters, everything is escaped.

So: We match the star as a star. Without Pattern.quote, we receive a "dangling metacharacter" exception.

Java program that uses Pattern.quote

import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) {

	// Quote this value.
	String value = "*star";
	String quote = Pattern.quote(value);

	System.out.println(value);
	System.out.println(quote);

	// Try matching with quoted value.
	boolean result1 = Pattern.matches(quote, "*star");
	System.out.println(result1);

	// This fails because it was not quoted.
	boolean result2 = Pattern.matches(value, "*star");
	System.out.println(result2);
    }
}

Output

*star
\Q*star\E
true
Exception in thread "main" java.util.regex.PatternSyntaxException:
    Dangling meta character '*' near index 0
    *star
    ^

Start, end in pattern. Often in regular expressions we want to match the start or end of strings. Two metacharacters are useful here: the "^" and the "$." These match the start, the end.

Here: A method called startsWithAEndsWithZ tests a String. It returns true if the first char is "a" and the last is "z."

Caution: Testing chars (with startsWith, endsWith, charAt) is more efficient. But it becomes harder to code when requirements change.

Java program that tests start, end in pattern

import java.util.regex.Pattern;

public class Program {

    public static boolean startsWithAEndsWithZ(String value) {
	// Test start and end characters.
	return Pattern.matches("^a.*z$", value);
    }

    public static void main(String[] args) {
	String[] values = { "a123z", "b123z", "az", "aq", "aza" };
	// Loop over and test these Strings.
	for (String value : values) {
	    System.out.print(value);
	    System.out.print(' ');
	    System.out.println(startsWithAEndsWithZ(value));
	}
    }
}

Output

a123z true
b123z false
az true
aq false
aza false

Pattern

^     Matches start of string.
a     Lowercase a.
.*    Zero or more characters.
z     Lowercase z.
$     Matches end of string.

A benchmark. This benchmark compares the performance of using Pattern.compile (and the matches method) with Pattern.matches. With compile() we reuse the same pattern many times.

Result: Using compile() and a Matcher is a clear performance boost. This approach is 200% faster than Pattern.matches.

Java that benchmarks Matcher, Pattern.matches

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) throws Exception {

	// ... Compile.
	Pattern pattern = Pattern.compile("num\\d\\d\\d");

	long t1 = System.currentTimeMillis();

	// ... Use Matcher with compiled pattern.
	for (int i = 0; i < 100000; i++) {
	    Matcher m = pattern.matcher("num123");
	    if (!m.matches()) {
		throw new Exception();
	    }
	}

	long t2 = System.currentTimeMillis();

	// ... Use Pattern.matches method.
	for (int i = 0; i < 100000; i++) {
	    if (!Pattern.matches("num\\d\\d\\d", "num123")) {
		throw new Exception();
	    }
	}

	long t3 = System.currentTimeMillis();

	// ... Times.
	System.out.println(t2 - t1);
	System.out.println(t3 - t2);
    }
}

Output

31 ms, Pattern.compile, Matcher
90 ms, Pattern.matches

Performance, named groups. We reference groups with names or indexes using the group method on Matcher. In this test, named accesses are slower. Using indexes, like 1 or 2, is faster.

So: Unless named groups in a Regex make the program much clearer, it is a better choice to use indexes to access groups.

Java that times named groups, matcher

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) {

	// ... Compile.
	Pattern pattern1 = Pattern
		.compile("(?<digitpart>\\d\\d),(?<letterpart>\\s+)");
	Pattern pattern2 = Pattern.compile("(\\d\\d),(\\s+)");

	long t1 = System.currentTimeMillis();

	// ... Use pattern with named groups.
	for (int i = 0; i < 200000; i++) {
	    Matcher m = pattern1.matcher("34,cat");
	    if (m.matches()) {
		String part1 = m.group("digitpart");
		String part2 = m.group("letterpart");
		if (part1 != "34" || part2 != "cat") {
		    System.out.println(false);
		    break;
		}
	    }
	}

	long t2 = System.currentTimeMillis();

	// ... Use pattern with indexed (ordinal) groups.
	for (int i = 0; i < 200000; i++) {
	    Matcher m = pattern2.matcher("34,cat");
	    if (m.matches()) {
		String part1 = m.group(1);
		String part2 = m.group(2);
		if (part1 != "34" || part2 != "cat") {
		    System.out.println(false);
		    break;
		}
	    }
	}

	long t3 = System.currentTimeMillis();

	// ... Times.
	System.out.println(t2 - t1);
	System.out.println(t3 - t2);
    }
}

Results

44 ms, group(name)
21 ms, group(index)

Split, Pattern. A split method is available on Pattern instances. This lets us split based on a Regex delimiter. The Pattern can be compiled once and reused many times.

Split
Java that uses split, Pattern

import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) {

	String line = "cat, dog, rabbit--100";

	// Compile a Pattern that indicates a delimiter.
	Pattern p = Pattern.compile("\\W+");

	// Split a String based on the delimiter pattern.
	String[] elements = p.split(line);
	for (String element : elements) {
	    System.out.println(element);
	}
    }
}

Output

cat
dog
rabbit
100

Pattern

\W+    One or more non-word characters.

Word count. A regular expression can be used to count words. The split() method is helpful here. But a faster option is to use a for-loop.

Word Count

Often, regular expressions reduce performance. The special text language used has some costs. With String methods and for-loops, we can directly manipulate and test Strings.


Complex tasks. When complexity builds, writing custom loops becomes a challenge. With Regex we simplify programs. We make them easier to write, to understand.