TheDeveloperBlog.com


Java

Java Split Method

Split. Often strings are read in from lines of a file. And these lines have many parts, separated by delimiters. With use split() to break them apart.


Regex. Split in Java uses a Regex. This can be simple, even a single character like a comma, or more complex, involving character codes. This method is powerful.


A simple example. Let use begin with this example. We introduce a string that has two commas in it, separating three strings (cat, dog, bird). We split on a comma.

For: Split returns a String array. We then loop over that array's elements with a for-each loop. We display them.

Based on:

Java 7

Java program that uses split

public class Program {
    public static void main(String[] args) {

	// This string has three words separated by commas.
	String value = "cat,dog,bird";

	// Split on a comma.
	String parts[] = value.split(",");

	// Display result parts.
	for (String part : parts) {
	    System.out.println(part);
	}
    }
}

Output

cat
dog
rat

Split lines in file. Here we use BufferedReader and FileReader to read in a text file. Then, while looping over it, we split each line. In this way we parse a CSV file with split.

Files, BufferedReader

Println: Finally we use the System.out.println method to display each part from each line to the screen.

Console
Contents: file.txt

carrot,squash,turnip
potato,spinach,kale

Java program that reads file, splits lines

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class Program {
    public static void main(String[] args) throws IOException {

	// Open this file.
	BufferedReader reader = new BufferedReader(new FileReader(
		"C:\\programs\\file.txt"));

	// Read lines from file.
	while (true) {
	    String line = reader.readLine();
	    if (line == null) {
		break;
	    }
	    // Split line on comma.
	    String[] parts = line.split(",");
	    for (String part : parts) {
		System.out.println(part);
	    }
	    System.out.println();
	}

	reader.close();
    }
}

Output

carrot
squash
turnip

potato
spinach
kale

Either character. Often data is inconsistent. Sometimes we need to split on a range or set of characters. With split, this is possible. Here we split on a comma and a colon.

Tip: With square brackets, we specify the possible characters to split upon. So we split on all colons and commas, with one call.

Java program that splits on either character

public class Program {
    public static void main(String[] args) {

	String line = "carrot:orange,apple:red";

	// Split on comma or colon.
	String[] parts = line.split("[,:]");
	for (String part : parts) {
	    System.out.println(part);
	}
    }
}

Output

carrot
orange
apple
red

Count, separate words. We can use more advanced character patterns in split. Here we separate a String based on non-word characters. We use "\W+" to mean this.

Pattern: The pattern means "one or more non-word characters." A plus means "one or more" and a W means non-word.

Note: The comma and its following space are treated as a single delimiter. So two characters are matched as one delimiter.

Word Count
Java program that counts, splits words

public class Program {
    public static void main(String[] args) {

	String line = "hello, how are you?";

	// Split on 1+ non-word characters.
	String[] words = line.split("\\W+");

	// Count words.
	System.out.println(words.length);

	// Display words.
	for (String word : words) {
	    System.out.println(word);
	}
    }
}

Output

4
hello
how
are
you

Numbers. This example splits a string apart and then uses parseInt to convert those parts into ints. It splits on a two-char sequence. Then in a loop, it calls parseInt on each String.

Numbers: parseInt
Java program that uses split, parseInt

public class Program {
    public static void main(String[] args) {

	String line = "1, 2, 3";

	// Split on two-char sequence.
	String[] numbers = line.split(", ");

	// Display numbers.
	for (String number : numbers) {
	    int value = Integer.parseInt(number);
	    System.out.println(value + " * 20 = " + value * 20);
	}
    }
}

Output

1 * 20 = 20
2 * 20 = 40
3 * 20 = 60

Limit. Split accepts an optional second parameter, a limit Integer. If we provide this, the result array has at most that many elements. Any extra parts remain part of the last element.


Pattern.compile, split. A split method is available on the Pattern class, found in java.util.regex. We can compile a Pattern and reuse it many times. This can enhance performance.

Note: A call to Pattern.compile optimizes all split() calls afterwards. But this only helps if many splits are done.

Pattern
Java program that uses Pattern.compile, split

import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) {

	// Separate based on number delimiters.
	Pattern p = Pattern.compile("\\d+");
	String value = "abc100defgh9ij";
	String[] elements = p.split(value);

	// Display our results.
	for (String element : elements) {
	    System.out.println(element);
	}
    }
}

Output

abc
defgh
ij

Performance, Pattern split. We can improve the speed of splitting strings based on regular expressions by using Pattern.compile. We create a delimiter pattern. Then we call split() with it.

Result: When many Strings are split, a call Pattern.compile before using its Split method optimizes performance.

Java that times Pattern split

import java.util.regex.Pattern;

public class Program {
    public static void main(String[] args) {

	// ... Create a delimiter pattern.
	Pattern pattern = Pattern.compile("\\W+");
	String line = "cat; dog--ABC";

	long t1 = System.currentTimeMillis();

	// Version 1: use split method on Pattern.
	for (int i = 0; i < 1000000; i++) {

	    String[] values = pattern.split(line);
	    if (values.length != 3) {
		System.out.println(false);
	    }
	}

	long t2 = System.currentTimeMillis();

	// Version 2: use String split method.
	for (int i = 0; i < 1000000; i++) {

	    String[] values = line.split("\\W+");
	    if (values.length != 3) {
		System.out.println(false);
	    }
	}

	long t3 = System.currentTimeMillis();

	// ... Benchmark results.
	System.out.println(t2 - t1);
	System.out.println(t3 - t2);
    }
}

Results

471 ms, Pattern split
549 ms, String split

Join. This method combines Strings together—we specify our desired delimiter String. Join is sophisticated. It can handle a String array or individual Strings.

Join

With split, we use a regular expression-based pattern. But for simple cases, we provide the delimiter itself as the pattern. This too works. Split is elegant and powerful.