TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

<< Back to JAVA

Java Word Count Methods: Split and For Loop

Use split and a for-loop to count words in a String. Check word boundaries with the Character class.
Count words. A String contains text divided into words. With a method, we can count the number of words in the String. This can be implemented in many ways.
With split, we use a regular expression pattern to separate likely words. Then we access the array's length. With a for-loop, we use the Character class to detect likely word separators.SplitFor
Split implementation. Let us begin with the split() version. We introduce countWords: this method separates a String into an array of strings. We split on non-word chars.

Pattern: The regular expression pattern used, "W+" indicates one or more non-word characters.

If: An if-statement is used to detect a zero-word string. This logic works for the case tested, but may not always be enough.

Java program that implements countWords with split public class Program { public static int countWords(String value) { // Split on non-word chars. String[] words = value.split("\\W+"); // Handle an empty string. if (words.length == 1 && words[0].length() == 0) { return 0; } // Return array length. return words.length; } public static void main(String[] args) { String value = "To be or not to be, that is the question."; int count = countWords(value); System.out.println(count); value = "Stately, plump Buck Mulligan came from the stairhead"; count = countWords(value); System.out.println(count); System.out.println(countWords("")); } } Output 10 8 0
Loop version. Let us rewrite our previous countWords method. This version uses a simple loop. We use the Character class to detect certain word boundaries.Character

Complexity: This version of countWords has less computational complexity. It just loops through all characters once.

IsWhitespace: This method detects whether a char is considered whitespace (this includes paces, newlines and tabs).

IsLetterOrDigit: This is a convenient method. It returns true if we have a letter (either upper or lowercase) or a digit (like 1, 2 or 3).

Note: CountWords here detects a whitespace character, and if a word-start character follows it, the variable c is incremented.

Java program that implements countWords with loop public class Program { public static int countWords(String value) { int c = 0; for (int i = 1; i < value.length(); i++) { // See if previous char is a space. if (Character.isWhitespace(value.charAt(i - 1))) { // See if this char is a word start character. // ... Some punctuation chars can start a word. if (Character.isLetterOrDigit(value.charAt(i)) == true || value.charAt(i) == '"' || value.charAt(i) == '(') { c++; } } } if (value.length() > 2) { c++; } return c; } public static void main(String[] args) { String value = "To be or not to be, that is the question."; int count = countWords(value); System.out.println(count); value = "Stately, plump Buck Mulligan came from the stairhead"; count = countWords(value); System.out.println(count); System.out.println(countWords("")); } } Output 10 8 0
Some issues, for-loop. In the for-loop method (the second example) we have some issues. We check for certain punctuation characters, but more checks may need to be added.

Thus: We developed a good approach for a countWords method, but not an ideal implementation.

A review. In counting words, we require approximations. Some sequences, like numbers, may or may not be considered words. Hyphenated words too are an issue.
© TheDeveloperBlog.com
The Dev Codes

Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf