C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Microsoft Word has an option to count characters including spaces or not including spaces. We test the solution against Microsoft Word to ensure correctness.
Example. This method counts characters like Microsoft Office. I tested a real-world text file with Word 2007 and developed two methods that closely match Microsoft Office's result. For the character count, we treat many spaces together as one.
Next: This example iterates over its string parameter. Word 2007 counts ten spaces in a row as one—we do the same here.
And: The bool flag variable keeps track of whether the previous char was a space.
Method that counts characters: C# /// <summary> /// Return the number of characters in a string using the same method /// as Microsoft Word 2007. Sequential spaces are not counted. /// </summary> /// <param name="value">String to count chars.</param> /// <returns>Number of chars in string.</returns> static int CountChars(string value) { int result = 0; bool lastWasSpace = false; foreach (char c in value) { if (char.IsWhiteSpace(c)) { // A. // Only count sequential spaces one time. if (lastWasSpace == false) { result++; } lastWasSpace = true; } else { // B. // Count other characters every time. result++; lastWasSpace = false; } } return result; }
Methods used. We use the char.IsWhiteSpace method. This method handles newlines, line breaks, tabs and any whitespace. Finally, the code counts non-whitespace characters in the expected way.
Example 2. Here we see a method that counts non-whitespace characters. This closely parallels Microsoft Word 2007's results as well. It is simpler and just increments the result for each non-whitespace character.
Method that counts word characters: C# /// <summary> /// Counts the number of non-whitespace characters. /// It closely matches Microsoft Word 2007. /// </summary> /// <param name="value">String to count non-whitespaces.</param> /// <returns>Number of non-whitespace chars.</returns> static int CountNonSpaceChars(string value) { int result = 0; foreach (char c in value) { if (!char.IsWhiteSpace(c)) { result++; } } return result; }
Tests. We discuss the accuracy of the above methods when compared to Microsoft Word. I tested the two methods against Microsoft Word to make sure they are accurate. They come close to Word's result, although they are off by a tiny amount.
File tested: decision.txt Microsoft Word char count: 834 Method count: 830 [off by 4] Word non-whitespace count: 667 Method count: 667 [exact]
Discussion. This method pair is helpful because it allows you to more accurately judge the logical length of a file. This will make text that uses two spaces after a period be equivalent in length to text that uses one period.
Counting bytes in files. If you need a way to test the length of text files, these methods could be better. This of course only applies if you want the logical length, not the physical length.
Note: The methods are fairly fast because no StringBuilder appends or other string copying is done.
And: Scanning through individual characters was fast in my research. The foreach-loop is not slower than a for-loop.
Summary. Here we saw methods that can count characters and non-whitespace characters with results that are similar to Microsoft Word 2007. We can use these two methods to count the number of characters, finding the logical length of text.