C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
StripHtmlRegex: Uses replaceAll. With replaceAll, the first argument is a regular expression, and the second is the replacement.
StripTagsCharArray: This method implements a simple imperative parser in a for-loop. It changes state based on angle brackets.
ForOutput: The two methods have the same, correct, output on the example string. In main() we test them.
Java program that removes HTML tags
public class Program {
public static String stripHtmlRegex(String source) {
// Replace all tag characters with an empty string.
return source.replaceAll("<.*?>", "");
}
public static String stripTagsCharArray(String source) {
// Create char array to store our result.
char[] array = new char[source.length()];
int arrayIndex = 0;
boolean inside = false;
// Loop over characters and append when not inside a tag.
for (int i = 0; i < source.length(); i++) {
char let = source.charAt(i);
if (let == '<') {
inside = true;
continue;
}
if (let == '>') {
inside = false;
continue;
}
if (!inside) {
array[arrayIndex] = let;
arrayIndex++;
}
}
// ... Return written data.
return new String(array, 0, arrayIndex);
}
public static void main(String[] args) {
final String html = "<p id=x>Sometimes, <b>simpler</b> is better, "
+ "but <i>not</i> always.</p>";
System.out.println(html);
String test = stripHtmlRegex(html);
System.out.println(test);
String test2 = stripTagsCharArray(html);
System.out.println(test2);
}
}
Output
<p id=x>Sometimes, <b>simpler</b> is better, but <i>not</i> always.</p>
Sometimes, simpler is better, but not always.
Sometimes, simpler is better, but not always.
And: Due to the complex, organic nature of the web, these HTML methods can be used only on a limited subset of pages.