C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Tip: In the pattern, the question mark is important. It means to match as few characters as possible.
So: With the question mark, the entire string is not treated as one huge HTML tag.
Python program that removes HTML with re.sub
import re
# This string contains HTML.
v = """<p id=1>Sometimes, <b>simpler</b> is better,
but <i>not</i> always.</p>"""
# Replace HTML tags with an empty string.
result = re.sub("<.*?>", "", v)
print(result)
Output
Sometimes, simpler is better,
but not always.
Pattern details
< Less-than sign (matches HTML bracket).
.*? Match zero or more chars.
Match as few as possible.
> Greater-than (matches HTML bracket).
Note: This code is expected to mess up when a comment contains other comments or HTML tags.
But: On simple pages, this code can be used to process out HTML comments, reducing page size and increasing rendering performance.
Python program that removes HTML comments
import re
# This HTML string contains two comments.
v = """<p>Welcome to my <!-- awesome -->
website<!-- bro --></p>"""
# Remove HTML comments.
result = re.sub("<!--.*?-->", "", v)
print(v)
print(result)
Output
<p>Welcome to my <!-- awesome -->
website<!-- bro --></p>
<p>Welcome to my
website</p>
Instead: These simple methods can be used to process pages that contain no errors or unexpected markup.