C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Methods: In the class, we specify 2 methods: handle_starttag and handle_data. Other methods can be specified.
Here: We just set a field "tag" to the name of the current start tag in handle_starttag.
And: Then when we encounter data, in handle_data, we use the previous tag name to help identify that data.
Caution: This approach is not ideal, but if you are just searching for simple tags, like title or h1 elements, it works.
Python program that uses html.parser
from html.parser import HTMLParser
# A class that inherits from HTMLParser.
# ... It implements two methods.
class TagParser(HTMLParser):
def handle_starttag(self, tag, attrs):
# Set "tag" field to the name of the opened tag.
self.tag = tag
def handle_data(self, data):
# Print data within currently-open tag.
print(self.tag + ":", data)
parser = TagParser()
parser.feed("<h1>Python</h1>" +
"<p>Is cool.</p>");
Output
h1: Python
p: Is cool.
Tip: You can specify any Python statements within your class that derives from HTMLParser.
And: This makes it possible to develop a custom HTML parser. It erases the need to handle tedious HTML syntax in custom code.
Tip: You can loop over the attributes (attrs) list like any other list. The for-loop is ideal.
List