TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

<< Back to PYTHON

Python XML: Expat, StartElementHandler

Investigate ways to parse XML. Use Expat, in xml.parsers.expat.
XML. Many programs require XML support. XML is a markup language that is commonly used for configuration and data files. In Python, several modules offer XML support. They have special features and advantages. We test Expat, one XML library.
To begin, we add the xml.parsers.expat module with an import statement. In this example, we add start tags, and element data, to a list. We assign the StartElementHandler and CharacterDataHandler to lambda expressions.

Tip: A def-method name could instead be used. Lambda expressions may be sufficient if you need just one statement.

Def

Info: For StartElementHandler, we append the tag name to the list. For CharacterDataHandler, we append the data.

And: This yields a list containing start element names, and the contents of those elements.

Note: Many other handlers, including EndElementHandler and CommentHandler are available. Please see the Python documentation.

xml.parsers.expat: Python.org
Python program that uses xml.parsers.expat import xml.parsers.expat # Will store tag names and char data. list = [] # Create the parser. parser = xml.parsers.expat.ParserCreate() # Specify handlers. parser.StartElementHandler = lambda name, attrs: list.append(name) parser.CharacterDataHandler = lambda data: list.append(data) # Parse a string. parser.Parse("""<?xml version="1.0"?> <item><name>Sam</name> <name>Mark</name> </item>""", True) # Print the items in our list. print(list) Output ['item', 'name', 'Sam', '\n', 'name', 'Mark', '\n']
Newlines. Please notice how newlines are treated as character data. This is not the ideal effect for most programs. Newlines could be ignored, or filtered out of the list with helper methods. This would yield a better data model.
Discussion. Expat is not a Python technology. It is an older XML library created by James Clark in 1998. Written in C, it has excellent performance: it is noted as a "fast" parser. It does no validation. Generally, raw C code outperforms Python code.

So: For performance, Expat is a good choice. It may be harder to use than other solutions. This is a tradeoff you must evaluate.

The Expat XML Parser: github.io
Summary. Nearly every developer will encounter XML files and need to parse them. No one way is ideal. A custom string-based parser, written in Python, is sometimes a good choice. A regular expression, with re, may be effective.re.match, search

But: A C-based, optimized XML parser like Expat is likely one of the fastest options. It requires less testing: it is already developed.

Strings
© TheDeveloperBlog.com
The Dev Codes

Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf