Python parse XML with lxml library
In this post, you will learn about the Python lxml library and how it can be used to perform xml parsing and data scraping.
Python has many libraries for reading and writing data in XML formats. The lxml is one that has consistently strong performance and is easy to use in parsing large files. Here, we have mentioned the installation process and some important features of the lxml library.
Install lxml library
First, we need to install the lxml library. The following command installs the library on the terminal window using pip tool.
pip install lxml
On successful installation, it returns something like this-
Collecting lxml
Downloading lxml-4.5.0-cp37-cp37m-win_amd64.whl (3.7 MB)
|████████████████████████████████| 3.7 MB 384 kB/s
Installing collected packages: lxml
Successfully installed lxml-4.5.0
lxml objectify
In lxml.objectify, element trees provide an API that models the behaviour of normal Python object trees as closely as possible. So, we have imported the library as follows-
from lxml import objectify
xml parse() method
The parse() method can be used to parse files and file-like objects.
Get XML element children
We can retrieve a list of all children of an element by calling the getchildren() method.
Python Code to parse XML with lxml library
Suppose we have the following XML file.
<vehicle>
<car>
<name>Suzuki Dzire</name>
<engine>VVT Petrol</engine>
<displacement>1197CC</displacement>
<cylinder>4</cylinder>
<drivetrain>FWD</drivetrain>
<emission>BS VI</emission>
<seating>5</seating>
<mileage>22kmpl</mileage>
<power_steering>Yes</power_steering>
<tyre_type>tubeless</tyre_type>
</car>
</vehicle>
Here is the Python code to parse the XML file, it skips the given elements and displays the remaining elements and their values.
from lxml import objectify
path = 'vehicle.xml'
parsed = objectify.parse(open(path))
root = parsed.getroot()
data = []
skip_fields = ['tyre_type','power_streeing','emission']
for elt in root.car:
el_data = {}
for child in elt.getchildren():
if child.tag in skip_fields:
continue
el_data[child.tag] = child.pyval
data.append(el_data)
print(data)
Once we will execute the above code, it returns the following output.

Related Articles
Convert Python list to numpy arrayConvert string to list Python
Python program to list even and odd numbers of a list
Python loop through list
Sort list in descending order Python
Convert array to list Python
Python take screenshot of specific window
Web scraping Python BeautifulSoup
Python web scraping using urllib
Python requests GET method
How to convert MySQL query result to JSON in Python
Get data from MongoDB Python
CRUD operations in Python using MongoDB connector
Write Python Pandas Dataframe to CSV
Quick Introduction to Python Pandas
Python Pandas DataFrame
Python3 Tkinter Messagebox
Python get visitor information by IP address
Python Webbrowser
Python Tkinter Overview and Examples
Python Turtle Graphics Overview