Trying to learn how to use Python to interact with an XML file. The eventual goal is to plot some attributes of the xml file's nodes in scatter-plot fashion using a Python library called matplotlib. First, I need to learn how to work with XML through Python.
Starting here: https://docs.python.org/3/library/xml.dom.html#module-xml.dom
The page talks about something called SAX, which I know nothing about. Apparently when parsing XML, we choose between SAX and DOM. I know DOM stands for Document Object Model, but I don't really know what it means, in practical terms. So, good time to learn more. I went and found this thread, which explains the difference well:
http://stackoverflow.com/questions/6828703/what-is-the-difference-between-sax-and-dom
I guess SAX triggers events while the XML is being parsed, while the DOM parses the XML first and then triggers events.
Now that I understand a little more about XML and the DOM, time to use Python to interact with XML through the DOM. Apparently the stuff you use to interact with XML in Python comes with the base Python download, so I won't need to install any extra libraries just yet.
I'm using this video to see an example:
https://www.youtube.com/watch?v=aCksVW1YUHs
The video linked me to some source code here, which I copy-pasted and adapted into this:
from xml.dom import minidom
import matplotlib.pyplot as plt
doc = minidom.parse("staff.xml")
# doc.getElementsByTagName returns NodeList
rows = doc.getElementsByTagName("row")
for row in rows:
if (row.getAttribute("Id") == 1):
rid = 1
viewCount = row.getAttribute("ViewCount")
score = row.getAttribute("Score")
There is some confusion about whether to use Python 3.4 or 2.7, or if it matters. (I have both installed.) I think the conclusion is that matplotlib is for Python 2.7, so I'll do everything in 2.7.
Tried to run Grimm's sample code -- got error:
ImportError: No module named six
So, googled the error. Found out: This error means that matplotlib depends on a module named six and that I haven't installed the library which contains the six module.
The six module is pretty cool, being a compatibility library. I downloaded it. Now I have to figure out what to do with the thing I downloaded. Apparently there's a thing called a python wheel. By googling "Install python wheel", I found a video that explains this a little bit.
Basically, I didn't need to download the six module after all -- pip will download and install it for me when I type this into the command line and press Enter:
pip install six
So, six is now installed. I try running Grimm's code again and get this:
ImportError: matplotlib requires dateutil
Oops, looks like I didn't follow the instructions for matplotlib very well. They say that I need to install a number of python libraries. So, I'll try to use the installer tool I downloaded. pip install numpy. An installation occurs, but an error happens. It's a complicated one and something I've never dealt with before. I ignore it for now. I'll circle back to it if things aren't working after I install the rest of the libraries.
Well, I got an error message that told me the error was important, so looks like I'll have to fix it. I google the error and find this:
http://stackoverflow.com/questions/2817869/error-unable-to-find-vcvarsall-bat
Which points me to this:
http://stackoverflow.com/questions/6551724/how-do-i-point-easy-install-to-vcvarsall-bat
Which tells me that I need to have Visual Studio installed, which raises the question, "Don't I have to pay for that or something?" But I suspect that's off. So I google "Is visual studio free?" The answer seems to be NO--there's a trial version. But I'd rather just go back and install Linux finally. That'll be its own post.
No comments:
Post a Comment