Tuesday, 27 August 2013

Parsing XML with Python ignoring parts

Parsing XML with Python ignoring parts

I'm having difficulty parsing a particular style of XML.
The XML file looks like:
<channels>
<genre type = blah1>
<channel name="Channel 1">
<show>
<title>hello</title>
</show>
</channel>
<channel name="Channel 2">
<show>
<title>hello</title>
</show>
</channel>
</genre>
<genre type="blah2">
<channel name="Channel 3">
<show>
<title>hello</title>
</show>
</channel>
</genre>
<channels>
So my problem is as follows:
channelList = rootElem.find(".//channel[@name]")
howManyChannels = len(channelList)
for x in range(1, howManyChannels):
print x
print rootElem.find(".//channel[@name]["+str(x)+"]").get('name')
for y in rootElem.find(".//channel[@name]["+str(x)+"]"):
print y.findtext('title')
This gets to Channel 2 and then errors with:
Traceback (most recent call last):
File "parse.py", line 17, in <module>
print rootElem.find(".//channel[@name]["+str(x)+"]").get('name')
AttributeError: 'NoneType' object has no attribute 'get'
Why doesn't the code:
for y in rootElem.find(".//channel[@name]["+str(x)+"]"):
include the 3rd channel, why is it being isolated as it is in another
genre tab? How do I change the code to accommodate for this?
Thanks

No comments:

Post a Comment