Sunday, 8 September 2013

Extract a certain part of HTML with XPath and HTMLAbilityPack

Extract a certain part of HTML with XPath and HTMLAbilityPack

I am having an issue with XPath syntax as I dont understand how to use it
to extract certain HTML statements. I am trying to load a videos
information from a channel page;
http://www.youtube.com/user/CinemaSins/videos
I know there is a line that holds all the details from views, title, ID, ect.
Here is what I am trying to get from within the html:
Thats line 2836;
<div class="yt-lockup clearfix yt-lockup-video yt-lockup-grid
context-data-item" data-context-item-id="ntgNB3Mb08Y"
data-context-item-views="243,456 views" data-context-item-time="9:01"
data-context-item-type="video" data-context-item-user="CinemaSins"
data-context-item-title="Everything Wrong With The Chronicles Of Riddick
In 8 Minutes Or Less">
I'm not sure how, But I have HTML Ability Pack added as a resouce and have
started attempts on getting it. Can someone explain how to get all of
those details and the XPath syntax involved?
What I have attemped:
foreach (HtmlNode node in
doc.DocumentNode.SelectNodes("//div[@class='yt-lockup clearfix
yt-lockup-video yt-lockup-grid context-data-item']//a"))
{
if (node.ChildNodes[0].InnerHtml != String.Empty)
{
title.Add(node.ChildNodes[0].InnerHtml);
}
}
^ The above code works in only getting the title of each video. But it
also has a blank input aswell. Code executed and result is below.

No comments:

Post a Comment