Jump to content

Using DOMDocument and DOMXPath to scrape a site


cags

Recommended Posts

Ok, it would appear that my expectations and reality don't match up. I don't know if I'm going about this completely wrong or simply making a small mistake, but I'm getting nothing but a headache from it. I have a HTML page that has been fetched from a site using cURL. I am attempting to fetch various bits of information from it. Since the HTML is so large I won't post it (at least for now). Hopefully it will suffice for me to tell you that the site contains many div's which have the class = "slot". I am attempting to loop through them and within them I am (currently) trying to fetch the href attribute of an a tag that is within a div tag that has the class = "something". A basic example of the XML structure...

 

<div id="slots">
   <div class="slot">
      <div class="something">
         <a href="http://www.google.com">Google</a>
      </div>
   </div>
   <div class="slot">
      <div class="something">
         <a href="http://www.yahoo.com">Yahoo</a>
      </div>
   </div>
</div>

 

This is the core of the code I've been trying, I've tried *many* variations, but it seems like somewhere along the lines I'm making an assumption I shouldn't be.

 

$dom = new DOMDocument;
libxml_use_internal_errors(true);
@$dom->loadHTML($html);
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);

$query = $xpath->query('//div[@class="slot"]');

foreach($query as $node) {
    $q = $xpath->query('//div[@class="titlelogo"]/a', $node);
    echo $q->item(0)->attributes->getNamedItem('href')->value;
}

Before somebody mentions it, yes I know I could just do something like //div[@class=slot]/*/a (or whatever the exact syntax is for that) or even build a full relative path, but the point is the contents of the 'slot' divs are all related so I need to work on each 'slot' individually.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.