cags Posted January 19, 2010 Share Posted January 19, 2010 Ok, it would appear that my expectations and reality don't match up. I don't know if I'm going about this completely wrong or simply making a small mistake, but I'm getting nothing but a headache from it. I have a HTML page that has been fetched from a site using cURL. I am attempting to fetch various bits of information from it. Since the HTML is so large I won't post it (at least for now). Hopefully it will suffice for me to tell you that the site contains many div's which have the class = "slot". I am attempting to loop through them and within them I am (currently) trying to fetch the href attribute of an a tag that is within a div tag that has the class = "something". A basic example of the XML structure... <div id="slots"> <div class="slot"> <div class="something"> <a href="http://www.google.com">Google</a> </div> </div> <div class="slot"> <div class="something"> <a href="http://www.yahoo.com">Yahoo</a> </div> </div> </div> This is the core of the code I've been trying, I've tried *many* variations, but it seems like somewhere along the lines I'm making an assumption I shouldn't be. $dom = new DOMDocument; libxml_use_internal_errors(true); @$dom->loadHTML($html); libxml_use_internal_errors(false); $xpath = new DOMXPath($dom); $query = $xpath->query('//div[@class="slot"]'); foreach($query as $node) { $q = $xpath->query('//div[@class="titlelogo"]/a', $node); echo $q->item(0)->attributes->getNamedItem('href')->value; } Before somebody mentions it, yes I know I could just do something like //div[@class=slot]/*/a (or whatever the exact syntax is for that) or even build a full relative path, but the point is the contents of the 'slot' divs are all related so I need to work on each 'slot' individually. Quote Link to comment https://forums.phpfreaks.com/topic/189017-using-domdocument-and-domxpath-to-scrape-a-site/ Share on other sites More sharing options...
salathe Posted January 19, 2010 Share Posted January 19, 2010 The second XPath query (within the loop) relates to the root of the document even though you're trying to use a context node. To make the path relative, start it with a dot (.//div[). Quote Link to comment https://forums.phpfreaks.com/topic/189017-using-domdocument-and-domxpath-to-scrape-a-site/#findComment-997977 Share on other sites More sharing options...
cags Posted January 19, 2010 Author Share Posted January 19, 2010 Thanks salathe, that works much better. I was so near yet so far. Quote Link to comment https://forums.phpfreaks.com/topic/189017-using-domdocument-and-domxpath-to-scrape-a-site/#findComment-997984 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.