Omzy Posted January 22, 2010 Share Posted January 22, 2010 I've created a scrape script which fetches all links on a page: $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); echo "<br />Link: $url"; } I now need to extend this further - it needs to go in to each link and get for example the H1 tag on the page. Can someone provide any sample solution? Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/ Share on other sites More sharing options...
premiso Posted January 22, 2010 Share Posted January 22, 2010 You would be better off posting a section (10-15 lines max) of the link and the H1 tag you are trying to fetch. It is hard to tell not knowing what is being traversed. Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000202 Share on other sites More sharing options...
Omzy Posted January 23, 2010 Author Share Posted January 23, 2010 I'm just after a sample solution. The code I posted above fetches all the links on a given page. Now I want it to go into each of those links and output the H1 tag of that page. Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000218 Share on other sites More sharing options...
jtgraphic Posted January 23, 2010 Share Posted January 23, 2010 Use the same code, but search for H1 tags inside the location of the href. Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000225 Share on other sites More sharing options...
Omzy Posted January 23, 2010 Author Share Posted January 23, 2010 Use the same code, but search for H1 tags inside the location of the href. how? Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000227 Share on other sites More sharing options...
jtgraphic Posted January 23, 2010 Share Posted January 23, 2010 $hrefs = $xpath->evaluate("/html/body//a"); should be: $hrefs = $xpath->evaluate("/html/body//h1"); Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000240 Share on other sites More sharing options...
Omzy Posted January 23, 2010 Author Share Posted January 23, 2010 lol. i suggest u re-read my original post. Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000242 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.