Omzy Posted January 22, 2010 Share Posted January 22, 2010 I've created a scrape script which fetches all links on a page: $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); echo "<br />Link: $url"; } I now need to extend this further - it needs to go in to each link and get for example the H1 tag on the page. Can someone provide any sample solution? Quote Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/ Share on other sites More sharing options...
premiso Posted January 22, 2010 Share Posted January 22, 2010 You would be better off posting a section (10-15 lines max) of the link and the H1 tag you are trying to fetch. It is hard to tell not knowing what is being traversed. Quote Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000202 Share on other sites More sharing options...
Omzy Posted January 23, 2010 Author Share Posted January 23, 2010 I'm just after a sample solution. The code I posted above fetches all the links on a given page. Now I want it to go into each of those links and output the H1 tag of that page. Quote Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000218 Share on other sites More sharing options...
jtgraphic Posted January 23, 2010 Share Posted January 23, 2010 Use the same code, but search for H1 tags inside the location of the href. Quote Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000225 Share on other sites More sharing options...
Omzy Posted January 23, 2010 Author Share Posted January 23, 2010 Use the same code, but search for H1 tags inside the location of the href. how? Quote Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000227 Share on other sites More sharing options...
jtgraphic Posted January 23, 2010 Share Posted January 23, 2010 $hrefs = $xpath->evaluate("/html/body//a"); should be: $hrefs = $xpath->evaluate("/html/body//h1"); Quote Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000240 Share on other sites More sharing options...
Omzy Posted January 23, 2010 Author Share Posted January 23, 2010 lol. i suggest u re-read my original post. Quote Link to comment https://forums.phpfreaks.com/topic/189488-scraping-linkscontent/#findComment-1000242 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.