Omzy Posted January 30, 2010 Share Posted January 30, 2010 I've created a scrape script which fetches all links on a page: $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $links = $xpath->query("//a[@class='listinglink']"); $i=0; foreach($links as $item) { $href = $links->item($i); $url = $href->getAttribute('href'); echo '<a href="'.$url.'">'.$url.'</a><br/>'; } I now need to extend this further - it needs to go in to each link and get for example the content of all the <p> tags on the page. So for example the page output should be as follows: Link 1 P tag 1 content P tag 2 content P tag 3 content Link 2 P tag 1 content P tag 2 content ...and so on. Can anyone assist me with this? Link to comment https://forums.phpfreaks.com/topic/190376-scraping-linkscontent/ Share on other sites More sharing options...
The Little Guy Posted January 30, 2010 Share Posted January 30, 2010 I would do it a little something like this: foreach($links as $item) { curl_setopt($ch, CURLOPT_URL, $item); curl_setopt($ch, CURLOPT_HEADER, FALSE); curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); $opt = curl_exec($ch); curl_close($ch); preg_match_all("~<p>(.+?)</p>~", $opt, $matches); print_r($matches); } Link to comment https://forums.phpfreaks.com/topic/190376-scraping-linkscontent/#findComment-1004398 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.