Omzy Posted January 30, 2010 Share Posted January 30, 2010 I've created a scrape script which fetches all links on a page: $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $links = $xpath->query("//a[@class='listinglink']"); $i=0; foreach($links as $item) { $href = $links->item($i); $url = $href->getAttribute('href'); echo '<a href="'.$url.'">'.$url.'</a><br/>'; } I now need to extend this further - it needs to go in to each link and get for example the content of all the <p> tags on the page. So for example the page output should be as follows: Link 1 P tag 1 content P tag 2 content P tag 3 content Link 2 P tag 1 content P tag 2 content ...and so on. Can anyone assist me with this? Quote Link to comment Share on other sites More sharing options...
The Little Guy Posted January 30, 2010 Share Posted January 30, 2010 I would do it a little something like this: foreach($links as $item) { curl_setopt($ch, CURLOPT_URL, $item); curl_setopt($ch, CURLOPT_HEADER, FALSE); curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); $opt = curl_exec($ch); curl_close($ch); preg_match_all("~<p>(.+?)</p>~", $opt, $matches); print_r($matches); } Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.