abrilfluke Posted August 4, 2012 Share Posted August 4, 2012 i'm building my website with presentations off different products, and i face a few problems using curl basically what i need to do is to get some portions of html from different websites and display on my website ex: title, model, description, user reviews etc.... i managed to accomplish some of the code but when changing the source url stop working... even the source is the same my code: $url = "http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=2819129&CatId=4938"; //$url = "http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=1808177&csid=_61"; //this one is not working.... $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1); $source = curl_exec ($ch); $start_description1 = "</tr> </tbody> </table> <p>"; $end_description1 = "</div> </div> <div id=\"Videos\" style=\"display:inline;\">"; $description1_start_pos = strpos($source, $start_description1) + strlen($start_description1); $description1_end_pos = strpos($source, $end_description1) - $description1_start_pos; $description1 = substr($source, $description1_start_pos, $description1_end_pos); echo $description1; it works perfect but if i change the url it won't work... the problem is the start_description html code... on other pages the html code differs... instead of: </tr> </tbody> </table> <p> new page have: </tr> </tbody> </table> <p> or: </tr> </tbody> </table> <p> how can i avoid this error? or what to do to avoid cUrl errors, and retrieve the content i want ? thank you! Quote Link to comment https://forums.phpfreaks.com/topic/266687-get-specific-portion-of-html-source-with-curl-problems-retrieving-right-content/ Share on other sites More sharing options...
gizmola Posted August 4, 2012 Share Posted August 4, 2012 It looks to me like your problem doesn't involve curl at all. It's instead in trying to parse out the portions of the data you want from the tigerdirect markup. Trying to find variable data inside html markup using simple string matching or regular expressions is notoriously painful and error prone. A much better solution is to take the page and use the DOM functions to find and extract the portions you need. Quote Link to comment https://forums.phpfreaks.com/topic/266687-get-specific-portion-of-html-source-with-curl-problems-retrieving-right-content/#findComment-1366824 Share on other sites More sharing options...
abrilfluke Posted August 4, 2012 Author Share Posted August 4, 2012 please be kind enough and paste an example of DOM thank you! Quote Link to comment https://forums.phpfreaks.com/topic/266687-get-specific-portion-of-html-source-with-curl-problems-retrieving-right-content/#findComment-1366828 Share on other sites More sharing options...
gizmola Posted August 4, 2012 Share Posted August 4, 2012 It's not clear to me exactly what you are after. Also tigerdirects pages are pretty messy. In this example I just dump out the "ProductReview' portion of the DOM: error_reporting(E_ERROR); $urls[] = "http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=2819129&CatId=4938"; $urls[] = "http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=1808177&csid=_61"; function curlload($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1); $source = curl_exec($ch); return $source; } foreach ($urls as $url) { $source = curlLoad($url); $dom = DOMDocument::loadHTML($source); $prodReviewElement = $dom->getElementById('ProductReview'); $prodReview = $dom->saveXML($prodReviewElement); echo "***********************************************\n\n"; echo "$url\n"; echo "***********************************************\n\n"; echo $prodReview; } Have a look at the DOM manual, domdocument etc. The only tricky thing I saw was that they often don't use id's so if you plan to try and extract individual elements, it looks like many of them would be by class, where you'd have to do an XPath search, which is a bit more complicated, but still the best approach. Quote Link to comment https://forums.phpfreaks.com/topic/266687-get-specific-portion-of-html-source-with-curl-problems-retrieving-right-content/#findComment-1366831 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.