jmurch Posted March 1, 2009 Share Posted March 1, 2009 I have my php curl script working with websites on port 80 fine. I'm trying to use it now on sites with an alternate port, in this case 8006. I've tried to set the option: curl_setopt($ch, CURLOPT_PORT, 8006); and curl_setopt($ch, CURLOPT_PORT, '8006'); As well as making the port a part of the URL: www.website.com:8006 Any advice or suggestions wourld be appreciated. TIA, Jeff Murch Quote Link to comment Share on other sites More sharing options...
jmurch Posted March 1, 2009 Author Share Posted March 1, 2009 Here is my full code. When I enter 'http://www.google.com' as a URL in the text box it works fine. When I enter http://205.188.215.228:8006/ it does not work. <html> <body> <form method="post"> URL: <input type="text" name="url"> <input type="submit" value="Go!"> </form> <? $url = $_REQUEST['url']; if(!$url) exit; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); //curl_setopt($ch, CURLOPT_PORT, '8006'); curl_setopt($ch, CURLOPT_REFERER, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); # Get HTML $output = curl_exec($ch); # Close handle curl_close($ch); # Regular Expression to get all ANCHOR tags preg_match_all("@<a[ ]+href\s*=\s*[\"'](.*)[\"'].*>(.*)</a>@isU", $output, $m); $i = count($m[0]); # Create an assoc array for($a=0;$a<$i;$a++) $anchors[$m[2][$a]]=$m[0][$a]; # Display for testing print_r($anchors); ?> </body> </html> Quote Link to comment Share on other sites More sharing options...
freedbill Posted June 19, 2009 Share Posted June 19, 2009 Having the same problem. I have started a new post but thought I'd log it here also in hopes that someone will have the secret in their back pocket. Thanks Much, Bill Quote Link to comment Share on other sites More sharing options...
thebadbad Posted June 19, 2009 Share Posted June 19, 2009 The page you're trying to scrape apparently checks the user agent string, and discards requests when it's empty. Set it via cURL to get around it: curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; da; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11'); Always a good idea when scraping, since some pages check that. Can also be set with ini_set('user_agent', 'value'). Edit: BTW, the port specified in curl_setopt() should be an integer, as per the manual. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.