phpsycho Posted July 30, 2011 Share Posted July 30, 2011 I am trying to build a small script that will scrape links for other links and images. Already got things like asking robots.txt and I'm using curl, not file get contents. Although.. I am using file() to get info from the robots.txt.. that way I can get info from each line. Problem is though.. no links are being added.. and when I save an image to my server I want to read its info like is it color, width, height, extension, etc But I keep getting these errors: IMG ADDED: http://***.com/images/status-busy.png PHP Warning: exif_read_data(5525611904.png): File not supported in /var/www/alpha/my/bots/crawl10.php on line 172 IMG ADDED: http://***.com/images/status-busy.png PHP Warning: exif_read_data(2322371467.png): File not supported in /var/www/alpha/my/bots/crawl10.php on line 172 IMG ADDED: http://***.com/images/status-busy.png PHP Fatal error: Cannot break/continue 1 level in /var/www/alpha/my/bots/crawl10.php on line 120 Just noticed its trying to add the images more than once.. which is odd. but the last one is what I was really wondering about.. Lines 120ish: <?php $parse = parse_url($url); if(isset($parse['path'])){ $haystack = pathinfo($parse['path'], PATHINFO_EXTENSION); if(!preg_match("/(php|html|htm|asp|aspx|shtml|php4|php5|cfm|pl|jsp)/is", $haystack)){ continue; } } ?> And for this query here.. <?php mysql_query("INSERT INTO `search_images` (`url`,`file`,`name`,`from`,`width`,`height`,`color`,`size`,`type`,`datetime`) values ('$img[2]','$file','$name','$link[2]','$width','$height','$color','$size','$extention','$datetime')"); ?> I have in a foreach loop. foreach($imgs as $img) can I just add another foreach inside that one that says foreach($links as $link)? so I can get $link[2] which is where the image came from. Quote Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/ Share on other sites More sharing options...
QuickOldCar Posted July 30, 2011 Share Posted July 30, 2011 http://www.php.net/manual/en/function.exif-read-data.php png is not supported for exif data You can use GD locally on the image after you download it. http://www.php.net/manual/en/function.gd-info.php http://www.php.net/manual/en/function.getimagesize.php http://www.php.net/manual/en/function.image-type-to-mime-type.php And I guess show the rest of your code for how you associate the mysql inserts. Quote Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249591 Share on other sites More sharing options...
phpsycho Posted July 30, 2011 Author Share Posted July 30, 2011 ah okay. Well those errors are fixed. but.. PHP Fatal error: Cannot break/continue 1 level in /var/www/alpha/my/bots/crawl10.php on line 120 Fatal error: Cannot break/continue 1 level in /var/www/alpha/my/bots/crawl10.php on line 120 It must be something with my if statement, but I think I am doing it right.. <?php $parse = parse_url($url); if(isset($parse['path'])){ $haystack = pathinfo($parse['path'], PATHINFO_EXTENSION); if(!preg_match("/(php|html|htm|asp|aspx|shtml|php4|php5|cfm|pl|jsp)/is", $haystack)){ continue; } }?> second if statement.. Quote Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249602 Share on other sites More sharing options...
QuickOldCar Posted July 30, 2011 Share Posted July 30, 2011 I only see that message when it didn't match the preg_match filetype $parse = parse_url($url); if(isset($parse['path'])){ $haystack = pathinfo($parse['path'], PATHINFO_EXTENSION); if(!preg_match("/(php|html|htm|asp|aspx|shtml|php4|php5|cfm|pl|jsp)/is", $haystack)){ echo "Didn't match file type"; die; } else { echo $haystack; //rest of code } } Quote Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249606 Share on other sites More sharing options...
phpsycho Posted July 31, 2011 Author Share Posted July 31, 2011 hmm okay, think that may work.. but now I am just getting blank page. Would it be alright if I PM you the code? I rather not stick it out for the public to see. Thanks Quote Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249617 Share on other sites More sharing options...
QuickOldCar Posted July 31, 2011 Share Posted July 31, 2011 yeah can pm it Quote Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249619 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.