phpsycho Posted July 30, 2011 Share Posted July 30, 2011 I am trying to build a small script that will scrape links for other links and images. Already got things like asking robots.txt and I'm using curl, not file get contents. Although.. I am using file() to get info from the robots.txt.. that way I can get info from each line. Problem is though.. no links are being added.. and when I save an image to my server I want to read its info like is it color, width, height, extension, etc But I keep getting these errors: IMG ADDED: http://***.com/images/status-busy.png PHP Warning: exif_read_data(5525611904.png): File not supported in /var/www/alpha/my/bots/crawl10.php on line 172 IMG ADDED: http://***.com/images/status-busy.png PHP Warning: exif_read_data(2322371467.png): File not supported in /var/www/alpha/my/bots/crawl10.php on line 172 IMG ADDED: http://***.com/images/status-busy.png PHP Fatal error: Cannot break/continue 1 level in /var/www/alpha/my/bots/crawl10.php on line 120 Just noticed its trying to add the images more than once.. which is odd. but the last one is what I was really wondering about.. Lines 120ish: <?php $parse = parse_url($url); if(isset($parse['path'])){ $haystack = pathinfo($parse['path'], PATHINFO_EXTENSION); if(!preg_match("/(php|html|htm|asp|aspx|shtml|php4|php5|cfm|pl|jsp)/is", $haystack)){ continue; } } ?> And for this query here.. <?php mysql_query("INSERT INTO `search_images` (`url`,`file`,`name`,`from`,`width`,`height`,`color`,`size`,`type`,`datetime`) values ('$img[2]','$file','$name','$link[2]','$width','$height','$color','$size','$extention','$datetime')"); ?> I have in a foreach loop. foreach($imgs as $img) can I just add another foreach inside that one that says foreach($links as $link)? so I can get $link[2] which is where the image came from. Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/ Share on other sites More sharing options...
QuickOldCar Posted July 30, 2011 Share Posted July 30, 2011 http://www.php.net/manual/en/function.exif-read-data.php png is not supported for exif data You can use GD locally on the image after you download it. http://www.php.net/manual/en/function.gd-info.php http://www.php.net/manual/en/function.getimagesize.php http://www.php.net/manual/en/function.image-type-to-mime-type.php And I guess show the rest of your code for how you associate the mysql inserts. Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249591 Share on other sites More sharing options...
phpsycho Posted July 30, 2011 Author Share Posted July 30, 2011 ah okay. Well those errors are fixed. but.. PHP Fatal error: Cannot break/continue 1 level in /var/www/alpha/my/bots/crawl10.php on line 120 Fatal error: Cannot break/continue 1 level in /var/www/alpha/my/bots/crawl10.php on line 120 It must be something with my if statement, but I think I am doing it right.. <?php $parse = parse_url($url); if(isset($parse['path'])){ $haystack = pathinfo($parse['path'], PATHINFO_EXTENSION); if(!preg_match("/(php|html|htm|asp|aspx|shtml|php4|php5|cfm|pl|jsp)/is", $haystack)){ continue; } }?> second if statement.. Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249602 Share on other sites More sharing options...
QuickOldCar Posted July 30, 2011 Share Posted July 30, 2011 I only see that message when it didn't match the preg_match filetype $parse = parse_url($url); if(isset($parse['path'])){ $haystack = pathinfo($parse['path'], PATHINFO_EXTENSION); if(!preg_match("/(php|html|htm|asp|aspx|shtml|php4|php5|cfm|pl|jsp)/is", $haystack)){ echo "Didn't match file type"; die; } else { echo $haystack; //rest of code } } Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249606 Share on other sites More sharing options...
phpsycho Posted July 31, 2011 Author Share Posted July 31, 2011 hmm okay, think that may work.. but now I am just getting blank page. Would it be alright if I PM you the code? I rather not stick it out for the public to see. Thanks Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249617 Share on other sites More sharing options...
QuickOldCar Posted July 31, 2011 Share Posted July 31, 2011 yeah can pm it Link to comment https://forums.phpfreaks.com/topic/243326-scraping-with-php/#findComment-1249619 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.