dilbertone Posted December 11, 2011 Share Posted December 11, 2011 good day dear php-freaks This is a posting that is related to an image-display-topic. I ve got a list of 5500 websites and need to grab a little screenshot of them- to create a thumbnail that is ready to show - as a thumbnail of course - on a website. How do i do that. Dynamically - with by using file_get_contents($url): $url = 'http://www.exmaple.com; $output = file_get_contents($url); or should i download all the images first secondly store it on a folder (as a thumbnail) on the server and thrdly: retrieve it with a certain call. The goal: i want to retrieve the image of a given website - as a screenshot. As an example - what i have in mind we can have a look at the site www.drupal.org and there - see "Sites Made with Drupal" You see that there the image is changing from time to time. It changes every visit (i guess). Well how do they do that?! whats the solution? But: with PHP, it is easy to get the HTML contents of a web page by using file_get_contents($url): $url = 'http://www.exmaple.com; $output = file_get_contents($url); Some musings about the method: well - what do you think. Can i add a list of URLS into a database and then let the above mentioned image gallery do a call and show the image, or should i fetch all the images with a perl - programme (see below) or httrack and store it locally to do calls to the locally based file. Hmm - i hope that you understand my question ... or do i have to expalin it more... ?! Which method is more smart is just less difficult and just easiser to accomplish? Thats pretty easy -no scraping that goes into the deepnes of the site. Thank god it is that easy! With the second code i can store the files into and folder using the corresponding names To sum it up: this is a question that is related to a method - fetching data on the fly eg with $output = file_get_contents($url); ...or getting the data (more than 5500 images - that are screenshots from given webpages [nothing more nothing less] and store it here locally - and do calls to them ... Which method is smarter!? love to hear from you greetings dilbertone Note: i only need the screenshots - nothing more. Thats pretty easy - noscraping that goes into the deepnes of the site. Thank god it is that easy! Here is Perl solution: #!/usr/bin/perl use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); open(INPUT, "urls.txt") or die "Can't open file: $!"; while (<INPUT>) { chomp; $mech->get($_); my $png = $mech->content_as_png(); } close(INPUT); exit; From the docs: Returns the given tab or the current page rendered as PNG image. All parameters are optional. $tab defaults to the current tab. If the coordinates are given, that rectangle will be cut out. The coordinates should be a hash with the four usual entries, left,top,width,height. Well this is specific to WWW::Mechanize::Firefox. Currently, the data transfer between Firefox and Perl is done Base64-encoded.It would be beneficial to find what's necessary to make JSON handle binary data more gracefully. Well the source is here: Filename: urls.txt (for example like here shown) www.google.com www.cnn.com www.msnbc.com news.bbc.co.uk www.bing.com www.yahoo.com open my $out, '>', "$_.png" or die "could not open '$_.png' for output $!"; print $out $png; close $out; Again: Note: i only need the screenshots - nothing more. Thats pretty easy - no scraping that goes into the deepnes of the site. Thank god it is that easy! And the alternative is - to work with the dynamically solution - with by using file_get_contents($url): $url = 'http://www.exmaple.com; $output = file_get_contents($url); which is the smarter solution!? love to hear from you! Quote Link to comment https://forums.phpfreaks.com/topic/252943-file_get_contentsurl-vs-wwwmechanizefirefox-a-methological-question/ Share on other sites More sharing options...
dilbertone Posted December 11, 2011 Author Share Posted December 11, 2011 hello dear folks again me. Well i am musing about the most clever and smart way to do a job... Hmmm - i guess there is a main difference between retrieving HTML (on the one handside) and retrieving an image (on the other handside). Retrieving a image - with the Perl-code and the FireFox[/b] (see the code that includes the FireFox part in Mechanize) seems to be much much smarter than - for example doing it with httrack (the famous tool). With the little Perl-snippet we re able to do nice rendering, and interpreting css/js. The regular browser (automated) such as firefox is able do a good job here. On a sidenote: Considering to do the fetching-job this little Perl-Snippet is far more powerful -than httrack - since this job is not something httrack would do easily. HTTrack is only able to grab part of website(s), but is not able to do any rendering of any sort, nor interpreting css/js. #!/usr/bin/perl use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); open(INPUT, "urls.txt") or die "Can't open file: $!"; while (<INPUT>) { chomp; $mech->get($_); my $png = $mech->content_as_png(); } close(INPUT); exit; Well: There is absolutly no need to fetch HTML-Contents. Caching the image is done easily with the Perl-Snippet. And therefore Httrack is (absolutley) not the tool that i should take into consideration. what do you think !? Quote Link to comment https://forums.phpfreaks.com/topic/252943-file_get_contentsurl-vs-wwwmechanizefirefox-a-methological-question/#findComment-1296885 Share on other sites More sharing options...
Drongo_III Posted December 11, 2011 Share Posted December 11, 2011 Hi mate I reckon this should put you on the write track http://www.php.net/manual/en/function.imagegrabwindow.php Drongo Quote Link to comment https://forums.phpfreaks.com/topic/252943-file_get_contentsurl-vs-wwwmechanizefirefox-a-methological-question/#findComment-1296899 Share on other sites More sharing options...
dilbertone Posted December 11, 2011 Author Share Posted December 11, 2011 hi there good day - great to hear from you Drongo,!!! Hi mate I reckon this should put you on the write track http://www.php.net/manual/en/function.imagegrabwindow.php Drongo well good catch - great thoghts - i am happy to hear that from you.... I will digg deeper and try to solve the things with your ideas. i come back later the day. greetings db1 update: great catch .- all looks interesting! <?php $browser = new COM("InternetExplorer.Application"); $handle = $browser->HWND; $browser->Visible = true; $im = imagegrabwindow($handle); $browser->Quit(); imagepng($im, "iesnap.png"); imagedestroy($im); ?> Capture a window (IE for example) but with its content <?php $browser = new COM("InternetExplorer.Application"); $handle = $browser->HWND; $browser->Visible = true; $browser->Navigate("http://www.libgd.org"); /* Still working? */ while ($browser->Busy) { com_message_pump(4000); } $im = imagegrabwindow($handle, 0); $browser->Quit(); imagepng($im, "iesnap.png"); imagedestroy($im); ?> well i guess that i should try this out - interesting stuff in deed Quote Link to comment https://forums.phpfreaks.com/topic/252943-file_get_contentsurl-vs-wwwmechanizefirefox-a-methological-question/#findComment-1296965 Share on other sites More sharing options...
QuickOldCar Posted December 12, 2011 Share Posted December 12, 2011 Here's a script I use to take thumbs of websites, I save them as md5, you could save them something else if wanted to I run a different script for display purposes that looks if the image exists in many multiple ways, because that's the way urls work, and then resize them with gd <div align="center"> <form action="" method="GET" align="center"> <input type="text" name="url" size="100" id="url" placeholder="Insert a Url" /> <br /> <input type="submit" value="Snap IT" /> <br /> </form> <?php if (isset($_GET['url'])){ //parse the url to host function getparsedHost($new_parse_url) { $parsedUrl = parse_url(trim($new_parse_url)); return trim($parsedUrl['host'] ? $parsedUrl['host'] : array_shift(explode('/', $parsedUrl['path'], 2))); } //get website url from browser $input_url = mysql_real_escape_string(trim($_GET['url'])); //clean the url $input_url = str_ireplace(array("http://www.","http://","feed://","ftp://","https://","https://www."), "", $input_url); $input_url = rtrim($input_url, "/"); $url = "http://$input_url"; //use parsed url versus full urls $url = "http://".getparsedHost($url); //if empty url show message if($url == "" || $url == "http://"){ echo "Insert a valid url."; die; } //make md5 hash for filename $md5_url = md5($url); //resize function function resize($img, $w, $h, $newfilename) { //Check if GD extension is loaded if (!extension_loaded('gd') && !extension_loaded('gd2')) { trigger_error("GD is not loaded", E_USER_WARNING); return false; } //Get Image size info $imgInfo = getimagesize($img); switch ($imgInfo[2]) { case 1: $im = imagecreatefromgif($img); break; case 2: $im = imagecreatefromjpeg($img); break; case 3: $im = imagecreatefrompng($img); break; default: trigger_error('Unsupported filetype!', E_USER_WARNING); break; } //If image dimension is smaller, do not resize if ($imgInfo[0] <= $w && $imgInfo[1] <= $h) { $nHeight = $imgInfo[1]; $nWidth = $imgInfo[0]; }else{ //yeah, resize it, but keep it proportional if ($w/$imgInfo[0] > $h/$imgInfo[1]) { $nWidth = $w; $nHeight = $imgInfo[1]*($w/$imgInfo[0]); }else{ $nWidth = $imgInfo[0]*($h/$imgInfo[1]); $nHeight = $h; } } $shrink = 0.40;//shrink by % $nWidth = round($nWidth); $nHeight = round($nHeight); $nWidth = $nWidth * $shrink; $nHeight = $nHeight * $shrink; $newImg = imagecreatetruecolor($nWidth, $nHeight); /* Check if this image is PNG or GIF, then set if Transparent*/ if(($imgInfo[2] == 1) OR ($imgInfo[2]==3)){ imagealphablending($newImg, false); imagesavealpha($newImg,true); $transparent = imagecolorallocatealpha($newImg, 255, 255, 255, 127); imagefilledrectangle($newImg, 0, 0, $nWidth, $nHeight, $transparent); } imagecopyresampled($newImg, $im, 0, 0, 0, 0, $nWidth, $nHeight, $imgInfo[0], $imgInfo[1]); //Generate the file, and rename it to $newfilename switch ($imgInfo[2]) { case 1: imagegif($newImg,$newfilename); break; case 2: imagejpeg($newImg,$newfilename); break; case 3: imagepng($newImg,$newfilename); break; default: trigger_error('Failed resize image!', E_USER_WARNING); break; } return $newfilename; } //load url fullscreen in IE browser $browser = new COM("InternetExplorer.Application") or die ("Could not initiate IE object."); $handle = $browser->HWND; $browser->Visible = true; $browser->FullScreen = true; $browser->Navigate($input_url); $seconds = 7; $delay_time = $seconds * 1000; if($browser->Busy) { com_message_pump($delay_time); } $im = imagegrabwindow($handle, 0); //$im = imagegrabscreen($handle, 0);//grabs entire primary window $browser->Quit(); $browser=null; unset($browser); imagepng($im, "./thumb/$md5_url.png"); //image location $image_location = "./thumb/$md5_url.png"; //browser snap size in fullscreen $w = 1024; $h = 768; //resize the image $thumbnail = resize($image_location, $w, $h, $image_location); //show the thumbnail and href links echo "<a href='$url' TARGET='_blank'><img src='$image_location' alt='$url' /><br />"; echo " <a href='$url' TARGET='_blank'>$url</a><br />"; echo "<a href='thumb/$md5_url.png'>Thumb Location</a>"; //always destroy the temp image in GD imagedestroy($im); } ?> There is also a good plugin for firefox that works Pearl Crescent Page Saver you can install the basic version, save them as %5 for md5, I set them to 40% of size which is 401 pixels I run a command like this to save as png exec("Psexec.exe -i -d ./firefox/firefox.exe -savepng $url -savedelay 3000"); You could also check out webshot it can snap all your images from a list, save as certain sizes I also wanted to add, the only way to render everything correctly on a page is to use a browser. Using firefox and adblock is nice to block the ads. Quote Link to comment https://forums.phpfreaks.com/topic/252943-file_get_contentsurl-vs-wwwmechanizefirefox-a-methological-question/#findComment-1296993 Share on other sites More sharing options...
dilbertone Posted December 12, 2011 Author Share Posted December 12, 2011 hello dear QuickOldCar, well - in one word - many thanks for sharing your code - this looks damned cool! Very very well done! Thank you so much for all this interesing lines of code! You are great! You've created a monster--congratulations! I have a quick look at the code! It looks great and impressive - and contains all necessary things. Well i have to make up my ideas about saving the results as md5 - i never did this. But this is a very very cool method! Dear QuickOldCar, thank you for the service you provide the ham community. At the weekend i will give your code and your plan a try - and then i come back and report all. Untill then Have a great week! Best regards and warm greetings dilbertone Quote Link to comment https://forums.phpfreaks.com/topic/252943-file_get_contentsurl-vs-wwwmechanizefirefox-a-methological-question/#findComment-1297252 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.