willgar Posted February 27, 2014 Share Posted February 27, 2014 I'm very new to php coding and starting to teach myself the basis, but i'm struggling with this particular issue. I'm trying to scrape all the product images from one of my sites as save them to a folder on the server on another - I'm trying to create a scraping script to do this. I've found the script below which will generate a list of all the images on a particular page, but I'm not sure how I can take this forward so that is actually saves the images to a folder I define. Also, I don't need all the images on the page, only those matching a url format / in a particular folder e.g. http://www.site.com/files/143x143/(.*).jpg $page = file_get_contents('http://www.url.com'); $doc = new DOMDocument(); $doc->loadHTML($page); $images = $doc->getElementsByTagName('img'); foreach($images as $image) { echo $image->getAttribute('src') . '<br />'; Any advice / suggested code greatly appreciated. Quote Link to comment Share on other sites More sharing options...
mogosselin Posted February 27, 2014 Share Posted February 27, 2014 (edited) So basically, you want to do this: Get an HTML page Get all the images URL Loop on all of the images If the image has "www.site.com" AND "43x143/" AND ".jpg" in the name download that image on the server somewhere Is that it? So you figured out the point 1, 2 and 3. You need to figure out 3.1 and 3.1.1. 3.1 : Really easy (if you don't want to use a regex. I hate regex ) if (strpos($imageUrl,'www.site.com') !== false && strpos($imageUrl, '43x143' !== false ...) { // the image is fomr the site www.site.com and has 43x143 in it and ... // so download the image } To download the image, I found this code on stackoverflow: if you have allow_url_fopen set to true: $url = 'http://example.com/image.php'; $img = '/my/folder/flower.gif'; file_put_contents($img, file_get_contents($url)); Else use cURL: $ch = curl_init('http://example.com/image.php'); $fp = fopen('/my/folder/flower.gif', 'wb'); curl_setopt($ch, CURLOPT_FILE, $fp); curl_setopt($ch, CURLOPT_HEADER, 0); curl_exec($ch); curl_close($ch); fclose($fp); Right here: http://stackoverflow.com/questions/9801471/download-image-from-url-using-php-code I found the code looking for "how to download image from url php". The trick is to break what you want to do in sentences. And then search (or ask here) one piece at a time Edited February 27, 2014 by mogosselin Quote Link to comment Share on other sites More sharing options...
Psycho Posted February 27, 2014 Share Posted February 27, 2014 I'm trying to scrape all the product images from one of my sites as save them to a folder on the server on another - Yeah, right. It absolutely makes sense for you to create a script to scrape the images from one of YOUR sites. So much easier than just copying the images from the server directly. Quote Link to comment Share on other sites More sharing options...
willgar Posted February 27, 2014 Author Share Posted February 27, 2014 (edited) Well, when I said one of my sites - it is a site owned by a client and I am creating a new site for them. Because of the website agreement they pay a monthly fee and don't own the code and don't get access to the server via ftp / SSH. So it is not straightforward to copy the images directly from the server. To get access to the content I'm doing a lot of copy pasting and just wanting to speed up the process of getting the 1000 or so product images. Edited February 27, 2014 by willgar Quote Link to comment Share on other sites More sharing options...
paddy_fields Posted February 27, 2014 Share Posted February 27, 2014 If they own the site, then they own the images. If they're paying you to do a new site for them then why on earth wouldn't they give the material. If for whatever reason they don't want to give you access to their FTP then they could just post you the images on a DVD. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.