Jump to content

Script to scrape images


willgar

Recommended Posts

I'm very new to php coding and starting to teach myself the basis, but i'm struggling with this particular issue.

 

I'm trying to scrape all the product images from one of my sites as save them to a folder on the server on another -  

 

I'm trying to create a scraping script to do this.

I've found the script below which will generate a list of all the images on a particular page, but I'm not sure how I can take this forward so that is actually saves the images to a folder I define.

Also, I don't need all the images on the page, only those matching a url format / in a particular folder e.g. http://www.site.com/files/143x143/(.*).jpg

 

 

$page = file_get_contents('http://www.url.com');
$doc = new DOMDocument(); 
$doc->loadHTML($page);
$images = $doc->getElementsByTagName('img'); 
foreach($images as $image) {
 
    echo $image->getAttribute('src') . '<br />';
 
Any advice / suggested code greatly appreciated.
Link to comment
Share on other sites

So basically, you want to do this:

  1. Get an HTML page
  2. Get all the images URL
  3. Loop on all of the images
    1. If the image has "www.site.com" AND "43x143/" AND ".jpg" in the name
      1. download that image on the server somewhere

Is that it? So you figured out the point 1, 2 and 3. You need to figure out 3.1 and 3.1.1.

 

3.1 : Really easy (if you don't want to use a regex. I hate regex ;) )  

if (strpos($imageUrl,'www.site.com') !== false && strpos($imageUrl, '43x143' !== false ...) {
// the image is fomr the site www.site.com and has 43x143 in it and ...
// so download the image
}

To download the image, I found this code on stackoverflow:

if you have allow_url_fopen set to true:
$url = 'http://example.com/image.php';
$img = '/my/folder/flower.gif';
file_put_contents($img, file_get_contents($url));

Else use cURL:
$ch = curl_init('http://example.com/image.php');
$fp = fopen('/my/folder/flower.gif', 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);

Right here: http://stackoverflow.com/questions/9801471/download-image-from-url-using-php-code

 

I found the code looking for "how to download image from url php".

 

The trick is to break what you want to do in sentences. And then search (or ask here) one piece at a time ;)

Edited by mogosselin
Link to comment
Share on other sites

 

I'm trying to scrape all the product images from one of my sites as save them to a folder on the server on another -  

 

 

Yeah, right. It absolutely makes sense for you to create a script to scrape the images from one of YOUR sites. So much easier than just copying the images from the server directly.

Link to comment
Share on other sites

Well, when I said one of my sites - it is a site owned by a client and I am creating a new site for them. Because of the website agreement they pay a monthly fee and don't own the code and don't get access to the server via ftp / SSH. So it is not straightforward to copy the images directly from the server. To get access to the content I'm doing a lot of copy pasting and just wanting to speed up the process of getting the 1000 or so product images.

Edited by willgar
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.