Jump to content

script to download whole website


madrazel

Recommended Posts

Try loading each file into a string, then going through each element like this:

 

<?php
//foreach URL
$file = file_get_contents($url);

//images
$offset = 0;
while (($tag = @strpos($file,'<img src="',$offset)) !== false) { //You should probably use some other sort of matching here incase someone put something between "img" and "src"
$attr = $tag + 10; //10 is the string length of '<img src="', so $attr points to the beginning of src
$endAttr = strpos($file,'"',$attr); //End is the position of the first quote after the start of the attribute
$img = substr($file,$attr,($endAttr - $attr));
//Check for absolute vs relative url here and insert appropriate prefix (i.e., "http://www.example.ex" if url is relative to site root)
$img = imagecreatefromjpeg($img); //Use GD to check file type and use appropriate function (jpeg is obviously just an example) to load the image into memory
imagejpeg($img,'path/to/newFile'); //save image
imagedestroy($img); //free memory
$offset = $endAttr; //set offset so it keeps going forward in the file.
}

//anchors
....

etc.
?>

 

You could make that a function, too (albeit a complicated one), and use it recursively to download the entire website by crawling through the links and sending each page up to the function. That might be kind of messy, though.

 

By the way, this is just an example. It's only to impart the concept; not to actually be used! For it to work, you'd have to include a lot more flexibility to account for people's sloppy code.

 

Good luck!

 

-kael

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.