Si14 Posted March 27, 2013 Share Posted March 27, 2013 Hi all, I need a code that downloads all PDF files of a URL (e.g. www.myurl.com)? I want to run this code on my localhost (WAMP). Thank You for you time. Quote Link to comment Share on other sites More sharing options...
exeTrix Posted March 27, 2013 Share Posted March 27, 2013 (edited) There's a couple of way to approach this. If you're looking at multiple PDF files and it always will be, you can concatenate the documents together into one document. Otherwise, you can produce a zip file for the user to download, I'd say this is the most common solution utilised in the real world. Unfortunately, due to HTTP being stateless and following the request response flow it's not possible to responde to a HTTP request with multiple responses. Therefore, you'll notice that the above solutions will produce only one file for download. Further reading for the zip file can be found on the ZipArchive docs here: http://www.php.net/manual/en/class.ziparchive.php Any problems then let me know and I'll be happy to provide you further assistance. Edited March 27, 2013 by exeTrix Quote Link to comment Share on other sites More sharing options...
Si14 Posted March 27, 2013 Author Share Posted March 27, 2013 Thank You for you reply exeTrix. I think I did not clearly expressed the question. I want to download all the PDF files of a website. Similar to what download managers do. You may ask why I am not using a download manager, and the reponse would be because I want to customize the code later. At the moment, the basic thing it needs to do is to download all PDF files of one (or multiple) URLs (which I provide) and then store them into separate directories on my hard drive (one directory for one URL). In order to run this code, I assume I should use a server client e.g. WAMP? Please let me know if you have any suggestions. Quote Link to comment Share on other sites More sharing options...
exeTrix Posted March 27, 2013 Share Posted March 27, 2013 (edited) Ah sorry must have mis understood. Yes you will need WAMP setup on your machine then this should do what you need it to: //define our array of files we'd like to get with the dir name as keys $pdfs = array( 'folder1' => 'http://www.bbc.co.uk/bbctrust/assets/files/pdf/about/how_we_govern/charter.pdf', 'folder2' => 'http://www.bbc.co.uk/radio4/today/reports/pdf/camera_gifford.pdf' ); try{ //start to loop through the files stored in the pdfs array foreach( $pdfs as $key => $pdf ){ //split the string on / $urlParts = explode( '/', $pdf ); //get the last segment as this is our file name eg charter.pdf $fileName = end( $urlParts ); //get the contents of the file $fileContents = file_get_contents( $pdf ); //get a path to our directory $directory = $_SERVER['DOCUMENT_ROOT'] . '/' . $key . '/'; //check to see if the directory DOESN'T exist if( !is_dir( $directory ) ){ //create the directory mkdir( $directory ); } //create a file object for the contents to be written to $fileObject = new SPLFileObject( $directory . $fileName, 'a+' ); //write the contents to the file $fileObject->fwrite( $fileContents ); //clean up by removing the contents unset( $fileContents ); } }catch( Exceptions $e ){ echo $e->getMessage(); } Any problems then give us a shout Edited March 27, 2013 by exeTrix Quote Link to comment Share on other sites More sharing options...
Si14 Posted March 28, 2013 Author Share Posted March 28, 2013 Ah sorry must have mis understood. Yes you will need WAMP setup on your machine then this should do what you need it to: //define our array of files we'd like to get with the dir name as keys $pdfs = array( 'folder1' => 'http://www.bbc.co.uk/bbctrust/assets/files/pdf/about/how_we_govern/charter.pdf', 'folder2' => 'http://www.bbc.co.uk/radio4/today/reports/pdf/camera_gifford.pdf' ); try{ //start to loop through the files stored in the pdfs array foreach( $pdfs as $key => $pdf ){ //split the string on / $urlParts = explode( '/', $pdf ); //get the last segment as this is our file name eg charter.pdf $fileName = end( $urlParts ); //get the contents of the file $fileContents = file_get_contents( $pdf ); //get a path to our directory $directory = $_SERVER['DOCUMENT_ROOT'] . '/' . $key . '/'; //check to see if the directory DOESN'T exist if( !is_dir( $directory ) ){ //create the directory mkdir( $directory ); } //create a file object for the contents to be written to $fileObject = new SPLFileObject( $directory . $fileName, 'a+' ); //write the contents to the file $fileObject->fwrite( $fileContents ); //clean up by removing the contents unset( $fileContents ); } }catch( Exceptions $e ){ echo $e->getMessage(); } Any problems then give us a shout Thanks for your reply and your help. Instead of the direct PDF links, Is it possible to put the link of the page and then it detects all PDF files of that URL automatically? Quote Link to comment Share on other sites More sharing options...
exeTrix Posted March 28, 2013 Share Posted March 28, 2013 Ok, couple of possibilities here. You could download the contents of the page using file_get_contents and use a regular expression to match all urls then iterate over the matches. Essentially, you'd be scraping the page for PDF file links. Or you could load the page into DOMDocument and use that to find all links then iterate over them to find PDF's using a RegexIterator. If you were to use DOMDocument you'd need the page to be valid HTML. So I'd suggest using regex, it'll be easier. I'm sure there's loads of articles on the web relating to this or something similar so I'm not going to reinvent the wheel and code it for you. Have a bash, any problems post back here and somebody will certainly give you a helping hand. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.