Jump to content

Download all PDF files of a URL


Si14

Recommended Posts

There's a couple of way to approach this. If you're looking at multiple PDF files and it always will be, you can concatenate the documents together into one document. Otherwise, you can produce a zip file for the user to download, I'd say this is the most common solution utilised in the real world.

 

Unfortunately, due to HTTP being stateless and following the request response flow it's not possible to responde to a HTTP request with multiple responses. Therefore, you'll notice that the above solutions will produce only one file for download.

 

Further reading for the zip file can be found on the ZipArchive docs here: http://www.php.net/manual/en/class.ziparchive.php

 

Any problems then let me know and I'll be happy to provide you further assistance. :)

Edited by exeTrix
Link to comment
Share on other sites

Thank You for you reply exeTrix.

I think I did not clearly expressed the question.

I want to download all the PDF files of a website. Similar to what download managers do. You may ask why I am not using a download manager, and the reponse would be because I want to customize the code later.

At the moment, the basic thing it needs to do is to download all PDF files of one (or multiple) URLs (which I provide) and then store them into separate directories on my hard drive (one directory for one URL). In order to run this code, I assume I should use a server client e.g. WAMP?

Please let me know if you have any suggestions.

Link to comment
Share on other sites

Ah sorry must have mis understood. 

 

Yes you will need WAMP setup on your machine then this should do what you need it to:

 

//define our array of files we'd like to get with the dir name as keys
$pdfs = array(
	'folder1' => 'http://www.bbc.co.uk/bbctrust/assets/files/pdf/about/how_we_govern/charter.pdf',
	'folder2' => 'http://www.bbc.co.uk/radio4/today/reports/pdf/camera_gifford.pdf'
);

try{
        //start to loop through the files stored in the pdfs array
	foreach( $pdfs as $key => $pdf ){

		//split the string on /
		$urlParts = explode( '/', $pdf );

		//get the last segment as this is our file name eg charter.pdf
		$fileName = end( $urlParts );

		//get the contents of the file
		$fileContents = file_get_contents( $pdf );

		//get a path to our directory
		$directory = $_SERVER['DOCUMENT_ROOT'] . '/' . $key . '/';

		//check to see if the directory DOESN'T exist
		if( !is_dir( $directory ) ){
			//create the directory
			mkdir( $directory );
		}

		//create a file object for the contents to be written to
		$fileObject = new SPLFileObject( $directory . $fileName, 'a+'  );

		//write the contents to the file
		$fileObject->fwrite( $fileContents );

		//clean up by removing the contents
		unset( $fileContents );

	}

}catch( Exceptions $e ){

	echo $e->getMessage();

}

 

 

Any problems then give us a shout :)

Edited by exeTrix
Link to comment
Share on other sites

Ah sorry must have mis understood. 

 

Yes you will need WAMP setup on your machine then this should do what you need it to:

 

//define our array of files we'd like to get with the dir name as keys
$pdfs = array(
	'folder1' => 'http://www.bbc.co.uk/bbctrust/assets/files/pdf/about/how_we_govern/charter.pdf',
	'folder2' => 'http://www.bbc.co.uk/radio4/today/reports/pdf/camera_gifford.pdf'
);

try{
        //start to loop through the files stored in the pdfs array
	foreach( $pdfs as $key => $pdf ){

		//split the string on /
		$urlParts = explode( '/', $pdf );

		//get the last segment as this is our file name eg charter.pdf
		$fileName = end( $urlParts );

		//get the contents of the file
		$fileContents = file_get_contents( $pdf );

		//get a path to our directory
		$directory = $_SERVER['DOCUMENT_ROOT'] . '/' . $key . '/';

		//check to see if the directory DOESN'T exist
		if( !is_dir( $directory ) ){
			//create the directory
			mkdir( $directory );
		}

		//create a file object for the contents to be written to
		$fileObject = new SPLFileObject( $directory . $fileName, 'a+'  );

		//write the contents to the file
		$fileObject->fwrite( $fileContents );

		//clean up by removing the contents
		unset( $fileContents );

	}

}catch( Exceptions $e ){

	echo $e->getMessage();

}

 

 

Any problems then give us a shout :)

 

Thanks for your reply and your help.

Instead of the direct PDF links, Is it possible to put the link of the page and then it detects all PDF files of that URL automatically?

Link to comment
Share on other sites

Ok, couple of possibilities here. You could download the contents of the page using file_get_contents and use a regular expression to match all urls then iterate over the matches. Essentially, you'd be scraping the page for PDF file links. Or you could load the page into DOMDocument and use that to find all links then iterate over them to find PDF's using a RegexIterator.

 

If you were to use DOMDocument you'd need the page to be valid HTML. So I'd suggest using regex, it'll be easier.

 

I'm sure there's loads of articles on the web relating to this or something similar so I'm not going to reinvent the wheel and code it for you. Have a bash, any problems post back here and somebody will certainly give you a helping hand.  

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.