Jump to content

Finding extension of file that has been retrieved using CURL


stubarny

Recommended Posts

Hello,

 

I have managed to find, retrieve and save a file using CURL. But I am having to hard code the file extension, is there a way to find the file extension automatically? (it seems the file extension isn't within the download URL)

 

(also, is there a way of getting the file name so I can save it as the same filename - that would be great)

 

Thanks for your help,

 

Stu

 

p.s. I've tried the pathinfo($url) function, but that gets information out of the download URL rather than the download file.

	$url="http://webmail.WEBSITE.com/src/redirect.php"; 
	$cookie="cookie.txt"; 
	
	$postdata = "login_username=USERNAME&secretkey=PASSWORD&js_autodetect_results=0&just_logged_in=1"; 
	
	#	get the cookie
	$ch = curl_init(); 
	curl_setopt ($ch, CURLOPT_URL, $url); 
	curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
	curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
	curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
	curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); 
	curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
	curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
	curl_setopt ($ch, CURLOPT_REFERER, $url); 
	
	curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
	curl_setopt ($ch, CURLOPT_POST, 1); 
	$result = curl_exec ($ch);  

	curl_close($ch);




$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //read cookies from here

curl_setopt($ch, CURLOPT_URL, "http://webmail.WEBSITE.com/src/right_main.php");

curl_setopt($ch, CURLOPT_HEADER, 0);
$result = curl_exec($ch);


curl_close($ch);


#	download file
	$source = "http://webmail.WEBSITE.com/src/download.php?mailbox=INBOX&passed_id=6475&startMessage=1&override_type0=text&override_type1=html&ent_id=2&absolute_dl=true";
	$ch = curl_init();

	curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //read cookies from here

	curl_setopt($ch, CURLOPT_URL, $source);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($ch, CURLOPT_SSLVERSION,3);
	$data = curl_exec ($ch);
	$error = curl_error($ch); 
	curl_close ($ch);

       # !!! The line below needs to be automated !!!
  	$destination = "./files/test.html";

	$file = fopen($destination, "w+");

	fputs($file, $data);
	fclose($file);
Edited by stubarny
Link to comment
Share on other sites

Hi,

 

The file extension of the download file isn't in the URL.

 

The URL prompts the download, and the file is then downloaded. I need to know how to capture the file name and the extension from the file (not the url). Does that make sense?

Edited by stubarny
Link to comment
Share on other sites

The download link is:

 

http://webmail.WEBSITE.com/src/download.php?mailbox=INBOX&passed_id=6475&startMessage=1&override_type0=text&override_type1=html&ent_id=2&absolute_dl=true
 

and that link successfully downloads a file called test.html

 

I then save that file as "test.html" but I've had to hard code the file name and the extension in the code of my first post. I want to have the code automatically find the file name and extension. Is this information not somewhere in the $data variable?

Link to comment
Share on other sites

Well, the name of the downloaded file is "download.php" with some additional get query string value as "mailbox=INBOX&passed_id=6475&startMessage=1&override_type0=text&override_type1=.........". Curl works as a web browser where the content will be retrieved/saved or displayed as an HTML data. However, to be honest with you, I still have no idea what exactly do you want to achieve. 

Edited by jazzman1
Link to comment
Share on other sites

I'm looking to download a file and save it to my server.

 

the link http://webmail.website.com/src/download.php?mailbox=INBOX&passed_id=6475&startMessage=1&override_type0=text&override_type1=html&ent_id=2&absolute_dl=true INITIATES the download (it is not the file being downloaded), and the file being downloaded it called test.html.

 

At the moment I have to hard code in my php script to call the downloaded file "test.html", shouldn't I be able to find the file name from the downloaded file?

 

Secondly I don't know how to download non html files with CURL - I've tried to download .pdf and .docx files and both have failed. Is it possible?

 

Thanks,

 

Stu

Edited by stubarny
Link to comment
Share on other sites

When you download a "file" using http, there is no actual file. You get a stream that you are then saving as a file. However, the server that you download from may send a header suggesting a name for the file. You need to retrieve the headers in the curl request, and inspect them for one that contains the suggested filename. You can then use this value as the name of the file when you save it.

 

You will have to watch out for name conflicts, since there is nothing to prevent you from getting the same filename in multiple "downloads".

 

You can use the CURLOPT_HEADER option to get the headers. But these will be included in the returned data ($data in your script). So you will have to remove the headers, inspect them, and extract the filename.

 

I don't remember the name of the header, in fact, I don't think I have actually done this. So grab the headers and echo them to see what is there and what you need. The headers should end with two consecutive line endings ("\r\n\r\n"). Then you can explode on the line endings to get an array of headers.

 

curl_setopt($ch, CURLOPT_HEADER, true);
$data = curl_exec($ch);
$pos = strpos("\r\n\r\n", $data); # Find the end of the headers
$headers = substr($data, 0, $pos); # Save headers
$data = substr($data, $pos+4); # Remove headers from data
$headers = explode("\r\n", $headers); # Explode headers to array
print_r($headers);
Un-tested code - no error checking - UAYOR
Link to comment
Share on other sites

The header name you'll be looking for would be the Content-disposition header, which takes the format of

Content-disposition: [attachment|inline]; filename="the_filename.ext"
(quotes optional)

 

Using the CURLOPT_HEADERFUNCTION option may be better than having the headers included in the output and then stripping them. Creating a class to handle the download would probably make this easier as your callback can be another method and you can share data using class-level variables. This is something I've never personally tried, and the below sample code is completely untested by may get you started.

 

<?php

class DownloadFile {
	private $ch;
	private $headers;
	private $status;

	public function __construct($url, $cookieFile){
		$ch = curl_init($url);
		curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSLVERSION,3);
		curl_setopt($ch, CURLOPT_HEADERFUNCTION, array($this, 'captureHeader'));
		$this->ch = $ch;
	}

	public function __destruct(){
		curl_close ($this->ch);
	}

	public function download(){
		return curl_exec($this->ch);
	}

	protected function normalize($header){
		return ucfirst(strtolower(trim($header)));
	}

	protected function captureHeader($ch, $headerData){
		if (substr($headerData, 0, 4) == 'HTTP'){
			$this->status = substr($headerData, 9, 3);
		}
		else if (false !== strpos($headerData, ':')){
			list($header, $content) = explode(':', $headerData, 2);
			//Normalize the header name
			$header = $this->normalize($header);
			$content = trim($content);

			$this->headers[$header] = $content;
		}
		return strlen($headerData);
	}

	public function getHeader($header){
		$header = $this->normalize($header);
		return array_key_exists($header, $this->headers)?$this->headers[$header]:null;
	}
}

//Set $url and $cookie
$dl = new DownloadFile($url, $cookie);
$content = $dl->download();
$saveName = 'default.html';
if ($header=$dl->getHeader('Content-disposition')){
	if (preg_match('/filename="?(.*)"?/', $header, $matches)){
		$saveName = $matches[1];
	}
}
file_put_contents($saveName, $content);
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.