stubarny Posted June 17, 2014 Share Posted June 17, 2014 (edited) Hello, I have managed to find, retrieve and save a file using CURL. But I am having to hard code the file extension, is there a way to find the file extension automatically? (it seems the file extension isn't within the download URL) (also, is there a way of getting the file name so I can save it as the same filename - that would be great) Thanks for your help, Stu p.s. I've tried the pathinfo($url) function, but that gets information out of the download URL rather than the download file. $url="http://webmail.WEBSITE.com/src/redirect.php"; $cookie="cookie.txt"; $postdata = "login_username=USERNAME&secretkey=PASSWORD&js_autodetect_results=0&just_logged_in=1"; # get the cookie $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); curl_setopt ($ch, CURLOPT_TIMEOUT, 60); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); curl_setopt ($ch, CURLOPT_REFERER, $url); curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); curl_setopt ($ch, CURLOPT_POST, 1); $result = curl_exec ($ch); curl_close($ch); $ch = curl_init(); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //read cookies from here curl_setopt($ch, CURLOPT_URL, "http://webmail.WEBSITE.com/src/right_main.php"); curl_setopt($ch, CURLOPT_HEADER, 0); $result = curl_exec($ch); curl_close($ch); # download file $source = "http://webmail.WEBSITE.com/src/download.php?mailbox=INBOX&passed_id=6475&startMessage=1&override_type0=text&override_type1=html&ent_id=2&absolute_dl=true"; $ch = curl_init(); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //read cookies from here curl_setopt($ch, CURLOPT_URL, $source); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_SSLVERSION,3); $data = curl_exec ($ch); $error = curl_error($ch); curl_close ($ch); # !!! The line below needs to be automated !!! $destination = "./files/test.html"; $file = fopen($destination, "w+"); fputs($file, $data); fclose($file); Edited June 17, 2014 by stubarny Quote Link to comment https://forums.phpfreaks.com/topic/289188-finding-extension-of-file-that-has-been-retrieved-using-curl/ Share on other sites More sharing options...
jbonnett Posted June 18, 2014 Share Posted June 18, 2014 You could always use the PHP function: explode(); with another PHP function: array_pop(); in your case array_pop(explode(".", $url)); Quote Link to comment https://forums.phpfreaks.com/topic/289188-finding-extension-of-file-that-has-been-retrieved-using-curl/#findComment-1482810 Share on other sites More sharing options...
stubarny Posted June 18, 2014 Author Share Posted June 18, 2014 (edited) Hi, The file extension of the download file isn't in the URL. The URL prompts the download, and the file is then downloaded. I need to know how to capture the file name and the extension from the file (not the url). Does that make sense? Edited June 18, 2014 by stubarny Quote Link to comment https://forums.phpfreaks.com/topic/289188-finding-extension-of-file-that-has-been-retrieved-using-curl/#findComment-1482824 Share on other sites More sharing options...
jazzman1 Posted June 18, 2014 Share Posted June 18, 2014 (edited) You want to make а search for existing files into a remote machine/server (which doesn't belong to you)? That's impossible. Edited June 18, 2014 by jazzman1 Quote Link to comment https://forums.phpfreaks.com/topic/289188-finding-extension-of-file-that-has-been-retrieved-using-curl/#findComment-1482825 Share on other sites More sharing options...
stubarny Posted June 18, 2014 Author Share Posted June 18, 2014 The download link is: http://webmail.WEBSITE.com/src/download.php?mailbox=INBOX&passed_id=6475&startMessage=1&override_type0=text&override_type1=html&ent_id=2&absolute_dl=true and that link successfully downloads a file called test.html I then save that file as "test.html" but I've had to hard code the file name and the extension in the code of my first post. I want to have the code automatically find the file name and extension. Is this information not somewhere in the $data variable? Quote Link to comment https://forums.phpfreaks.com/topic/289188-finding-extension-of-file-that-has-been-retrieved-using-curl/#findComment-1482827 Share on other sites More sharing options...
jazzman1 Posted June 18, 2014 Share Posted June 18, 2014 (edited) Well, the name of the downloaded file is "download.php" with some additional get query string value as "mailbox=INBOX&passed_id=6475&startMessage=1&override_type0=text&override_type1=.........". Curl works as a web browser where the content will be retrieved/saved or displayed as an HTML data. However, to be honest with you, I still have no idea what exactly do you want to achieve. Edited June 18, 2014 by jazzman1 Quote Link to comment https://forums.phpfreaks.com/topic/289188-finding-extension-of-file-that-has-been-retrieved-using-curl/#findComment-1482832 Share on other sites More sharing options...
stubarny Posted June 18, 2014 Author Share Posted June 18, 2014 (edited) I'm looking to download a file and save it to my server. the link http://webmail.website.com/src/download.php?mailbox=INBOX&passed_id=6475&startMessage=1&override_type0=text&override_type1=html&ent_id=2&absolute_dl=true INITIATES the download (it is not the file being downloaded), and the file being downloaded it called test.html. At the moment I have to hard code in my php script to call the downloaded file "test.html", shouldn't I be able to find the file name from the downloaded file? Secondly I don't know how to download non html files with CURL - I've tried to download .pdf and .docx files and both have failed. Is it possible? Thanks, Stu Edited June 18, 2014 by stubarny Quote Link to comment https://forums.phpfreaks.com/topic/289188-finding-extension-of-file-that-has-been-retrieved-using-curl/#findComment-1482841 Share on other sites More sharing options...
DavidAM Posted June 20, 2014 Share Posted June 20, 2014 When you download a "file" using http, there is no actual file. You get a stream that you are then saving as a file. However, the server that you download from may send a header suggesting a name for the file. You need to retrieve the headers in the curl request, and inspect them for one that contains the suggested filename. You can then use this value as the name of the file when you save it. You will have to watch out for name conflicts, since there is nothing to prevent you from getting the same filename in multiple "downloads". You can use the CURLOPT_HEADER option to get the headers. But these will be included in the returned data ($data in your script). So you will have to remove the headers, inspect them, and extract the filename. I don't remember the name of the header, in fact, I don't think I have actually done this. So grab the headers and echo them to see what is there and what you need. The headers should end with two consecutive line endings ("\r\n\r\n"). Then you can explode on the line endings to get an array of headers. curl_setopt($ch, CURLOPT_HEADER, true); $data = curl_exec($ch); $pos = strpos("\r\n\r\n", $data); # Find the end of the headers $headers = substr($data, 0, $pos); # Save headers $data = substr($data, $pos+4); # Remove headers from data $headers = explode("\r\n", $headers); # Explode headers to array print_r($headers); Un-tested code - no error checking - UAYOR Quote Link to comment https://forums.phpfreaks.com/topic/289188-finding-extension-of-file-that-has-been-retrieved-using-curl/#findComment-1482966 Share on other sites More sharing options...
kicken Posted June 20, 2014 Share Posted June 20, 2014 The header name you'll be looking for would be the Content-disposition header, which takes the format of Content-disposition: [attachment|inline]; filename="the_filename.ext" (quotes optional) Using the CURLOPT_HEADERFUNCTION option may be better than having the headers included in the output and then stripping them. Creating a class to handle the download would probably make this easier as your callback can be another method and you can share data using class-level variables. This is something I've never personally tried, and the below sample code is completely untested by may get you started. <?php class DownloadFile { private $ch; private $headers; private $status; public function __construct($url, $cookieFile){ $ch = curl_init($url); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_SSLVERSION,3); curl_setopt($ch, CURLOPT_HEADERFUNCTION, array($this, 'captureHeader')); $this->ch = $ch; } public function __destruct(){ curl_close ($this->ch); } public function download(){ return curl_exec($this->ch); } protected function normalize($header){ return ucfirst(strtolower(trim($header))); } protected function captureHeader($ch, $headerData){ if (substr($headerData, 0, 4) == 'HTTP'){ $this->status = substr($headerData, 9, 3); } else if (false !== strpos($headerData, ':')){ list($header, $content) = explode(':', $headerData, 2); //Normalize the header name $header = $this->normalize($header); $content = trim($content); $this->headers[$header] = $content; } return strlen($headerData); } public function getHeader($header){ $header = $this->normalize($header); return array_key_exists($header, $this->headers)?$this->headers[$header]:null; } } //Set $url and $cookie $dl = new DownloadFile($url, $cookie); $content = $dl->download(); $saveName = 'default.html'; if ($header=$dl->getHeader('Content-disposition')){ if (preg_match('/filename="?(.*)"?/', $header, $matches)){ $saveName = $matches[1]; } } file_put_contents($saveName, $content); Quote Link to comment https://forums.phpfreaks.com/topic/289188-finding-extension-of-file-that-has-been-retrieved-using-curl/#findComment-1482968 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.