Jump to content

Curl, Wget, or how to handle this download


bschultz

Recommended Posts

I have some files that I need to download for work.  They are from a password protected web directory.  Once I log in, I'm shown an HTML table with dates and files.  Next to each file name is a checkbox.  If a checkbox is checked, a DOWNLOAD SELECTED button is shown.

Using the debugging features of Firefox, I see that the download is handled by these three urls:
 

This form is POST, with the following params:

https://domain.com/index.php?action=FileTransfer.download_selected_http
 

				selectedFiles
			
				4941422
		
				transfer_identifier
			
				48b47cfda429980c8c41b50cb17774c8
		
		
				useCompression
			
				true
This form is GET
 
https://domain.com/index.php?action=FileTransfer.get_http_status_messages&id=48b47cfda429980c8c41b50cb17774c8&_=1520995394437

and finally

This form is GET
 
https://domain.com/index.php?action=FileTransfer.get_http_status_messages&id=48b47cfda429980c8c41b50cb17774c8&_=1520995395434

If I need to scrape the html, here's the corresponding code in the html
 

<tr id="4941422" class="DownloadFileList odd ">
                         <td class="select aligncenter"><input type="checkbox" id="4941422" class="3363384" value="4941422" name="downloadFile" onClick="$G.transferLackey.updateDownloadControls();" /></td>
                         <td >LCL-Show01-2018-March14.mp3</td>
                         <td class="wrap_anywhere"></td>
                         <td class="aligncenter">PMT1DSM</td>
                         <td class="aligncenter">2018-03-08 06:00:05 PM</td>
                         <td class="alignright">3285</td>
                                                  <td class="UploadFileList-remove" wrap="nowrap">
                            <a href="#" onclick="$G.transferLackey.DeleteFileById(4941422, 'LCL-Show01-2018-March14.mp3', 3363384, 1);"><div class="removeIcon"></div></a>                        </td>
                        </tr>


And finally, here's the output of cliget addon for Firefox

curl --header 'Host: domain.com' --user-agent 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' --referer 'https://domain.com/index.php?action=FileTransfer.download_frame' --header 'Content-Type: application/x-www-form-urlencoded' --cookie 'PHPSESSID=abjnhmkq8034jg1f0lpnj90354; cookies_enabled=1; plugvercheck=0' --header 'Upgrade-Insecure-Requests: 1' --request POST --data-urlencode 'useCompression=true' --data-urlencode 'selectedFiles=4941941' --data-urlencode 'transfer_identifier=cee18c977b984073ae4f414e13fb654c' 'https://masstransit.meredith.com/index.php?action=FileTransfer.download_selected_http' --output 'mtdownload5539_2018.03.14_03.01.zip'




Some of the params match the html code...many do not.  What's my best method of trying to automate the downloads of these files?  Or, is there a way to find an absolute download path / url of a file when a download script is used by the host website?
Link to comment
Share on other sites

I got to thinking...I hadn't looked at the whole source code of that file.  The transfer_identifier is in the java script...as are the file id's for the whole page

 

<script type="text/javascript">var transferData={"metadata":"","job_ticket_id":0,"who":"D_286577002_2640591309","transport":"TCP/IP Secure","host_addr":"192.223.7.72","host_port":"443","job_id":"","who_plus":"3153325650217078101","ssl_params":"Auto%2dGenerated+by+MassTransit%0a2%0a","force_forward_or_service":"","transfer_identifier":"2c9759cb130043816cf53c8621ee9b42","hideDownloadAllButton":"","totalFileSizeToSend":84265681,"totalFileSizeSent":5817819453,"tableStyle":0};var downloadFileList=[{"file_path":"","file_name":"Wk 03-12 topics.txt","file_status":"100","file_id":4941409},{"file_path":"","file_name":"LCL-Show02-2018-March12.mp3","file_status":"100","file_id":4941410},{"file_path":"","file_name":"LCL-Show02-2018-March15.mp3","file_status":"100","file_id":4941412},{"file_path":"","file_name":"LCL-Show02-2018-March16.mp3","file_status":"100","file_id":4941413},{"file_path":"","file_name":"LCL-Show01-2018-March13.mp3","file_status":"100","file_id":4941414},{"file_path":"","file_name":"LCL-Show01-2018-March16.mp3","file_status":"100","file_id":4941415},{"file_path":"","file_name":"LCL-Show01-2018-March12.mp3","file_status":"100","file_id":4941416},{"file_path":"","file_name":"LCL-Show02-2018-March12.mp3","file_status":"100","file_id":4941417},{"file_path":"","file_name":"LCL-Show02-2018-March13.mp3","file_status":"100","file_id":4941418},{"file_path":"","file_name":"LCL-Show02-2018-March15.mp3","file_status":"100","file_id":4941419},{"file_path":"","file_name":"LCL-Show01-2018-March15.mp3","file_status":"100","file_id":4941420},{"file_path":"","file_name":"LCL-Show02-2018-March16.mp3","file_status":"100","file_id":4941421},{"file_path":"","file_name":"LCL-Show02-2018-March14.mp3","file_status":"100","file_id":4941423},{"file_path":"","file_name":"LCL-Show01-2018-March16.mp3","file_status":"100","file_id":4941424},{"file_path":"","file_name":"LCL-Show01-2018-March13.mp3","file_status":"100","file_id":4941425},{"file_path":"","file_name":"LCL-Show02-2018-March14.mp3","file_status":"100","file_id":4941427},{"file_path":"","file_name":"LCL-Show01-2018-March14.mp3","file_status":"100","file_id":4941428},{"file_path":"","file_name":"LCL-Show02-2018-March13.mp3","file_status":"100","file_id":4941429},{"file_path":"","file_name":"LCL-Show01-2018-March12.mp3","file_status":"100","file_id":4941942},{"file_path":"","file_name":"LCL-Show01-2018-March15.mp3","file_status":"100","file_id":4941943},{"file_path":"","file_name":"LCL-Show01-2018-March09.mp3","file_status":"100","file_id":4941944},{"file_path":"","file_name":"LCL-Show01-2018-March14.mp3","file_status":"100","file_id":4941947},{"file_path":"","file_name":"LCL-Show02-2018-March09.mp3","file_status":"100","file_id":4941948}];$j(document).ready(function(){$G.transferLackey.handleOnReady(transferData,downloadFileList,5902085134);});</script> 

 

So, I'm thinking that I'll have to scrape the page to get this info...then pass the cookie via curl, and download the files.  I think I have a plan.  I'll report back.

Link to comment
Share on other sites

I've finally had some time to work on this again.  I've succesfully read the html into curl, and put the correct values into a string.  Now, I need to put the string into an array with key => value.

 

How can I turn this string...

 

 

"LCL-Show02-2018-March23.mp3" => "4950301", "LCL-Show02-2018-March22.mp3" => "4950302"

 

Into an array with where...

 

 

$array_name['LCL-Show02-2018-March23.mp3']

 

will return

 

 

4950301

 

Thanks!

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.