bschultz Posted March 14, 2018 Share Posted March 14, 2018 (edited) I have some files that I need to download for work. They are from a password protected web directory. Once I log in, I'm shown an HTML table with dates and files. Next to each file name is a checkbox. If a checkbox is checked, a DOWNLOAD SELECTED button is shown.Using the debugging features of Firefox, I see that the download is handled by these three urls: This form is POST, with the following params: https://domain.com/index.php?action=FileTransfer.download_selected_http selectedFiles 4941422 transfer_identifier 48b47cfda429980c8c41b50cb17774c8 useCompression true This form is GET https://domain.com/index.php?action=FileTransfer.get_http_status_messages&id=48b47cfda429980c8c41b50cb17774c8&_=1520995394437 and finally This form is GET https://domain.com/index.php?action=FileTransfer.get_http_status_messages&id=48b47cfda429980c8c41b50cb17774c8&_=1520995395434 If I need to scrape the html, here's the corresponding code in the html <tr id="4941422" class="DownloadFileList odd "> <td class="select aligncenter"><input type="checkbox" id="4941422" class="3363384" value="4941422" name="downloadFile" onClick="$G.transferLackey.updateDownloadControls();" /></td> <td >LCL-Show01-2018-March14.mp3</td> <td class="wrap_anywhere"></td> <td class="aligncenter">PMT1DSM</td> <td class="aligncenter">2018-03-08 06:00:05 PM</td> <td class="alignright">3285</td> <td class="UploadFileList-remove" wrap="nowrap"> <a href="#" onclick="$G.transferLackey.DeleteFileById(4941422, 'LCL-Show01-2018-March14.mp3', 3363384, 1);"><div class="removeIcon"></div></a> </td> </tr> And finally, here's the output of cliget addon for Firefox curl --header 'Host: domain.com' --user-agent 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' --referer 'https://domain.com/index.php?action=FileTransfer.download_frame' --header 'Content-Type: application/x-www-form-urlencoded' --cookie 'PHPSESSID=abjnhmkq8034jg1f0lpnj90354; cookies_enabled=1; plugvercheck=0' --header 'Upgrade-Insecure-Requests: 1' --request POST --data-urlencode 'useCompression=true' --data-urlencode 'selectedFiles=4941941' --data-urlencode 'transfer_identifier=cee18c977b984073ae4f414e13fb654c' 'https://masstransit.meredith.com/index.php?action=FileTransfer.download_selected_http' --output 'mtdownload5539_2018.03.14_03.01.zip' Some of the params match the html code...many do not. What's my best method of trying to automate the downloads of these files? Or, is there a way to find an absolute download path / url of a file when a download script is used by the host website? Edited March 14, 2018 by bschultz Quote Link to comment Share on other sites More sharing options...
bschultz Posted March 14, 2018 Author Share Posted March 14, 2018 I got to thinking...I hadn't looked at the whole source code of that file. The transfer_identifier is in the java script...as are the file id's for the whole page <script type="text/javascript">var transferData={"metadata":"","job_ticket_id":0,"who":"D_286577002_2640591309","transport":"TCP/IP Secure","host_addr":"192.223.7.72","host_port":"443","job_id":"","who_plus":"3153325650217078101","ssl_params":"Auto%2dGenerated+by+MassTransit%0a2%0a","force_forward_or_service":"","transfer_identifier":"2c9759cb130043816cf53c8621ee9b42","hideDownloadAllButton":"","totalFileSizeToSend":84265681,"totalFileSizeSent":5817819453,"tableStyle":0};var downloadFileList=[{"file_path":"","file_name":"Wk 03-12 topics.txt","file_status":"100","file_id":4941409},{"file_path":"","file_name":"LCL-Show02-2018-March12.mp3","file_status":"100","file_id":4941410},{"file_path":"","file_name":"LCL-Show02-2018-March15.mp3","file_status":"100","file_id":4941412},{"file_path":"","file_name":"LCL-Show02-2018-March16.mp3","file_status":"100","file_id":4941413},{"file_path":"","file_name":"LCL-Show01-2018-March13.mp3","file_status":"100","file_id":4941414},{"file_path":"","file_name":"LCL-Show01-2018-March16.mp3","file_status":"100","file_id":4941415},{"file_path":"","file_name":"LCL-Show01-2018-March12.mp3","file_status":"100","file_id":4941416},{"file_path":"","file_name":"LCL-Show02-2018-March12.mp3","file_status":"100","file_id":4941417},{"file_path":"","file_name":"LCL-Show02-2018-March13.mp3","file_status":"100","file_id":4941418},{"file_path":"","file_name":"LCL-Show02-2018-March15.mp3","file_status":"100","file_id":4941419},{"file_path":"","file_name":"LCL-Show01-2018-March15.mp3","file_status":"100","file_id":4941420},{"file_path":"","file_name":"LCL-Show02-2018-March16.mp3","file_status":"100","file_id":4941421},{"file_path":"","file_name":"LCL-Show02-2018-March14.mp3","file_status":"100","file_id":4941423},{"file_path":"","file_name":"LCL-Show01-2018-March16.mp3","file_status":"100","file_id":4941424},{"file_path":"","file_name":"LCL-Show01-2018-March13.mp3","file_status":"100","file_id":4941425},{"file_path":"","file_name":"LCL-Show02-2018-March14.mp3","file_status":"100","file_id":4941427},{"file_path":"","file_name":"LCL-Show01-2018-March14.mp3","file_status":"100","file_id":4941428},{"file_path":"","file_name":"LCL-Show02-2018-March13.mp3","file_status":"100","file_id":4941429},{"file_path":"","file_name":"LCL-Show01-2018-March12.mp3","file_status":"100","file_id":4941942},{"file_path":"","file_name":"LCL-Show01-2018-March15.mp3","file_status":"100","file_id":4941943},{"file_path":"","file_name":"LCL-Show01-2018-March09.mp3","file_status":"100","file_id":4941944},{"file_path":"","file_name":"LCL-Show01-2018-March14.mp3","file_status":"100","file_id":4941947},{"file_path":"","file_name":"LCL-Show02-2018-March09.mp3","file_status":"100","file_id":4941948}];$j(document).ready(function(){$G.transferLackey.handleOnReady(transferData,downloadFileList,5902085134);});</script> So, I'm thinking that I'll have to scrape the page to get this info...then pass the cookie via curl, and download the files. I think I have a plan. I'll report back. Quote Link to comment Share on other sites More sharing options...
bschultz Posted March 22, 2018 Author Share Posted March 22, 2018 I've finally had some time to work on this again. I've succesfully read the html into curl, and put the correct values into a string. Now, I need to put the string into an array with key => value. How can I turn this string... "LCL-Show02-2018-March23.mp3" => "4950301", "LCL-Show02-2018-March22.mp3" => "4950302" Into an array with where... $array_name['LCL-Show02-2018-March23.mp3'] will return 4950301 Thanks! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.