mrhenniger Posted April 11, 2010 Share Posted April 11, 2010 I am trying to write a page to extract historical data from the FAA website of vintage aircraft. This is my first scraper. I have been successful with extracting data from a standard web page, but some of the FAA pages have a Javascript link which needs to be followed first before getting to the same data format. Here is an example... http://registry.faa.gov/aircraftinquiry/NNum_Results.aspx?NNumbertxt=N164TB When I look at the source I find this link... <a id="_ctl0__ctl0_MainContent_SideMenuContent_lbtnWarning" class="Results_link" href="javascript:__doPostBack('_ctl0$_ctl0$MainContent$SideMenuContent$lbtnWarning','')">Continue</a> The source also provides the javascript function being called... function __doPostBack(eventTarget, eventArgument) { if (!theForm.onsubmit || (theForm.onsubmit() != false)) { theForm.__EVENTTARGET.value = eventTarget; theForm.__EVENTARGUMENT.value = eventArgument; theForm.submit(); } } So I have been doing some surfing to try and find a way to follow this link. So what I did was decided to open the page and submit to the same form the javascript does in an attempt to follow it. By parsing the page I was able to define this parameter set... $paramSet = "__EVENTTARGET=_ctl0$_ctl0$MainContent$SideMenuContent$lbtnWarning&__EVENTARGUMENT="; I then used the following php... $c = curl_init(); $ret = curl_setopt($c, CURLOPT_URL, $page); $ret = curl_setopt($c, CURLOPT_POST, TRUE); $ret = curl_setopt($c, CURLOPT_POSTFIELDS, $paramSet); $ret = curl_setopt($c, CURLOPT_RETURNTRANSFER, TRUE); $data = curl_exec($c); $data = htmlspecialchars($data); curl_close($c); ...which gave me this value for $data... <html><head><title>Object moved</title></head><body> <h2>Object moved to <a href="%2faircraftinquiry%2fLastResort.aspx%3faspxerrorpath%3d%2faircraftinquiry%2fNNum_Results.aspx">here</a>.</h2> </body></html> I then assumed I was go to the url... http://registry.faa.gov/aircraftinquiry/LastResort.aspx?aspxerrorpath=/aircraftinquiry/NNum_Results.aspx I followed this link as well, but all that it tells me is that "We Can Not Process Your Request At This Time.". So you can see that I am not seeing any obvious errors, but I am not getting the same results with curl that I can get by manually clicking on the javascript link. So.......... Does anyone have any suggestions? It would be great if I could get a critique of my technique. Thanks in advance. Mike (the rookie scraper) Quote Link to comment Share on other sites More sharing options...
mrhenniger Posted April 13, 2010 Author Share Posted April 13, 2010 Can anyone point me to some good cURL tutorials? TIA Mike Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.