Jump to content

Using curl to Follow a Javascript Link


mrhenniger

Recommended Posts

I am trying to write a page to extract historical data from the FAA website of vintage aircraft.  This is my first scraper.  I have been successful with extracting data from a standard web page, but some of the FAA pages have a Javascript link which needs to be followed first before getting to the same data format.  Here is an example...

 

http://registry.faa.gov/aircraftinquiry/NNum_Results.aspx?NNumbertxt=N164TB

 

When I look at the source I find this link...

 

<a id="_ctl0__ctl0_MainContent_SideMenuContent_lbtnWarning" class="Results_link" href="javascript:__doPostBack('_ctl0$_ctl0$MainContent$SideMenuContent$lbtnWarning','')">Continue</a>

 

The source also provides the javascript function being called...

 

function __doPostBack(eventTarget, eventArgument)

{

if (!theForm.onsubmit || (theForm.onsubmit() != false))

{

theForm.__EVENTTARGET.value = eventTarget;

theForm.__EVENTARGUMENT.value = eventArgument;

theForm.submit();

}

}

 

So I have been doing some surfing to try and find a way to follow this link.  So what I did was decided to open the page and submit to the same form the javascript does in an attempt to follow it.  By parsing the page I was able to define this parameter set...

 

$paramSet = "__EVENTTARGET=_ctl0$_ctl0$MainContent$SideMenuContent$lbtnWarning&__EVENTARGUMENT=";

 

I then used the following php...

 

$c = curl_init();

$ret = curl_setopt($c, CURLOPT_URL, $page);

$ret = curl_setopt($c, CURLOPT_POST, TRUE);

$ret = curl_setopt($c, CURLOPT_POSTFIELDS, $paramSet);

$ret = curl_setopt($c, CURLOPT_RETURNTRANSFER, TRUE);

$data = curl_exec($c);

$data = htmlspecialchars($data);

curl_close($c);

 

...which gave me this value for $data...

 

<html><head><title>Object moved</title></head><body> <h2>Object moved to <a href="%2faircraftinquiry%2fLastResort.aspx%3faspxerrorpath%3d%2faircraftinquiry%2fNNum_Results.aspx">here</a>.</h2> </body></html>

 

I then assumed I was go to the url...

 

http://registry.faa.gov/aircraftinquiry/LastResort.aspx?aspxerrorpath=/aircraftinquiry/NNum_Results.aspx

 

I followed this link as well, but all that it tells me is that "We Can Not Process Your Request At This Time.".

 

So you can see that I am not seeing any obvious errors, but I am not getting the same results with curl that I can get by manually clicking on the javascript link.

 

So..........  Does anyone have any suggestions?  It would be great if I could get a critique of my technique.

 

Thanks in advance.

 

Mike (the rookie scraper)

Link to comment
https://forums.phpfreaks.com/topic/198217-using-curl-to-follow-a-javascript-link/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.