Jump to content

Using curl to Follow a Javascript Link


mrhenniger

Recommended Posts

I am trying to write a page to extract historical data from the FAA website of vintage aircraft.  This is my first scraper.  I have been successful with extracting data from a standard web page, but some of the FAA pages have a Javascript link which needs to be followed first before getting to the same data format.  Here is an example...

 

http://registry.faa.gov/aircraftinquiry/NNum_Results.aspx?NNumbertxt=N164TB

 

When I look at the source I find this link...

 

<a id="_ctl0__ctl0_MainContent_SideMenuContent_lbtnWarning" class="Results_link" href="javascript:__doPostBack('_ctl0$_ctl0$MainContent$SideMenuContent$lbtnWarning','')">Continue</a>

 

The source also provides the javascript function being called...

 

function __doPostBack(eventTarget, eventArgument)

{

if (!theForm.onsubmit || (theForm.onsubmit() != false))

{

theForm.__EVENTTARGET.value = eventTarget;

theForm.__EVENTARGUMENT.value = eventArgument;

theForm.submit();

}

}

 

So I have been doing some surfing to try and find a way to follow this link.  So what I did was decided to open the page and submit to the same form the javascript does in an attempt to follow it.  By parsing the page I was able to define this parameter set...

 

$paramSet = "__EVENTTARGET=_ctl0$_ctl0$MainContent$SideMenuContent$lbtnWarning&__EVENTARGUMENT=";

 

I then used the following php...

 

$c = curl_init();

$ret = curl_setopt($c, CURLOPT_URL, $page);

$ret = curl_setopt($c, CURLOPT_POST, TRUE);

$ret = curl_setopt($c, CURLOPT_POSTFIELDS, $paramSet);

$ret = curl_setopt($c, CURLOPT_RETURNTRANSFER, TRUE);

$data = curl_exec($c);

$data = htmlspecialchars($data);

curl_close($c);

 

...which gave me this value for $data...

 

<html><head><title>Object moved</title></head><body> <h2>Object moved to <a href="%2faircraftinquiry%2fLastResort.aspx%3faspxerrorpath%3d%2faircraftinquiry%2fNNum_Results.aspx">here</a>.</h2> </body></html>

 

I then assumed I was go to the url...

 

http://registry.faa.gov/aircraftinquiry/LastResort.aspx?aspxerrorpath=/aircraftinquiry/NNum_Results.aspx

 

I followed this link as well, but all that it tells me is that "We Can Not Process Your Request At This Time.".

 

So you can see that I am not seeing any obvious errors, but I am not getting the same results with curl that I can get by manually clicking on the javascript link.

 

So..........  Does anyone have any suggestions?  It would be great if I could get a critique of my technique.

 

Thanks in advance.

 

Mike (the rookie scraper)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.