Jump to content

PHP Curl and AJAX login and page scraping


bschultz

Recommended Posts

I'm trying to login and scrape a page 4 pages deep.  I can get to the fourth page...but that page only returns AJAX ERROR:0.  I know NOTHING about AJAX calls via Curl.  Can someone please help me with what to look for in the source code of the 4th page (when using a browser) to what I'm supposed to pass along via CURL?

If you need the source code or login credentials to see what's happening in the background, I can generate a temp password for you.

Thanks!

Link to comment
Share on other sites

Please post the code in question here (and please use the '<>' button to format your code), as well as explaining exactly what '4 pages deep' means. If I'm understanding correctly, there's nothing inherently special about a 'https://tld.com/page1/page2/page3/page4' curl call that would cause an ajax error. And now that I type that, ajax and curl are completely different things so perhaps posting the JavaScript ajax code along with the relevant PHP would help.

Link to comment
Share on other sites

4 pages deep means login page (page 1).  Simulate a link click to page 2.  Simulate a link click to page 3.  Simulate a link click to page 4.  Pages 2 and 3 have no javascript or Ajax coding.  Page 4 does!  Pages 2 and 3 have some coding that is tied to the login to display certain info.  As far as I can tell, page 4 uses AJAX to set some database calls...thus without the AXAX info passed via CURL, I get AJAX:0 error. 

I'm assuming the AJAX:0 error is in the code of the page...but when I visit that page via a browser, it works...so no error.

What code would you like me to post?  Page 1, 2, 3, or 4?

Link to comment
Share on other sites

When you're trying to re-create a flow, the best thing to do generally is to use either the browsers developer tools or something like fiddler to monitor what exactly the requests being made are, then figure out how to re-create those requests.

Sometimes it's as simple as loading a URL, other times it's more complicated and involves parsing the previous pages source for various details.  The process is something you general have to figure out on a case-by-case basis so it'll be difficult for anyone to really guide you without at lot of details.

Link to comment
Share on other sites

Again, ajax has absolutely nothing to do with curl. If you're missing information between pages 1 and 4, you need to set the information in either a session or cookie variable, or pass it between the pages in another way. Of use local storage in JavaScript. You also mention 'simulating' a click to pages2, 3, and 4 - I'm not sure what you mean by that. Do you physically click a link or button on a page that sends you to a new page, or does something else happen?

On a side note - @kicken, I don't recall having heard of fiddler before. I'm gonna explore that 'cause it looks pretty cool; thanks for the heads-up!

Edited by maxxd
Link to comment
Share on other sites

Maxxd, when using a browser, there is a button to press to go from page 1 to page 2.  This button is just a link, so the second curl request uses this link as the URL.

Using Firefox Developer Tools - Network, page #4 makes two GET calls to two other pages (which includes the SQL Selects), then the content loads on page #4

Here are the two external page requests
 

curl 'http://209.151.229.186/AffWeb_USRN/V2/ASP/GTD.asp?SQLCMD=spGetStationOptions%20%27WBJI-FM%27,%20%27Virtual%20News%20Network%20MF%27,%20%2710/19/2020%27,%20%2710/19/2020%27&DT=1603034713365' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: http://209.151.229.186/AffWeb_USRN/V2/log_exact.asp?startDate=10/19/2020&endDate=10/19/2020&SD=10/19/2020&ED=10/19/2020&gsfCode=0' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Cookie: SavePW=1; Password=xxxx; ASPSESSIONIDAARTRTBC=EAPNEIJAHEJILFGECNDAKNPE; ASPSESSIONIDQSBQTQDD=HNHHELCCMPNLDFACKGPLCJHF'

curl 'http://209.151.229.186/AffWeb_USRN/V2/ASP/GTD.asp?SQLCMD=spGetAllSpots%20%27WBJI-FM%27,%20%27Virtual%20News%20Network%20MF%27,%20%2710/19/2020%27,%20%2710/19/2020%27&DT=1603034713563' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: http://209.151.229.186/AffWeb_USRN/V2/log_exact.asp?startDate=10/19/2020&endDate=10/19/2020&SD=10/19/2020&ED=10/19/2020&gsfCode=0' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Cookie: SavePW=1; Password=xxxx; ASPSESSIONIDAARTRTBC=EAPNEIJAHEJILFGECNDAKNPE; ASPSESSIONIDQSBQTQDD=HNHHELCCMPNLDFACKGPLCJHF'

 

I can generate a temp password if someone want to use a browser to see what's happening...just private message me for the login details.

Thanks!

Edited by bschultz
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.