bschultz Posted October 15, 2020 Share Posted October 15, 2020 I'm trying to login and scrape a page 4 pages deep. I can get to the fourth page...but that page only returns AJAX ERROR:0. I know NOTHING about AJAX calls via Curl. Can someone please help me with what to look for in the source code of the 4th page (when using a browser) to what I'm supposed to pass along via CURL? If you need the source code or login credentials to see what's happening in the background, I can generate a temp password for you. Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/311603-php-curl-and-ajax-login-and-page-scraping/ Share on other sites More sharing options...
maxxd Posted October 15, 2020 Share Posted October 15, 2020 Please post the code in question here (and please use the '<>' button to format your code), as well as explaining exactly what '4 pages deep' means. If I'm understanding correctly, there's nothing inherently special about a 'https://tld.com/page1/page2/page3/page4' curl call that would cause an ajax error. And now that I type that, ajax and curl are completely different things so perhaps posting the JavaScript ajax code along with the relevant PHP would help. Quote Link to comment https://forums.phpfreaks.com/topic/311603-php-curl-and-ajax-login-and-page-scraping/#findComment-1581892 Share on other sites More sharing options...
bschultz Posted October 15, 2020 Author Share Posted October 15, 2020 4 pages deep means login page (page 1). Simulate a link click to page 2. Simulate a link click to page 3. Simulate a link click to page 4. Pages 2 and 3 have no javascript or Ajax coding. Page 4 does! Pages 2 and 3 have some coding that is tied to the login to display certain info. As far as I can tell, page 4 uses AJAX to set some database calls...thus without the AXAX info passed via CURL, I get AJAX:0 error. I'm assuming the AJAX:0 error is in the code of the page...but when I visit that page via a browser, it works...so no error. What code would you like me to post? Page 1, 2, 3, or 4? Quote Link to comment https://forums.phpfreaks.com/topic/311603-php-curl-and-ajax-login-and-page-scraping/#findComment-1581896 Share on other sites More sharing options...
kicken Posted October 15, 2020 Share Posted October 15, 2020 When you're trying to re-create a flow, the best thing to do generally is to use either the browsers developer tools or something like fiddler to monitor what exactly the requests being made are, then figure out how to re-create those requests. Sometimes it's as simple as loading a URL, other times it's more complicated and involves parsing the previous pages source for various details. The process is something you general have to figure out on a case-by-case basis so it'll be difficult for anyone to really guide you without at lot of details. Quote Link to comment https://forums.phpfreaks.com/topic/311603-php-curl-and-ajax-login-and-page-scraping/#findComment-1581915 Share on other sites More sharing options...
maxxd Posted October 15, 2020 Share Posted October 15, 2020 (edited) Again, ajax has absolutely nothing to do with curl. If you're missing information between pages 1 and 4, you need to set the information in either a session or cookie variable, or pass it between the pages in another way. Of use local storage in JavaScript. You also mention 'simulating' a click to pages2, 3, and 4 - I'm not sure what you mean by that. Do you physically click a link or button on a page that sends you to a new page, or does something else happen? On a side note - @kicken, I don't recall having heard of fiddler before. I'm gonna explore that 'cause it looks pretty cool; thanks for the heads-up! Edited October 15, 2020 by maxxd Quote Link to comment https://forums.phpfreaks.com/topic/311603-php-curl-and-ajax-login-and-page-scraping/#findComment-1581921 Share on other sites More sharing options...
bschultz Posted October 18, 2020 Author Share Posted October 18, 2020 (edited) Maxxd, when using a browser, there is a button to press to go from page 1 to page 2. This button is just a link, so the second curl request uses this link as the URL. Using Firefox Developer Tools - Network, page #4 makes two GET calls to two other pages (which includes the SQL Selects), then the content loads on page #4 Here are the two external page requests curl 'http://209.151.229.186/AffWeb_USRN/V2/ASP/GTD.asp?SQLCMD=spGetStationOptions%20%27WBJI-FM%27,%20%27Virtual%20News%20Network%20MF%27,%20%2710/19/2020%27,%20%2710/19/2020%27&DT=1603034713365' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: http://209.151.229.186/AffWeb_USRN/V2/log_exact.asp?startDate=10/19/2020&endDate=10/19/2020&SD=10/19/2020&ED=10/19/2020&gsfCode=0' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Cookie: SavePW=1; Password=xxxx; ASPSESSIONIDAARTRTBC=EAPNEIJAHEJILFGECNDAKNPE; ASPSESSIONIDQSBQTQDD=HNHHELCCMPNLDFACKGPLCJHF' curl 'http://209.151.229.186/AffWeb_USRN/V2/ASP/GTD.asp?SQLCMD=spGetAllSpots%20%27WBJI-FM%27,%20%27Virtual%20News%20Network%20MF%27,%20%2710/19/2020%27,%20%2710/19/2020%27&DT=1603034713563' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: http://209.151.229.186/AffWeb_USRN/V2/log_exact.asp?startDate=10/19/2020&endDate=10/19/2020&SD=10/19/2020&ED=10/19/2020&gsfCode=0' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Cookie: SavePW=1; Password=xxxx; ASPSESSIONIDAARTRTBC=EAPNEIJAHEJILFGECNDAKNPE; ASPSESSIONIDQSBQTQDD=HNHHELCCMPNLDFACKGPLCJHF' I can generate a temp password if someone want to use a browser to see what's happening...just private message me for the login details. Thanks! Edited October 18, 2020 by bschultz Quote Link to comment https://forums.phpfreaks.com/topic/311603-php-curl-and-ajax-login-and-page-scraping/#findComment-1581958 Share on other sites More sharing options...
bschultz Posted October 20, 2020 Author Share Posted October 20, 2020 This is fixed. Turns out one of the pages that the 4th page was calling via GET does NOT require a password, and it has all the content I need. I exploded the string it returned and got all the info I needed. Thanks. Quote Link to comment https://forums.phpfreaks.com/topic/311603-php-curl-and-ajax-login-and-page-scraping/#findComment-1581964 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.