kiss-o-matic Posted June 19, 2011 Share Posted June 19, 2011 Okay, starting a new thread as I can't modify the old one. I had some pretty stable code scraping a Yahoo site for a while but they've made a change and I can no longer log in. It's not telling me my password is incorrect, but it's forcing a human recognition page. The flow is: load page -> Detect login form -> login via that form -> Bob's your uncle. This all worked, but after I login, I get the human confirmation thing. After tinkering my entire Father's Day away, I have been able to replicate this in a browser (Firefox/Chrome) by doing the following: 1: Turning off Javascript 2: Going directly to the 3rd step above (login via that form) and removing some of the GET variables from the URL. Things of note: The following line is in the page, which I ignore. <noscript><input type="hidden" name=".nojs" value="1"></noscript> However, ignoring it is apparently not enough. Using HttpFox, I'm able to see what I'm sending. I've tried to perfectly emulate my Firefox browser. I'm sending the same headers AND POST data with only one exception: Content Size (which is automatic in CURL). Curl is always about 20 or so bytes smaller than what the browser says. That being said, I'm kind curious as to what it is. I'm not sure that that is the issue though. I'm following the form pretty much to a T. There are 3 variable values it sends that it uses for authentication, and that I appear to be handling right. On a successful login, about 5 cookies are written and the page is redirected. The anti-phishing page itself states to make sure java-script is enabled, and also to check your network settings. Is there anything else in CURL I might be missing? An SSL setting, as it does use secure login? I'm mainly looking for a brainstorm more than anything here, but if someone spots a glaring error, I'm all ears. Perhaps another way that it's detecting JavaScript is disabled. class CURL { var $callback = false; function CURL( $cookie = "" ) { if ( !strlen( $cookie ) ) { $this->cookie = "default_cookie.txt"; } else { $this->cookie = $cookie; } } function setCallback($func_name) { $this->callback = $func_name; } function doRequest($method, $url, $vars, $referer ) { $ch = curl_init(); $header[0] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; $header[1] = "Accept-Language: en-us,en;q=0.5"; $header[2] = "Accept-Encoding: gzip, deflate"; $header[3] = "Accept-Charset: EUC-JP,utf-8;q=0.7,*;q=0.7"; $header[4] = "Keep-Alive: 115"; $header[5] = "Connection: keep-alive"; if ( $method == 'GET' ) { $header[7] = "Cache-Control: max-age=0"; } curl_setopt($ch, CURLOPT_VERBOSE, 1); if ( $referer != "" ) { curl_setopt($ch, CURLOPT_REFERER, $referer); } curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); curl_setopt($ch, CURLOPT_ENCODING, "" ); curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE ); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_TIMEOUT, 5); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux x86_64;rv:2.0) Gecko/20110411 Firefox/4.0'); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_HTTPHEADER, $header ); curl_setopt($ch, CURLOPT_COOKIEJAR, $this->cookie); curl_setopt($ch, CURLOPT_COOKIEFILE, $this->cookie); if ( $method == 'POST' ) { curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $vars); } $data = curl_exec($ch); $info = curl_getinfo( $ch ); curl_close($ch); if ($data) { if ($this->callback) { $callback = $this->callback; $this->callback = false; return call_user_func($callback, $data); } else { return $data; } } else { return "There was an error in getting page. Try refreshing your browser.<br>"; // return curl_error($ch); } } function get($url, $referer) { return $this->doRequest('GET', $url, 'NULL', $referer); } function post($url, $vars, $referer ) { return $this->doRequest('POST', $url, $vars, $referer); } } Quote Link to comment https://forums.phpfreaks.com/topic/239793-curl-issue-sucking-the-life-out-of-me/ Share on other sites More sharing options...
boompa Posted June 19, 2011 Share Posted June 19, 2011 What if you add another parameter, .nojs=0 to what you're passing? That line you're ignoring appears to be telling the page receiving the request that JavaScript is turned off, so you'll need to lie to it to prevent that. Quote Link to comment https://forums.phpfreaks.com/topic/239793-curl-issue-sucking-the-life-out-of-me/#findComment-1231771 Share on other sites More sharing options...
kiss-o-matic Posted June 19, 2011 Author Share Posted June 19, 2011 What if you add another parameter, .nojs=0 to what you're passing? That line you're ignoring appears to be telling the page receiving the request that JavaScript is turned off, so you'll need to lie to it to prevent that. Sorry, should have mentioned. 1) I tried that 2) In the browser, you're sending nothing (as it's within a <noscript> tag) and that's what I'm trying to emulate at the moment. I think I've tried just about every sane approach here. I'm just curious how they're detecting it's not a browser if I'm sending the same headers & post data. Frustrating. Quote Link to comment https://forums.phpfreaks.com/topic/239793-curl-issue-sucking-the-life-out-of-me/#findComment-1231775 Share on other sites More sharing options...
kenrbnsn Posted June 19, 2011 Share Posted June 19, 2011 Please don't double post. You can edit your original post for 10 minutes. After that just add additional information as a reply to the thread. Ken Quote Link to comment https://forums.phpfreaks.com/topic/239793-curl-issue-sucking-the-life-out-of-me/#findComment-1231779 Share on other sites More sharing options...
kiss-o-matic Posted June 19, 2011 Author Share Posted June 19, 2011 Well, I can't modify the old one, and it turns out the topic (arguably the most important part for getting help) is misleading after I did some research. Alas, apologies... Quote Link to comment https://forums.phpfreaks.com/topic/239793-curl-issue-sucking-the-life-out-of-me/#findComment-1231782 Share on other sites More sharing options...
kiss-o-matic Posted June 20, 2011 Author Share Posted June 20, 2011 I've done some digging. My CURL session is now sending byte for byte the same information as Firefox (according to HttpFox) and what looks like perfectly fine values. I believe the server is detecting Javascript either enabled or disabled from the first time I view the page. Is this possible? The flow, again is: 1: Slurp page (Cookie set here) 2: Read form 3: Follow form to login page 4: Login using that form (this requires said cookies, but the server isn't satisfied -- forces Captcha) I've been able to spoof the other side into thinking Javascript was off by deleting the cookie set in step 1. So, my assumption is that the magic all starts here, which seems very tricky to me. The page is loaded via a GET request, and I'm sending the exact same headers. There are some <noscript> tags in the page, but these couldn't have an effect on the cookie, could they? I'm not all up to date, but last I checked, they're sent in the headers (before the page contents). Quote Link to comment https://forums.phpfreaks.com/topic/239793-curl-issue-sucking-the-life-out-of-me/#findComment-1232134 Share on other sites More sharing options...
kiss-o-matic Posted June 20, 2011 Author Share Posted June 20, 2011 Okay -- I'm comparing packets (again) and the magic is happening within <noscript> tags... or within <script> tags. One of the two. There are javascripts loaded (very,very convoluted ones) and there are alternatives within the <noscript> tags. I've yet to decipher what they do exactly. However, not being a JS pro, I'm not exactly sure what's going on. My headers & POST data match exactly, byte for byte. However, they could be storing something server side, denoting that I might not be human. Can something be set server side in a javascript? Also, is there an alternative to Curl that executes a javascript? I guess I should go through the script but it's a few thousand lines long. Cheers Quote Link to comment https://forums.phpfreaks.com/topic/239793-curl-issue-sucking-the-life-out-of-me/#findComment-1232339 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.