sloth456 Posted October 14, 2010 Share Posted October 14, 2010 Hi everyone, thanks for reading. I have live HTTP headers installed on Firefox, hopefully some of you will be familiar with this addon. I would like to make a php script that is able to do what live http headers does, that is, I'd like to be able to give the script a URL and it will return all the headers. I know how to use CURLOPT_HEADER and CURLOPT_RETURNTRANSFER but this doesn't quite do what I'm looking for. For example if I go to : https://adwords.google.com/o/Targeting/Explorer?__c=1000000000&__u=1000000000&ideaRequestType=KEYWORD_STATS#search.none with Live HTTP Headers switched on I get the following data under the 'generators' tab GET /o/Targeting/Explorer?__c=1000000000&__u=1000000000&ideaRequestType=KEYWORD_STATS GET /cues/cues.js GET /ga.js GET /cues/cb?__u=1000000000&__c=1000000000&l=en_US&v=5E5BE5A3D9AD806BA7FF2C9FE2E15DF9&a=1000000000 GET /cues/metrics/?requestType=external&startTime=1287073462287&browserStartTime=1287073471028&browserEndTime=1287073472156 GET /cues/metrics/?requestType=notabs&startTime=1287073462287&browserStartTime=1287073471028&browserEndTime=1287073472162 GET /o/Targeting/clear.cache.gif GET /__utm.gif?utmwv=4.8.6&utmn=606563096&utmhn=adwords.google.com&utmcs=UTF-8&utmsr=1280x1024&utmsc=24-bit&utmul=en-gb&utmje=1&utmfl=10.1%20r85&utmdt=Google%20AdWords%3A%20Traffic%20Estimator&utmhid=1700273830&utmr=-&utmp=%2FAnonymous%2FTargetingExplorer%2FKeywordStats%3Fcontext%3DHistoryChange%26__c%3D1000000000%26__u%3D1000000000%26ideaRequestType%3DKEYWORD_STATS&utmac=UA-3418223-1&utmcc=__utma%3D229779660.119716289.1279381615.1286975241.1287071773.41%3B%2B__utmz%3D229779660.1279381615.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B&utmu=q GET /o/Targeting/F098CA184E661C697D29595B67C57BAA.cache.png GET /favicon.ico POST /o/Targeting/captcha?__u=1000000000&__c=1000000000&challengedService=/o/Targeting/g 6|1|4|https://adwords.google.com/o/Targeting/|02990CDDCF521142AE1B373ED2008D94|_|getToken|1|2|3|4|0| GET /o/Targeting/captchaData?token=AJtyWwbzrehZWrT8puwruvcvjxCHnRJWKqWvpWfw54W2AjmF2cAlFpYLcL13KE0hi7retjZclcFHqnd8yMC0vDfP4tRNXISZ1QfbdLBK_WVW1wqlzqrR2ouSBgclxUiU0RiHUXHWy1VrIlo5oq3xNeemD-dZHQr0ed1s6p-dBF9mvyvkct1BifkXPAD___W32qxemYPbi6qvgpec3-rgsesGVUoYPglxzg&type=IMAGE GET /favicon.ico However my current CURL script only returns the following HTTP/1.1 200 OK Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: Fri, 01 Jan 1990 00:00:00 GMT Date: Thu, 14 Oct 2010 17:05:12 GMT Content-Type: text/html; charset=UTF-8 X-Invoke-Duration: 8 X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block Server: GSE Transfer-Encoding: chunked This is my script so far <?php $url="https://adwords.google.com/o/Targeting/Explorer?__c=1000000000&__u=1000000000&ideaRequestType=KEYWORD_STATS#search.none"; $useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"; //next open a new CURL session $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_USERAGENT, $useragent); curl_setopt ($ch, CURLOPT_HEADER, 1); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ch, CURLOPT_FRESH_CONNECT, 0); curl_setopt ($ch, CURLOPT_CAINFO, dirname(__FILE__)."/cacert.pem"); $source=curl_exec($ch); //the source code is now stored in $source, lets close the curl session curl_close($ch); echo $source; ?> Quote Link to comment Share on other sites More sharing options...
AbraCadaver Posted October 14, 2010 Share Posted October 14, 2010 The Live HTTP Headers that you are seeing are the client request headers sent to the server. To get the curl request headers: //set this curl_setopt($ch, CURLINFO_HEADER_OUT, true); //after curl_exec $requests = curl_getinfo($ch, CURLINFO_HEADER_OUT ); Quote Link to comment Share on other sites More sharing options...
sloth456 Posted October 15, 2010 Author Share Posted October 15, 2010 Thanks AbraCadaver! That seems to be part of the solution, I get this: GET /o/Targeting/Explorer?__c=1000000000&__u=1000000000&ideaRequestType=KEYWORD_STATS#search.none HTTP/1.1 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1 Host: adwords.google.com Accept: */* Cookie: S=adwords-optimization=-DwrCLnPsqsWKQcAK4nJyw; AdsUserLocale=en_US; PREF=ID=b2a448c2548e92a2:TM=1264175150:LM=1264175150:S=xxonXdFebwKBUsGH How can I get the rest of the data as shown in my first post? I'm particularly interested in obtaining this line GET /o/Targeting/captchaData?token=AJtyWwbzrehZWrT8puwruvcvjxCHnRJWKqWvpWfw54W2AjmF2cAlFpYLcL13KE0hi7retjZclcFHqnd8yMC0vDfP4tRNXISZ1QfbdLBK_WVW1wqlzqrR2ouSBgclxUiU0RiHUXHWy1VrIlo5oq3xNeemD-dZHQr0ed1s6p-dBF9mvyvkct1BifkXPAD___W32qxemYPbi6qvgpec3-rgsesGVUoYPglxzg&type=IMAGE Thanks for all your help so far, really appreciate it. Quote Link to comment Share on other sites More sharing options...
BlueSkyIS Posted October 15, 2010 Share Posted October 15, 2010 To repeat what AbraCadaver said: The Live HTTP Headers that you are seeing are the client request headers sent to the server. Those headers are the headers sent by Firefox. They don't come from the remote server. It seems as though you expect to use curl to get those headers from a server, and you can't because those headers didn't come from the server. Quote Link to comment Share on other sites More sharing options...
btherl Posted October 15, 2010 Share Posted October 15, 2010 The additional GET lines you see from LiveHTTPHeaders are additional requests made by the browser, based on the HTML. If you are using curl you won't be requesting those, because curl just does a single request (and follows 30x redirects if configured to do so). Curl doesn't process the HTML and request additional items like images and javascript. What you can try is parse the HTML and find where that captcha request is generated. As long as it's not generated by javscript .. and good luck trying to decode the captcha image Quote Link to comment Share on other sites More sharing options...
sloth456 Posted October 15, 2010 Author Share Posted October 15, 2010 Thanks guys, I suspected it might not be possible to get what I want. I've tried to look in the HTMl to see where the captcha token is generated or if there was a way of requesting one from a file on Adwords, no luck so far but I'll keep trying. As for decoding the captcha image, I'll probably just end up using decaptcha or something. Or retrieve and enter it manually. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.