Jump to content

How to extract/download content from HTTPS page?


shoiab

Recommended Posts

Hello to all the Members of this forum, Im Shoiab, A novice programmer in php.. for my first job I have been recently assigned a project, in which I have got to extract/download the contents of the webpage (of my clients website) from HTTPS webpage using cURL. In other words I want to extract the same exact webpage to my local host.

 

Let me tell you, what all I have done so far, I am able to download the web content from "www.virginholidays.co.uk" here is the link to book a resort

"http://www.virginholidays.co.uk/brochures/florida/holidays/orlando/kissimmee/champions_world_resort" when i click on BOOK THE HOLIDAY BUTTON, it takes me to "https webpage" from which im not able to download (https://www.virginholidays.co.uk/book/start)

 

Im using windows XP, IE 5, php 5.2 and fiddler.

 

Here is my code:

 

$req1="GET /book/start HTTP/1.0\r\n";

$req1.='Accept: */*';

$req1.="\r\nAccept-Encoding: gzip, deflate

Cookie: _#lc=#; 90225614_clogin=l=1259059733&v=1&e=1259062485781;

 

__utmc=262657675;

CoreID6=60127103647212586967853;

 

__utma=262657675.233062282.1258696796.1259047752.1259059734.14;

__utmz=262657675.1258696796.1.1.utmccn=(direct)|utmcsr=(direct)

 

|utmcmd=(none);

_#uid=1258696798931.315033071.3223127.1883.436744734.051;

 

_#srchist=11611%3A1%3A20091221055958;

_#sess=1%7C20091120062958%7C1; _#vdf=11611%7C1%7C20091221055958;

 

__utmb=262657675;

 

ASP.NET_SessionId=zpn5ftje1xxodv55f1h3yg45; cmTPSet=Y;

 

cookie_complete=Region%3DFlorida%26Resort%3D2018.OR;

 

_csoot=1259036845125;

 

ememberedSearch=GeographyArea=Florida&GeographyResort=329.OR&Depart

 

ureAirport=MAN&DepartureDate=Fri 11 Dec

 

2009&Duration=7&AdultPax=2&ChildPax=0&InfantPax=0&ChildAge1=&ChildA

 

ge2=&ChildAge3=&ChildAge4=&ChildAge5=&ChildAge6=&ChildAge7=&ChildAg

 

e8=&SearchType=complete; _csuid=X47174a9c82f607;

 

cmRS=t3=1259060790328&pi=Hotel%20Options%20-%20Atop

 

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;

 

InfoPath.2; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR

 

3.5.30729)

 

Host: http://www.virginholidays.co.uk

Connection: Keep-Alive

Accept-Language: en-us";

 

$header[0] = "Accept:

 

text/xml,application/xml,application/xhtml+xml,application/json,";

$header[0] .=

 

"text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";

$header[] = "Cache-Control: public";

$header[] = "Connection: keep-alive";

$header[] = "Keep-Alive: 300";

$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";

$header[] = "Accept-Language: en-us,en;q=0.5";

$header[] = "Pragma: "; // browsers keep this blank.

$cookie="#lc=#; 90225614_clogin=l=1259059733&v=1&e=1259062485781;

 

__utmc=262657675;

 

CoreID6=60127103647212586967853;

 

__utma=262657675.233062282.1258696796.1259047752.1259059734.14;

__utmz=262657675.1258696796.1.1.utmccn=(direct)|utmcsr=(direct)

 

|utmcmd=(none);

_#uid=1258696798931.315033071.3223127.1883.436744734.051;

 

_#srchist=11611%3A1%3A20091221055958;

_#sess=1%7C20091120062958%7C1; _#vdf=11611%7C1%7C20091221055958;

 

__utmb=262657675;

ASP.NET_SessionId=zpn5ftje1xxodv55f1h3yg45; cmTPSet=Y;

 

cookie_complete=Region%3DFlorida%26Resort%3D2018.OR;

 

_csoot=1259036845125;

 

RememberedSearch=GeographyArea=Florida&GeographyResort=329.OR&Depar

 

tureAirport=MAN&DepartureDate=Fri 11 Dec

 

2009&Duration=7&AdultPax=2&ChildPax=0&InfantPax=0&ChildAge1=&ChildA

 

ge2=&ChildAge3=&ChildAge4=&ChildAge5=&ChildAge6=&ChildAge7=&ChildAg

 

e8=&SearchType=complete; _csuid=X47174a9c82f607;

 

cmRS=t3=1259060790328&pi=Hotel%20Options%20-%20Atop";

 

$ch = curl_init();

curl_setopt($ch,

 

CURLOPT_URL,"https://www.virginholidays.co.uk/book/start");

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);

curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);

curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt ($ch, CURLOPT_HTTPHEADER, $header);

curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);

curl_setopt($ch, CURLOPT_POST, 0);

curl_setopt($ch, CURLOPT_HEADER, 1);

curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');

curl_setopt ($ch, CURLOPT_COOKIE, $cookie);

$response1=curl_exec($ch);

curl_close($ch);

echo $response1;

 

$response = str_replace

 

("/_assets/","http://www.virginholidays.co.uk/_assets/",$response);

$response = str_replace

 

("/brochures/","http://www.virginholidays.co.uk/brochures/",$respon

 

se);

$response = str_replace

 

("/dynamichtag.aspx","http://www.virginholidays.co.uk/dynamichtag.a

 

spx",$response);

echo $response;

 

Could you please help me download the content of https webpage? Im not sure what is the issue? Is the cookie or session expired? Or I need to write a different code..?

 

Please help,

Thanks in advance.

hai shoiab

i dont think u can pass the values in the https..

coz these values will be encrypted while getting forwarded.

[pre]<form name="aspnetForm" method="post" action="default.aspx" onsubmit="javascript:return WebForm_OnSubmit();" id="aspnetForm">

<div>

<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />

<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />

<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKMTEyMjI5MDMwOQ9kFgJ....................[/pre]

 

if u check the source code then u will come to know that

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.