How to ignore security certificate when parsing a website?

divadiva · July 28, 2008

All,

I am trying to scrap Intel webiste which runs fine in IE.But ,whenever I run there website in firefox it gives me an security certificate.I have to accept it in order to view teh website in Firefox.

My question is how can I ignore this warning so ,I can easily parse the data from Intel to my scraper.

Thanks in advance for replying.

Cheers!!

DarkWater · July 28, 2008

The server doesn't run Firefox when accessing the data, so I don't understand the question.

divadiva · July 28, 2008

I want to load an webpage.But when ever I do loadHTML it doesnt do anything.When I run this website in Firefox it gives me a security certificate .How can I ignore this security certificate so ,that I can load the HTML.

DarkWater · July 28, 2008

What is "loadHtml"? >_> How are you accessing the page?

divadiva · July 28, 2008

By CURL.Here my code:

<?php
include("mainpage.php");
include_once("database.php");		
set_time_limit(0); 


parseData();

function geCurlContent($link)		
{
// initialize curl
$ch = curl_init(); 
// set the url to fetch
curl_setopt($ch, CURLOPT_URL, $link); 
// don't give me the headers just the content
curl_setopt($ch, CURLOPT_HEADER, 0); 
// return the value instead of printing the response to browser
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
// write data to variable
$content = curl_exec($ch); 
// close curl
curl_close($ch); 	

return $content;
}

function parseData()
{
//update old data before parsing started

handlOldData('INTEL');

$skip=0;
$link = "https://resale.intel.com/Inventory.aspx?sid=3"; 

$totalData = parseEachPageData($link, 1);

for($skip = 10; $skip <= $totalData * 11; $skip = $skip + 10)
{
	parseEachPageData("https://resale.intel.com/Inventory.aspx?sid=3");
        
}
}

function parseEachPageData($link, $flag)
{				

$maincontent = geCurlContent($link);	//we get the expected page. now need parse to save actual data in db


/*
* substring matching of specific table content
*/
$content = substr($maincontent, strpos($maincontent, "<TABLE CELLSPACING=\"0\" ALIGN=\"center\" BORDER=\"0\" CLASS=\"agres\" WIDTH=\"100%\">"));

$content = substr($content, 0, strpos($content, "</TABLE>")).'</TABLE>';
$content = str_replace("&", "&", $content);
        
//now the table portion will be passed to domdocument to parse each column
$dom = new DOMDocument();	//dom document object
$dom->loadHTML($content);	//load the tbody part

Sign In

How to ignore security certificate when parsing a website?

Recommended Posts

divadiva

Link to comment

Share on other sites

DarkWater

Link to comment

Share on other sites

divadiva

Link to comment

Share on other sites

DarkWater

Link to comment

Share on other sites

divadiva

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information