divadiva Posted July 28, 2008 Share Posted July 28, 2008 All, I am trying to scrap Intel webiste which runs fine in IE.But ,whenever I run there website in firefox it gives me an security certificate.I have to accept it in order to view teh website in Firefox. My question is how can I ignore this warning so ,I can easily parse the data from Intel to my scraper. Thanks in advance for replying. Cheers!! Link to comment https://forums.phpfreaks.com/topic/117014-how-to-ignore-security-certificate-when-parsing-a-website/ Share on other sites More sharing options...
DarkWater Posted July 28, 2008 Share Posted July 28, 2008 The server doesn't run Firefox when accessing the data, so I don't understand the question. Link to comment https://forums.phpfreaks.com/topic/117014-how-to-ignore-security-certificate-when-parsing-a-website/#findComment-601820 Share on other sites More sharing options...
divadiva Posted July 28, 2008 Author Share Posted July 28, 2008 I want to load an webpage.But when ever I do loadHTML it doesnt do anything.When I run this website in Firefox it gives me a security certificate .How can I ignore this security certificate so ,that I can load the HTML. Link to comment https://forums.phpfreaks.com/topic/117014-how-to-ignore-security-certificate-when-parsing-a-website/#findComment-601855 Share on other sites More sharing options...
DarkWater Posted July 28, 2008 Share Posted July 28, 2008 What is "loadHtml"? >_> How are you accessing the page? Link to comment https://forums.phpfreaks.com/topic/117014-how-to-ignore-security-certificate-when-parsing-a-website/#findComment-601857 Share on other sites More sharing options...
divadiva Posted July 28, 2008 Author Share Posted July 28, 2008 By CURL.Here my code: <?php include("mainpage.php"); include_once("database.php"); set_time_limit(0); parseData(); function geCurlContent($link) { // initialize curl $ch = curl_init(); // set the url to fetch curl_setopt($ch, CURLOPT_URL, $link); // don't give me the headers just the content curl_setopt($ch, CURLOPT_HEADER, 0); // return the value instead of printing the response to browser curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // write data to variable $content = curl_exec($ch); // close curl curl_close($ch); return $content; } function parseData() { //update old data before parsing started handlOldData('INTEL'); $skip=0; $link = "https://resale.intel.com/Inventory.aspx?sid=3"; $totalData = parseEachPageData($link, 1); for($skip = 10; $skip <= $totalData * 11; $skip = $skip + 10) { parseEachPageData("https://resale.intel.com/Inventory.aspx?sid=3"); } } function parseEachPageData($link, $flag) { $maincontent = geCurlContent($link); //we get the expected page. now need parse to save actual data in db /* * substring matching of specific table content */ $content = substr($maincontent, strpos($maincontent, "<TABLE CELLSPACING=\"0\" ALIGN=\"center\" BORDER=\"0\" CLASS=\"agres\" WIDTH=\"100%\">")); $content = substr($content, 0, strpos($content, "</TABLE>")).'</TABLE>'; $content = str_replace("&", "&", $content); //now the table portion will be passed to domdocument to parse each column $dom = new DOMDocument(); //dom document object $dom->loadHTML($content); //load the tbody part Link to comment https://forums.phpfreaks.com/topic/117014-how-to-ignore-security-certificate-when-parsing-a-website/#findComment-601864 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.