beepingKeyboard Posted February 12, 2013 Share Posted February 12, 2013 Hello everyone, Would like some direction, as I want to start a project and I'm not even sure if I'm headed the right way. I have a local news site, which I would like to "scrape" various of the news items off it. I already talked with their webmaster, and he said it's good to go. Ok, so I believe (please correct) that a good tool for the job would be PHP and cURL. What about using PHP Simple HTML DOM Parser? I ask because I'm just not sure of where to head. I'm a n00b at this, so diving into this project is various hours... before I even realize if what I'm doing will work or not. So, that's the general direction. Should I use PHP and cURL? (a reference doc I found here) Also, I don't know how this works, yet I would like to "scrape" the page 4-5 times per day (at pre-set times), and then save the info over in my server. So when a user to my website visits, I server the scraped information from my site (as opposed to re-scraping from the original site?). Any thoughts on this project? Thank you very much everyone! Quote Link to comment https://forums.phpfreaks.com/topic/274375-scraping-with-php-and-curl/ Share on other sites More sharing options...
Jessica Posted February 12, 2013 Share Posted February 12, 2013 I have recently used the Simple HTML DOM Parser and found it to be VERY useful. Quote Link to comment https://forums.phpfreaks.com/topic/274375-scraping-with-php-and-curl/#findComment-1411884 Share on other sites More sharing options...
beepingKeyboard Posted February 12, 2013 Author Share Posted February 12, 2013 I have recently used the Simple HTML DOM Parser and found it to be VERY useful. Thanks for the reply! I'm reading the documentation right now (and trying to make sense of it). Question: With the Simple HTML DOM Parser... will the data be scraped, and saved to my website's webserver? Or does the programming do the "scraping" every time a visitor requests a page on my site? Thank you Quote Link to comment https://forums.phpfreaks.com/topic/274375-scraping-with-php-and-curl/#findComment-1411885 Share on other sites More sharing options...
Jessica Posted February 12, 2013 Share Posted February 12, 2013 The parser just helps you *parse* the HTML. What you do with it after that is up to you. Quote Link to comment https://forums.phpfreaks.com/topic/274375-scraping-with-php-and-curl/#findComment-1411887 Share on other sites More sharing options...
tibberous Posted February 12, 2013 Share Posted February 12, 2013 You acctually need some pretty advanced knowedge to do this. First, getting a PHP script to run at set intervals takes chron daemon (typically, I know there are other ways) You probably don't need curl. Unless your doing stuff like spoofing cookies and post variable requests, you can just do: $file = file_get_contents("page.php?p=1§ion=whatever"); Biggest thing is probably going to be regex though. Btw, you might want to see if the news site has an rss feed that would be easier to parse. Quote Link to comment https://forums.phpfreaks.com/topic/274375-scraping-with-php-and-curl/#findComment-1411891 Share on other sites More sharing options...
beepingKeyboard Posted February 12, 2013 Author Share Posted February 12, 2013 The parser just helps you *parse* the HTML. What you do with it after that is up to you. Thanks... I guess my next question is "what" happens to that data. Does it go into a file (.txt?)... can it go into a .txt if I want it to? Or does it stay "in memory"? You acctually need some pretty advanced knowedge to do this. First, getting a PHP script to run at set intervals takes chron daemon (typically, I know there are other ways) You probably don't need curl. Unless your doing stuff like spoofing cookies and post variable requests, you can just do: $file = file_get_contents("page.php?p=1§ion=whatever"); Biggest thing is probably going to be regex though. Btw, you might want to see if the news site has an rss feed that would be easier to parse. Thanks for the reply! The cronjob I can do fine (thankfully!). Both schedule them, as well as directing it to run a script at x time interval (or hour). Question, the $file = file_get_contents("page.php?p1§ion=whatever"); example... is that with PHP? Or with the HTML DOM Parser? I already checked for RSS from the website, yet they don't have any running. Thanks everyone! I will work on this today, and keep you posted. Quote Link to comment https://forums.phpfreaks.com/topic/274375-scraping-with-php-and-curl/#findComment-1411917 Share on other sites More sharing options...
Jessica Posted February 12, 2013 Share Posted February 12, 2013 Thanks... I guess my next question is "what" happens to that data. Whatever you do with it. $a = 'Bob'; What happens to $a? Nothing if you don't store it somewhere. Quote Link to comment https://forums.phpfreaks.com/topic/274375-scraping-with-php-and-curl/#findComment-1411965 Share on other sites More sharing options...
beepingKeyboard Posted February 12, 2013 Author Share Posted February 12, 2013 Jessica, Thanks for the help. Yet I'm still... at a loss. Maybe I haven't expressed how much of a n00b I am at this. Any links you might suggest I go read? In your example, how could I store $a in a text file? Or... do I want to save it to a text file? I'm still checking out the HTML DOM Parser documentation. Tonight when I get home I will try and do some examples, to see if I am successful. I will update on how it goes. Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/274375-scraping-with-php-and-curl/#findComment-1411969 Share on other sites More sharing options...
Jessica Posted February 12, 2013 Share Posted February 12, 2013 You probably want to save the data you get from the other site into a database. There are lots of basic mysql tutorials you can use. Look into using mysqli or PDO in PHP. Quote Link to comment https://forums.phpfreaks.com/topic/274375-scraping-with-php-and-curl/#findComment-1411970 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.