coxdabd Posted January 2, 2012 Share Posted January 2, 2012 Hi all, I'm new here (first post) so hopefully you'll all go easy on me I'm created a price comparison website for cameras and lenses. I will have around fifteen retailers on the site. Now I need to scrape each of their websites for the prices - this I can do. However, need a little help and advice on the best way to go about tackling this...is it best to create one big script or have various ones for each retailer, etc? Any help would be appreciated greatly Quote Link to comment Share on other sites More sharing options...
spiderwell Posted January 2, 2012 Share Posted January 2, 2012 i guess it depends really on how the retailers offer their information, if they all have varied formats, it might be best to write each one specifically, and then have a control script that runs through all of them? Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted January 2, 2012 Share Posted January 2, 2012 I would probably make each retailer their own table, and do a join on the retailers. retailer1 id|item_number|title|description|price|url|timestamp retailer2 id|item_number|title|description|price|url|timestamp retailer3 id|item_number|title|description|price|url|timestamp retailer4 id|item_number|title|description|price|url|timestamp retailer5 id|item_number|title|description|price|url|timestamp so on The reasons being if that retailer left, or added new ones, faster displaying just that retailer, less resources to search one retailers table versus all the retailers, when crawling their sites can do checks if item already exists within just one table versus all, and if does update, else insert new. as for the one big script or individual ones, I'd do one scraper for all, you can list all 15 retailers in a text file list, or in an array. For the initial crawls you can scrape entire sites if wanted, then do one checking for new content, and eventually all their items would get there...while still getting their new content as they added it. Ordinarily scrapers just look for urls, then use curl and grab all the related information on a page,use regex to find the specific content, for this I would say do the single scraper, because in this way could expand to more retailers easily by adding a new name domain in the list, and create new table. But if are doing something like DOM and have to specify the areas to grab content from, I guess each one would have to be a unique crawler. To me whatever works better in accuracy and efficiency is the way to go. Quote Link to comment Share on other sites More sharing options...
Zane Posted January 2, 2012 Share Posted January 2, 2012 If I were doing this, I would make an array containing the urls for product Hi all, I'm new here (first post) so hopefully you'll all go easy on me I'm created a price comparison website for cameras and lenses. I will have around fifteen retailers on the site. Now I need to scrape each of their websites for the prices - this I can do. However, need a little help and advice on the best way to go about tackling this...is it best to create one big script or have various ones for each retailer, etc? Any help would be appreciated greatly What have you done so far? Quote Link to comment Share on other sites More sharing options...
Sogo7 Posted January 2, 2012 Share Posted January 2, 2012 From previous experience I can say it makes more sense to have individual harvesting scripts. The logic being that not all the websites being scanned will update at the same time and you don't want to waste server resources re-reading all 15 sites if only one has updated. Quote Link to comment Share on other sites More sharing options...
coxdabd Posted January 3, 2012 Author Share Posted January 3, 2012 Massive help, thank you to all of your answers and help. Unfortunately I'll be scraping from the DOM for each site - very few have feeds which is a bit of a pain in the bum! So think I'll be getting close to my good old friend regex Thanks again guys/gals, this a great forum. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.