Jump to content

Recommended Posts

Hi all, I'm new here (first post) so hopefully you'll all go easy on me :)

 

I'm created a price comparison website for cameras and lenses. I will have around fifteen retailers on the site. Now I need to scrape each of their websites for the prices - this I can do. However, need a little help and advice on the best way to go about tackling this...is it best to create one big script or have various ones for each retailer, etc?

 

Any help would be appreciated greatly :)

Link to comment
https://forums.phpfreaks.com/topic/254214-create-a-price-comparison-website/
Share on other sites

I would probably make each retailer their own table, and do a join on the retailers.

 

retailer1 id|item_number|title|description|price|url|timestamp

retailer2 id|item_number|title|description|price|url|timestamp

retailer3 id|item_number|title|description|price|url|timestamp

retailer4 id|item_number|title|description|price|url|timestamp

retailer5 id|item_number|title|description|price|url|timestamp

so on

 

The reasons being if that retailer left, or added new ones, faster displaying just that retailer, less resources to search one retailers table versus all the retailers, when crawling their sites can do checks if item already exists within just one table versus all, and if does update, else insert new.

 

as for the one big script or individual ones, I'd do one scraper for all, you can list all 15 retailers in a text file list, or in an array.

 

For the initial crawls you can scrape entire sites if wanted, then do one checking for new content, and eventually all their items would get there...while still getting their new content as they added it.

 

Ordinarily scrapers just look for urls, then use curl and grab all the related information on a page,use regex to find the specific content, for this I would say do the single scraper, because in this way could expand to more retailers easily by adding a new name domain in the list, and create new table.

 

But if are doing something like DOM and have to specify the areas to grab content from, I guess each one would have to be a unique crawler.

 

To me whatever works better in accuracy and efficiency is the way to go.

If I were doing this, I would make an array containing the urls for product

Hi all, I'm new here (first post) so hopefully you'll all go easy on me :)

 

I'm created a price comparison website for cameras and lenses. I will have around fifteen retailers on the site. Now I need to scrape each of their websites for the prices - this I can do. However, need a little help and advice on the best way to go about tackling this...is it best to create one big script or have various ones for each retailer, etc?

 

Any help would be appreciated greatly :)

What have you done so far?

From previous experience I can say it makes more sense to have individual harvesting scripts. The logic being that not all the websites being scanned will update at the same time and you don't want to waste server resources re-reading all 15 sites if only one has updated.

 

Massive help, thank you to all of your answers and help. Unfortunately I'll be scraping from the DOM for each site - very few have feeds which is a bit of a pain in the bum! So think I'll be getting close to my good old friend regex :) Thanks again guys/gals, this a great forum.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.