Jump to content

High frequency database queue, suggestions?


topcat

Recommended Posts

I'm using a MySQL database as a queue for pages that have been downloaded by a php based crawler. Sets of pages will be added several times per second and pages will be read by a seperate php indexer running as a daemon also several times a second. This would obviously lead to clashes - I guess the obvious solution to this is to use TSQL for the web crawler although this would cause delays for the indexer.

 

The other issue is the number of database interactions that the indexer would use and the performance lag this would create, I guess that this could be reduced by pulling records out of the db in groups but then memory could be an issue. Another option would be to store the pages in flat files but then I'm not sure how I could avoid issues with both applications trying to access the same file simultaneously.

 

Does anyone have any experiemce of handling a situation like this or have any ideas how to go about doing it in an efficient way?

 

I've also been looking at queue software like beanstalkd but not sure how I would implement this?

 

Any suggestions welcome - cheers!!

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.