High frequency database queue, suggestions?

topcat · March 10, 2011

I'm using a MySQL database as a queue for pages that have been downloaded by a php based crawler. Sets of pages will be added several times per second and pages will be read by a seperate php indexer running as a daemon also several times a second. This would obviously lead to clashes - I guess the obvious solution to this is to use TSQL for the web crawler although this would cause delays for the indexer.

The other issue is the number of database interactions that the indexer would use and the performance lag this would create, I guess that this could be reduced by pulling records out of the db in groups but then memory could be an issue. Another option would be to store the pages in flat files but then I'm not sure how I could avoid issues with both applications trying to access the same file simultaneously.

Does anyone have any experiemce of handling a situation like this or have any ideas how to go about doing it in an efficient way?

I've also been looking at queue software like beanstalkd but not sure how I would implement this?

Any suggestions welcome - cheers!!

fenway · March 11, 2011

MySQL can handled thousands of statements per second.

Sign In

High frequency database queue, suggestions?

Recommended Posts

topcat

Link to comment

Share on other sites

fenway

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information