Jump to content

stats - spiders


dal_oscar

Recommended Posts

I am storing all the IP addresses and their useragents of all visitors on my website in a mysql table, and I want to filter out the spiders and cralwers when I am displaying the totals for webstats.

Its a really long list - 557 user agents. I get about 6000 hits a day, and its impossible to query a seperate useragent table or perfom a simple search text (i have tried both) for each hit, especially since I need to display hits each day for 4 weeks.

Any suggestions?
Link to comment
Share on other sites

[!--quoteo(post=362205:date=Apr 6 2006, 08:24 AM:name=Desdinova)--][div class=\'quotetop\']QUOTE(Desdinova @ Apr 6 2006, 08:24 AM) [snapback]362205[/snapback][/div][div class=\'quotemain\'][!--quotec--]
how can you tell which one is a spider and which is not?

if that's easy, why not write the spiders to another stats-tabel?
[/quote]


that's the problem - checking which is a spider.

i have a list of useragents, and when I get a new hit, I am currently just dumping it in a different table. During stats display, i retrieve each hit and compare that useragent to the spider useragent's list.

now, i can do this comparison during the initial dumping of the hit record but dont want to slow it down for the visitor.

what i am really looking for, is an optimum way to check if a hit is by a spider - a text search or a mysql query are very slow -

suggestions?
Link to comment
Share on other sites

maybe you could speed up the query by putting the spiders in categories.

say you create a table Spiders.
In this table you create columns ID, A, B, C, D through Z

every spider gets written down in the col which maches the spiders useragents first char.

so basically, you don't have your query searching all fields when checking for a spider, but only checking the col which matches the first char.

Think this should at least decrease load and thus waiting time.


but if it's a good solution, I don't know really.
Link to comment
Share on other sites

[!--quoteo(post=362215:date=Apr 6 2006, 08:55 AM:name=Desdinova)--][div class=\'quotetop\']QUOTE(Desdinova @ Apr 6 2006, 08:55 AM) [snapback]362215[/snapback][/div][div class=\'quotemain\'][!--quotec--]
maybe you could speed up the query by putting the spiders in categories.

say you create a table Spiders.
In this table you create columns ID, A, B, C, D through Z

every spider gets written down in the col which maches the spiders useragents first char.

so basically, you don't have your query searching all fields when checking for a spider, but only checking the col which matches the first char.

Think this should at least decrease load and thus waiting time.
but if it's a good solution, I don't know really.
[/quote]

Hmm....that sounds better. I'll implement this immediately.

But I'll be happy to hear of other methods if someone knows any.

Thanks Desdinova
Link to comment
Share on other sites

[!--quoteo(post=362216:date=Apr 6 2006, 08:59 AM:name=dal_oscar)--][div class=\'quotetop\']QUOTE(dal_oscar @ Apr 6 2006, 08:59 AM) [snapback]362216[/snapback][/div][div class=\'quotemain\'][!--quotec--]
Hmm....that sounds better. I'll implement this immediately.

But I'll be happy to hear of other methods if someone knows any.

Thanks Desdinova
[/quote]


Hi,
I have done that, but it still takes ages to load!!!
Any other ideas?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.