Jump to content

is crawler


Destramic
 Share

Recommended Posts

how effective is browscaps crawler these days please guys?

 

 

Array
(
[browser_name_regex] => ~^mozilla/5\.0 \(.*windows nt 10\.0.*rv:49\.0.*\) gecko.* firefox.*$~
[browser_name_pattern] => Mozilla/5.0 (*Windows NT 10.0*rv:49.0*) Gecko* Firefox*
[parent] => Firefox 49.0
[platform] => Win10
[comment] => Firefox 49.0
[browser] => Firefox
[browser_maker] => Mozilla Foundation
[version] => 49.0
[majorver] => 49
[device_type] => Desktop
[device_pointing_method] => mouse
[minorver] => 0
[ismobiledevice] =>
[istablet] =>
[crawler] =>
)

 

i need to implement something to stop any crawlers inserting rows into db...hopefully browscap if it's any good...i really hate the idea of image/sum/google captures.

 

what would be my best method please?

 

thank you

Link to comment
Share on other sites

when i refer to crawler i mean bad bots or have i got the wording incorrect? :-\

 

adding onto what i said, i did some more looking about, and what seems to be a good example is a simple hidden link, disallow the link in my robot.txt so the good bots don't open it....and if accessed catch the bad bot?

 

maybe there are better alternatives

Link to comment
Share on other sites

You can't stop "bad bots" from getting to your site. They won't respect robots.txt. They'll crawl anything they can find. IP bans, at best, will just slow them down.

 

You said they are "inserting rows into db". What does that mean? That sounds like the problem that should be fixed.

Link to comment
Share on other sites

exactly...bad bots won't respect the robots.txt...so if they access a hidden link no visible to human the bad bot will open it.

 

when that link is opened, the ip and user agent is added to db, but firstly checking its not a good bot, just for good measures...so as soon as someone access' the site i can check if its a bad bot from db records and die;

 

i saw the idea from https://perishablepress.com/blackhole-bad-bots/

 

whats your thoughts?

 

thank you

Link to comment
Share on other sites

sorry requinix...a user register form for instance...a bad bot could fill out form and insert numerous rows...this is my concern as i have nothing in place yet to capture bad bots doing this.

 

is a bot capture as seen in the link above a good enough idea...or what is the best solution please?

 

thank you

Link to comment
Share on other sites

Oh.

 

Wait until bots become an issue, then use CAPTCHA. I say to wait because doing that is a bit of an annoyance to users, so if you can get away with not using it then great... but CAPTCHA (the real solutions, not the stuff you make up on your own) is very effective at stopping bots so it's the best solution when do you have the problem.

Link to comment
Share on other sites

exactly...bad bots won't respect the robots.txt...so if they access a hidden link no visible to human the bad bot will open it.

Most spam bots are going to completely ignore the robots.txt file so they will never see your hidden link either. They will also spoof user agents so they appear as a legit browser so you cannot detect them that way.

 

The only effective way to prevent them from submitting your forms is to include a CAPTCHA as has been mentioned. reCaptcha is popular and pretty good, but even something like a simple math problem or "password" will generally stop generic automated bots.

Link to comment
Share on other sites

Most spam bots are going to completely ignore the robots.txt file so they will never see your hidden link either.

 

ofcourse they will see a hidden link...thats one of the bots job to seach for href's...the bot will find it...and if bad bot he will try to open link?

Link to comment
Share on other sites

Probably. So, yes, this can in principle be used for bot detection. However, note that you'll also have false positives, i. e. legitimate users clicking on the link. For example, a visually impaired human may not realize that the link is hidden due to a bad user interface or a bad screenreader. That means you can't just reject the request. You still need a fallback for humans (e. g. a CAPTCHA).

 

There's a lot you can do about bots:

  • There are large blacklists of known spammers like StopForumSpam, Project Honey Pot or The Spamhaus Project.
  • Invisible form fields that mustn't be filled out also work pretty well as a spam trap.
  • Statistical spam filters are very effective against spam messages.
  • CAPTCHAs are annoying, but they work. You can at least use them as a second line of defense when the user has already triggered some other bot detection mechanism.

But again: None of this is perfect, so you should be very careful with hard bans.

Link to comment
Share on other sites

I suppose you need to cover all angles...im just put off with the catcha for my site at the moment as I believe it could scare people away.

 

I do like the invisible field method though.

 

@requinix you mentioned wait until bots become problem...just wonder how I would know that bots were registering on my site?

 

thank you

Link to comment
Share on other sites

...just wonder how I would know that bots were registering on my site?

 

You don't, and why would you care? Bot traffic is the background noise of the Internet, and there's no reason to worry about it as long as the bots don't cause any harm.

 

The only reason for making the registration less bot-friendly is to prevent malicious behavior like spamming. Otherwise it's completely irrelevant whether the request comes from a human or a bot.

Link to comment
Share on other sites

This thread is more than a year old.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.