Destramic Posted September 25, 2016 Share Posted September 25, 2016 how effective is browscaps crawler these days please guys? Array([browser_name_regex] => ~^mozilla/5\.0 \(.*windows nt 10\.0.*rv:49\.0.*\) gecko.* firefox.*$~[browser_name_pattern] => Mozilla/5.0 (*Windows NT 10.0*rv:49.0*) Gecko* Firefox*[parent] => Firefox 49.0[platform] => Win10[comment] => Firefox 49.0[browser] => Firefox[browser_maker] => Mozilla Foundation[version] => 49.0[majorver] => 49[device_type] => Desktop[device_pointing_method] => mouse[minorver] => 0[ismobiledevice] =>[istablet] =>[crawler] =>) i need to implement something to stop any crawlers inserting rows into db...hopefully browscap if it's any good...i really hate the idea of image/sum/google captures. what would be my best method please? thank you Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/ Share on other sites More sharing options...
requinix Posted September 25, 2016 Share Posted September 25, 2016 I must be misunderstanding something because it sounds like you're suggesting that search engine spiders are submitting data on your site and so you don't want them to index it at all. Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537810 Share on other sites More sharing options...
Destramic Posted September 26, 2016 Author Share Posted September 26, 2016 when i refer to crawler i mean bad bots or have i got the wording incorrect? adding onto what i said, i did some more looking about, and what seems to be a good example is a simple hidden link, disallow the link in my robot.txt so the good bots don't open it....and if accessed catch the bad bot? maybe there are better alternatives Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537817 Share on other sites More sharing options...
requinix Posted September 26, 2016 Share Posted September 26, 2016 You can't stop "bad bots" from getting to your site. They won't respect robots.txt. They'll crawl anything they can find. IP bans, at best, will just slow them down. You said they are "inserting rows into db". What does that mean? That sounds like the problem that should be fixed. Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537819 Share on other sites More sharing options...
Destramic Posted September 26, 2016 Author Share Posted September 26, 2016 exactly...bad bots won't respect the robots.txt...so if they access a hidden link no visible to human the bad bot will open it. when that link is opened, the ip and user agent is added to db, but firstly checking its not a good bot, just for good measures...so as soon as someone access' the site i can check if its a bad bot from db records and die; i saw the idea from https://perishablepress.com/blackhole-bad-bots/ whats your thoughts? thank you Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537820 Share on other sites More sharing options...
ginerjm Posted September 26, 2016 Share Posted September 26, 2016 You're missing the question that Requinix is asking you. WHAT DATABASE is being updated by these 'bad bots'? As Requinex says - THAT is your primary problem right now. Stop unauthorized access to your database if it is your database. Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537822 Share on other sites More sharing options...
Destramic Posted September 26, 2016 Author Share Posted September 26, 2016 sorry requinix...a user register form for instance...a bad bot could fill out form and insert numerous rows...this is my concern as i have nothing in place yet to capture bad bots doing this. is a bot capture as seen in the link above a good enough idea...or what is the best solution please? thank you Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537826 Share on other sites More sharing options...
requinix Posted September 26, 2016 Share Posted September 26, 2016 Oh. Wait until bots become an issue, then use CAPTCHA. I say to wait because doing that is a bit of an annoyance to users, so if you can get away with not using it then great... but CAPTCHA (the real solutions, not the stuff you make up on your own) is very effective at stopping bots so it's the best solution when do you have the problem. Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537827 Share on other sites More sharing options...
kicken Posted September 26, 2016 Share Posted September 26, 2016 exactly...bad bots won't respect the robots.txt...so if they access a hidden link no visible to human the bad bot will open it. Most spam bots are going to completely ignore the robots.txt file so they will never see your hidden link either. They will also spoof user agents so they appear as a legit browser so you cannot detect them that way. The only effective way to prevent them from submitting your forms is to include a CAPTCHA as has been mentioned. reCaptcha is popular and pretty good, but even something like a simple math problem or "password" will generally stop generic automated bots. Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537828 Share on other sites More sharing options...
Destramic Posted October 1, 2016 Author Share Posted October 1, 2016 Most spam bots are going to completely ignore the robots.txt file so they will never see your hidden link either. ofcourse they will see a hidden link...thats one of the bots job to seach for href's...the bot will find it...and if bad bot he will try to open link? Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537945 Share on other sites More sharing options...
Jacques1 Posted October 1, 2016 Share Posted October 1, 2016 Probably. So, yes, this can in principle be used for bot detection. However, note that you'll also have false positives, i. e. legitimate users clicking on the link. For example, a visually impaired human may not realize that the link is hidden due to a bad user interface or a bad screenreader. That means you can't just reject the request. You still need a fallback for humans (e. g. a CAPTCHA). There's a lot you can do about bots: There are large blacklists of known spammers like StopForumSpam, Project Honey Pot or The Spamhaus Project. Invisible form fields that mustn't be filled out also work pretty well as a spam trap. Statistical spam filters are very effective against spam messages. CAPTCHAs are annoying, but they work. You can at least use them as a second line of defense when the user has already triggered some other bot detection mechanism. But again: None of this is perfect, so you should be very careful with hard bans. Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537948 Share on other sites More sharing options...
Destramic Posted October 1, 2016 Author Share Posted October 1, 2016 I suppose you need to cover all angles...im just put off with the catcha for my site at the moment as I believe it could scare people away. I do like the invisible field method though. @requinix you mentioned wait until bots become problem...just wonder how I would know that bots were registering on my site? thank you Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537955 Share on other sites More sharing options...
Jacques1 Posted October 2, 2016 Share Posted October 2, 2016 ...just wonder how I would know that bots were registering on my site? You don't, and why would you care? Bot traffic is the background noise of the Internet, and there's no reason to worry about it as long as the bots don't cause any harm. The only reason for making the registration less bot-friendly is to prevent malicious behavior like spamming. Otherwise it's completely irrelevant whether the request comes from a human or a bot. Quote Link to comment https://forums.phpfreaks.com/topic/302240-is-crawler/#findComment-1537962 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.