greenace92 Posted October 15, 2015 Share Posted October 15, 2015 What I am trying to do is to look up an ip-address automatically via the website ip-lookup.net for example. To do this manually, I look at my ip-grabbing log of vistiors, I copy an ip-address, then I paste it into ip-lookup.net, hit search and they spit out some information. I'm not denying that this could be a false result. I'm also not sure if I can discern between a "company" versus a private computer... usually I see a Network provider or something like that. At any rate... I want this done automatically. I began to work with web scraping a while back and I could target the input and submit button... or perhaps just do a submission but how can I access their site remotely through my website or from php? I would have to: Go to site Target input field, paste ip Search Target result field, get info Insert into database with ip-address Any thoughts would be appreciated. Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted October 15, 2015 Share Posted October 15, 2015 (edited) What's your actual goal? What kind of information do you need for what purpose? There are plenty of IP databases and services with proper APIs, but which one is appropriate depends on your specific requirements. In any case, don't webscrape when there's no need to. It's fugly, fragile and possibly against the TOS. Edited October 15, 2015 by Jacques1 Quote Link to comment Share on other sites More sharing options...
greenace92 Posted October 15, 2015 Author Share Posted October 15, 2015 All I'm trying to accomplish is determining if the visitor is human for example not a crawler from a search engine. I will have to look at these API's you mentioned. I have read about black listed ip's regarding setting up web-servers. Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted October 15, 2015 Share Posted October 15, 2015 Am all for api's, concerning ip lookups they all have limits. Seems the biggest issues are the spammers, at least the bots and indexers could bring you traffic. $remote_ip = $_SERVER['REMOTE_ADDR']; if (strstr($remote_ip, ', ')) { $ips = explode(', ', $remote_ip); $remote_ip = $ips[0]; } $spam_ip = "http://api.stopforumspam.org/api?ip=".$remote_ip; $spamdata = @simplexml_load_file($spam_ip); if ($spamdata) { $spamarray = array(); $spamarray = json_decode(json_encode($spamdata), TRUE); if($spamarray['appears'] == "yes" ){ die('spammer'); } } There are some huge apache rules and lists around which block spammy servers and bots. Just know when start using ip blocks or cidr ranges are blocking a real lot. A robot.txt file can take care of legit bots, the bad ones will ignore it If you want to discover domain names from an ip then match them in an array can use gethostbyaddr() Some other related functions there as well. Quote Link to comment Share on other sites More sharing options...
greenace92 Posted October 15, 2015 Author Share Posted October 15, 2015 (edited) Thanks Quick, got a lot of projects, but will incorporate all of the responses from my various questions. Will return when actually implemented and know more. Edited October 15, 2015 by greenace92 Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted October 15, 2015 Share Posted October 15, 2015 It's still not really clear what you're trying to achieve. Web crawlers announce themselves via the user agent and can also be easily identified with reverse DNS lookups, so there's no need for an IP blacklist. In fact, Google specifically recommends against that, because their IP addresses may change at any time. Or are you talking about malicious bots which post spam? That's an entirely different story and may be prevented with CAPTCHAs powerful content filters (e. g. Bayesian or Markov filters known from e-mail) as a last resort: blacklists And then of course there are simple bots written by amateurs which don't cause any harm and should really be left alone. So before you start to randomly implement all kinds of features, I strongly recommend you get clear about your goal. Trying to recognize bots is hardly a sensible objective, because there's such a big range of entirely different bots for entirely different purposes. Quote Link to comment Share on other sites More sharing options...
greenace92 Posted November 1, 2015 Author Share Posted November 1, 2015 (edited) I am simply trying to figure out if the ip that visited my website is a real person or if it is just a crawler or some other type of bot. I have been logging access times when someone or "something" visits my website and I see what URL they were asking for, I've seen some pretty scary stuff (to me as I don't know what they are) like triple forward slashes or deliberate url queries like admin= something or where it seems like they are deliberately trying to get access to my server / do something they are not supposed to. So what I would do manually is take the recorded ip-address and copy, then paste it into ip-lookup.net who would then tell me who it is. For example, let me find something like what I have described above: Okay here are a couple of examples that worry me as these url queries seem to deliberately be trying to find some sort of access: http://mywebsite.com/?c=4e5e5d7364f443e28fbf0d3ae744a59a http://w3.hideme.ru:80 http://159.ip-192-99-169.net/?x=() http://www.southwest.comwww.southwest.com:443 not related to my website http://192.99.169.159 Connection: Keep-Alive/ http://www.mywebsite.com/?C=D;O=D here's a triple slash which is that wget or not? http:///cgi-sys/entropysearch.cgi See there are a bunch of those, and it scares me because I don't know what they mean. Am I safe? My website is SSL protected with an A rating from Qualys, and I use password login, session required for almost every page,without session,redirect to main page I have some test pages / other websites that don't use sessions(is that bad?) but the sql database is password protected. I was told to implement bcrypt and blowfish as well as some other things. I don't know if I am safe. So, I would take an ip-address and unfortunately I don't have any linked to those queries above, not sure why Actually it looks like I do, I have a bunch of tables jeez... frequency counting and the actual url looked up. For this one here: http://mywebsite.com/?c=4e5e5d7364f443e28fbf0d3ae744a59a This is the ip that requested it 183.60.244.46 So I go to ip-lookup.net and this is what they tell me: IP : 183.60.244.46 Neighborhood Host : ? Country : China What did I have to do in order to get that? 1) Go to ip-lookup.net 2) clear my ip which is the default searched by their website 3) enter ip of interest 4) hit search 5) see result I want to automate that with a web-scraper or something, I started working on it with something to do with php but lost interest It's on my list. So that is what I want to do, figure out how to stop or just deny any weird requests like that, if it doesn't have anything to do with existing directories. I realize I could probably accomplish that with htaccess. Edited November 1, 2015 by greenace92 Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted November 1, 2015 Share Posted November 1, 2015 Sorry, but what you're trying to do makes no sense. It's only natural for a public website to receive all kinds of requests from all kinds of agents. There's nothing scary about that, and the only way to prevent “unwanted” requests is to not have a public website. Assuming that all bots are evil while all human users are good doesn't make sense either. In fact, a human who actively tries to break into your site should scare you much more than some stupid bot scanning URLs. Needless to say that most bots are legitimate, useful tools which don't cause any harm whatsoever. If you're worried about the security of your website, then do something about that. Learn the basics of the security and make sure your code, your webserver and your operating system are safe. You need to do this anyway, so you might as well start now instead of trying to fight off bots. I'm sure somebody will recommend fail2ban, but I'd be careful about that. At best, this tool is a second line of defense which you apply at the very end. And in the worst case, it will give you a false sense of security and distract you from more important security measures. Quote Link to comment Share on other sites More sharing options...
greenace92 Posted November 1, 2015 Author Share Posted November 1, 2015 (edited) Okay. I just feel really clueless as far as how to know if I am safe. Currently I use: Password login, session-based access SSL Parameterized binding Not sure much else, it seems that more and more there are companies getting hacked so I agree with starting now to learn about security measures and good practices on secure coding. Edited November 1, 2015 by greenace92 Quote Link to comment Share on other sites More sharing options...
benanamen Posted November 1, 2015 Share Posted November 1, 2015 If you want to pm me your website I will let you know if you have any obvious security issues.. Quote Link to comment Share on other sites More sharing options...
0x00 Posted November 1, 2015 Share Posted November 1, 2015 One note and not a dig but... You want to block people doing what you want to do to another site Also, when I do any form of scraping tests I use what looks like legitimate credentials. To identify malicious crackers, go use a few penetration apps like nikto or ZAP and study what they test for. You can generally spot them by the 404's they generate. Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted November 1, 2015 Share Posted November 1, 2015 The most common security vulnerabilities are summarized in the OWASP Top 10 list. In my experience, PHP applications typically struggle with injection vulnerabilities. So if you use parameterized queries and rigorous HTML-escaping, that's definitely a good start. Besides the concrete defense mechanisms, it's important to develop a security-oriented way of thinking: Keep privileges at a minimum. Set up multiple layers of protection instead of relying on a single feature. Don't trust anything, unless it's absolutely necessary. Whitelisting is generally superior to blacklisting. Established, well-tested libraries are generally superior to homegrown implementations. Pádraic Brady does an excellent job at explaining this. 1 Quote Link to comment Share on other sites More sharing options...
0x00 Posted November 1, 2015 Share Posted November 1, 2015 I was only going to mention OWASP's ZAP but I put in Nikto because it dumps everything out easy to see, but also demonstrates how a lot of these programs give false positives when faced with custom 404's. Quote Link to comment Share on other sites More sharing options...
greenace92 Posted November 1, 2015 Author Share Posted November 1, 2015 Thanks for the new information guys. benanamen PM sent. I haven't been doing any HTML escaping. On my future projects I will have to keep these tips in mind. Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted November 1, 2015 Share Posted November 1, 2015 You need to fix that immediately, not in the future. Cross-site scripting vulnerabilities are extremely dangerous and one of the first things any attacker will try to exploit. Quote Link to comment Share on other sites More sharing options...
benanamen Posted November 1, 2015 Share Posted November 1, 2015 I will PM you. You have numerous security problems. Quote Link to comment Share on other sites More sharing options...
greenace92 Posted November 1, 2015 Author Share Posted November 1, 2015 How can I even tell if I have been "infected" for the lack of a better word. I know the site has a lot of problems, thankfully in a way no one really cares about it. I don't understand about html escaping if I am using parameterized binding. Don't you apply the html escape after the query? I don't check for a real email address. Then once an account has been created you are past the point of session problems. Oh man, so glad to have asked these questions, thanks for the help guys. Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted November 1, 2015 Share Posted November 1, 2015 I don't understand about html escaping if I am using parameterized binding. Those are two unrelated security mechanisms. Parameterized queries prevent (most) SQL injection attacks, HTML-escaping is required to prevent cross-site scripting attacks. You need both. Quote Link to comment Share on other sites More sharing options...
greenace92 Posted November 1, 2015 Author Share Posted November 1, 2015 (edited) Well I have backed up my four vps's and reinstalled them, I figure I'll start from scratch. At this point I don't manage anyone or have any users yet so this is not a problem. It's good to catch the potential problems before they happen. I'm really thankful that you guys have caught me up, I have more to research/learn but this is really great. I want to do it right. Much of my work in other interests has been half-ass so, I really want to be competent in what I am doing. Those are two unrelated security mechanisms. Parameterized queries prevent (most) SQL injection attacks, HTML-escaping is required to prevent cross-site scripting attacks. You need both. Right, alright, I'm going to start from the ground up. I think I might start a thread about the newest version of apache and openssl if I run into problems again. I stuck with the older version of apache as it came preinstalled on debian 7 but there is debian 8 now and I figure I should choose the latest version. Random thought, the age is off on this site, unless it is just me? Edited November 1, 2015 by greenace92 Quote Link to comment Share on other sites More sharing options...
benanamen Posted November 1, 2015 Share Posted November 1, 2015 FYI: Your not "Stuck" with old Apache and you don't have to upgrade the OS to install it. All topics for a different thread. I would suggest you do some research on what to do before just starting a thread on it. For development you could always set up virtual servers on your computer and run anything anyway you want it. There is VMware and many other such software's to do it. Some free, some paid. I have just about every OS there is on my Windows 7 machine and even several OS versions of some of them. Quote Link to comment Share on other sites More sharing options...
greenace92 Posted November 1, 2015 Author Share Posted November 1, 2015 I jsut recently got into virtual boxing on my windows computer. Yeah I don't know if it is a good idea to switch to the latest OS version if it is not stable. I tried to compile the latest apache, ran into some errors on openssl. Yeah I have a lot to work on, I started to get into nginx server set too. So much to learn. Quote Link to comment Share on other sites More sharing options...
benanamen Posted November 1, 2015 Share Posted November 1, 2015 On a virtual machine it really doesn't matter if it is stable or not. You dont switch, you just create another VM with the other OS. And you really dont need to compile Apache for what your doing, Just use the Package Manager to install/update. Quote Link to comment Share on other sites More sharing options...
greenace92 Posted November 1, 2015 Author Share Posted November 1, 2015 Right I asked about it on webhostingtalk regarding the newest apache version. With OVH and Debian 7 it seemed that the 2.2.24?(outdated version) came preinstalled. So to install the newest version I endeded following someone's compiling instructions verbatim but I ran into problems regarding openssl. I meant the stable version on the actual vps. But I really have to figure out what I am doing. Quote Link to comment Share on other sites More sharing options...
benanamen Posted November 1, 2015 Share Posted November 1, 2015 (edited) All you have to do is update the sources file and it will update from the version 8 repository using the package manager. We are getting way off topic for this thread though. Edited November 1, 2015 by benanamen Quote Link to comment Share on other sites More sharing options...
greenace92 Posted November 2, 2015 Author Share Posted November 2, 2015 Definitely, thanks to everyone for their input Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.