ajetrumpet Posted November 16, 2019 Share Posted November 16, 2019 (edited) hey guys, in the attached image, i'm logged into another forum I'm part of and I'm looking at the page called "who's online". I have a php traffic report page that, when accessed, echoes out database data that has been stored by way of another php script that captures geoLocation data (ip address of ISP, referrer page, date/time of visit) using PHP global variables. my question is - how does this forum script know the identity of the google spiders? in my traffic report, i am only capturing the ip address of the ISP as the identifying information. from what I understand, it's not possible to capture the actual location of the visitor, only the ISP's location. if I look up the ip address on an ip lookup website, i can see that it is a google spider, but can this be done through PHP scripting? Edited November 16, 2019 by ajetrumpet Quote Link to comment https://forums.phpfreaks.com/topic/309533-identifying-web-crawlers-spiders-by-ip-address/ Share on other sites More sharing options...
kicken Posted November 16, 2019 Share Posted November 16, 2019 Friendly spiders such as google's will identify them via the User-Agent header in their HTTP requests. For example, google sends: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) This header is most likely how the forum is deciding if it's google or not. If you follow that link in the user agent header for google, they mention being able to verify an IP belongs to google bot by doing a reverse DNS lookup on it. Other spiders may or may not have a similar IP verification mechanism, you'd have to research them individually. Quote Link to comment https://forums.phpfreaks.com/topic/309533-identifying-web-crawlers-spiders-by-ip-address/#findComment-1571603 Share on other sites More sharing options...
ajetrumpet Posted November 16, 2019 Author Share Posted November 16, 2019 kicken, I ran a test with all of these included: <?php echo "ip - " . $_SERVER['REMOTE_ADDR']; echo "<br>"; echo "gethostbyaddr - " . gethostbyaddr($_SERVER['REMOTE_ADDR']); echo "<br>"; echo "uname - " . php_uname(); echo "<br>"; echo "gethostname() - " . gethostname(); echo "<br>"; echo "HTTP_HOST - " . $_SERVER['HTTP_HOST']; echo "<br>"; echo "SERVER_NAME - " . $_SERVER['SERVER_NAME']; ?> this is really good info and I think I'll use it. one question though: HTTP_HOST and SERVER_NAME return the same result. is there any scenario where they would *not* return the same? Quote Link to comment https://forums.phpfreaks.com/topic/309533-identifying-web-crawlers-spiders-by-ip-address/#findComment-1571609 Share on other sites More sharing options...
ajetrumpet Posted November 16, 2019 Author Share Posted November 16, 2019 additionally kicken, I think I might have a corrupted file. My query for my report is: $sql = mysqli_query($conn, "SELECT ip , page , CASE WHEN referrer = '' THEN 'N/A' ELSE referrer END as referrer , DATE_FORMAT(date, '%m/%d/%y') as date , TIME_FORMAT(logged, '%T') as time FROM tblTraffic ORDER BY date DESC, time DESC"); and my PHP echo code is: <body> <table border='1'> <tr> <th>VISITOR IP ADDRESS, ISP NAME</th> <th>VISITOR DOMAIN ADDRESS<th> <th>PAGE VISITED</th> <th>DATE</th> <th>TIME</th> </tr> <?php // printing table rows while($row = mysqli_fetch_row($sql)) { echo '<tr>'; foreach ($row as $key => $col) { echo "<td>$col</td>"; } echo '</tr>'; } ?> </table> </body> I attached an image of what I'm seeing as an output. There is an extra column without a header and the data is still being outputted although i'm not querying 6 columns. can you see something wrong with this? Quote Link to comment https://forums.phpfreaks.com/topic/309533-identifying-web-crawlers-spiders-by-ip-address/#findComment-1571611 Share on other sites More sharing options...
ajetrumpet Posted November 16, 2019 Author Share Posted November 16, 2019 i have solved the issue. Quote Link to comment https://forums.phpfreaks.com/topic/309533-identifying-web-crawlers-spiders-by-ip-address/#findComment-1571612 Share on other sites More sharing options...
Barand Posted November 16, 2019 Share Posted November 16, 2019 For the record, the problem is an extra heading column, not an extra data column. You have <th> intead of a </th> thus adding an extra header cell. <th>VISITOR DOMAIN ADDRESS<th> ^ Quote Link to comment https://forums.phpfreaks.com/topic/309533-identifying-web-crawlers-spiders-by-ip-address/#findComment-1571617 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.