dsaba Posted December 20, 2007 Share Posted December 20, 2007 If I have files that exist within a public directory that are never linked anywhere throughout all the html/php files in my server, can a web spider like google discover them? Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/ Share on other sites More sharing options...
The Little Guy Posted December 20, 2007 Share Posted December 20, 2007 I think they can if you use a robots.txt file. otherwise, you can create a sitemap, and send it to google: Google Web masters Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-419747 Share on other sites More sharing options...
dsaba Posted December 20, 2007 Author Share Posted December 20, 2007 No i dont want a web spider to discover these files.. so by your reply I'd say unless you have a robots.txt then NO a web-spider is not capable of finding these files.. right? Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-419749 Share on other sites More sharing options...
roopurt18 Posted December 20, 2007 Share Posted December 20, 2007 If you have indexing turned on or someone else links to your files, then I'd say, "Yes, a spider can find them since they are publicly available." You have to password protect them if you want to make sure no one knows they're there. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-419766 Share on other sites More sharing options...
The Little Guy Posted December 20, 2007 Share Posted December 20, 2007 google/Yahoo! (a tame webbot) won't index them if you have a proper robots.txt file, and someone decides links to them. A bad robot will index them... and then in that case you will have to password protect them, or get them off line. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-419786 Share on other sites More sharing options...
zq29 Posted December 21, 2007 Share Posted December 21, 2007 How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through. How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420329 Share on other sites More sharing options...
roopurt18 Posted December 21, 2007 Share Posted December 21, 2007 How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through. How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess. If the files are in an directory with indexes turned on and no index file and the crawler stumbles across it, then I imagine it would index them. How it would stumble across the directory in the first place is anyone's guess, but I wouldn't put it outside the realm of possibility. Bottom line, if you want something available to a limited audience and not everyone, password protect it IMO. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420335 Share on other sites More sharing options...
dsaba Posted December 21, 2007 Author Share Posted December 21, 2007 How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through. How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess. Yes this is the essence of why I asked this question... How can a robot, person, or anyone know if something exist if it has no prior knowledge of it, no robots.txt file that lists files, no links, not anything.. I mean let's say you have a robots.txt file that lists the directories of your site and not the files Still, how can it ever find files within a directory that are never linked too? Does it just look for and try making a zillion permutations of filenames/file extensions in this directory until it can find files.. I wouldn't think so.. but thats why I asked Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420405 Share on other sites More sharing options...
The Little Guy Posted December 21, 2007 Share Posted December 21, 2007 How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through. How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess. Someone could always come across the directory, and say to them self "HEY! I like this, lets make a robot that will index these files!" The files get index, then get linked to, and another robot will come across them, and link to them as well. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420535 Share on other sites More sharing options...
roopurt18 Posted December 21, 2007 Share Posted December 21, 2007 How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through. How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess. Yes this is the essence of why I asked this question... How can a robot, person, or anyone know if something exist if it has no prior knowledge of it, no robots.txt file that lists files, no links, not anything.. I mean let's say you have a robots.txt file that lists the directories of your site and not the files Still, how can it ever find files within a directory that are never linked too? Does it just look for and try making a zillion permutations of filenames/file extensions in this directory until it can find files.. I wouldn't think so.. but thats why I asked If the web server is set to automatically index directories where no index file exists, then all it needs is a directory. The web server will then supply the files. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420566 Share on other sites More sharing options...
dsaba Posted December 21, 2007 Author Share Posted December 21, 2007 If the web server is set to automatically index directories where no index file exists, then all it needs is a directory. When I said the files aren't linked anywhere, I meant from here as well. So I guess I was right, no links, no spider can find the files. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420702 Share on other sites More sharing options...
Daniel0 Posted December 21, 2007 Share Posted December 21, 2007 If the web server is set to automatically index directories where no index file exists, then all it needs is a directory. When I said the files aren't linked anywhere, I meant from here as well. So I guess I was right, no links, no spider can find the files. You can't ensure that other people won't link to the directory. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420709 Share on other sites More sharing options...
The Little Guy Posted December 21, 2007 Share Posted December 21, 2007 Someone could make a robot that created random folder names, and tested that folder against your website, to see if it finds anything in that directory, if it does not, it will create another random name, and test it... and on and on and on. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420734 Share on other sites More sharing options...
GingerRobot Posted December 21, 2007 Share Posted December 21, 2007 What would be the point in files that weren't linked to anywhere? Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420843 Share on other sites More sharing options...
The Little Guy Posted December 22, 2007 Share Posted December 22, 2007 What would be the point in files that weren't linked to anywhere? - process information - private data/images/files Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-420992 Share on other sites More sharing options...
GingerRobot Posted December 22, 2007 Share Posted December 22, 2007 - private data/images/files Again, whats the point in storing images if they're not used anywhere. The point im trying to make is that you'll probably show someone the files at some point, who might link to them etc. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-421286 Share on other sites More sharing options...
Daniel0 Posted December 22, 2007 Share Posted December 22, 2007 - private data/images/files Again, whats the point in storing images if they're not used anywhere. The point im trying to make is that you'll probably show someone the files at some point, who might link to them etc. As roopurt said, if you wish a limited audience, password protect it. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-421311 Share on other sites More sharing options...
The Little Guy Posted December 22, 2007 Share Posted December 22, 2007 - private data/images/files Again, whats the point in storing images if they're not used anywhere. The point im trying to make is that you'll probably show someone the files at some point, who might link to them etc. As roopurt said, if you wish a limited audience, password protect it. I agree, or save it into a database. Quote Link to comment https://forums.phpfreaks.com/topic/82567-how-spiders-crawl-your-website/#findComment-421312 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.