Jump to content

how spiders crawl your website


dsaba

Recommended Posts

How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through.

 

How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess.

Link to comment
Share on other sites

How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through.

 

How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess.

If the files are in an directory with indexes turned on and no index file and the crawler stumbles across it, then I imagine it would index them.  How it would stumble across the directory in the first place is anyone's guess, but I wouldn't put it outside the realm of possibility.

 

Bottom line, if you want something available to a limited audience and not everyone, password protect it IMO.

Link to comment
Share on other sites

How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through.

 

How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess.

 

Yes this is the essence of why I asked this question...

How can a robot, person, or anyone know if something exist if it has no prior knowledge of it, no robots.txt file that lists files, no links, not anything..

I mean let's say you have a robots.txt file that lists the directories of your site and not the files

Still, how can it ever find files within a directory that are never linked too?

Does it just look for and try making a zillion permutations of filenames/file extensions in this directory until it can find files.. I wouldn't think so.. but thats why I asked

Link to comment
Share on other sites

How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through.

 

How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess.

 

Someone could always come across the directory, and say to them self "HEY! I like this, lets make a robot that will index these files!"

 

The files get index, then get linked to, and another robot will come across them, and link to them as well.

Link to comment
Share on other sites

How can a 'bad robot' find the files if they are not linked to in one form or another? My limited understanding of how robots work is that they sift through the source of a page looking for links and follow each of those links to the next page to look through.

 

How is it possible for them to find files that haven't been linked to, unless it's a malicious crawler hitting an unsecured server I guess.

 

Yes this is the essence of why I asked this question...

How can a robot, person, or anyone know if something exist if it has no prior knowledge of it, no robots.txt file that lists files, no links, not anything..

I mean let's say you have a robots.txt file that lists the directories of your site and not the files

Still, how can it ever find files within a directory that are never linked too?

Does it just look for and try making a zillion permutations of filenames/file extensions in this directory until it can find files.. I wouldn't think so.. but thats why I asked

 

If the web server is set to automatically index directories where no index file exists, then all it needs is a directory.  The web server will then supply the files.

Link to comment
Share on other sites

If the web server is set to automatically index directories where no index file exists, then all it needs is a directory.

 

When I said the files aren't linked anywhere, I meant from here as well.  So I guess I was right, no links, no spider can find the files.

Link to comment
Share on other sites

If the web server is set to automatically index directories where no index file exists, then all it needs is a directory.

 

When I said the files aren't linked anywhere, I meant from here as well.  So I guess I was right, no links, no spider can find the files.

 

You can't ensure that other people won't link to the directory.

Link to comment
Share on other sites

- private data/images/files

 

Again, whats the point in storing images if they're not used anywhere. The point im trying to make is that you'll probably show someone the files at some point, who might link to them etc.

 

As roopurt said, if you wish a limited audience, password protect it.

Link to comment
Share on other sites

- private data/images/files

 

Again, whats the point in storing images if they're not used anywhere. The point im trying to make is that you'll probably show someone the files at some point, who might link to them etc.

 

As roopurt said, if you wish a limited audience, password protect it.

 

I agree, or save it into a database.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.