Jump to content

How do I find a sitemap of a website -_-?


miniramen

Recommended Posts

Hello,

 

People have been telling me if I want to do any crawling, I need to know the sitemap....I need to do xml parsing....

 

then I realized that sitemap is written with xml.....sorry my noobness is unbearable even for me sometimes.

 

So if I need to find a sitemap of a website, how do I go about doing it?

Link to comment
https://forums.phpfreaks.com/topic/204852-how-do-i-find-a-sitemap-of-a-website-_/
Share on other sites

Hello,

 

People have been telling me if I want to do any crawling, I need to know the sitemap....I need to do xml parsing....

 

then I realized that sitemap is written with xml.....sorry my noobness is unbearable even for me sometimes.

 

So if I need to find a sitemap of a website, how do I go about doing it?

 

You don't need the sitemap. Just start out by looking for a robots.txt file and respect the rules specified then start out by reading the index.html and obey the <meta name="ROBOTS"> tag if present. Fill your queue with any URL you find. Store whatever you think is relevant and continue with the next URL in the queue.

  • 2 years later...

If you have a website with a url like this: www.example.com then you can find the robots.txt by adding this: www.example.com/robots.txt

Then if you see something like this:

 

 

User-agent: *

Allow: /

Disallow: /inbox/

Disallow: /levels/

Disallow: /levels/extras/userpass.txt

Disallow: /users/

 

User-agent: Mediapartners-Google

Disallow:

#Begin Attracta SEO Tools Sitemap. Do not remove

sitemap: http://cdn.attracta.com/sitemap/2165581.xml.gz

#End Attracta SEO Tools Sitemap. Do not remove

 

Then you see the sitemap. Robots directory is inside every website and can have useful information to administrator access, but you wont always find the sitemap.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.