How do I find a sitemap of a website -_-?

miniramen · June 15, 2010

Hello,

People have been telling me if I want to do any crawling, I need to know the sitemap....I need to do xml parsing....

then I realized that sitemap is written with xml.....sorry my noobness is unbearable even for me sometimes.

So if I need to find a sitemap of a website, how do I go about doing it?

Bottyz · June 15, 2010

usually site maps are stored in the base directory of the website. Well, as per google's preferred requirement... so http://www.website.com/sitemap.xml or Sitemap.xml

Not sure if you can do a search for it?

DavidAM · June 15, 2010

You might look for the Sitemap: entries in the robots.txt file. It is a non-standard entry, but Google uses it. You ARE checking that file anyway, right? You should be.

ignace · June 15, 2010

Hello,

People have been telling me if I want to do any crawling, I need to know the sitemap....I need to do xml parsing....

then I realized that sitemap is written with xml.....sorry my noobness is unbearable even for me sometimes.

So if I need to find a sitemap of a website, how do I go about doing it?

You don't need the sitemap. Just start out by looking for a robots.txt file and respect the rules specified then start out by reading the index.html and obey the <meta name="ROBOTS"> tag if present. Fill your queue with any URL you find. Store whatever you think is relevant and continue with the next URL in the queue.

YTxMasterModzx · November 3, 2012

If you have a website with a url like this: www.example.com then you can find the robots.txt by adding this: www.example.com/robots.txt

Then if you see something like this:

User-agent: *

Allow: /

Disallow: /inbox/

Disallow: /levels/

Disallow: /levels/extras/userpass.txt

Disallow: /users/

User-agent: Mediapartners-Google

Disallow:

#Begin Attracta SEO Tools Sitemap. Do not remove

sitemap: http://cdn.attracta.com/sitemap/2165581.xml.gz

#End Attracta SEO Tools Sitemap. Do not remove

Then you see the sitemap. Robots directory is inside every website and can have useful information to administrator access, but you wont always find the sitemap.

Sign In

How do I find a sitemap of a website -_-?

Recommended Posts

miniramen

Link to comment

Share on other sites

Bottyz

Link to comment

Share on other sites

DavidAM

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

YTxMasterModzx

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information