
isham research
Sitemaps are a useful tool in many ways, though they currently have little or no effect on the way a site is ranked by the search engines. If your problem is ranking in the SERPs, stop reading now - concentrate on your site's content and the world's view of it as expressed by organic inbound links.
There are three basic types - plain text, HTML and XML.
Plain text sitemaps are simply lists of URIs, one per line, with no additional data. They were orginally designed for batch submission of pages to search engines, but explicit submission of URIs is no longer required - it's preferable and some say necessary for the search engine spiders to find each site by following links. Once they find a site, properly-designed internal navigation should take them to every page. Javascript-based navigation may not do this.
HTML sitemaps are the only ones rendered properly by a browser, and hence the only ones that can be regarded as human-readable though this can be changed. If the <title> statement is used to generate the anchor text, this is yet another of many reasons to make the it both unique and descriptive on every page. The isham research system prepends the ISO 8601 date to the title - this ensures that the anchor text for each page changes each time that page is updated.
XML sitemaps - often simply "Google sitemaps" but now also supported by Yahoo, MSN, etc. - are the current vogue since their introduction by Google in June 2005. But they are not mandatory and have little or no effect on a site's ranking in the search results. So what do they do? This is a sample URL entry for the isham research home page:
<url>
<loc>http://www.isham-research.co.uk/index.html</loc>
<lastmod>2007-12-31</lastmod>
<priority>0.3</priority>
</url>
Note that the first priority is set quite low within the allowable range of 1.0 to 0.0 - only 2.9% of visitors actually reach the home page. Over 80% of visitors arrive from search engines on specific landing pages.
This entry is for an archived page on the same site:
<url>
<loc>http://www.isham-research.co.uk/19990126.html</loc>
<lastmod>2006-07-03</lastmod>
<changefreq>never</changefreq>
<priority>0.0</priority>
<url>
Sadly, there is no provision in an XML sitemap for anchor text, as there is in an HTML sitemap. And some search engines don't support them anyway. So the two are not true alternatives, and the consequence is that every site should still provide an HTML sitemap.
But the XML version permits new inputs, at least one of which - priority - has business implications.
Priority is by far the most interesting of the XML sitemap parameters. It is the only explicit way for a webmaster to express a subjective view of a page's relative importance within a site, and may have serious business implications. Perhaps one product has a large inventory, or a larger profit margin than others. It should not be necessary to repeat that priority is relative within a site - setting every page to 1.0 will NOT beat everyone else on the web - it just wastes an opportunity.
Lastmod is a way of telling a search engine crawler the last date that a page was changed. This is potentially useful to the crawler, because FTP has no way of transferring or setting the change date of a page on a server - the apparent date of the file as seen by the crawler is the date it was transferred to the web server. Some FTP clients (such as CuteFTP) claim to be able to set the server-side date. Many management systems refresh all pages regularly, so the search engine crawlers may be unable to determine the true date of the file and can waste bandwidth (theirs and the site's) crawling pages that haven't changed. They cannot be fooled - even if the header says the file has been changed, a checksum will show it hasn't. Lastmod is a way of conveying the fact that a page with a newer server-side date has not in fact been changed and does not need to be crawled again. Contrary to many people's beliefs, crawl frequency is not related to ranking.
Changefreq is probably the least useful and most dangerous of the XML sitemap parameters and is in most cases better left unspecified - the search engines are able to determine this for themselves. Herein lies in fact a danger - many webmasters are uploading XML sitemaps with changefreq set to "daily" (or even "always") in the hope that search engines will crawl their sites more frequently - although as stated above crawl frequency and search ranking are not related. Moreover, if the search engine crawlers are told a page changes daily and they discover it really changes two or three times a year (or in one recent case not since 1996), will this affect their trust of the site? Possibly the only useful value is 'never' on archive material - saving bandwidth by asking a search engine not to bother crawling pages that don't change. And if such a page should change, an updated <lastmod> will get it crawled.
Priority needs management. First of all, it should reflect business goals - which products are most profitable, where inventory is becoming a problem, what other marketing campaigns are running, etc. Secondly, a site's priority profile should be asymptotic - or "long tail" - just a very few high priority pages with the majority really quite low. Google's default of 0.5 is, in this context, much too high to allow fine definition of the upper part of the curve.
Discovery is a benefit of both HTML and XML sitemaps, making it a useful way for all search engines to find pages that normal navigation links would not take them to. As Google puts it:
"Sitemaps are particularly beneficial when users can't reach all areas of a website through a browseable interface. (Generally, this is when users are unable to reach certain pages or regions of a site by following links). For example, any site where certain pages are only accessible via a search form would benefit from creating a Sitemap and submitting it to search engines."
https://www.google.com/webmasters/tools/docs/en/protocol.html
The above means, of course, that sitemaps can have an effect on crawling. Many sites have pages reached only via Javascript menus, and in the absence of an external link to such pages a sitemap can be the only way to make a search engine aware of them. Most major search engines now support XML sitemaps and will discover them via robots.txt - so once this sytem is set up, one sitemap change is enough.
As you might imagine, isham research has a sitemap generator that addresses these and other issues.
or call 07833 654800 Back to the Web Site Services Page