« Snow | Main | Budget Notebook »

Sunday January 20, 2008

Sitemaps

I wanted the transition to the new website to go as smoothly as possible. In the past there were certain things you could do to help search engines categorize your page. There are simple things that are part of HTML like using a title tag, using header tags to identify important parts of the page, etc. But you can also add META tags. A description of the site is good, and the descriptions I used would usually show up in the search results under the title of my page. At the time it was good to include keywords, but those seem to be less important now and instead the search engines find their own keywords in the text of your page.

Looking into all of that again, I signed up for Google's webmaster tools service (they had me verify that I owned my site by including a specially named file on my site). Google pointed me in the direction of using a special sitemap.xml file that would tell the search engines where all of my pages are located and how often they are updated. Though Google seems to have originated this, a lot of other search engines use it too. A lot of websites have an html sitemap for visitors that shows all of the pages on their site, like a Table of Contents. But this file is xml and is intended just for the search engines. A piece of it might look like this:


   <url>
      <loc>http://igirder.com/index.html</loc>
      <lastmod>2008-01-19T07:50:56+00:00</lastmod>
      <priority>0.90</priority>
      <changefreq>weekly</changefreq>
   </url>

A record is created for each page. Google points to an open source project that runs in Python that will create a sitemap.xml file of your site as long as your server supports Python. I have no idea if my server supports that, so I looked for other options. I clicked on a Google ad for a site that would produce a free sitemap, called sitemap.xmlecho.org. You had to register using an e-mail address and then confirm that address. Then you enter the site. Sometimes you get what you pay for and the result was basically the text written above, produced for the site that I had entered. I think it was just a scam to get a valid e-mail address and they used my input to produce the text for that one page.

I did a Google search and found a site called freesitemapgenerator.com that would do it. Again I had to enter an e-mail and validate it. I also had to validate that I owned the site by uploading a specially named file to my site. It then put my site in a queue to be processed. I was number 202. Not bad, I thought. I hit refresh to see how fast it was moving through the queue. Still 202. I waited about five minutes. Still 202. I was beginning to think this was a scam too. But about 15 minutes later I had worked my way to 199. By the next morning I was down to 98. That evening I was down to 54. The following morning I was down to 2! At about 10:30 it started processing my website. A few minutes later it was still processing, with one page identified. An hour later it was up to 15 pages. A couple of hours later it was up to 33 pages and 76% complete. But I knew it hadn't found my movie reviews yet. An hour later it had found a couple of hundred pages and was down to 26% complete. When I went to bed last night it was 80% complete. I'm thinking this person must be typing all of the code in by hand or something. I don't understand why Google, with a significant portion of the world's computing power, needs a sitemap in the first place. The irony of this is that while I was waiting to create a sitemap, Google started referring people to the site anyway.

This morning the file was done (56 hours after I first submitted it). They even pointed out some errors I had in an image map on my Galapagos page (which I have fixed now). I worried that the file would be completely useless, but it actually seems like it is pretty good. No duplicates. I checked my AW stats and it seems like a lot of the movie pages were loaded 4 times each, so I'm thinking that was all the sitemap generator. Certainly it can't be working very efficiently though.

After downloading the file and unzipping it, I uploaded the sitemap, along with a robots.txt file pointing to the sitemap.xml file. This was recommended by a Wikipedia article on Google Sitemaps which also has links to the different search engines so the sitemaps can be submitted. I wound up signing up for webmaster accounts at Live and Yahoo (and uploaded a unique file for each of them). All I had to do for Ask was enter a URL with the location of my sitemap. If I am verifying site ownership anyway, it seems like Google and the others would allow me then to tell them directly that I am moving the site from one spot to another. Then I could avoid a lot of this transition hassle.

Comments (1)

I read later on that some people reported lower traffic after providing sitemaps. Since I was seeing less traffic, I decided to delete the sitemaps as well. Also, I figure if the sitemaps aren't updated regularly (automatically generated by your own server, which is the idea), the search engines probably don't like that. Lastly, if the site is easy to navigate with HTML links (as opposed to php), I kind of doubt the search engines need the help finding your pages.

Post a comment