Thursday, October 07, 2004

Hiding Navigation Links

Preventing Search Robots from Indexing Navigation Links

I added search boxes to this site with both Yahoo and Google and found that the search results were indexed more on the navigation links than on the content of the page.  I tried a few fixes and finally settled on the following three-stepped approach that I hope best works for keeping the search indexes focused on page content (but only after messing around with Google and having my pages removed from their results...hope to be re-indexed shortly).

This site is a blog, using the blogger.com service.  I have set my archive pages to only include links to each of the blogs and not have any content.  The current post page (or index) contains the 10 most recent blogs.  However, these blogs change frequently and therefore any indexing will generally result in outdated search results.  Finally, I have selected to enable Post Pages (so each of the blogs will have it's own permanent page).  Therefore, the best pages to be indexed with searchbots are the Post Pages.  With this in mind, I took the following steps:
  1. Added this code to my template:

    <mainpage>
    <meta name="robots" content="follow,noindex">
    <meta name="revisit-after" content="15 days">
    <meta http-equiv="Pragma" content="no-cache">
    <meta http-equiv="Cache-Control" content="no-cache">
    <META NAME="ROBOTS" CONTENT="NOARCHIVE">
    </mainpage>

    <itempage>
    <meta name="robots" content="follow,index">
    <meta name="revisit-after" content="45 days">
    </itempage>

    <archivepage>
    <meta name="robots" content="follow,noindex">
    <meta name="revisit-after" content="60 days">
    <meta http-equiv="Pragma" content="no-cache">
    <meta http-equiv="Cache-Control" content="no-cache">
    <META NAME="ROBOTS" CONTENT="NOARCHIVE">
    </archivepage>


    This prevents the index and archive pages from being indexed, but still instructs the robots to follow the links contained on those pages.
        
  2. Created a robots.txt file with the following information:

  3. User-agent: *
    Disallow: /files/
    Disallow: /vmj/victor/januario/?q=node/feed


    This prevents my utility files and Atom/RSS Feeds from being indexed.  Both provided misleading search results. The ?q=node/feed file was especially misleading because it was neither readable by most browsers, but was also frequently modified (to reflect the index page)
       
  4. Removed the recent posts list from my Blog.  The recent post list was useless in my index page because each of the recent posts were already displayed on that page.  This list was also useless on my archive pages.  The only pages in which the recent post pages were moderately useful were the Post Pages.  This would allow a reader to navigate to the posts that proceeded the one displayed.  However, since this blog contains posts of very diverse topics, I did not think that following the posts chronologically was the best method of viewing this blog.

    I opted to replace the recent post list with a JavaScript pull-down jump menu.  The JavaScript is located in a linked file in my /files/ directory.  This facilitates updating the links and prevents the links from appearing in the post pages (thereby preventing the links from being indexed).
       
Thus far, this approach seems to be working well to prevent search engine robots (or spiders) from indexing my navigation links.  It has also made it easier to view posts by category rather than chronologically.  The only two drawbacks are that, first I have to manually maintain the navigation links to the individual post pages.  Second, users with JavaScript disabled will not be able to use the pull-down jump menu (although the site is still navigable by following the links to the archive pages).  A possible drawback that I have not yet checked for is that search engines may not follow the instructions in the META tags to 'follow' the links.  If that's the case, the site will not be indexed at all.  Needless to say, if that's the case, I'll rework this solution and update this page once I find out.
  
  

 

 

This site has moved...please update your links.