Hiding Navigation Links
Preventing Search Robots from Indexing Navigation Links
I added search boxes to this site with both Yahoo and Google and found that the search results were indexed more on the navigation links than on the content of the page. I tried a few fixes and finally settled on the following three-stepped approach that I hope best works for keeping the search indexes focused on page content (but only after messing around with Google and having my pages removed from their results...hope to be re-indexed shortly).This site is a blog, using the blogger.com service. I have set my archive pages to only include links to each of the blogs and not have any content. The current post page (or index) contains the 10 most recent blogs. However, these blogs change frequently and therefore any indexing will generally result in outdated search results. Finally, I have selected to enable Post Pages (so each of the blogs will have it's own permanent page). Therefore, the best pages to be indexed with searchbots are the Post Pages. With this in mind, I took the following steps:
- Added this code to my template:
<mainpage>
<meta name="robots" content="follow,noindex">
<meta name="revisit-after" content="15 days">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Cache-Control" content="no-cache">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
</mainpage>
<itempage>
<meta name="robots" content="follow,index">
<meta name="revisit-after" content="45 days">
</itempage>
<archivepage>
<meta name="robots" content="follow,noindex">
<meta name="revisit-after" content="60 days">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Cache-Control" content="no-cache">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
</archivepage>
This prevents the index and archive pages from being indexed, but still instructs the robots to follow the links contained on those pages.
- Created a
robots.txtfile with the following information: - Removed the recent posts list from my Blog. The recent post list was useless in my index page because each of the recent posts were already displayed on that page. This list was also useless on my archive pages. The only pages in which the recent post pages were moderately useful were the Post Pages. This would allow a reader to navigate to the posts that proceeded the one displayed. However, since this blog contains posts of very diverse topics, I did not think that following the posts chronologically was the best method of viewing this blog.
I opted to replace the recent post list with a JavaScript pull-down jump menu. The JavaScript is located in a linked file in my/files/directory. This facilitates updating the links and prevents the links from appearing in the post pages (thereby preventing the links from being indexed).
User-agent: *
Disallow: /files/
Disallow: /vmj/victor/januario/?q=node/feedThis prevents my utility files and Atom/RSS Feeds from being indexed. Both provided misleading search results. The
?q=node/feed file was especially misleading because it was neither readable by most browsers, but was also frequently modified (to reflect the index page)
0 Comments:
Post a Comment
« Current Posts Page »