Duplicate content on Drupal sites

Date: Wed Dec 31 2008 Drupal
The search engines downgrade sites which appear to have duplicated content. This is often the sign of a spam site that's attempting to bury search engines with lots of pages. Unfortunately some features of Drupal make it look like duplicate content. For example if you've turned on clean URL's and turned on the pathauto module for SEO purposes you have a nice user-friendly URL for each page but at the same time Drupal still shows the page at the "node/12345" URL. Drupal will also, while being helpful to the user, interpret "example.com/friendly/url" and "example.com/friendly/url/" as the same page and show the same content, but the search engines will see those as two separate pages with duplicate content, because of the trailing slash.

Dealing With Duplicate Content in Drupal: My Approach discusses this problem ...

Global Redirect takes care of duplicates between the "node/12345" and "friendly/url/" URL's. If the node/12345 URL is requested for a page which has a URL Alias, global redirect causes a 301 redirect to the Alias. This notifies the search engines that the aliased URL is the correct one and to ignore the node/12345 URL. However browsing global redirect discussions on drupal.org I see many instances of incompatibility with other modules, but I also see some discussion of moving global redirect into Drupal Core. I believe it should be in Core but only if the incompatibilities can be resolved.

A common source of seemingly duplicate pages are the pages generated by Drupal which are spanned across multiple pages. The URL differs on each page by adding "?page=##" to the URL and often the page title is the same on each page. Looking in my Google Webmasters Tools account, pages with Duplicate title tags are reported in the Content Analysis section. Among the supposed duplicate pages are for "/blog?page=##" and "/tracker?page=##" and similar pages.

This discussion shows robots.txt configuration tricks to reduce duplicate pages.