Cure for FAIL when using RedirectMatch with clean URL's on Drupal

Date: Tue Sep 15 2009 Apache Tricks »»»» Drupal Tutorial
I've had a problem with use of RedirectMatch on some of my sites which causes a FAIL in combination with Drupal's reliance on mod_rewrite to provide clean URL's. Over the years I've used different technologies to build sites and on occasion have converted a site built with static web pages into one driven with Drupal. This has meant a web of .htaccess files listing redirects for the old URL into a new URL. The goal is to help your readers continue to access the pages you wrote even when the location (URL) of the page changes. To do so you install a Redirect or RedirectMatch directive in an Apache configuration file (.htaccess) that causes an HTTP redirect that informs the web browser the request needs to go over to some other URL. Additionally the HTTP redirect informs search engines (aka Google) of the new URL and that the page rank juice belongs elsewhere. Finally if any page out there links to the old URL, traffic coming through that link should get redirected to the new URL rather than presenting people with an error page.

The followng is a couple typical examples (see: Apache's documentation on RedirectMatch)


RedirectMatch permanent ^index.*.html /
RedirectMatch permanent ^about.html /about
RedirectMatch permanent path/to/some/article.html http://example.com/node/7865

Unfortunately this doesn't work very well when it's a Drupal site. I believe the problem lies with the core FAIL in that Drupal (by default) uses UGLY URL's and uses Apache's rewrite module as a stupid hack to get around their inability to support clean URL's out of the box. That step in the Drupal installation where it checks for a capability to support clean URL's? Why do they have to check for that? Why can't they just do clean URL's from the getgo without having to do weirdo unnatural things with mod_rewrite?

ScenarioURL
Default Drupal URLhttp://example.com/index.php?q=node/7865
Desired URLhttp://example.com/user-name/article-title
URL when using RedirectMatch and mod_rewrite based Drupal Clean URL'shttp://example.com/user-name/article-title?q=path/to/original/url

The last row of that table is the subject of this tutorial. I've got several forms of the behavior in my sites, and have a cure for one of them.

ScenarioResult
Redirect old URL to new URL within the same siteFix known
Redirect old URL to new URL on a different Drupal siteFix not known, FAIL

Consider


RedirectMatch permanent ^about.html /about
RedirectMatch permanent node/1234 /node/4321

The first is a case of converting a site built with static HTML pages into one built with Drupal. A similar scenario (the second) is if you've written a page on a Drupal site (http://example.com/node/1234) and no longer want that page to exist, have significant traffic going to the old page, and have a replacement page with better content.

The result will be

User visits URLBrowser redirects to (with FAILURE)
http://example.com/about.htmlhttp://example.com/about?q=about.html -- plus, the user is told it's an unknown page
http://example.com/node/1234http://example.com/node/4321?q=node/1234 -- plus, the user is told it's an unknown page

I found a solution which works at: http://drupal.org/node/142947 (note it dates from 2007) where the suggestion is to use something like this before the mod_rewrite section of Drupal's .htaccess file:


RewriteRule ^keywords/(.*)\.html$ /tags/$1.html [R=301,L]

This works for the following cases within the same site:


RewriteRule ^index.*.html /  [R=301,L]
RewriteRule ^about.html /about  [R=301,L]

But not for the following case that redirects the browser to a different (Drupal) site:


RewriteRule path/to/some/article.html http://example.com/node/7865   [R=301,L]

And of course because you've modified one of Drupal's core files you have to remember to not overwrite .htaccess when you update Drupal core (which occurs about every 2-3 months or so depending on the rate of security patches).

I should also discuss the Path Redirect module. It performs a similar service to the above Apache directives, but as a Drupal module. It's useful and probably does the above redirects without the failure modes above. However what made me not use it is that one enters the URL's one at a time in the admin form, and there is no ability to supply a pattern match on the URL. It depends on the number of URL's you have on hand to redirect. If you have only a few URL's to redirect then path_redirect will do the job fine. In my case I had hundreds of URL's and the RedirectMatch redirective reduced the number of mappings to a few dozen patterns.