Excerpted from Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO

URL Rewriting Using ISAPI_Rewrite

by Cristian Darie and Jaimie Sirovich

Click me!” If the ideal URL could speak, its speech would resemble the communication of an experienced salesman. It would grab your attention with relevant keywords and a call to action; and it would persuasively argue that one should choose it instead of the other one. Other URLs on the page would pale in comparison.

URLs are more visible than many realize, and a contributing factor in CTR. They are often cited directly in copy, and they occupy approximately 20% of the real estate in a given search engine result page. Apart from “looking enticing” to humans, URLs must be friendly to search engines. URLs function as the “addresses” of all content in a web site. If confused by them, a search engine spider may not reach some of your content in the first place. This would clearly reduce search engine friendliness.

So let’s enumerate all of the benefits of placing keywords in URLs:

           1.        Doing so has a small beneficial effect on search engine ranking in and of itself.

           2.        The URL is roughly 20% of the real estate you get in a SERP result. It functions as a call to action and increases perceived relevance.

           3.        The URL appears in the status bar of a browser when the mouse hovers over anchor text that references it. Again—it functions as a call to action and increases perceived relevance.

           4.        Keyword-based URLs tend to be easier to remember than ?ProductID=5&CategoryID=2.

           5.        Query keywords, including those in the URL, are highlighted in search result pages.

           6.        Often, the URL is cited as the actual anchor text, that is:

 <a href="http://www.example.com/foo.html">http://www.example.com/foo.html</a>

                     Obviously, a user is more likely to click a link to a URL that contains relevant keywords, than a link that does not. Also, because keywords in anchor text are a decisive ranking factor, having keywords in the URL-anchor-text will help you rank better for “foos.”

To sum up these benefits in one phrase:

Keyword-rich URLs are more aesthetically pleasing and more visible, and are likely to enhance your CTR and search engine rankings.

Implementing URL Rewriting

The hurdle we must overcome to support keyword-rich URLs like those shown earlier is that they don’t actually exist anywhere in your web site. Your site still contains a script—named, say, Product.aspx—which expects to receive parameters through the query string and generate content depending on those parameters. This script would be ready to handle a request such as this:

http://www.example.com/Product.aspx?ProductID=123

but your web server would normally generate a 404 error if you tried any of the following:

http://www.example.com/Products/123.html

http://www.example.com/my-super-product.html

URL rewriting allows you to transform the URL of such an incoming request (which we’ll call the original URL) to a different, existing URL (which we’ll call the rewritten URL), according to a defined set of rules. You could use URL rewriting to transform the previous nonexistent URLs to Product.aspx?ProductID=123, which does exist.

If you happen to have some experience with the Apache web server, you probably know that it ships by default with the mod_rewrite module, which is the standard way to implement URL rewriting in the LAMP (Linux/Apache/MySQL/PHP) world. That is covered in the PHP edition of this book.

Unfortunately, IIS doesn’t ship by default with such a module. IIS 7 contains a number of new features that make URL rewriting easier, but it will take a while until all existing IIS 5 and 6 web servers will be upgraded. Third-party URL-rewriting modules for IIS 5 and 6 do exist, and also several URL-rewriting libraries, hacks, and techniques, and each of them can (or cannot) be used depending on your version and configuration of IIS, and the version of ASP.NET. In this chapter we try to cover the most relevant scenarios by providing practical solutions.

To understand why an apparently easy problem—that of implementing URL rewriting—can become so problematic, you first need to understand how the process really works. To implement URL rewriting, there are three steps:

           1.        Intercept the incoming request. When implementing URL rewriting, it’s obvious that you need to intercept the incoming request, which usually points to a resource that doesn’t exist on your server physically. This task is not trivial when your web site is hosted on IIS 6 and older. There are different ways to implement URL rewriting depending on the version of IIS you use (IIS 7 brings some additional features over IIS 5/6), and depending on whether you implement rewriting using an IIS extension, or from within your ASP.NET application (using C# or VB.NET code). In this latter case, usually IIS still needs to be configured to pass the requests we need to rewrite to the ASP.NET engine, which doesn’t usually happen by default.

           2.        Associate the incoming URL with an existing URL on your server. There are various techniques you can use to calculate what URL should be loaded, depending on the incoming URL. The “real” URL usually is a dynamic URL.

           3.        Rewrite the original URL to the rewritten URL. Depending on the technique used to capture the original URL and the form of the original URL, you have various options to specify the real URL your application should execute.

The result of this process is that the user requests a URL, but a different URL actually serves the request. The rest of the article covers how to implement these steps using ISAPI_Rewrite by Helicontech. For background information on how IIS processes incoming requests, we recommend Scott Mitchell’s article “How ASP.NET Web Pages are Processed on the Web Server,” located at http://aspnet.4guysfromrolla.com/articles/011404-1.aspx.

URL Rewriting with ISAPI_Rewrite v2

Using a URL rewriting engine such as Helicon’s ISAPI_Rewrite has the following advantages over writing your own rewriting code:

*        Simple implementation. Rewriting rules are written in configuration files; you don’t need to write any supporting code.

*        Task separation. The ASP.NET application works just as if it was working with dynamic URLs. Apart from the link building functionality, the ASP.NET application doesn’t need to be aware of the URL rewriting layer of your application.

*        You can easily rewrite requests for resources that are not processed by ASP.NET by default, such as those for image files, for example.

To process incoming requests, IIS works with ISAPI extensions, which are code libraries that process the incoming requests. IIS chooses the appropriate ISAPI extension to process a certain request depending on the extension of the requested file. For example, an ASP.NET-enabled IIS machine will redirect ASP.NET-specific requests (which are those for .aspx files, .ashx files, and so on), to the ASP.NET ISAPI extension, which is a file named aspnet_isapi.dll.

Figure 3-3 describes how an ISAPI_Rewrite fits into the picture. Its role is to rewrite the URL of the incoming requests, but doesn’t affect the output of the ASP.NET script in any way.

At first sight, the rewriting rules can be added easily to an existing web site, but in practice there are other issues to take into consideration. For example, you’d also need to modify the existing links within the web site content. This is covered in Chapter 4 of Professional Search Engine Optimization with ASP.NET: A Developer’s Guide to SEO.

Figure 3-3 

ISAPI_Rewrite allows the programmer to easily declare a set of rules that are applied by IIS on-the-fly to map incoming URLs requested by the visitor to dynamic query strings sent to various ASP.NET pages. As far as a search engine spider is concerned, the URLs are static.

The following few pages demonstrate URL rewriting functionality by using Helicon’s ISAPI_Rewrite filter. You can find its official documentation at http://www.isapirewrite.com/docs/. Ionic’s ISAPI rewriting module has similar functionality.

In the first exercise we’ll create a simple rewrite rule that translates my-super-product.html to Product.aspx?ProductID=123. This is the exact scenario that was presented in Figure 3-3.

The Product.aspx Web Form is designed to simulate a real product page. The script receives a query string parameter named ProductID, and generates a very simple output message based on the value of this parameter. Figure 3-4 shows the sample output that you’ll get by loading http://seoasp/Product.aspx?ProductID=3.

Figure 3-4

In order to improve search engine friendliness, we want to be able to access the same page through a static URL: http://seoasp/my-super-product.html. To implement this feature, we’ll use—you guessed it!—URL rewriting, using Helicon’s ISAPI_Rewrite.

As you know, what ISAPI_Rewrite basically does is to translate an input string (the URL typed by your visitor) to another string (a URL that can be processed by your ASP.NET code). In this exercise we’ll make it rewrite my-super-product.html to Product.aspx?ProductID=123.

This article covers ISAPI_Rewrite version 2. At the moment of writing, ISAPI_Rewrite 3.0 is in beta testing. The new version comes with an updated syntax for the configuration files and rewriting rules, which is compatible to that of the Apache mod_rewrite module, which is the standard rewriting engine in the Apache world. Please visit Cristian’s web page dedicated to this book, http://www.cristiandarie.ro/seo-asp/, for updates and additional information regarding the following exercises.

Exercise: Using Helicon’s ISAPI_Rewrite

           1.        The first step is to install ISAPI_Rewrite. Navigate to http://www.helicontech.com/download.htmand download ISAPI_Rewrite Lite (freeware). The file name should be something like isapi_rwl_x86.msi. At the time of writing, the full (not freeware) version of the product comes in a different package if you’re using Windows Vista and IIS 7, but the freeware edition is the same for all platforms.

           2.        Execute the MSI file you just downloaded, and install the application using the default options all the way through.

If you run into trouble, you should visit the Installation section of the product’s manual, at http://www.isapirewrite.com/docs/#install. If you run Windows Vista, you need certain IIS modules to be installed in order for ISAPI_Rewrite to function. If you configured IIS as shown in Chapter 1 of the book Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO, you already have everything you need, and the installation of ISAPI_Rewrite should run smoothly.

           3.        Make sure your IIS web server is running and open the http://seoasp/ web site using Visual Web Developer. (Code samples for this demo site are available from Wrox at http://www.wrox.com/WileyCDA/WroxTitle/productCd-0470131470,descCd-download_code.html.)

           4.        Create a new Web Form named Product.aspx in your project, with no code-behind file or Master Page. Then modify the generated code as shown in the following code snippet. (Remember that you can have Visual Web Developer generate the Page_Load signature for you by switching to Design view, and double-clicking an empty area of the page or using the Properties window.)

<%@ Page Language="C#" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

 

<script runat="server">

  protected void Page_Load(object sender, EventArgs e)

  {

    // retrieve the product ID from the query string

    string productId = Request.QueryString["ProductID"];

   

    // use productId to customize page contents

    if (productId != null)

    {

      // set the page title

      this.Title += ": Product " + productId;

     

      // display product details

      message.Text =     

        String.Format("You selected product #{0}. Good choice!", productId);

    }

    else

    {

      // display product details

      message.Text = "Please select a product from our catalog.";

    }

   

  }

</script>

 

<html xmlns="http://www.w3.org/1999/xhtml" >

<head runat="server">

  <title>ASP.NET SEO Shop</title>

</head>

<body>

  <form id="form1" runat="server">

    <asp:Literal runat="server" ID="message" />

  </form>

</body>

</html>

           5.        Test your Web Form by loading http://seoasp/Product.aspx?ProductID=3. The result should resemble Figure 3-4.

           6.        Let’s now write the rewriting rule. Open the Program Files/Helicon/ISAPI_Rewrite/httpd.ini file (you can find a shortcut to this file in Programs), and add the following highlighted lines to the file. Note the file is read-only by default. If you use Notepad to edit it, you’ll need to make it writable first.

[ISAPI_Rewrite]

 

# Translate /my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

           7.        Switch back to your browser again, and this time load http://seoasp/my-super-product.html. If everything works as it should, you should get the output that’s shown in Figure 3-5.

Figure 3-5 

Congratulations! You’ve just written your first rewrite rule using Helicon’s ISAPI_Rewrite. The free edition of this product only allows server-wide rewriting rules, whereas the commercial edition would allow you to use an application-specific httpd.ini configuration file, located in the root of your web site. However, this limitation shouldn’t affect your learning process.

The exercise you’ve just finished features a very simplistic scenario, without much practical value—at least compared with what you’ll learn next! Its purpose was to install ISAPI_Rewrite, and to ensure your working environment is correctly configured.

You started by creating a very simple ASP.NET Web Form that takes a numeric parameter from the query string. You could imagine this is a more involved page that displays lots of details about the product with the ID mentioned by the ProductID query string parameter, but in our case we’re simply displaying a text message that confirms the ID has been correctly read from the query string.

Product.aspx is indeed very simple! It starts by reading the product ID value:

  protected void Page_Load(object sender, EventArgs e)

  {

    // retrieve the product ID from the query string

    string productId = Request.QueryString["ProductID"];

Next, we verify if the value we just read is null. If that is the case, then ProductID doesn’t exist as a query string parameter. Otherwise, we display a simple text message, and update the page title, to confirm that ProductID was correctly read:

    // use productId to customize page contents

    if (productId != null)

    {

      // set the page title

      this.Title += ": Product " + productId;

     

      // display product details

      message.Text =     

        String.Format("You selected product #{0}. Good choice!", productId);     

    }

    else

    {

      // display product details

      message.Text = "Please select a product from our catalog.";

    }

URL Rewriting and ISAPI_Rewrite

As Figure 3-3 describes, the Product.aspx page is accessed after the original URL has been rewritten. This explains why Request.QueryString["ProductID"] reads the value of ProductID from the rewritten version of the URL. This is helpful, because the script works fine no matter if you accessed Product.aspx directly, or if the initial request was for another URL that was rewritten to Product.aspx.

The Request.QueryString collection, as well as the other values you can read through the Request object, work with the rewritten URL. For example, when requesting my-super-product.html in the context of our exercise, Request.RawUrl will return /Product.aspx?ProductID=123.

The rewriting engine allows you to retrieve the originally requested URL by saving its value to a server variable named HTTP_X_REWRITE_URL. You can read this value through Request.ServerVariables["HTTP_X_REWRITE_URL"].This is helpful whenever you need to know what was the original request initiated by the client.

The Request class offers complete details about the current request. The following table describes the most commonly used Request members. You should visit the documentation for the complete list, or use IntelliSense in Visual Web Developer to quickly access the class members.

Server Variable

Description

Request.RawURL

Returns a string representing the URL of the request excluding the domain name, such as /Product.aspx?ID=123. When URL rewriting is involved, RawURL returns the rewritten URL.

Request.Url

Similar to Request.RawURL, except the return value is a Uri object, which also contains data about the request domain.

Request.PhysicalPath

Returns a string representing the physical path of the requested file, such as C:\seoasp\Product.aspx.

Request.QueryString

Returns a NameValueCollection object that contains the query string parameters of the request. You can use this object’s indexer to access its values by name or by index, such as in Request.QueryString[0] or Request.QueryString[ProductID].

Request.Cookies

Returns a NameValueCollection object containing the client’s cookies.

Request.Headers

Returns a NameValueCollection object containing the request headers.

Request.ServerVariables

Returns a NameValueCollection object containing IIS variables.

Request.ServerVariables[HTTP_X_REWRITE_URL]

Returns a string representing the originally requested URL, when the URL is rewritten by Helicon’s ISAPI_Rewrite or IIRF (Ionic ISAPI Rewrite).



After testing that Product.aspx works when accessed using its physical name (http://seoasp/Product.aspx?ProductID=123), we moved on to access this same script, but through a URL that doesn’t physically exist on your server. We implemented this feature using Helicon’s ISAPI_Rewrite.

As previously stated, the free version of Helicon’s ISAPI_Rewrite only supports server-wide rewriting rules, which are stored in a file named httpd.ini in the product’s installation folder (\Program Files\Helicon\ISAPI_Rewrite). This file has a section named [ISAPI_Rewrite], usually at the beginning of the file, which can contain URL rewriting rules.

We added a single rule to the file, which translates requests to /my-super-product.html to /Product.aspx?ProductID=123. The line that precedes the RewriteRule line is a comment; comments are marked using the # character at the beginning of the line, and are ignored by the parser:

# Translate my-super.product.html to /Product.aspx?ProductID=123

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123

In its basic form, RewriteRule takes two parameters. The first parameter describes the original URL that needs to be rewritten, and the second specifies what is should be rewritten to. The pattern that describes the form of the original URL is delimited by ^ and $, which mark the beginning and the end of the matched URL. The pattern is written using regular expressions, which you learn about in the next exercise.

In case you were wondering why the .html extension in the rewrite rule has been written as \.html, we will explain it now. In regular expressions—the programming language used to describe the original URL that needs to be rewritten—the dot is a character that has a special significance. If you want that dot to be read as a literal dot, you need to escape it using the backslash character. As you’ll learn, this is a general rule with regular expressions: when special characters need to be read literally, they need to be escaped with the backslash character (which is a special character in turn—so if you wanted to use a backslash, it would be denoted as \\).

At the end of a rewrite rule you can also add one or more flag arguments, which affect the rewriting behavior. For example, the [L] flag, demonstrated in the following example, specifies that when a match is found the rewrite should be performed immediately, without processing any further RewriteRule entries:

RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123 [L]

These arguments are specific to the RewriteRule command, and not to regular expressions in general. Table 3-1 lists the possible RewriteRule arguments. The rewrite flags must always be placed in square brackets at the end of an individual rule.

Table 3-1

RewriteRule Option

Significance

Description

I

Ignore case

The regular expression of the RewriteRule and any corresponding RewriteCond directives is performed using case-insensitive matching.

F

Forbidden

In case the RewriteRule regular expression matches, the web server returns a 404 Not Found response, regardless of the format string (second parameter of RewriteRule) specified. Read Chapter 4 for more details about the HTTP status codes.

L

Last rule

If a match is found, stop processing further rules.

N

Next iteration

Restarts processing the set of rules from the beginning, but using the current rewritten URL. The number of restarts is limited by the value specified with the RepeatLimit directive.

NS

Next iteration of the same rule

Restarts processing the rule, using the rewritten URL. The number of restarts is limited by the value specified with the RepeatLimit directive, and is calculated independently of the number of restarts counted for the N directive.

P

Proxy

Immediately passes the rewritten URL to the ISAPI extension that handles proxy requests. The new URL must be a complete URL that includes the protocol, domain name, and so on.

R

Redirect

Sends a 302 redirect status code to the client pointing to the new URL, instead of rewriting the URL. This is always the last rule, even if the L flag is not specified.

RP

Permanent redirect

The same as R, except the 301 status code is used instead.

U

Unmangle log

Log the new URL as it was the originally requested URL.

O

Normalize

Normalize the URL before processing by removing illegal characters, and so on, and also deletes the query string.

CL

Lowercase

Changes the rewritten URL to lowercase.

CU

Uppercase

Changes the rewritten URL to uppercase.

 

Also, you should know that although RewriteRule is arguably the most important directive that you can use for URL rewriting with Helicon’s ISAPI_Rewrite, it is not the only one. Table 3-2 quickly describes a few other directives. Please visit the product’s documentation for a complete reference.

Table 3-2

Directive

Description

RewriteRule

This is the directive that allows for URL rewriting.

RewriteHeader

A generic version of RewriteRule that can rewrite any HTTP headers of the request. RewriteHeader URL is the same as RewriteRule.

RewriteProxy

Similar to RewriteRule, except it forces the result URL to be passed to the ISAPI extension that handles proxy requests.

RewriteCond

Allows defining one or more conditions (when more RewriteCond entries are used) that must be met before the following RewriteRule, RewriteHeader, or RewriteProxy directive is processed.

 

Introducing Regular Expressions

Before you can implement any really useful rewrite