Mar 25, 2009

Redirects: TYPO3 and RealURL vs mod_rewrite

Very few people know how to optimize TYPO3 web sites for performance. There is one common mistake that almost everyone does in regard to TYPO3 performance. The mistake is to let TYPO3 handle redirects.

Why are TYPO3 redirects bad?

Let's see two typical cases when redirects happen. The first one is domain redirects, the second one is a redirect using RealURL.

Domain redirects

Domain redirects happen when there are several domains on the same page and all domains except one (primary) should redirect to the primary domain. This is usually done for SEO purposes.
How is it usually handled? The administrator creates a domain record for the primary domain. Next he creates domain records for all other domains and specifies the address where to redirect them (to the primary domain).
How do this work? Let's trace the chain of actions involved into handling HTTP request for a non–primary domain:

  1. Apache receives a request
  2. (if RealURL is enabled) Apache uses mod_rewrite to rewrite the request to index.php
  3. PHP invokes index.php from TYPO3
  4. The whole TYPO3 framework initializes
  5. (if RealURL is enabled) RealURL resolves a page id
  6. TYPO3 finds that it should redirect
  7. TYPO3 sets corresponding HTTP headers to PHP
  8. PHP returns headers to Apache
  9. Apache sends redirect reply to the browser
This is an approximation, of course. But there are at least these 9 steps involved. Steps 2 to 8 are useless waste of server resources. We know that domain always must be redirected. Thus invoking the whole TYPO3 for this task is a waste of resources.

RealURL redirects

Let's see what happens when RealURL redirects happen:
  1. Apache receives a request
  2. Apache uses mod_rewrite to rewrite the request to index.php
  3. PHP invokes index.php from TYPO3
  4. The whole TYPO3 framework initializes
  5. RealURL finds that it should redirect to another location
  6. RealURL sets corresponding HTTP headers to PHP
  7. PHP returns headers to Apache
  8. Apache sends redirect reply to the browser
Here we have 8 steps but it is not much better. Heavy TYPO3 is still initialized, MySQL receives queries, RealURL works and all this happens just to perform a well–known static action: redirect from one static location to another static location. It is like going to the bookshelf, taking the dictionary and searching for a known word every time when you encounter it in the text. Waste of resources.

Proper redirects

Most redirects are static. So they do not have to go through TYPO3 at all. We can use mod_rewrite to efficiently do all rewrites. Let's see how to do it properly!

Proper domain redirects

To redirect from one domain to another we can use this rewrite rule:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example.com$
RewriteRule (.*) http://www.example.com/$1 [L,R=301]
This rule redirects from example.com to www.example.com and keeps the request URI the same. There are only three steps involved in this action:
  1. Apache receives a request
  2. Apache uses mod_rewrite to rewrite the request
  3. Apache sends redirect reply to the browser
No PHP, no TYPO3, no MySQL! Fast and efficient.

Proper path redirects

To redirect from one path to another we can use this rewrite rule:
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/path/to/old/page/$
RewriteRule .* /path/to/new/page/ [L,R=301]
This rule results in the same three steps as the previous one.

Final notes

This part contain minor notes on the topic.

To slash or not to slash

When using mod_rewrite rules, it is important to check the slash in the beginning of the URL. If the rule is in the .htaccess or inside the <Directory> section of Apache configuration, there must be no forwarding slash. Both .htaccess and <Directory> are directory–related. When using mod_rewrite directives directly inside <VirtualHost>, the beginning slash is necessary because the path there is not relative.

Where to place the rules?

The best place for mod_rewrite rules is Apache configuration files. Modern systems are fast, .htaccess files are cached in memory, so there is no major decrease in performance when using .htaccess. It was a problem years ago but not now. However it is not a reason to spend even extra millisecond on parsing .htaccess. Placing mod_rewrite rules to Apache configuration files is better because saves parsing time for each request.

Why does RealURL have redirects at all?

Well, it is a long story. The feature is historical. It is convenient. And it was made when people did not realy care about performance. Now people do. So do not use RealURL redirects. You can do better with mod_rewrite!

13 comments:

  1. Thanks for this clear explanation. I found out that if you have parameters in the URL (like ?param1=value) you should use: %{QUERY_STRING}



    I had issues with redirecting:

    http://www.domain.com/public/?act=latestnews to

    http://www.domain.com/latest-news/



    The following lines did the trick. I'm forced to use .htaccess since I don't assume I've got access to the virtualhost configuration:



    RewriteCond %{QUERY_STRING} ^act=latestnews$

    RewriteRule .* /latest-news/? [R=301,L]

    ReplyDelete
  2. Thank you Dmitry to make that clear.



    (I'll get myself a shirt saying "I caused Dmitry to post in his blog")

    ReplyDelete
  3. For domain redirects, I usually change this directly in the Apache configuration if I have access to it. This avoids the use of mod_rewrite, hopefully saving even more resources:



    Redirect 301 / http://www.other-domain.de

    ReplyDelete
  4. I have an extension I developed a while ago to manage the creation of Apache configuration files containing redirects. It basically just creates a file that contains a bunch of lines like:



    Redirect 302 /babla http://www.mysite.com/the/correct/url/

    RedirectMatch 302 ^/blafda/(.*)$ http://www.mysite.com/blafbla/



    Then you just include that conf file in your virtual host block. If you ever change your domain or move a page, the extension can recreate the redirect configuration file very quickly.



    Non-Apache developers can create new redirects in the List module using all the advantages of creating links in Typo3.



    I wonder if rewrite rules would be quicker than the mod_alias?



    I didn't have it create .htaccess files because they slow down Apache and would defeat the purpose.



    I suppose it would be pretty easy to create different modes, where the extension could output redirects for mod_alias, mod_rewrite, nginx, etc...



    Do you think this is an extension that should be released?







    Reading your article gave me another idea, an extension that would look for RealURL or Typo3 Domain redirects and automatically create a set of rewrite rules to handle the redirects.

    ReplyDelete
  5. NathanL, this is a great extension! I think a lot of people will like it! :) Also the idea of RealURL->Apache extension looks very good. If you release such extension, I would be happy to try it and post an article about it in this blog.



    Good work!

    ReplyDelete
  6. I think it would be great to release that extension too.



    I find it very handy you can put domain redirects in Typo3; because you have a good overview of all the domain names assigned to a site. When this is done in apache or htaccess:

    - the overview is gone

    - you need to understand more about server configuration; I think Typo3 is already technical enough and should work hard to make it more easy to use for none programmers

    - one typo3 mismatch and your live website is down; I would like to prevent those kind of possible errors.

    ReplyDelete
  7. Xavier PerseguersMay 11, 2009 at 4:56 PM

    An enhancement would be to have something like the extension of NathanL that hooks into domain creation / realurl configuration change (for instance) and automatically create the .htaccess file. This way, no problem with the lack of overview Edward is speaking about.

    ReplyDelete
  8. my redirect for www.domain.com/--anytext to www.domain.ch/ is not working

    ReplyDelete
  9. Hi all,



    Great post, thanks for all the tips.

    I'm not a web developer, but a SEO and i'm working ona website that was developed with Typo3.

    My problem : all our website is duplicated. For each URL www.mydomain.com/page.html another URL www.mydomain.com/page/ is created.



    The developers told me it was a problem caused by Typo3 (but I found out later that it was not, and that it was due to a rewriting rule on Apache).



    My question : can I use the mod-rewrite rule to Apache to redirect all the URLs ending with slash to the URLs ending in .html, without impacting the performances of the website?



    Is it easy to do set for all the URLs or does it have to be done manually for each single URL (as the developers told me)?



    Any help would be greatly appreciated!

    Thanks in advance.



    Zico

    ReplyDelete
  10. Zico,



    I suggest joining the TYPO3 English list and then submitting your question there.



    http://lists.typo3.org/cgi-bin/mailman/listinfo/typo3-english



    Michael

    ReplyDelete
  11. Dmitry,



    Has the realurl redirect performance issues been resolved?



    The concern is that reading a long .htaccess takes a while as well. It's possible to cache realurl redirect results, but not .htaccess.



    Then again, for folks with more server control, they can stick the RewriteRules into their virtul host and we're moot.



    Thoughts?



    Michael

    ReplyDelete
  12. It is not a RealURL issue. It is a problem of invoking the whole heavy TYPO3 just for redirect.



    Slow .htaccess is a myth. All file systems now have caches and that file will be in cache. It's parsing is faster than a single database query execution in TYPO3. But if you are still concerned, makes rules in the Apache config, not .htaccess.

    ReplyDelete
  13. Thanks, just what I needed also. Very fast instead of using Typo3 :-)

    ReplyDelete