Jul 15, 2008

Why subtituteMarkerArrayCached is bad

In the previous article we talked about TYPO3 template functions. I mentioned that substituteMarkerArrayCached is a function that developers should not use. In this article I am going to explain why.

As you remember there are four “substitute” functions for use with TYPO3 templates:
  • substituteMarker
    This function substitutes a single marker
  • substituteMarkerArray
    This function does the same as above but for many markers in the array
  • substituteSubpart
    Substitutes a single subpart
  • substituteMarkerArrayCached
    Our today's case.
The first two functions substitute marker and marker array. The third substitutes template subpart. The obvious missing function is the one to substitute subpart array.substituteMarkerArrayCached takes several arguments, three first are most interesting to us. It takes template, marker array and subpart array. It means that this function works almost like the function we just found missing. In other words, it can substitute subpart array:

$content = $this->cObj->substituteMarkerArrayCached($template, array(), $subPartArray);

But there is a catch.

The catch is in the “Cached” word in the function name. This function caches its results and tries to reuse them next time. Caching happens in the cache_hash table.

It does not look bad from the first glance. It may even speed up the web site.

Does it really?

Imagine a web site with articles. Let's say 10000 articles. Each article has comments. Suppose that average number of comments is 10 per article. So, if comments use substituteMarkerArrayCached, it would lead to 100000 records in cache_hash table. Now imagine of each comment requires two calls to substituteMarkerArrayCached...

You can say: “Big deal! Disks are large and cheap now!”. Yes, they are. But it is not a disk space. It is MySQL who will suffer.

MySQL works very well with indexes. If it can locate a record using index, the speed can be fantastic. Autoincrement fields make a good index. This is a typical case for most tables in TYPO3. But cache_hash uses MD5 value as index. MD5 values look like random data for MySQL. They do not make good database indexes. So you can be easily out of luck with this table when think about indexes.

Another problem lies in a large number of records in this table. MySQL uses indexes only if it believes that it can save time by using them. If MySQL thinks that using index will not result in much performance gain, it will revert to full table scan. Imagine ful table scan on 100000 row table. Wouldn't non-cached version of substituteMarkerArrayCached be faster for a two entries in $subPartArray?

Unless you have really huge subpart and marker array, I recommend to avoid using substituteMarkerArrayCached at all. It can make performance better for really large substitutions but not for smaller ones. PHP code often runs faster than a single database query (especially if query goes over network).

You can replace a call to substituteMarkerArrayCached with the following code:

$content = $this->cObj->substituteMarkerArray($template, $markers);
foreach ($subParts as $subPart => $subContent) {
    $content = $this->cObj->substituteSubpart($content, $subPart, $subContent);

This is how comments extension is going to work starting from the next version.

substituteMarkerArrayCached is bad because it stores results in and fetches results from cache_hash table. This causes lots of records and slower execution for a simple substitutions. So do not this function unless your substitutions contain a lot of data.

1 comment:

  1. "MD5 values look like random data for MySQL." this is true in the case of TYPO3, but MD5 is not just random data nor it's a 32 character string: it's an 128 bit integer in hexadecimal representation - but the database model of TYPO3 is sometimes very clumsy - instead of storing the MD5 hash as 128 bit integer (or in case of MySQL you need to use BINARY(16) since integer is limited to 8 bytes), cache_hash uses varchar(32) - and this is of course slow.