Jan 10, 2012

Cleaning and optimising cache tables in TYPO3

There is an important issue with cache tables in TYPO3, which I am going to address in this article. The issue can become critical for projects with a limited disk space, projects with lots of user groups, which use indexed search or Solr, or projects with lots of pages.

Cache tables are designed to provide a high-performance access for simultaneous reads and writes. This means that several reads at a time should be possible as well as several writes and none should block each other. This is possible when cache tables use InnoDB MySQL database engine. InnoDB provides concurrency and data safety against sudden failures. TYPO3 uses InnoDB for several critical tables, including cache tables.

Sometimes cache tables can grow a lot. The more records exist in the cache table, the slower it is to find a necessary record there. TYPO3 provides a special scheduler task to clean up old cache records. If you have TYPO3 4.6 or newer, you should always use that task.

However, there is a certain fact that very few users know: running the task will remove entries from the table but it will not recover the disk space taken by removed records. InnoDB does not reuse the space when records are deleted. It always appends new data to the end of the allocated data file. Thus it is possible to have a fully empty InnoDB table, which takes 100GB on the disk.


I must emphasize that it is not a TYPO3 bug or limitation. This is how MySQL works. InnoDB creators decided to make that for performance and safety reasons.

There is another issue. InnoDB data can grow large and they can become fragmented. This means that disk heads will have to travel here and there all the time to find records. This can slow down InnoDB, especially when multiple tables are used in the query.

How to fix these issues? The obvious solution would be to remove unused space from InnoDB data files and defragment the data file. Is that possible? No, it is not possible with standard InnoDB tools. However, you can make a workaround.

The workaround is this:
  • temporarily close the server for maintenance
  • convert all tables to MyISAM
  • stop MySQL
  • set the option named innodb_file_per_table
  • start MySQL
  • convert tables back to InnoDb as necessary
This places each table to a separate file. Now you need to set up a cron job for each table you want to clean and defragment. The job should do the following:
  • issue CREATE TABLE mytable_copy LIKE mytable query
  • drop the original table (a corresponding large and fragmented InnoDB file is deleted at this moment)
  • rename the new table to the original table
If you do not want to delete the content, you can:
  • use ALTER TABLE to change table engine type from InnoDB to MySAIM
  • use ALTER TABLE to change table engine type back to InnoDB
  • repeat the previous step once more to ensure that table is sorted by primary key (do not skip this!)
While multiple tables require more file handles, those are cheap in modern Linux distributions. Also cache works well on the system level, so you will not get big performance issues comparing to a single huge file. However the clean up and optimisation procedures will definitely improve the performance of your TYPO3 cache tables.

11 comments:

  1. just as followup:
    -------
    Cache tables are designed to provide a high-performance access for simultaneous reads and writes.
    -------

    this is not entirely true as those tables should e.g. also used for expensive things like complex queries.

    ReplyDelete
  2. Cache tables? For complex queries? Could you provide an example? :)

    ReplyDelete
  3. cache table & complex queries : syntax error. Or maybe du you mean storing some complex queries in cache ?

    ReplyDelete
  4. Well, as far as i know deleted records from db are just marked as deleted. That's why the total disk-space could preserve so big. Why just not add "OPTIMIZE TABLE `table_name`" statement after table clean up? It will physically remove "deleted" records and wipe free space, wouldn't it?

    ReplyDelete
  5. You can just dump the database, remove the old (or clean it), create a new one, with the same parameters and load dump ...
    Tested, it works.

    ReplyDelete
  6. With a 10GB database (binary size) that would take many hours :) And all that only to fix only 2 tables.

    ReplyDelete
  7. You can limit the max size of the data-files, using the max atrribut while setting innodb_data_file_path. You should choose a size larger than necessary for the "real" data, then innodb will use the space used by deleted entries.

    optimize table - the manual says:
    For InnoDB tables, OPTIMIZE TABLE is mapped to ALTER TABLE, which rebuilds the table to update index statistics and free unused space in the clustered index. Beginning with MySQL 5.1.27, this is displayed in the output of OPTIMIZE TABLE when you run it on an InnoDB table.
    http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html

    As i understand it, optimize table does, what you want to do with your cronjob.

    Tobias

    ReplyDelete
  8. No, it is not. I do not have much time to argue right now, sorry. You can search for more information at Percona's blog, for example.

    ReplyDelete
  9. This is very helpful although I do not have the skills to do what I read is required. Is there someone who can affordable do this and repair the database(s) as you mentioned above? I'm going to be in trouble soon with some of my sites. Any suggestions for trustworthy help on this?

    ReplyDelete
  10. OT:
    About Percona and Typo3, do you know if there is some problem running Typo3 using Percona XtraDB Cluster 5.5?
    It uses replication only with InnoDB and DELETE operations are unsupported on tables without primary key. If I change all engines from MyISAM to InnoDB, add a primary key where needed, does typo3 4.5 works?
    Thanks

    Simone

    ReplyDelete