Many people have tested whether the TokuDB storage engine runs faster and compresses better than the default storage engine, InnoDB. I am more concerned about TokuDB's features. Or, to put it unfairly: can it do everything that InnoDB can do, in a current version?
Vadim Tkachenko of Percona supplied a binary download for a platform that I happen to have (Ubuntu 12.04), so I got MySQL 5.6 + TokuDB 7.1 up and running within 15 minutes. Then I found that I could not break anything within 60 minutes. I I conclude that "ease of use" and "stability" are okay. (For anyone who thinks I'm rushing: yes, but usually finding a bug in a new version takes less than an hour so I'm judging from experience.)
I tried compound indexes, long strings, the relatively-new TIMESTAMP(6) data type with fractional seconds, utf8mb4 varchar columns with German collation, transaction rollbacks with or without savepoints, crash recovery, and partitions.
I always got the same results as I get with InnoDB. It seems that the Tokutek folks looked at compatibility before declaring that their product is a "drop-in replacement storage engine for MySQL and MariaDB".
These statements didn't cause an error but the FOREIGN KEY clause was ignored:
CREATE TABLE tp (s1 INT, PRIMARY KEY (s1)) engine=tokudb; CREATE TABLE tf (s1 INT, FOREIGN KEY (s1) REFERENCES tp (s1)) engine=tokudb;
If I use InnoDB, the statements succeed. In basic English, TokuDB doesn't support referential integrity.
This is a big deal because foreign keys are part of "core" standard SQL. In mitigation, I have to allow that many users eschew foreign keys in this benighted sharded world, and that in theory one could replace some foreign-key functionality with triggers. But I doubt that anybody ever does.
These statements resulted in an error:
CREATE TABLE tq (s1 TEXT) engine=tokudb; CREATE FULLTEXT INDEX iq ON tq (s1);
If I use InnoDB, the statements succeed, because InnoDB (since version 5.6.4) supports full-text indexing, TokuDB doesn't.
I'm sure I could hear objections like "that's not what TokuDB is for" (true), "InnoDB's implementation is bad" (actually that's my own opinion but I'm sure somebody somewhere shares it), or "use Sphinx" (I don't understand the logic here but I heard it when I complained about a full-text deficiency in another product).
To refute all objections, I went to bugs.mysql.com, clicked "Return only bugs with status = ALL", clicked "Show = ALL", clicked "Restrict to bugs in selected categories = MySQL Server: FULLTEXT search", and clicked "Search". I got 107 hits So full text must be important to somebody, else they wouldn't report bugs.
I started two connections, both with autocommit = 0 and default transaction isolation = REPEATABLE READ. On the first connection I said
CREATE TABLE t2 (s1 INT, UNIQUE(s1)) engine=tokudb; INSERT INTO t2 VALUES (0);
On the second connection I said:
INSERT INTO t2 VALUES (1);
Result: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction.
If I use engine=innodb, I do not see this error, both INSERTs work.
I also see that there's a documented behaviour difference with SELECT FOR UPDATE.
I conclude that TokuDB's different locking methods mean that some SQL statements will have different effects in multi-user environments, if one switches from InnoDB.
A storage engine maker cannot muck with MySQL's syntax, with two exceptions: (1) this rule doesn't apply to InnoDB; (2) new variables or information_schema tables can be added that are specific to tuning or monitoring.
So I said "SHOW VARIABLES LIKE 'tokudb%'" ... and I did see some things that bothered me. For example:
() tokudb_lock_timeout seems to be something like innodb_lock_wait_timeout so it's unfortunate that they didn't call it tokudb_lock_wait_timeout
() tokudb_commit_sync seems to do some things like innodb_flush_log_at_trx_commit so it's unfortunate that they didn't use the same numbers
() tokudb_fs_reserve_percent? well, I couldn't tell if it's unfortunate or not because I couldn't find it in the TokuDB user manual.
Tokutek's FAQ says that TokuDB was tested on Centos 5 and CentOS 6. Doubtless it also works on all other Linux 64-bit distros (I downloaded for Ubuntu 12.04 and the MariaDB manual mentions Fedora). I see that Antony Curtis got it going on his Macbook.
That leaves three major items from the list of platforms that MySQL supports: FreeBSD, Solaris, and Windows. The trouble is that one of them, Windows, is what beginners or students might start with. As a result, MySQL + InnoDB or MariaDB + InnoDB will remain the default choice for introductory texts and some personal uses.
I can't see how this will change, since there's no money in the entry-level market and there's no point in a "Windows server" for Tokutek's real objective, which is to capture a piece of the must-have-performance market. But if they don't care, the mind share won't be there.
I fear that, by concentrating on TokuDB's performance advantages, people have failed to worry about losing features that are available with InnoDB. Even if you conclude that you want to use TokuDB -- especially if you conclude that you want to use TokuDB -- you should be aware of them.
Nice write up. You are correct about the lack of foreign key support in TokuDB, which I think our documentation clearly states. It would be nice if MySQL implemented foreign keys above the storage engine, but the code is within InnoDB itself and as with any engineering team, our resources are limited.
One variable difference worth commenting on. InnoDB only has 3 options for commit sync (0,1,2). In TokuDB, you can have commit sync on or off, and when on you can control how frequently it occurs (in milliseconds) with tokudb_fsync_log_period.
"InnoDB's implementation is bad" - Can you please elaborate?
Hi! As an Oracle employee, you have access to a private searchable archive of bug reports and emails. It should be easy to search for my name and the words fulltext or full-text.
I always have a problem when people want to compare two products in an "apples" to "apples" comparison. Unless the products are identical, the are always differences. If you were to compare replacing an Oracle RDBMS with an MySQL RDBMS you would end up with a huge list of what MySQL does not do. These comparisons do not answer the questions as to if the product meets your needs, or, why are you seeking to replace an existing product.
Foreign Keys while implemented in MySQL/InnoDB, they have their issues. I agree with Tim, it really needs to be a the handlerton layer. Unfortunately this was a feature being worked on pre Oracle acquisition that will never be deployed, because Oracle (in a commercial natures) has no desire to have storage engine competitors, whereas MySQL (open source) did.
Naming variables identical to a competitor is also NOT what you want to do. Infact when I deal with Oracle DBAs, the "READ COMMITTED" v "READ_COMMITTED" isolation discussion causes issues because people blindly assumed the value is the same, therefore the implementation between products is the same. In this case, it's so not the case.
Back to what really matters, is Why would you choose TokuDB? While I have never actually deployed this with a client, I see it has several strengths, especially in larger datasets that have a much larger disk/memory ratio. The compression benefits (which you do not talk about), are significantly better. The indexing strategy is specifically designed to help with complex table structure using many secondary indexes.
You will always find differences between products if you compare syntax. More importantly is do they matter to you, or do the new features and benefits outway the small differences.
I didn't say InnoDB foreign keys have no issues, I didn't say variable names should be identical, I did say that others have looked at compression.