Category Archives: MariaDB


PL/SQL in MariaDB

The Oracle 12c manual says "Oracle PL/SQL provides functionality equivalent to SQL/PSM:2011, with minor syntactic differences, such as the spelling or arrangement of keywords." Yes, and English provides functionality equivalent to Russian, except all the words are spelled differently and in different order. Seriously, since Oracle has PL/SQL and MariaDB has SQL/PSM, their stored procedures and functions and triggers and anonymous blocks -- hereinafter "code blocks" -- look different.

But that's changing.

There's a boatload of Oracle compatibility sauce in MariaDB 10.3, and part of it is PL/SQL code blocks, which is what I'll look at here. MariaDB 10.3 is not released and might not be released for a long time, but the source code is public. I downloaded it with
git clone -b bb-10.2-compatibility https://github.com/MariaDB/server.git
and followed the usual instructions for building from source. I had a bit of trouble building, and I'll make a few quibbles when I describe details. But to get to the bottom line here at the top: the feature is complete and works about as well as some betas that I've seen.

Here's a short example.

CREATE PROCEDURE p AS
  var1 NUMBER(6) := 1;
BEGIN
  WHILE var1 < 5 LOOP
    DELETE FROM t WHERE s1 = var1;
    var1 := var1 + 1;
    END LOOP;
END;

Notice some obvious differences from SQL/PSM:
there are no parentheses after p, they're optional
var1 is defined outside the BEGIN END block
WHILE ... LOOP ... END LOOP not WHILE ... DO ... END WHILE
assignment is done without SET, this too is sometimes optional.

In terms of "Oracle 12c compatibility", MariaDB is now pulling ahead of MySQL, which is bemusing when one considers that Oracle owns MySQL. MariaDB will still not be as compatible as PostgreSQL with EnterpriseDB, I suppose, but this is a boost for migrants.

First complaint: Ocelot's stored-procedure debugger, which is part of our open-source GUI (ocelotgui) for MySQL and MariaDB, can't handle the new PL/SQL. It doesn't even detect where statements end, so I've got work to do.

Second complaint: obviously Monty Widenius makes the MariaDB programmers work too hard. What happened to the halcyon days of yore when making DBMSs was one long party?

For the rest of this blog post, I will describe what the differences are between MariaDB's current PSM and MariaDB 10.3's PL/SQL.

sql_mode='oracle'

In order to enable creation of PL/SQL code blocks, say
SET sql_mode='oracle';

Before saying "this is a bad idea", I will admit that:
I don't have a better idea at the moment.

Now, with that out of the way, I can say that:
this is a bad idea.

(1) There's already a "SET sql_mode='oracle';" statement, so this is a significant behaviour change.
(2) The setting changes what statements are legal. That means that, faced with a code-block statement, it's impossible to know whether it's syntactically correct unless you know what the earlier statements were.
(3) Inevitably some things will be SQL/PSM and some things will be PL/SQL, at least in some installations. Therefore there will be lots of jumping back and forth with sql_mode settings, with special extra fun if someone makes the change to @@sql_mode globally.

In other words, this is such a bad idea that it cannot possibly survive. Therefore don't worry about it. (UPDATE: 2017-02-12: since I have no better idea, and since few current users would be affected, I backtrack -- MariaDB may as well stay on course.)

CREATE PROCEDURE

SQL/PSM                                 PL/SQL
-------                                 ------
CREATE                                  CREATE
[OR REPLACE]                            [OR REPLACE]
[DEFINER = {user|CURRENT_USER }]        [DEFINER = {user|CURRENT_USER}]
PROCEDURE sp_name                       PROCEDURE sp_name
([proc_parameter[,...]])                [([proc_parameter[,...]])]
[characteristic ...]                    [characteristic ...]
                                        {IS|AS}
body                                    body

proc_parameter:
  [IN|OUT|INOUT] name type              name [IN|OUT|INOUT] type

type:
    Any MariaDB data type               any MariaDB or Oracle data type

characteristic:
    LANGUAGE SQL                        LANGUAGE SQL
  | [NOT] DETERMINISTIC                 | [NOT] DETERMINISTIC
  | contains|reads|modifies SQL clause  | contains|reads|modifies SQL clause
  | SQL SECURITY {DEFINER|INVOKER}      | SQL SECURITY {DEFINER|INVOKER}
  | COMMENT 'string'                    | COMMENT 'string'

The only differences between the left column and the right column are:
* {IS|AS} clause -- if you see IS OR AS here, then you know it's PL/SQL.
* parentheses around the parameter list are optional.
* types can be Oracle data types, which I'll discuss later.

If Mr Spock saw this he would say "It's PL/SQL captain, but not as we know it." The DEFINER clause and the INOUT option and all the characteristics are not in Oracle 12c. But they are good additions -- if they weren't allowed, some functionality would be lost. So my only quibble would be: the Oracle characteristics clause AUTHID {DEFINER|CURRENT_USER} should have been allowed as well.

After I created with
CREATE PROCEDURE p IS BEGIN DELETE FROM t; END;
I looked at information_schema with
SELECT routine_definition, routine_body, sql_data_access, sql_mode
FROM information_schema.routines
WHERE routine_name='p';
and saw
ROUTINE_DEFINITION: 'IS BEGIN DELETE FROM t; END'
ROUTINE_BODY = 'SQL'
SQL_DATA_ACCESS = 'CONTAINS SQL'
SQL_MODE: 'PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE,ORACLE, ...' (etc.)
So how do I know this is a PL/SQL routine?
I can't depend on SQL_MODE, for the reasons I stated before. I don't like to depend on ROUTINE_DEFINITION, because really the definition doesn't start with IS, it starts with BEGIN.

So my second quibble is: I wish the value in ROUTINE_BODY was 'PLSQL' rather than 'SQL'.

Approximately the same syntax enhancements exist for functions and triggers.

Body

When a body of a routine is a compound statement ...

SQL/PSM                            PL/SQL
-------                            ------
                                   [DECLARE section]
BEGIN [[NOT] ATOMIC]               BEGIN
  [variable declarations]
  [exception-handler declarations]
  [Executable statements]          Executable statements
                                   [EXCEPTION section]
END                                END

In SQL/PSM the variables are declared within BEGIN/END, in PL/SQL they're declared before BEGIN/END. In SQL/PSM the exceptions are declared within BEGIN/END, in PL/SQL they're declared after BEGIN/END.

Therefore the SQL/PSM body...

CREATE OR REPLACE PROCEDURE a()
BEGIN
  DECLARE v VARCHAR(100) DEFAULT 'Default';
  SET v = NULL;
END;

... does look different in PL/SQL ...

CREATE PROCEDURE a IS
v VARCHAR2(100) := 'Default';
BEGIN
  v := NULL;
END;

but the differences are in the syntax and order of non-executable statements. That is, the existing MariaDB-10.2 SQL executable statements (DELETE, DROP, etc.) will work in a PL/SQL body.

Control statements

SQL/PSM                            PL/SQL
-------                            ------

label:                             <<label>>
IF ... THEN ... [ELSE ...] END IF  IF ... THEN ... [ELSE ...] END IF 
LOOP ... END LOOP                  LOOP ... END LOOP
WHILE ... DO ...; END WHILE        WHILE LOOP ...; END LOOP
CASE ... WHEN ... THEN ... END CASE CASE ... WHEN ... THEN ... END CASE
REPEAT ... UNTIL ...
                                   ELSIF
                                   FOR
                                   GOTO
                                   NULL
LEAVE                              EXIT [WHEN ...]
ITERATE                            CONTINUE [WHEN ...]

For controlling the flow of execution, there's a lot of similarity.

The only major "enhancement" in PL/SQL is GOTO, which is considered beneficial when one wishes to get out of a loop that's within a loop that's within a loop. Seeing how that works out in practice will be interesting (a euphemism for: there will be bugs).

The only minor "enhancement" in PL/SQL is NULL, which does nothing. That is, where in SQL/PSM you'd say "BEGIN END", in SQL/PSM you'd say "BEGIN NULL; END".

Third quibble: For some reason, BEGIN <<label>> NULL; END is illegal.

Fourth quibble: The NULL statement is not visible in INFORMATION_SCHEMA.
(Update 2017-01-17: The third quibble has been taken care of, and the fourth quibble was my mistake, the NULL statement is visible.)

Variables

This is a PL/SQL procedure with a mishmash of declarations and assignments.

CREATE PROCEDURE p() IS
  a VARCHAR2(500);
  b CHAR(7) CHARACTER SET sjis;
  c NUMBER(6) := 1;
  d TINYINT DEFAULT 1;
BEGIN SET @c = c; d := @d; END;

Variables a and c are declared with Oracle syntax, but variables b and d are declared with MySQL syntax. The first assignment is MariaDB syntax, but the second assignment is Oracle syntax.

Fifth quibble: "SET d := @d" wouldn't work, which shows that Oracle and MariaDB variable and assignment statements aren't always interchangeable, but they usually are.

Mishmash is good -- once again, the MariaDB folks didn't see a need to limit functionality just because Oracle doesn't do something, and didn't see a need to be strict about Oracle syntax because of course MariaDB syntax works. However, routines like this can't be migrated to Oracle 12c. Compatibility only works one way.

The worklog

MariaDB is open -- the early source code is available, and so are the worklog tasks on JIRA. The information in the worklog tasks is sparse, far inferior to what we used to do for MySQL. But these items are worth following to see when they change.

[MDEV-10137] Providing compatibility to other databases
[MDEV-10142] PL/SQL parser
[MDEV-10343] Providing compatibility for basic SQL data types
[MDEV-10411] Providing compatibility for basic PL/SQL constructs
[MDEV-10580] sql_mode=ORACLE: FOR loop statement
[MDEV-10697] GOTO statement
[MDEV-10764] PL/SQL parser - Phase 2

There are many other tasks associated with "Oracle compatibility", if I had tried to describe them all this post would be too long.

MariaDB 10.2 Window Functions

Today the first MariaDB 10.2 alpha popped up and for the first time there is support for window functions.

I'll describe what's been announced, what's been expected, comparisons to other DBMSs, problems (including crashes and wrong answers), how to prepare, what you can use as a substitute while you wait.

I assume some knowledge of what window functions are. If you'd prefer an introductory tutorial, I'd suggest reading articles like this one by Joe Celko before you continue reading this post.

What's been announced

The MariaDB sources are:

The release notes

The source code trees -- the feature tree up till now has been github.com/MariaDB/server/commits/bb-10.2-mdev9543 but the version-10.2 download page has more choices and is probably more stable.

Sergei Petrunia and Vicentiu Ciorbaru, two developers who I think deserve the credit, made a set of slides for a conference in Berlin earlier this month. It seems to have some typos but is the best description I've seen so far.

On Wednesday April 20 Mr Petrunia will give a talk at the Percona conference. Alas, it coincides with Konstantin Osipov's talk about Tarantool -- which I've done some work for -- which, if you somehow haven't heard, is a NoSQL DBMS that's stable and faster than others according to independent benchmarks like the one from Coimbra. What a shame that two such important talks are scheduled for the same time.

Anyway, it's clear that I'll have to update this post as more things happen.

What's been expected

There have been several wishes / feature requests for window functions over the years.

Typical feature requests or forum queries are "Oracle-like Analytic Function RANK() / DENSE_RANK() [in 2004]", "analytical function like RANK etc to be implemented [in 2008]", "Is MySQL planning to implement CTE and Window functions? [in 2010]".

Typical blog posts are Shlomi Noach's "Three wishes for a new year [in 2012]" and Baron Schwartz's "Features I'd like in MySQL: windowing functions [in 2013]"..

Typical articles mentioning the MySQL/MariaDB lack of window functions are "What PostgreSQL has over other open source SQL databases" and "Window Functions Comparison ...".

So it's clear that there has been steady demand, or reason for demand, over the years.

My first applause moment is: Mr Petrunia and Mr Ciorbaru have addressed something that's been asked for, rather than what they wished had been asked for.

Comparisons to other DBMSs

I know twelve DBMSs that support window functions. No screen is wide enough for a chart showing them all, so I'll just list their windows-function documents here so that you can click on the names to get the details:
APACHE DRILL,
CUBRID,
DB2 LUW,
DB2 z/OS 10,
DERBY,
FIREBIRD,
INFORMIX,
ORACLE,
POSTGRESQL,
SQL SERVER,
SYBASE,
TERADATA. I'll show MariaDB against The Big Three.

These functions are mentioned in the standard document as required by optional feature T611 ELementary OLAP operations:

Function MariaDB Oracle DB2 SQL Server
DENSE_RANK yes yes yes yes
RANK yes yes yes yes
ROW_NUMBER yes yes yes yes

These functions are mentioned in the standard document as required by optional feature T612 Advanced OLAP operations:

Function MariaDB Oracle DB2 SQL Server
CUME_DIST yes yes no yes
PERCENT_RANK yes yes no yes

These functions are mentioned in the standard document as required by optional features T614 through T617:

Function MariaDB Oracle DB2 SQL Server
FIRST_VALUE no yes yes yes
LAG no yes yes yes
LAST_VALUE no yes yes yes
LEAD no yes yes yes
NTH_VALUE no yes no no
NTILE yes yes no yes

These are common functions which are in the standard and which can be window functions:

Function MariaDB Oracle DB2 SQL Server
AVG yes yes yes yes
COUNT yes yes yes yes
COVAR_POP/SAMP no yes yes yes
MAX no yes yes yes
MIN no yes yes no
SUM yes yes yes yes
VAR_POP/SAMP no yes yes yes

Yes MariaDB also supports non-standard functions like BIT_XOR, but they're worthless for comparison purposes. What's more important is that the MariaDB functions cannot be DISTINCT.

As for the options in OVER clause ... just the important ones ...

Function MariaDB Oracle DB2 SQL Server
ORDER BY yes yes yes yes
NULLS FIRST|LAST no "yes" yes no
PARTITION BY yes yes yes yes
PRECEDING|FOLLOWING sometimes yes yes yes

Those are the options that matter. The NULLS clause is important only because it shows how far an implementor will go to support the standard, rather than because most people care. MariaDB in effect supports NULLS HIGH|LOW, which is as good as Oracle -- The Oracle manual puts it this way: "NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order." People who think that's not cheating can add a comment at the end of this post.

From the above I suppose this second applause moment is justifiable: MariaDB has all the basics, and half of the advanced features that other DBMSs have.

Problems (including crashes and wrong answers)

The MariaDB announcement says:

"Do not use alpha releases on production systems! ... Thanks, and enjoy MariaDB!"

Indeed anyone who used 10.2.0 in production would discover that enjoyable things can be bad for you.

I started with this database: ...

create table t1 (s1 int, s2 char(5));
insert into t1 values (1,'a');
insert into t1 values (null,null);
insert into t1 values (1,null);
insert into t1 values (null,'a');
insert into t1 values (2,'b');
insert into t1 values (-1,'');

The following statements all cause the server to crash:

select row_number() over ();
select 1 as a, row_number() over (order by a) from dual;
select *, abs(row_number() over (order by s1))
 - row_number() over (order by s1) as X from t1;
select rank() over (order by avg(s1)) from t1;

The following statements all give the wrong answers:

select count(*) over (order by s2) from t1 where s2 is null;
select *,dense_rank() over (order by s2 desc),
 dense_rank() over (order by s2) from t1;
select *, sum(s1) over (order by s1) from t1 order by s1;
select avg(s1), rank() over (order by s1) from t1;

The following statement causes the client to hang (it loops in mysql_store_result, I think this is the first time I've seen this type of error)

select *, avg(s1) over () from t1;

And now for the third applause line ... to which you might be saying: huh? Aren't those, er, less-than-desirable results? To which I would reply: yes, but two weeks ago there were far more and far bigger problems. We should be clapping for how quickly progress has been made, and guessing that this section of my post will be obsolete soon.

How to prepare

You have lots of time to get ready for 10.2, but may as well start now by getting rid of words that have special meaning for window functions.

The word OVER is reserved.

The newly supported function names -- DENSE_RANK RANK ROW_NUMBER CUME_DIST PERCENT_RANK NTILE -- are not reserved, and the names of functions which will probably be supported soon -- FIRST_VALUE LAG LEAD LAST_VALUE NTH_VALUE -- will probably not be reserved. But they might as well be, because you won't be able to use those names for your own functions. Besides, they're reserved in standard SQL.

What you can use as a substitute

Suppose you don't want to wait till MariaDB is perfect, or you'd like to stay with MySQL (which as far as I know has made less progress than MariaDB toward this feature). Well, in short: gee that's too bad. But I have seen three claims about getting a slight subset.

One: Shlomi Noach claimss you can use a trick with GROUP_CONCAT:

Two: Adrian Corston claims you can make delta functions with assignments.

Three: I claim that in ocelotgui you can put 'row_number() over ()' in a SELECT and get a row-number column even with older versions of MySQL or MariaDB (this is a recent change, it's in the source not the binary).

In fact all the "window_function_name() OVER ()" functions could be done easily in the client, if they're in the select list and not part of an expression,
and the result set is ordered. But I'm not sure whether that's something to excite the populace.

There might be a "Four:". I have not surveyed the various applications that can do cumulations. I suspect that mondrian is one, and open OLAP might be another, but haven't looked at them.

Our own progress

For ocelotgui (Ocelot's GUI client for MySQL and MariaDB) we had to adjust the syntax checker to highlight the new syntax in 10.2, as this screenshot shows window_function
So we now can claim to have "the only native-Linux GUI that correctly recognizes MariaDB 10.2 window functions". Catchy slogan, eh? The beta download is at https://github.com/ocelot-inc/ocelotgui. I still expect that it will be out of beta in a few weeks.

MariaDB 10.1 Release Candidate

I installed the MariaDB 10.1 Release Candidate. Nothing interesting happened, which from MariaDB's point of view is good. But here's how I tried to make it interesting. Some of this applies to late releases of MariaDB 10.0 as well.

Loop with MAKE INSTALL

My habit is to download the source to directory X and then say "cmake -DCMAKE_INSTALL_PREFIX=/X" (the same directory), then "make", then "make install". That doesn't work any more. Now I can't install in the same directory that I downloaded in. Not a big deal; perhaps I'm the only person who had this habit.

Crash with ALTER

In an earlier blog post General Purpose Storage Engines in MariaDB I mentioned a crash, which I'm happy to say is fixed now. Here's another way to crash, once again involving different storage engines.

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 2
Server version: 10.1.8-MariaDB Source distribution

Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> INSTALL SONAME 'ha_tokudb';
Query OK, 0 rows affected (0.57 sec)

MariaDB [(none)]> USE test
Database changed
MariaDB [test]> CREATE TABLE t (id INT UNIQUE, s2 VARCHAR(10)) ENGINE=tokudb;
Query OK, 0 rows affected (0.56 sec)

MariaDB [test]> INSERT INTO t VALUES (1,'ABCDEFGHIJ');
Query OK, 1 row affected (0.04 sec)

MariaDB [test]> INSERT INTO t VALUES (2,'1234567890');
Query OK, 1 row affected (0.05 sec)

MariaDB [test]> CREATE INDEX i ON t (s2);
Query OK, 0 rows affected (0.64 sec)                                   
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [test]> ALTER TABLE t engine=innodb;
Query OK, 2 rows affected (0.53 sec)               
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> ALTER TABLE t engine=tokudb;
ERROR 2013 (HY000): Lost connection to MySQL server during query

... Do not do this on a production system, as it will disable all your databases.
[ UPDATE 2015-09-29: This is apparently due to a problem with jemalloc which should only happen if one builds from source on Ubuntu 12.04. MariaDB was aware and had supplied extra information in its Knowledge Base, which I missed. Thanks to Elena Stepanova. ]

PREPARE within PREPARE

No doubt everyone encounters this situation at least once:

MariaDB [test]> prepare stmt1 from 'prepare stmt2 from @x';
ERROR 1295 (HY000): This command is not supported in the prepared statement protocol yet

... and then everyone else gets on with their lives, because preparing a prepare statement isn't top-of-agenda. Not me. So I welcome the fact that I can now say:

MariaDB [test]> prepare stmt1 from 'begin not atomic prepare stmt2 from @x; end';
Query OK, 0 rows affected (0.00 sec)
Statement prepared

So now if I execute stmt1, stmt2 is prepared. This is part of the "compound statement" feature.

Evade the MAX QUERIES PER HOUR limit

Suppose some administrator has said

GRANT ALL ON customer.* TO 'peter'@'localhost' WITH MAX_QUERIES_PER_HOUR 20;

Well, now, thanks again to the "compound statement" feature, I can evade that and do 1000 queries. Here's the test I used:

SET @a = 0;
DELIMITER //
WHILE @a < 1000 DO INSERT INTO t VALUES (@a);
SET @a = @a + 1;
END WHILE;//
DELIMITER ;

No error. So with a little advance planning, I could put 1000 different statements in a user variable, pick off one at a time from within the loop, and execute. One way of looking at this is: the WHILE ... END WHILE is a single statement. Another way to look at this is: new features introduce new hassles for administrators. Such, however, is progress. I clapped for the compound-statement feature in an earlier blog post and ended with: "But MariaDB 10.1.1 is an early alpha, and nothing is guaranteed in an alpha, so it's too early to say that MariaDB is ahead in this respect." I'm glad to say that statement is obsolete now, because this is MariaDB 10.1.8, not early alpha but release candidate.

The true meaning of the OR REPLACE clause

MariaDB has decided to try to be consistent with CREATE and DROP statements, because frankly nobody could ever remember: which CREATE statements allow CREATE IF NOT EXISTS, which CREATE statements allow CREATE OR REPLACE, which DROP statements allow DROP IF EXISTS? I wrote a handy chart in MySQL Worklog#3129 Consistent Clauses in CREATE and DROP. Now it's obsolete. The MariaDB version of the chart will have a boring bunch of "yes"es in every row.

But OR REPLACE behaviour is just a tad un-Oracle-ish. The Oracle 12c manual's description is "Specify OR REPLACE to re-create the [object] if it already exists. You can use this clause to change the definition of an existing [object] without dropping, re-creating, and regranting object privileges previously granted on it." That's not what MariaDB is doing. MariaDB drops the object and then creates it again, in effect. You can see that because you need to have DROP privilege on the object in order to say CREATE OR REPLACE.

And here's where it gets a tad un-MySQL-ish too. If you say "CREATE OR REPLACE x ...,", causing the dropping of an existing x, and then say SHOW STATUS LIKE 'Com%', you'll see that the Com_drop_* counter is zero. That is: according to the privilege requirements, x is being dropped. But according to the SHOW statement, x is not being dropped. Decent folk wouldn't use SHOW anyway, so this won't matter.

An effect on us

One of the little features of ocelotgui (the Ocelot GUI application for MySQL and MariaDB) is that one can avoid using DELIMITER when typing in a statement. The program counts the number of BEGINs (or WHILEs or LOOPs etc.) and matches them against the number of ENDs, so it doesn't prematurely ship off a statement to the server until the user presses Enter after the final END. However, this feature is currently working only for compound statements within CREATE statements. Now that compound statements are stand-alone, this needs adjusting.

Now that I've mentioned ocelotgui again, I'll add that if you go to the https://github.com/ocelot-inc/ocelotgui download page and scroll past the install instructions, you'll find more pictures, and a URL of the debugger reference, for version 0.7 alpha.

General Purpose Storage Engines in MariaDB

MariaDB tries to encourage use of many storage engines, which Oracle doesn't bother with. The ones that could be considered -- occasionally! -- as InnoDB replacements are: Mroonga, TokuDB, and Aria.

Mroonga

I predicted that Mroonga would be in MariaDB 10.0.8, and behold, here it is in MariaDB 10.0.15. Truly I am the Nostradamus of the database world.

It's a general-purpose storage engine, in that it doesn't have serious limitations regarding what can be stored or indexed. I think its "column-store" feature has limited appeal, but as I saw in January, it's dandy for full-text searching. This time I look at the list of "Full-Text Restrictions" in the MySQL manual, and quote from it:

"Full-text searches are supported for InnoDB and MyISAM tables only." -- Not true for Mroonga, of course.

"Full-text searches are not supported for partitioned tables." -- True for Mroonga, which can't handle partitioned tables at all.

"... the utf8 character set can be used, but not the ucs2 [or utf16 or utf32] character set." -- True for Mroonga.

"Therefore, the FULLTEXT parser cannot determine where words begin and end in [Chinese/Japanese/Korean i.e. CJK]." -- Not true for Mroonga, which is just great with CJK.

"... all columns in a FULLTEXT index must use the same character set and collation." -- True for Mroonga.

"The argument to AGAINST() must be a string value that is constant during query evaluation." -- True for Mroonga.

"For InnoDB, all DML operations (INSERT, UPDATE, DELETE) involving columns with full-text indexes are processed at transaction commit time." -- Technically not true for Mroonga, but Mroonga indexes will get out of synch if rollbacks occur.

TokuDB

In a previous escapade I found that TokuDB was good stuff but lacked two features: foreign keys and full-text search.

So, inspired by the fact that I can use Mroonga full-text searching with InnoDB, I wondered: wouldn't it be great if I could use that trick on TokuDB too?

Well, I gave it a try, and have to say the result was a bit disappointing ...

MariaDB [test]> CREATE TABLE t (
    ->   id INT NOT NULL DEFAULT '0',
    ->   `text_column` VARCHAR(100) CHARACTER SET utf8 DEFAULT NULL,
    ->   PRIMARY KEY (id),
    ->   FULLTEXT KEY `text_column` (text_column)
    -> ) ENGINE=mroonga DEFAULT CHARSET=latin1 COMMENT='engine "tokudb"'
    -> ;
ERROR 2013 (HY000): Lost connection to MySQL server during query

... Do not do this on a production system, as it will disable all your databases.

I also have a MariaDB 10.0.15 server that I built from source, and on that server TokuDB crashes even for simple everyday statements. I suppose this is like saying "Doctor my head hurts when I beat it against the wall". But there's no warning during the build and no warning in the documentation. Therefore it must be somebody else's fault.

Aria

In 2013 Stewart Smith said about Aria: "It’s still not there and I don’t think it ever will be." I didn't see anything in 2014 that contradicts what he predicts, and the MariaDB people themselves say improvements are "on hold", but it does have some advantages over MyISAM.

This is another storage engine with full-text support, but I rate that feature as "poor" because (like InnoDB) it won't do CJK well, and (unlike InnoDB) it has some irritating default settings about minimum key length and maximum frequency.

It looks nice that I can create a SPATIAL index with Aria; however, I see that this is supposedly possible with InnoDB too, according to the recent announcement of MySQL 5.7.5.

Data can be represented in tables

Capability Mroonga TokuDB Aria InnoDB
Full-text Indexes Excellent No Poor Poor
Foreign Keys No No No Yes
Rollbacks No Yes No Yes
Spatial Indexes Yes No Yes Real Soon
Maximum Key Length 3072 3072 1000 767
Smallest Lock Column Row Table Row
Allows Partition Clauses No Yes Yes Yes
Recovers after 'kill -9' No Yes Yes Yes
Works on Windows Yes No Yes Yes

Or, if you want to go beyond general-purpose open-source storage engines like these, there are lots more to look at. The biggest list of MySQL-compatible storage engines that I know of is in the Spanish Wikipedia.

UPDATE NOTICE: The chart above has been corrected due to comments from a Mroonga developer and a TokuDB developer.

Dynamic compound statements in MariaDB

A long-ago-discussed and much-requested feature, "dynamic compound statements", is working at last.

It's been eleven years since the original discussion of dynamic compound statements for MySQL, as you can see by looking at the worklog task on the wayback machine. (As usual, you have to click the "high level architecture" box to see the meat of the specification.) The essential idea is that one can directly enter compound statements like BEGIN ... END and conditional statements like "IF ... END IF" and looping statements like "WHILE ... END WHILE" without needing a CREATE PROCEDURE or CREATE FUNCTION statement.

The advantages are that one can run conditional or complex sequences of statements without needing an EXECUTE privilege, or affecting the database metadata. This has been a popular feature request, as one can see from bug#15037, bug#48777, bug#54000, and bug#61895. and bug#62679.

In 2013 Antony Curtis submitted a patch named "Compound / Anonymous statement blocks for MySQL". (Anonymous blocks is the Oracle terminology.) Apparently he offered it to Oracle but they couldn't agree about the licence terms. The MariaDB people were gladder to get the offer, so it's now in the MariaDB 10.1.1 alpha.

I downloaded it from source. There's a problem with "make install" but it's easy to work around and has nothing to do with Mr Curtis's patch. I had no trouble checking that the feature works as advertised. So, next, I compared the spec with Mr Curtis's implementation.

Spec: Allow BEGIN ... END "compound statements", CASE, IF, LOOP, WHILE, and REPEAT. Implementation: all done. The only significant defect is that BEGIN has to be stated as BEGIN NOT ATOMIC, to avoid confusion with an old non-standard meaning of BEGIN. So "BEGIN DELETE FROM t; END" is illegal. And "label_x: BEGIN DELETE FROM t; END" is illegal. Only "BEGIN NOT ATOMIC DELETE FROM t; END" is legal. It's a slight disappointment that no way was found to handle these little difficulties in the parser.

Spec: internally there will be a temporary anonymous procedure created for every compound statement. Implementation: This didn't happen, at least not in a user-visible way. That's a low-level detail so it doesn't matter.

Spec: there should be no new privilege requirement. Implementation: there is no new privilege requirement.

Spec: the implied routine will have characteristics like MODIFIES SQL DATA and NOT DETERMINISTIC. Implementation: characteristics are irrelevant.

Spec: even if there is an anonymous procedure, it should not be visible to the user in information_schema.routines. Implementation: nothing is visible in information_schema.routines.

Spec: perhaps the statement should be in performance_schema.statements. Implementation: no.

Spec: some statements that are client-specific and are not allowed in stored procedures should not be in dynamic compound statements. Implementation: right. For example "USE database_name" is not allowed.

Spec: it's uncertain whether a dynamic compound statement (which after all is a "statement") should appear as a single statement in the slow log. Implementation: I didn't test this because I don't believe in the slow log; I assumed it doesn't work.

Spec: if @@sql_mode is set within a dynamic compound statement, then it gets restored to its original value when the statement ends. Implementation: yes, it's restored.

Spec: A dynamic compound statement may not contain statements that create or drop or alter routines. Implementation: right, statements like DROP PROCEDURE are illegal.

Spec: a dynamic compound statement cannot be used for a PREPARE statement, and therefore there is no fix for BUG#14115 "Prepare() with compound statements breaks". Supposedly there could be parser-related pitfalls with such syntax. Implementation: PREPARE is allowed. Statements like PREPARE stmt1 FROM 'BEGIN NOT ATOMIC DECLARE v INT; END'// are just fine. This is worrisome, but probably the supposed pitfalls were cleared up long ago.

Spec: SHOW STATUS could have new counters: Com_compound_statement, Com_if, Com_loop, Com_repeat, Com_while. Implementation: No, that's not implemented.

Spec: SHOW PROCESSLIST will show the whole compound statement. Implementation: No, that's not implemented. When I first started the performance_schema design I realized that SHOW PROCESSLIST would eventually become obsolete, so nowadays I think this part of the spec is obsolete.

The future

Given that dynamic compound statements are in DB2 and Oracle 12c and PostgreSQL and now in MariaDB alpha, Oracle/MySQL will look a bit slow if it waits another eleven years to implement them.

But MariaDB 10.1.1 is an early alpha, and nothing is guaranteed in an alpha, so it's too early to say that MariaDB is ahead in this respect.

A glance at a MariaDB release candidate

Today I looked at the MariaDB Release Candidate wondering how my earlier predictions came out.

I predicted, for "roles":

For all of the bugs, and for some of the flaws, there's no worry -- they'll probably disappear.

In fact all the bugs are gone, and I belatedly realized (after some gentle nudges from a MariaDB employee) that some of the flaws weren't flaws.
The inability to grant to PUBLIC still troubles me, but it looks like roles are ready to roll.

I predicted for "mroonga":

At the time I'm writing this, MariaDB 10.0.8 doesn't have mroonga yet.

In other words, I thought it would be in 10.0.8. It was not to be, as another MariaDB employee told me:

Unfortunately, mroonga appeared to have some portability problems. Most likely it won't be in 10.0.8. There is a good chance it will be 10.0.9 though.

While I was looking, I tried out a new clause for the DELETE statement:
DELETE FROM t RETURNING select-list;
which returns the rows that just got deleted. It's a new non-standard clause which increases MariaDB's divergence from MySQL, so decent people should demand an explanation.

Oracle 12c looks similar but does something different:
DELETE FROM t RETURN|RETURNING expr [,expr...] INTO data-item [,data-item...];
SQL Server 2012 has a clause which looks different but does something similar:
DELETE FROM t OUTPUT select-list;
PostgreSQL has a clause which looks similar and does something similar:
DELETE FROM t RETURNING select-list;

So we're seeing, yet again, the belief that PostgreSQL is a better model for MariaDB than the big boys are. By the way, the big boys are a bit more orthogonal -- I've seen extensions similar to this one in UPDATE or MERGE statements. But the bigger orthogonality woe is that here we have a statement that returns a result set, but can't be used in all the places where a result-set-returning sub-statement could work, like DECLARE CURSOR FOR DELETE, or INSERT ... DELETE. Of course, I'm glad they don't work. My point is only that now-it-works-now-it-doesn't additions make the product look haphazard.

Update: How it's done in standard SQL

This section was added on 2014-03-01, one week after the original post.
SQL:2011 has a non-core (optional) feature T495 “Combined data change and retrieval”. Ignoring a lot, the syntax is SELECT ... FROM NEW|OLD|FINAL TABLE (INSERT|UPDATE|DELETE ...) .... For example

 SELECT column1+5 FROM OLD TABLE (DELETE FROM t) AS t WHERE f = 77;

In this example table t is a "delta table".
There's an example with DECLARE CURSOR and FETCH in the DB2 10 for z/OS manual.

Mroonga and me and MariaDB

Chinese and Japanese and Korean (CJK) text usually has no spaces between words. Conventional full-text search does its tokenizing by looking for spaces. Therefore conventional full-text search will fail for CJK.

One workaround is bigrams. Suppose the text is

册免从冘

There should be three index keys, one for each two-character sequence:

册免, 免从, and 从冘.

Now, in a search like

SELECT * FROM t WHERE MATCH(text_column) AGAINST ('免从');

a bigram-supporting full-text index will have a chance. It's wasteful and there will be false hits whenever the bigram isn't really a "word", but the folks in CJK-land have found that bigrams (or the three-character counterpart, trigrams) actually work.

One way to get bigrams for MySQL or MariaDB is to get mroonga.

Why care about Yet Another Storage Engine)?

Back in 2008 a project named Senna attracted the attention of my colleagues at MySQL, but didn't go on to world conquest.

Since around 2011 the groonga project, billed as a "successor" to Senna, has been outing regular releases of a generic library that can connect to more than one DBMS. "Mroonga" is the "M[ySQL storage engine that interfaces with the] groonga [code]". So, although it hasn't been packaged with MySQL or MariaDB until now, it's not all new or untested code. What's new is that MariaDB will, real soon now, include mroonga as part of the regular MariaDB download. Poof, credibility at last.

I understand that mroonga has features that make it interesting:
* It's a column-storage engine, which I suppose makes it an alternative to Infobright or InfiniDB
* Some of its users or developers also have an involvement with the Spider storage engine, which I suppose means there wouldn't be too much trouble using the two engines in concert: mroonga for full-text and Spider for sharding.
... but I didn't look at the column storage or the Spider compatibility. I cared only that SHOW ENGINES said it's "CJK-ready fulltext search".

Of course, SHOW ENGINES can also show MyISAM and InnoDB and SphinxSE, and they can all do full-text searching too. One might choose MyISAM because for a long time that was the only engine that had full-text, or choose InnoDB because it's the default nowadays, or choose SphinxSE because it has lots of features including stemming. But those aren't targeted for the CJK niche. For example, MariaDB has no bigrams. Sphinx's documentation says it does have bigrams, but a glance shows instantly "those are not the bigrams you're looking for".

The only questions, then, are (a) does mroonga really handle full-text, and (b) does mroonga really handle CJK?

Does mroonga work?

At the time I'm writing this, MariaDB 10.0.8 doesn't have mroonga yet. There are instructions for getting packages with Windows and Ubuntu etc., but such things never work for me -- they'll always be out of synch with something else that I've got, or with what the main vendor (Oracle or MariaDB in this case) is updating. So I decided to build from source. Getting mroonga was easy since I already have Bazaar:

bzr branch lp:~mroonga/maria/mroonga

and then build. Actually the initial build failed because mroonga required CMake 2.8.8 which doesn't come with Ubuntu 12.04. But after I worked around that, I had a server and all I had to say was

                                                                             
INSTALL PLUGIN mroonga SONAME 'ha_mroonga.so';
CREATE FUNCTION last_insert_grn_id RETURNS INTEGER SONAME 'ha_mroonga.so';
SET GLOBAL mroonga_log_level=NONE;

I found more out by looking at some documentation, which is nowhere near Oracle quality but is mostly in readable English.

Then I was able to do this:

CREATE TABLE t (
  id INT NOT NULL DEFAULT '0',
  `text_column` VARCHAR(100) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (id),
  FULLTEXT KEY `text_column` (text_column)
) ENGINE=mroonga DEFAULT CHARSET=latin1 COMMENT='engine "innodb"'

It's probably less than obvious that I'm looking at a good-looking feature. The point is: with this syntax I can use mroonga with an InnoDB table. Whoopie. Although that means I lose mroonga's other features, I can use it without losing whatever I find good about InnoDB. And then I was able to do this:

SELECT * FROM t WHERE MATCH(text_column) AGAINST ('x');

Again it's probably less than obvious that I'm looking at a good-looking feature. The MATCH ... AGAINST syntax is idiosyncratic and unobvious -- don't expect this to arrive in an SQL standard near you any time soon. But it's what the long-term MyISAM user is used to, so the transition isn't painful.

Does mroonga handle CJK?

The J (Japanese) part is certainly there. The developers are Japanese. One of the optional extras is integration with MeCab which is a tool that can handle Japanese morphology -- it's like solving the "no spaces" problem by understanding a bit about the Japanese language, which after all is the solution that a human would use with no-spaces text. I don't see, though, that there's been equivalent attention paid for the C (Chinese) and K (Korean) parts of CJK. So I just looked at the bigrams, since they're all that C or K could benefit from.

I created a million-row table containing randomly-chosen characters, mostly Kanji, but with a mix of Latin and Kana letters. How I made it is tedious, but perhaps somebody out there will want to know how to make randomly populated tables with such "data", so I'll put the code in a comment on this posting.

I had a choice between a bewildering variety of ways to tokenize. The default one, TokenBigram, did the job. The good news for me was that it didn't do bigrams when the text was entirely Latin -- that would be silly -- but in a separate test I did see that it indexed Latin characters as ordinary words. So one of the worries that I had -- that mroonga would be only good for CJK but not for non-CJK mixed inside the same text column -- turned out not to be a worry.

And now, the essential point, the "l'essence du bigram" (which would look good on a restaurant menu, eh?) ...

MariaDB [test]> SELECT COUNT(*) FROM t WHERE MATCH(text_column) AGAINST ('免从');
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

It found a row with a bigram! But is that the right count?

MariaDB [test]> SELECT COUNT(*) FROM t WHERE text_column LIKE '%免从%';
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
1 row in set (2.48 sec)

Yes, it's the right count -- a LIKE search confirms it.

The difference is that the LIKE search took 2.5 seconds because it had to scan a million rows. The mroonga full-text search took approximately zero seconds because it used an index. That doesn't mean that mroonga is fast -- not every search is a benchmark. I did find a graph that shows mroonga throughput is better than MyISAM's or SphinxSE's but probably if I looked hard I could find a graph that says the opposite.

And the crowd goes wild

Although I have done no benchmarks or bug hunts, I have acquired the impression that mroonga is capable of doing what its makers say it ought to be able to do.

The Third Most Popular Open Source DBMS

We all know that MySQL says it is "the world's most popular open-source database". And PostgreSQL has a firm hold on second place while claiming to be, instead, "the world's most advanced open source database". But the horse that comes in third can return some money to gamblers who bet "to show". And the horse that shows momentum or gets close is worth watching for next time.

So I'll just ignore the dolphin and the elephant in the room, and go on to a harder question: who's number three?

According to Wikipedia

To find out how many times someone has expressed interest in a topic, I can go to stats.grok.se and ask how many times someone has looked at that topic's page in Wikipedia. Evil-thinking people could manipulate these numbers with ease, but until now they have had no reason to do so.

So here is the count of accesses on the Wikipedia pages of the nine contending SQL DBMSs, for September last year (blue) and September this year (red).

third_most_popular_1

In absolute terms, MariaDB is only slightly ahead. But if you look at momentum (what direction the hit counts have gone between September 2012 and September 2013), interest in MariaDB has risen a great deal while interest in most other DBMSs has declined.

Now here is the chart for open-source NoSQL DBMSs, made the same way. This time I can't be comprehensive -- there are 150 NoSQL products! I'll show only the ten with the most hits.
thirrd_most_popular_2
Of course MongoDB is in front. The surprise is that MongoDB's in-frontness isn't glaring.

According to The Register

To find out how many times the mass media mentioned a DBMS, I can go to google.com and enter site:theregister.co.uk [DBMS name] -mozilla. Simply googling the DBMS name would fail because Google seems to become more inexact as numbers get bigger, and because "Ingres" and "Firebird" and "Drizzle" and "Cassandra" have alternative meanings that have nothing to do with databases.

So this term is more specific, it's asking: how many times does search-term appear in The Register (in a non-Mozilla context)? For this purpose I'm positing that "the mass media" and the well-known medium "The Register" are synonymous terms.
third_most_popular_3

According to Ohloh

To find out how much activity an open-source project has, I can go to ohloh.net and look up the count of commits in the last year. (Ohloh "provides a single aggregate source of information about the world's open source projects from over 5,000 repositories and forges including GitHub, SourceForge and Google Code, as well as open source foundation repositories including the Eclipse, Mozilla, Apache, and Linux.") This method has been justly criticized, but is an indicator for developer enthusiasm. Again I am only showing the top ten, other than MySQL or PostgreSQL, for any kind of open-source DBMS.
third_most_popular_4

According to db-engines

An Austrian company named solid IT maintains a site named db-engines.com. On it there is "a list of DBMS ranked by their current popularity ... updated monthly." Their ranking is based on (a) Google and Bing hits, (b) Google Trends, (c) Stack Overflow and DBA Stack Exchange (counting the number of technical discussions), (d) indeed.com and simplyhired.com (counting the number of job offers), (e) LinkedIn (counting the number of people who mention the product in their profiles). In other words, none of their criteria match the criteria that I prefer. If one excludes closed-source and MySQL and PostgreSQL from their list, their top 10 hits are:
third_most_popular_5

Woe and alas. If their results had been similar to mine, despite their different criteria, then that might have validated both our methods. But they're very different. I think that's because my criteria are the only valid ones, but obviously I'm not the only evaluator.

According to the makers

"We believe that there are more copies of SQLite in use around the world than any other SQL database engine, and possibly all other SQL database engines combined."
-- sqlite.org

"Mountain View, Calif. – February 8, 2012 – Couchbase, Inc. [is] the NoSQL market share leader ..."
-- couchbase.com

"Firebird® is the universal open source database used by approximately 1 million of software developers worldwide."
-- facebook.com/FirebirdSQL

"MongoDB is the leading NoSQL database, with significant adoption among the Fortune 500 and Global 500."
-- mongodb.com

Announcing the finishers

MariaDB is in third place among the open-source SQL DBMSs, and would be third overall if we gave more weight to the numbers "according to Ohloh" and "according to The Register". But MongoDB and Cassandra nose past it "according to Wikipedia", and MongoDB is definitely the third-place finisher "according to db-engines". The claim of SQLite is strongest "according to the makers".

We have a photo finish, with a blurry camera.