Category: MariaDB


Offset and fetch clauses in MariaDB

A new feature in MariaDB 10.6 is “Implement SQL-standard SELECT … OFFSET … FETCH”.

I’ll say what’s not in the manual: mention other vendors, add details, talk style, show how to get the wrong answer, describe WITH TIES, mention alternatives, experiment with locks, and list 11 minor flaws.

Other vendors

Products with support for OFFSET … FETCH:
Oracle, DB2, SQL Server.

Products with support for LIMIT … OFFSET:
Tarantool, MySQL, Vertica, PostgreSQL.

Products with support for neither:
Firebird, ScyllaDB, Actian.

This isn’t a complete list — there are lots of SQL products nowadays — but it does include the “Big Three”. All of them support SELECT … OFFSET … FETCH.

Syntax details

The BNF actually is:

[ OFFSET start { ROW | ROWS } ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]

The “start” and count” must be unsigned integers if they’re literals, but if they’re declared variables then they can be NULL (in which case they’re treated as zeros), or can be negative (in which case they’re treated as -infinity for “start” and +infinity for “count”) (but depending on undefined behaviour isn’t always fun).

The lowest offset is zero which is the first row, and the lowest count is zero, that is, it is possible to select zero rows. If count is omitted, then 1 is assumed.

Style

If you haven’t read the descriptive SQL style guide and clicked like, then you won’t understand this section, skip it.

OFFSET and FETCH are both either “clause starts” or “sub-clause starts” so if the choice is to put separate clauses on separate lines, then they are.

Deciding whether to use ROW rather than ROWS, or FIRST rather than NEXT, (neither of which make a difference for server behaviour), one could base a rule on English grammar:
(1) If there was an OFFSET clause with a non-zero number-of-rows, then NEXT, else FIRST. If number-of-rows is greater than one, or unknown, then ROWS, else ROW.
or one could base a rule on what comes first in the MariaDB manual BNF:
(2) The BNF has {FIRST|NEXT} and {ROW|ROWS} so FIRST and ROW.
or one could base a rule on what the examples in the manual use most often:
(3) FIRST and ROWS
or one could base a rule on whatever is shortest:
(4) NEXT and ROW.

I like (3), for example

  OFFSET 5 ROWS
  FETCH FIRST 1 ROWS ONLY;

but there’s no established rule yet so I’ll vary the format in what follows.

How to get the wrong answer

Try this:

CREATE TABLE td (char_1_column CHAR(1) DEFAULT '1',
                 null_column INT,
                 char_2_column CHAR(2));
CREATE INDEX int_char_1_column_plus_char_2_column
  ON td (char_1_column, char_2_column);
INSERT INTO td (char_2_column)
  VALUES ('a'),('b'),('c'),('d'),('e'),('ab');
SELECT char_2_column, null_column
  FROM td
  WHERE char_1_column > '0'
  OFFSET 0 ROWS
  FETCH FIRST 4 ROWS ONLY;
SELECT char_2_column, null_column
  FROM td WHERE char_1_column > '0'
  OFFSET 4 ROWS
  FETCH FIRST 4 ROWS ONLY;

Result of first SELECT:
(‘a’,NULL),(‘ab’,NULL),(‘b’,NULL),(‘c’,NULL)

Result of second SELECT:
(‘e’,NULL),(‘ab’,NULL)

So there’s no (‘d’,NULL) and there’s a duplicate (‘ab’,NULL). Wrong answer.

I don’t think it’s a bug because, in the absence of an ORDER BY statement, there is no guarantee which rows will be skipped in the second SELECT. But luckily it’s due to an exotic situation which is easy to avoid: the combined index on a non-unique column and a multi-character string column. Just don’t use the index, and all is well.

WITH TIES

The effects of the new syntax are mostly the same as with LIMIT … OFFSET, but WITH TIES is new.

Suppose there are 6 rows, selected in order (there must be an ORDER BY clause):
(NULL),(NULL),(1),(1),(2),(2).
Then OFFSET 0 ROWS FETCH NEXT 1 ROWS WITH TIES will gives us (NULL),(NULL).
Thus WITH TIES doesn’t mean that the two values are technically “equal”, they merely must be peers, so NULLS tie. The interesting thing is that when you ask for “NEXT 1 ROW” you get the next 2 rows. But that only applies to FETCH, it doesn’t apply to OFFSET, therefore this could be a mistake:

SELECT * FROM t
  ORDER BY pk
  OFFSET 0 ROWS
  FETCH NEXT 1 ROWS WITH TIES;
SELECT * FROM t
  ORDER BY pk
  OFFSET 1 ROWS
  FETCH NEXT 1 ROWS WITH TIES;

With the example 6 rows, the second SELECT will return a row that was already returned for the first SELECT. It would be better to say, instead of “OFFSET 1 ROWS”, “OFFSET (number of rows returned by the previous SELECT) ROWS”.

Alternatives

A standard way to fetch only one row:

BEGIN NOT ATOMIC
  DECLARE v INT;
  DECLARE c CURSOR FOR SELECT * FROM t;
  OPEN c;
  FETCH NEXT FROM c INTO v;
END;

A non-standard way to fetch only one row: (remember to restore @@sql_select_limit later):

SET sql_select_limit = 1;
SELECT * FROM t;

A way to offset by 10 and fetch 10 (this assumes that pk has all the integers in this range i.e. it depends on user knowing about contents and indexes, and assumes that user only wants to go forward without skipping)

SELECT * FROM t
WHERE pk > 10 AND pk <= 10 + 10;

A way to let the user do the offsets and fetches (this depends on the client being a GUI and the user knowing how to click on a scroll bar)

SELECT * FROM t;

… although users often only look at the first few pages (think of how often you use Google and only click Next a few dozen times).

Locking

Will locking, or SET TRANSACTION ISOLATION LEVEL, guarantee consistency?

For a test, I started two clients. On the first, I said:

CREATE TABLE pks (pk INT PRIMARY KEY);
BEGIN NOT ATOMIC
  DECLARE v INT DEFAULT 0;
  WHILE v <= 10000 DO
    INSERT INTO pks VALUES (v);
    SET v = v + 1;
  END WHILE;
END;
SET @@autocommit=0;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT * FROM pks
  WHERE pk > 400
  OFFSET 10 ROWS
  FETCH FIRST 2 ROWS ONLY FOR UPDATE;

On the second, I said:

SET @@autocommit=0;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
DELETE FROM pks WHERE pk = 7227;
DELETE FROM pks WHERE pk = 405;
DELETE FROM pks WHERE pk = 411;

Result: The first DELETE succeeds, the second DELETE hangs until I say ROLLBACK; on the first client. The third DELETE would also have hung if I had said it first.

In other words: the rows between 401 and 410 (the rows skipped due to OFFSET) are locked, and the rows between 411 and 412 (the rows picked up by FETCH) are locked, but the rows after 412 (the rest of the rows matching WHERE) are not locked.

What this means, of course, is that ordinarily you can’t be certain that you’re paging through a consistent set. If this is a concern (it shouldn’t always be), make a copy on either the server or the client.

Minor Flaws

A few incompatibilities with standard SQL or another DBMS or another bit of MariaDB 10.10 syntax.

Table t has 4 rows created thus:

CREATE TABLE t (s1 INT, s2 INT);
INSERT INTO t VALUES (1,2),(2,1),(3,2),(4,1);

1.

WITH x AS (SELECT * FROM t OFFSET 5 ROWS) SELECT * FROM x;

Result: Success, all 4 rows in table t are returned. It appears that the OFFSET clause is ignored.

2.

BEGIN NOT ATOMIC
  DECLARE var1 DECIMAL(0);
  SELECT * FROM t OFFSET var1 ROWS;
END;

Result: Error 1691 (HY000) A variable of a non-integer based type in LIMIT clause
Never mind whether var1 should be accepted. The error message is obsolete, I was not using a LIMIT clause.

3.

BEGIN NOT ATOMIC
  DECLARE var1 BIT DEFAULT NULL;
  SELECT * FROM t OFFSET var1 ROWS;
END;

Result: Success, all rows are returned.
Sometimes non-integer variables are illegal, but here a BIT variable is legal.
MariaDB calls BIT a numeric data type but SELECT * FROM t OFFSET b’1′ ROWS wouldn’t work.

4.

SET sql_mode='oracle';
CREATE PROCEDURE p2 AS
  var1 NUMBER := 1;
  BEGIN
    SELECT * FROM t OFFSET var1 rows;
  END;

(Don’t forget to restore sql_mode later.)
Result: Error 1691 (HY000) A variable of a non-integer based type in LIMIT clause
That’s right for standard SQL (“The declared type … shall be an exact numeric with scale 0”) but Oracle says it might be more forgiving (“If rowcount includes a fraction, then the fractional portion is truncated.”)

5.

SELECT * FROM t
  OFFSET 1 ROWS
  FETCH FIRST 0 ROWS ONLY;

Result: Success, zero rows are returned.
That’s wrong for standard SQL, which would return “data exception — invalid row count in fetch first clause”.

6.

SELECT * FROM t
  FETCH FIRST 1 ROWS ONLY;

Result: Success.
But the manual’s BNF is

OFFSET start { ROW | ROWS }
FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }

which might suggest to newbies that the OFFSET clause is compulsory before FETCH.

7.

SET @x = 1;
SELECT * FROM t OFFSET @x ROWS;

Result: Syntax error.
I can find, by saying CREATE TABLE .. AS SELECT @x; and then saying SHOW CREATE TABLE or SELECT … FROM INFORMATION_SCHEMA, that @x is BIGINT. BIGINT declared variables are legal, and I can use variables that start with @ in some other situations. This is how things go when variables have variable types.

8.

SHOW WARNINGS OFFSET 1 ROW;

Result: Success. I believe this is reasonable, and of course any non-standard syntax can be handled in an implementor-defined way. But it’s not documented on the documentation page for SHOW WARNINGS.

9.

BEGIN NOT ATOMIC
  DECLARE only INT DEFAULT 2e2;
  DECLARE row INT DEFAULT (SELECT 5);
  SELECT * FROM t OFFSET ROW ROW FETCH FIRST ONLY ROW ONLY;
END;

Result: Success.
In standard SQL:2016 ROW and ONLY are reserved words.
MariaDB tends to accept goofiness since it’s not your nanny, but I believe in egregious cases the standard strictness is preferable after reasonable notice.

10.

SELECT * FROM t
  FETCH FIRST 1 ROWS ONLY ROWS EXAMINED 1;

Result: Error.
Another inconsistency, since LIMIT 1 ROWS EXAMINED 1 is legal.

11.

VALUES (1),(2),(3) OFFSET 1 ROW FETCH FIRST ROW ONLY;

Result: Success.
This is fine, but once again undocumented.

ocelotgui support

The Ocelot GUI client recognizes MariaDB syntax when connected to a MariaDB server, so if you need features like autocomplete after OFFSET to be correct then download it from github.

FOR … END FOR

Let’s start by combining one new MariaDB feature — VALUES (5) which is the standard equivalent of the old non-standard “SELECT 5” — with another new feature in MariaDB 10.3.5, the FOR … END FOR statement.

mariadb>BEGIN NOT ATOMIC
> DECLARE v INT DEFAULT 0;
> FOR a IN (VALUES (7)) DO SET v = v + 1; END FOR;
> SELECT v;
>END;
Error 2013 (HY000) Lost connection to MySQL server during query

Oops. However, the MariaDB folks now know about this, it’s bug MDEV-15940″ and they know about the other bugs that I’ll mention in this post, so there’s no problem provided you wait for the next version.

FOR … END FOR works prettily so I decided to describe it. The official documentation isn’t out yet so this could change.

FOR x IN (subquery) DO … END FOR

Example:

BEGIN NOT ATOMIC
  DECLARE v INT DEFAULT 0;
  FOR a IN (SELECT 5) DO SET v = v + 1; END FOR;
  SELECT v;
END;

The SELECT will return ‘1’ because the SET statement happened once, because SELECT 5 returns 1 row. We must put (SELECT 5) inside parentheses because there must be a way to find where it ends — the word DO is not reserved and therefore is useless as a marker for the parser.

FOR x IN cursor DO … END FOR

Example:

CREATE TABLE t (s1 INT);
INSERT INTO t VALUES (1),(2),(3);
BEGIN NOT ATOMIC
  DECLARE v2, v3 INT DEFAULT 0;
  BEGIN
    DECLARE cn CURSOR FOR SELECT * FROM t;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET v3 = 1;
    FOR cn_record IN cn
    DO
      SET v2 = v2 + cn_record.s1;
    END FOR;
  END;
  SELECT v2, v3;
END;

FOR cn_record IN cn means “loop with cursor cn, which was declared earlier”. And cn_record according to MariaDB is a row variable that can be used within the loop, so cn_record.s1 is the value of s1 within a given row of table t.

Given that table t has 3 rows containing (1),(2),(3), and given that the obvious intent is that v2 will end up with the sum, you might think the SELECT will return v2 = 6, eh? Well, you’re thinking right, it does. This is a way to go through a cursor, with each iteration doing an automatic FETCH.

And, since there’s a CONTINUE HANDLER that says we’ll set v3 = 1 when there are no more rows to fetch, you might think the SELECT will return v3 = 1, eh? Well, you’re thinking reasonably (I think), but it doesn’t. The handler isn’t brought into play, the NOT FOUND condition is automatically cleared when the loop ends.

Summary: this kind of FOR does automatic cursor-open and cursor-fetch, but not automatic cursor-close. To me that looked like a bug, the way I interpret the standard document, “CLOSE cn;” is supposed to happen. And MariaDB agrees, it’s bug MDEV-15941.

FOR x IN low-number .. high-number DO … END FOR

Example:

SET @v = 0;
CREATE PROCEDURE p()
BEGIN
  FOR a IN 1 .. 3 DO SET @v = @v + a; END FOR;
  SELECT @v;
END;

MariaDB 10.3 comes with a boatload of Oracle/PostgreSQL syntax, often known as “PL/SQL” (their deviation from the standard “SQL/PSM” syntax which MySQL/MariaDB have followed in the past). But all the PL/SQL stuff is supplied if and only if you say

SET @@sql_mode='oracle';

The FOR … END FOR statement is an exception, it works in the default mode too. That’s deliberate. And it’s equivalent to

SET a = 1; WHILE a <= 3 DO ... SET a = a + 1; END WHILE;

So the SELECT in the above example displays '6'.

The ".." is a new operator, vaguely similar to the "range operators" in languages like Ruby and Perl. But SQL already has a range operator, it's called BETWEEN. And now MariaDB won't be able to accept the SQL Server syntax where ".." is for qualifiers ("a..b" means "schema a + default table + column b").

I encountered a problem during the test, with:
FOR a IN (SELECT 1) .. 3 DO SET @v = @v + a; END FOR;
Technically this should be legal because “low-number” can be any expression, including a subquery. In this case, though, subqueries mean something else, so it is not legal. This is now bug MDEV-15944″.

I enthusiastically recommend: don’t use this, use WHILE.

Differences from the standard

My first example looked like this:

  FOR a IN (SELECT 5) DO SET v = v + 1; END FOR;

In standard SQL it would have looked like this:

  FOR (VALUES (5)) DO SET v = v + 1; END FOR;

… The standard doesn’t require mentioning “a” when there is no need to refer to “a” inside the loop. The fact that the standard would use “VALUES (5)” is, as we’ve seen, something that MariaDB will soon support too.

My second example looked like this:

    FOR cn_record IN cn
    DO
      SET v2 = v2 + cn_record.s1;
    END FOR;

In standard SQL it would have looked like this:

    FOR cn_record AS cn
    DO
      SET v2 = v2 + s1;
    END FOR;

So it’s a difference between “IN” and “AS”, and between “cn_record.s1” and “s1”. However, I could have said “cn_record.s1” in standard SQL too, it’s optional when there’s no ambiguity.

There are serious matters that underlie that innocent-looking difference with “cn_record.”, about which I now digress.

Shadowing, a Digression

In the following, should “SELECT v FROM t;” display 1, or 2, or 3?

BEGIN NOT ATOMIC
  a: DECLARE v INT DEFAULT 1;
  BEGIN
    b: DECLARE v INT DEFAULT 2;
    CREATE TABLE t (v INT);
    INSERT INTO t VALUES (3);
    SELECT v FROM t;
  END;
END;

Answer: in MariaDB it’s 2. Inner scope beats outer scope, and variable declaration beats table definition. This is an old MySQL flaw that MariaDB has inherited (Bug #5967 Stored procedure declared variable used instead of column). The workaround, if you actually want “column v of table t”, is to use a qualifier, as in “SELECT t.v FROM t;”.

But what if you wanted to refer to the first declared variable? That would be easy too, in standard SQL you would qualify with the label, as in “SELECT a.v FROM t;”. Unfortunately — another old MySQL flaw that MariaDB has inherited — you can’t use [label.]variable_name for qualification.

As a result of these two flaws, we could have trouble in our FOR … END FOR loop if we used unqualified names like “s1”. Therefore in MariaDB you have to say “cn_record.s1” if you want the FOR variable, that is, qualification is compulsory. That’s a solution.

But the solution depends on a third flaw.

In standard SQL in this FOR loop “cn_record” becomes a label, and that’s why “cn_record.s1” would be legal — cn_record is a label qualifier. That’s not the case with Oracle/MariaDB, where “cn_record” is a variable with type = ROW — cn_record is a row-variable qualifier. The effect is the same for the particular example that I used, but that’s just luck.

The flaws do not mean that the implementor did something wrong for FOR … END FOR, rather it means that the blame lies in MySQL’s history. It would be grossly wrong to blame the standard, which has no flaws.

Differences from the Standard, continued

My third example looked like this:

  FOR a IN 1 .. 3 DO SET @v = @v + a; END FOR;

In standard SQL it would have looked like this:

  [empty prairie with crickets chirping in the darkness]

For more about the original plan for adding FOR … END FOR to MySQL in a standard way, go to “WL#3309: Stored Procedures: FOR statement” and click “High-level architecture”.

ocelotgui

Of course we’ve changed Ocelot’s GUI client for MySQL/MariaDB so that it recognizes most of the new syntax in MariaDB 10.3. We haven’t changed the debugger, though, so it won’t handle statements containing FOR … END FOR.

The simultaneous_assignment mode in MariaDB 10.3.5

Starting with MariaDB 10.3.5, if you say sql_mode = ‘simultaneous_assignment’, then
UPDATE t SET a = b, b = a;
will swap b and a because a gets what’s originally in b, while b gets what’s originally in a (as of the start of the SET for a given row), instead of what was going on before, which was “Assignments are evaluated in left-to-right order”.

I will list things that I like or don’t like, and behaviour-change cases to watch for.

I like it that this is standard SQL at last

I don’t know any other vendor who ever did left-to-right assignments, and they had a good reason.

Way back in SQL-86, section 8.11 said how to handle a situation with (simplified BNF)
SET <object column> = <value expression> | NULL [, <object column> = <value expression> | NULL …
“If the <value expression> contains a reference to a column of T, the reference is to the value of that column in the object row before any value of the object row is updated.”

In case anyone still didn’t get it, SQL-92 added the sentence:
“The <value expression>s are effectively evaluated before updating the object row.”

Unfortunately later versions of the standard were slightly denser, such as:
“11) Each <update source> contained in SCL is effectively evaluated for the current row before any of the current row’s object rows is updated.”
The abbreviation “SCL” is the set clause list. The word “rows” is not a typo, but it’s for a special situation, we can still interpret as “before the object row is updated”.

Thus what we’re talking about is optionally following the standard.

So hurrah, let’s put equestrian statues of the MariaDB seal in main squares of our cities.

I don’t like the name

I’ll admit that the effect can be the same, and I’ll admit that some SQL Server commenters like to refer to this as doing things “all at once”. (See here and here.) But the standard description isn’t really demanding that all the assignments be simultaneous.

And the term “simultaneous assignment” can make some programmers
think of “parallel assignment”, as in this example (from Wikipedia)
a, b := 0, 1

A name like “evaluate_sources_first_during_updates” would have been better.

I don’t like that it’s a mode

We all know the problems of setting a global variable that people don’t see or think about when they create SQL statements.

So perhaps it would have been better to make a change to the UPDATE syntax, e.g.
UPDATE /*! new-style-update */ NEW STYLE t SET …;
It doesn’t solve the problem cleanly — now every statement has non-standard syntax in order to perform in a standard way — but it means you don’t get the behaviour change in every statement, only in the ones where you ask.

Or perhaps it would have been better to produce a warning: “You are using an sql_mode value that changes the behaviour of this particular UPDATE statement because evaluation is no longer left to right.” I don’t know if it would be hard. I don’t know if it’s too late.

Setup for examples

Following examples are done with a one-row table, made thus:

CREATE TABLE t (col1 INT, col2 INT, PRIMARY KEY (col1, col2));
INSERT INTO t VALUES (1, 2);

The MariaDB version is 10.3.5 downloaded on 2018-03-20.

Assigning with swaps and variables

SET @@sql_mode='simultaneous_assignment';
UPDATE t SET col1 = 1, col2 = 2;
UPDATE t SET col1 = col2, col2 = col1;
SELECT * FROM t;
-- Result: col1 is 2, col2 is 1.

This is different from what happens with @@sql_mode=”. This is the best example that shows the new mode works.

SET @@sql_mode='simultaneous_assignment';
SET @x = 100;
UPDATE t SET col1 = @x, col2 = @x := 200;
SELECT * FROM t;
-- Result: col1 is 100, col2 is 200.

This is not different from what happens with @@sql_mode=”. Conclusion: if you use the non-standard variable-assignment trick, the value is not updated in advance.

What about the SET statement?

DROP PROCEDURE p;
SET @@sql_mode='simultaneous_assignment';
CREATE PROCEDURE p()
BEGIN
  DECLARE v1 INT DEFAULT 1;
  DECLARE v2 INT DEFAULT 2;
  SET v1 = v2, v2 = v1;
  SELECT v1, v2;
END;
CALL p();
-- Result: v1 is 2, v2 is 2.

This is not different from what happens with @@sql_mode=”. Conclusion ‘simultaneous_assignment’ does not affect assignments with the SET statement.

Perhaps people will expect otherwise because SET statements look so similar to SET clauses, but this is actually correct. The way the standard defines things, for syntax that is similar to (though not exactly the same as) “SET v1 = v2, v2 = v1;”, is: the statement should be treated as equivalent to

SET v1 = v2;
SET v2 = v1;

Good. But come to think of it, this is another reason to dislike the misleading mode name ‘simultaneous_assignment’. SET statements are assignments but @@sql_mode won’t affect them.

What about the INSERT statement?

SET @@sql_mode='simultaneous_assignment';
TRUNCATE TABLE t;
INSERT INTO t SET col1 = col2, col2 = 99;
SELECT * FROM t;
-- Result: col1 is 0, col2 is 99.

Conclusion: ‘simultaneous_assignment’ does not affect INSERT … SET statements, processing is left to right. This looks inconsistent to me, but it’s not wrong.

What about the INSERT ON DUPLICATE KEY UPDATE statement?

SET @@sql_mode='simultaneous_assignment';
UPDATE t SET col1 = 0, col2 = 1;
INSERT INTO t VALUES (0, 1) ON DUPLICATE KEY UPDATE col1 = col2, col2 = col1;
SELECT * FROM t;
-- Result: col1 is 1, col2 is 0.

Conclusion: ‘simultaneous_assignment does affect INSERT ON DUPLICATE KEY UPDATE. Nobody said that it has to (this syntax is a MySQL/MariaDB extension), but surely this is what everyone would expect. Good.

Triggers and constraints

I could not come up with a case where ‘simultaneous_assignment’ affected the order in which triggers or constraints are processed. Good.

Assigning twice to the same column

SET @@sql_mode='simultaneous_assignment';
UPDATE t SET col1 = 3, col2 = 4, col1 = col2;
-- Result: error.

Good. There’s supposed to be an error, because (quoting the standard document again) “Equivalent <object column>s shall not appear more than once in a <set clause list>.”

Actually this doesn’t tell us something new about ‘simultaneous_assignment’, I only tried it because I thought that MariaDB would not return an error. Probably I was remembering some old version before this was fixed.

Multiple-table updates

From the bugs.mysql.com site quoting the MySQL manual:
“Single-table UPDATE assignments are generally evaluated from left to right. For multiple-table updates, there is no guarantee that assignments are carried out in any particular order.”
I don’t see an equivalent statement in the MariaDB manual. So I conclude that even for multiple-table updates the assignments will be done in a standard way, but couldn’t think of a good way to test.

There are other things

The really big things in MariaDB 10.3.5 are PL/SQL and the myrocks engine. So, just because I’ve looked at a small thing, don’t get the impression that 10.3.5 is going to be a minor enhancement.

ocelotgui

There have been no significant updates to ocelotgui since my last blog post.

PL/SQL in MariaDB

The Oracle 12c manual says “Oracle PL/SQL provides functionality equivalent to SQL/PSM:2011, with minor syntactic differences, such as the spelling or arrangement of keywords.” Yes, and English provides functionality equivalent to Russian, except all the words are spelled differently and in different order. Seriously, since Oracle has PL/SQL and MariaDB has SQL/PSM, their stored procedures and functions and triggers and anonymous blocks — hereinafter “code blocks” — look different.

But that’s changing.

There’s a boatload of Oracle compatibility sauce in MariaDB 10.3, and part of it is PL/SQL code blocks, which is what I’ll look at here. MariaDB 10.3 is not released and might not be released for a long time, but the source code is public. I downloaded it with
git clone -b bb-10.2-compatibility https://github.com/MariaDB/server.git
and followed the usual instructions for building from source. I had a bit of trouble building, and I’ll make a few quibbles when I describe details. But to get to the bottom line here at the top: the feature is complete and works about as well as some betas that I’ve seen.

Here’s a short example.

CREATE PROCEDURE p AS
  var1 NUMBER(6) := 1;
BEGIN
  WHILE var1 < 5 LOOP
    DELETE FROM t WHERE s1 = var1;
    var1 := var1 + 1;
    END LOOP;
END;

Notice some obvious differences from SQL/PSM:
there are no parentheses after p, they're optional
var1 is defined outside the BEGIN END block
WHILE ... LOOP ... END LOOP not WHILE ... DO ... END WHILE
assignment is done without SET, this too is sometimes optional.

In terms of "Oracle 12c compatibility", MariaDB is now pulling ahead of MySQL, which is bemusing when one considers that Oracle owns MySQL. MariaDB will still not be as compatible as PostgreSQL with EnterpriseDB, I suppose, but this is a boost for migrants.

First complaint: Ocelot's stored-procedure debugger, which is part of our open-source GUI (ocelotgui) for MySQL and MariaDB, can't handle the new PL/SQL. It doesn't even detect where statements end, so I've got work to do.

Second complaint: obviously Monty Widenius makes the MariaDB programmers work too hard. What happened to the halcyon days of yore when making DBMSs was one long party?

For the rest of this blog post, I will describe what the differences are between MariaDB's current PSM and MariaDB 10.3's PL/SQL.

sql_mode='oracle'

In order to enable creation of PL/SQL code blocks, say
SET sql_mode='oracle';

Before saying "this is a bad idea", I will admit that:
I don't have a better idea at the moment.

Now, with that out of the way, I can say that:
this is a bad idea.

(1) There's already a "SET sql_mode='oracle';" statement, so this is a significant behaviour change.
(2) The setting changes what statements are legal. That means that, faced with a code-block statement, it's impossible to know whether it's syntactically correct unless you know what the earlier statements were.
(3) Inevitably some things will be SQL/PSM and some things will be PL/SQL, at least in some installations. Therefore there will be lots of jumping back and forth with sql_mode settings, with special extra fun if someone makes the change to @@sql_mode globally.

In other words, this is such a bad idea that it cannot possibly survive. Therefore don't worry about it. (UPDATE: 2017-02-12: since I have no better idea, and since few current users would be affected, I backtrack -- MariaDB may as well stay on course.)

CREATE PROCEDURE

SQL/PSM                                 PL/SQL
-------                                 ------
CREATE                                  CREATE
[OR REPLACE]                            [OR REPLACE]
[DEFINER = {user|CURRENT_USER }]        [DEFINER = {user|CURRENT_USER}]
PROCEDURE sp_name                       PROCEDURE sp_name
([proc_parameter[,...]])                [([proc_parameter[,...]])]
[characteristic ...]                    [characteristic ...]
                                        {IS|AS}
body                                    body

proc_parameter:
  [IN|OUT|INOUT] name type              name [IN|OUT|INOUT] type

type:
    Any MariaDB data type               any MariaDB or Oracle data type

characteristic:
    LANGUAGE SQL                        LANGUAGE SQL
  | [NOT] DETERMINISTIC                 | [NOT] DETERMINISTIC
  | contains|reads|modifies SQL clause  | contains|reads|modifies SQL clause
  | SQL SECURITY {DEFINER|INVOKER}      | SQL SECURITY {DEFINER|INVOKER}
  | COMMENT 'string'                    | COMMENT 'string'

The only differences between the left column and the right column are:
* {IS|AS} clause -- if you see IS OR AS here, then you know it's PL/SQL.
* parentheses around the parameter list are optional.
* types can be Oracle data types, which I'll discuss later.

If Mr Spock saw this he would say "It's PL/SQL captain, but not as we know it." The DEFINER clause and the INOUT option and all the characteristics are not in Oracle 12c. But they are good additions -- if they weren't allowed, some functionality would be lost. So my only quibble would be: the Oracle characteristics clause AUTHID {DEFINER|CURRENT_USER} should have been allowed as well.

After I created with
CREATE PROCEDURE p IS BEGIN DELETE FROM t; END;
I looked at information_schema with
SELECT routine_definition, routine_body, sql_data_access, sql_mode
FROM information_schema.routines
WHERE routine_name='p';
and saw
ROUTINE_DEFINITION: 'IS BEGIN DELETE FROM t; END'
ROUTINE_BODY = 'SQL'
SQL_DATA_ACCESS = 'CONTAINS SQL'
SQL_MODE: 'PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE,ORACLE, ...' (etc.)
So how do I know this is a PL/SQL routine?
I can't depend on SQL_MODE, for the reasons I stated before. I don't like to depend on ROUTINE_DEFINITION, because really the definition doesn't start with IS, it starts with BEGIN.

So my second quibble is: I wish the value in ROUTINE_BODY was 'PLSQL' rather than 'SQL'.

Approximately the same syntax enhancements exist for functions and triggers.

Body

When a body of a routine is a compound statement ...

SQL/PSM                            PL/SQL
-------                            ------
                                   [DECLARE section]
BEGIN [[NOT] ATOMIC]               BEGIN
  [variable declarations]
  [exception-handler declarations]
  [Executable statements]          Executable statements
                                   [EXCEPTION section]
END                                END

In SQL/PSM the variables are declared within BEGIN/END, in PL/SQL they're declared before BEGIN/END. In SQL/PSM the exceptions are declared within BEGIN/END, in PL/SQL they're declared after BEGIN/END.

Therefore the SQL/PSM body...

CREATE OR REPLACE PROCEDURE a()
BEGIN
  DECLARE v VARCHAR(100) DEFAULT 'Default';
  SET v = NULL;
END;

... does look different in PL/SQL ...

CREATE PROCEDURE a IS
v VARCHAR2(100) := 'Default';
BEGIN
  v := NULL;
END;

but the differences are in the syntax and order of non-executable statements. That is, the existing MariaDB-10.2 SQL executable statements (DELETE, DROP, etc.) will work in a PL/SQL body.

Control statements

SQL/PSM                            PL/SQL
-------                            ------

label:                             <<label>>
IF ... THEN ... [ELSE ...] END IF  IF ... THEN ... [ELSE ...] END IF 
LOOP ... END LOOP                  LOOP ... END LOOP
WHILE ... DO ...; END WHILE        WHILE LOOP ...; END LOOP
CASE ... WHEN ... THEN ... END CASE CASE ... WHEN ... THEN ... END CASE
REPEAT ... UNTIL ...
                                   ELSIF
                                   FOR
                                   GOTO
                                   NULL
LEAVE                              EXIT [WHEN ...]
ITERATE                            CONTINUE [WHEN ...]

For controlling the flow of execution, there's a lot of similarity.

The only major "enhancement" in PL/SQL is GOTO, which is considered beneficial when one wishes to get out of a loop that's within a loop that's within a loop. Seeing how that works out in practice will be interesting (a euphemism for: there will be bugs).

The only minor "enhancement" in PL/SQL is NULL, which does nothing. That is, where in SQL/PSM you'd say "BEGIN END", in SQL/PSM you'd say "BEGIN NULL; END".

Third quibble: For some reason, BEGIN <<label>> NULL; END is illegal.

Fourth quibble: The NULL statement is not visible in INFORMATION_SCHEMA.
(Update 2017-01-17: The third quibble has been taken care of, and the fourth quibble was my mistake, the NULL statement is visible.)

Variables

This is a PL/SQL procedure with a mishmash of declarations and assignments.

CREATE PROCEDURE p() IS
  a VARCHAR2(500);
  b CHAR(7) CHARACTER SET sjis;
  c NUMBER(6) := 1;
  d TINYINT DEFAULT 1;
BEGIN SET @c = c; d := @d; END;

Variables a and c are declared with Oracle syntax, but variables b and d are declared with MySQL syntax. The first assignment is MariaDB syntax, but the second assignment is Oracle syntax.

Fifth quibble: "SET d := @d" wouldn't work, which shows that Oracle and MariaDB variable and assignment statements aren't always interchangeable, but they usually are.

Mishmash is good -- once again, the MariaDB folks didn't see a need to limit functionality just because Oracle doesn't do something, and didn't see a need to be strict about Oracle syntax because of course MariaDB syntax works. However, routines like this can't be migrated to Oracle 12c. Compatibility only works one way.

The worklog

MariaDB is open -- the early source code is available, and so are the worklog tasks on JIRA. The information in the worklog tasks is sparse, far inferior to what we used to do for MySQL. But these items are worth following to see when they change.

[MDEV-10137] Providing compatibility to other databases
[MDEV-10142] PL/SQL parser
[MDEV-10343] Providing compatibility for basic SQL data types
[MDEV-10411] Providing compatibility for basic PL/SQL constructs
[MDEV-10580] sql_mode=ORACLE: FOR loop statement
[MDEV-10697] GOTO statement
[MDEV-10764] PL/SQL parser - Phase 2

There are many other tasks associated with "Oracle compatibility", if I had tried to describe them all this post would be too long.

MariaDB 10.2 Window Functions

Today the first MariaDB 10.2 alpha popped up and for the first time there is support for window functions.

I’ll describe what’s been announced, what’s been expected, comparisons to other DBMSs, problems (including crashes and wrong answers), how to prepare, what you can use as a substitute while you wait.

I assume some knowledge of what window functions are. If you’d prefer an introductory tutorial, I’d suggest reading articles like this one by Joe Celko before you continue reading this post.

What’s been announced

The MariaDB sources are:

The release notes

The source code trees — the feature tree up till now has been github.com/MariaDB/server/commits/bb-10.2-mdev9543 but the version-10.2 download page has more choices and is probably more stable.

Sergei Petrunia and Vicentiu Ciorbaru, two developers who I think deserve the credit, made a set of slides for a conference in Berlin earlier this month. It seems to have some typos but is the best description I’ve seen so far.

On Wednesday April 20 Mr Petrunia will give a talk at the Percona conference. Alas, it coincides with Konstantin Osipov’s talk about Tarantool — which I’ve done some work for — which, if you somehow haven’t heard, is a NoSQL DBMS that’s stable and faster than others according to independent benchmarks like the one from Coimbra. What a shame that two such important talks are scheduled for the same time.

Anyway, it’s clear that I’ll have to update this post as more things happen.

What’s been expected

There have been several wishes / feature requests for window functions over the years.

Typical feature requests or forum queries are “Oracle-like Analytic Function RANK() / DENSE_RANK() [in 2004]”, “analytical function like RANK etc to be implemented [in 2008]”, “Is MySQL planning to implement CTE and Window functions? [in 2010]”.

Typical blog posts are Shlomi Noach’s “Three wishes for a new year [in 2012]” and Baron Schwartz’s “Features I’d like in MySQL: windowing functions [in 2013]”..

Typical articles mentioning the MySQL/MariaDB lack of window functions are “What PostgreSQL has over other open source SQL databases” and “Window Functions Comparison …”.

So it’s clear that there has been steady demand, or reason for demand, over the years.

My first applause moment is: Mr Petrunia and Mr Ciorbaru have addressed something that’s been asked for, rather than what they wished had been asked for.

Comparisons to other DBMSs

I know twelve DBMSs that support window functions. No screen is wide enough for a chart showing them all, so I’ll just list their windows-function documents here so that you can click on the names to get the details:
APACHE DRILL,
CUBRID,
DB2 LUW,
DB2 z/OS 10,
DERBY,
FIREBIRD,
INFORMIX,
ORACLE,
POSTGRESQL,
SQL SERVER,
SYBASE,
TERADATA. I’ll show MariaDB against The Big Three.

These functions are mentioned in the standard document as required by optional feature T611 ELementary OLAP operations:

Function MariaDB Oracle DB2 SQL Server
DENSE_RANK yes yes yes yes
RANK yes yes yes yes
ROW_NUMBER yes yes yes yes

These functions are mentioned in the standard document as required by optional feature T612 Advanced OLAP operations:

Function MariaDB Oracle DB2 SQL Server
CUME_DIST yes yes no yes
PERCENT_RANK yes yes no yes

These functions are mentioned in the standard document as required by optional features T614 through T617:

Function MariaDB Oracle DB2 SQL Server
FIRST_VALUE no yes yes yes
LAG no yes yes yes
LAST_VALUE no yes yes yes
LEAD no yes yes yes
NTH_VALUE no yes no no
NTILE yes yes no yes

These are common functions which are in the standard and which can be window functions:

Function MariaDB Oracle DB2 SQL Server
AVG yes yes yes yes
COUNT yes yes yes yes
COVAR_POP/SAMP no yes yes yes
MAX no yes yes yes
MIN no yes yes no
SUM yes yes yes yes
VAR_POP/SAMP no yes yes yes

Yes MariaDB also supports non-standard functions like BIT_XOR, but they’re worthless for comparison purposes. What’s more important is that the MariaDB functions cannot be DISTINCT.

As for the options in OVER clause … just the important ones …

Function MariaDB Oracle DB2 SQL Server
ORDER BY yes yes yes yes
NULLS FIRST|LAST no “yes” yes no
PARTITION BY yes yes yes yes
PRECEDING|FOLLOWING sometimes yes yes yes

Those are the options that matter. The NULLS clause is important only because it shows how far an implementor will go to support the standard, rather than because most people care. MariaDB in effect supports NULLS HIGH|LOW, which is as good as Oracle — The Oracle manual puts it this way: “NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order.” People who think that’s not cheating can add a comment at the end of this post.

From the above I suppose this second applause moment is justifiable: MariaDB has all the basics, and half of the advanced features that other DBMSs have.

Problems (including crashes and wrong answers)

The MariaDB announcement says:

“Do not use alpha releases on production systems! … Thanks, and enjoy MariaDB!”

Indeed anyone who used 10.2.0 in production would discover that enjoyable things can be bad for you.

I started with this database: …

create table t1 (s1 int, s2 char(5));
insert into t1 values (1,'a');
insert into t1 values (null,null);
insert into t1 values (1,null);
insert into t1 values (null,'a');
insert into t1 values (2,'b');
insert into t1 values (-1,'');

The following statements all cause the server to crash:

select row_number() over ();
select 1 as a, row_number() over (order by a) from dual;
select *, abs(row_number() over (order by s1))
 - row_number() over (order by s1) as X from t1;
select rank() over (order by avg(s1)) from t1;

The following statements all give the wrong answers:

select count(*) over (order by s2) from t1 where s2 is null;
select *,dense_rank() over (order by s2 desc),
 dense_rank() over (order by s2) from t1;
select *, sum(s1) over (order by s1) from t1 order by s1;
select avg(s1), rank() over (order by s1) from t1;

The following statement causes the client to hang (it loops in mysql_store_result, I think this is the first time I’ve seen this type of error)

select *, avg(s1) over () from t1;

And now for the third applause line … to which you might be saying: huh? Aren’t those, er, less-than-desirable results? To which I would reply: yes, but two weeks ago there were far more and far bigger problems. We should be clapping for how quickly progress has been made, and guessing that this section of my post will be obsolete soon.

How to prepare

You have lots of time to get ready for 10.2, but may as well start now by getting rid of words that have special meaning for window functions.

The word OVER is reserved.

The newly supported function names — DENSE_RANK RANK ROW_NUMBER CUME_DIST PERCENT_RANK NTILE — are not reserved, and the names of functions which will probably be supported soon — FIRST_VALUE LAG LEAD LAST_VALUE NTH_VALUE — will probably not be reserved. But they might as well be, because you won’t be able to use those names for your own functions. Besides, they’re reserved in standard SQL.

What you can use as a substitute

Suppose you don’t want to wait till MariaDB is perfect, or you’d like to stay with MySQL (which as far as I know has made less progress than MariaDB toward this feature). Well, in short: gee that’s too bad. But I have seen three claims about getting a slight subset.

One: Shlomi Noach claimss you can use a trick with GROUP_CONCAT:

Two: Adrian Corston claims you can make delta functions with assignments.

Three: I claim that in ocelotgui you can put ‘row_number() over ()’ in a SELECT and get a row-number column even with older versions of MySQL or MariaDB (this is a recent change, it’s in the source not the binary).

In fact all the “window_function_name() OVER ()” functions could be done easily in the client, if they’re in the select list and not part of an expression,
and the result set is ordered. But I’m not sure whether that’s something to excite the populace.

There might be a “Four:”. I have not surveyed the various applications that can do cumulations. I suspect that mondrian is one, and open OLAP might be another, but haven’t looked at them.

Our own progress

For ocelotgui (Ocelot’s GUI client for MySQL and MariaDB) we had to adjust the syntax checker to highlight the new syntax in 10.2, as this screenshot shows window_function
So we now can claim to have “the only native-Linux GUI that correctly recognizes MariaDB 10.2 window functions”. Catchy slogan, eh? The beta download is at https://github.com/ocelot-inc/ocelotgui. I still expect that it will be out of beta in a few weeks.

MariaDB 10.1 Release Candidate

I installed the MariaDB 10.1 Release Candidate. Nothing interesting happened, which from MariaDB’s point of view is good. But here’s how I tried to make it interesting. Some of this applies to late releases of MariaDB 10.0 as well.

Loop with MAKE INSTALL

My habit is to download the source to directory X and then say “cmake -DCMAKE_INSTALL_PREFIX=/X” (the same directory), then “make”, then “make install”. That doesn’t work any more. Now I can’t install in the same directory that I downloaded in. Not a big deal; perhaps I’m the only person who had this habit.

Crash with ALTER

In an earlier blog post General Purpose Storage Engines in MariaDB I mentioned a crash, which I’m happy to say is fixed now. Here’s another way to crash, once again involving different storage engines.

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 2
Server version: 10.1.8-MariaDB Source distribution

Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> INSTALL SONAME 'ha_tokudb';
Query OK, 0 rows affected (0.57 sec)

MariaDB [(none)]> USE test
Database changed
MariaDB [test]> CREATE TABLE t (id INT UNIQUE, s2 VARCHAR(10)) ENGINE=tokudb;
Query OK, 0 rows affected (0.56 sec)

MariaDB [test]> INSERT INTO t VALUES (1,'ABCDEFGHIJ');
Query OK, 1 row affected (0.04 sec)

MariaDB [test]> INSERT INTO t VALUES (2,'1234567890');
Query OK, 1 row affected (0.05 sec)

MariaDB [test]> CREATE INDEX i ON t (s2);
Query OK, 0 rows affected (0.64 sec)                                   
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [test]> ALTER TABLE t engine=innodb;
Query OK, 2 rows affected (0.53 sec)               
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> ALTER TABLE t engine=tokudb;
ERROR 2013 (HY000): Lost connection to MySQL server during query

… Do not do this on a production system, as it will disable all your databases.
[ UPDATE 2015-09-29: This is apparently due to a problem with jemalloc which should only happen if one builds from source on Ubuntu 12.04. MariaDB was aware and had supplied extra information in its Knowledge Base, which I missed. Thanks to Elena Stepanova. ]

PREPARE within PREPARE

No doubt everyone encounters this situation at least once:

MariaDB [test]> prepare stmt1 from 'prepare stmt2 from @x';
ERROR 1295 (HY000): This command is not supported in the prepared statement protocol yet

… and then everyone else gets on with their lives, because preparing a prepare statement isn’t top-of-agenda. Not me. So I welcome the fact that I can now say:

MariaDB [test]> prepare stmt1 from 'begin not atomic prepare stmt2 from @x; end';
Query OK, 0 rows affected (0.00 sec)
Statement prepared

So now if I execute stmt1, stmt2 is prepared. This is part of the “compound statement” feature.

Evade the MAX QUERIES PER HOUR limit

Suppose some administrator has said

GRANT ALL ON customer.* TO 'peter'@'localhost' WITH MAX_QUERIES_PER_HOUR 20;

Well, now, thanks again to the “compound statement” feature, I can evade that and do 1000 queries. Here’s the test I used:

SET @a = 0;
DELIMITER //
WHILE @a < 1000 DO INSERT INTO t VALUES (@a);
SET @a = @a + 1;
END WHILE;//
DELIMITER ;

No error. So with a little advance planning, I could put 1000 different statements in a user variable, pick off one at a time from within the loop, and execute. One way of looking at this is: the WHILE ... END WHILE is a single statement. Another way to look at this is: new features introduce new hassles for administrators. Such, however, is progress. I clapped for the compound-statement feature in an earlier blog post and ended with: "But MariaDB 10.1.1 is an early alpha, and nothing is guaranteed in an alpha, so it's too early to say that MariaDB is ahead in this respect." I'm glad to say that statement is obsolete now, because this is MariaDB 10.1.8, not early alpha but release candidate.

The true meaning of the OR REPLACE clause

MariaDB has decided to try to be consistent with CREATE and DROP statements, because frankly nobody could ever remember: which CREATE statements allow CREATE IF NOT EXISTS, which CREATE statements allow CREATE OR REPLACE, which DROP statements allow DROP IF EXISTS? I wrote a handy chart in MySQL Worklog#3129 Consistent Clauses in CREATE and DROP. Now it's obsolete. The MariaDB version of the chart will have a boring bunch of "yes"es in every row.

But OR REPLACE behaviour is just a tad un-Oracle-ish. The Oracle 12c manual's description is "Specify OR REPLACE to re-create the [object] if it already exists. You can use this clause to change the definition of an existing [object] without dropping, re-creating, and regranting object privileges previously granted on it." That's not what MariaDB is doing. MariaDB drops the object and then creates it again, in effect. You can see that because you need to have DROP privilege on the object in order to say CREATE OR REPLACE.

And here's where it gets a tad un-MySQL-ish too. If you say "CREATE OR REPLACE x ...,", causing the dropping of an existing x, and then say SHOW STATUS LIKE 'Com%', you'll see that the Com_drop_* counter is zero. That is: according to the privilege requirements, x is being dropped. But according to the SHOW statement, x is not being dropped. Decent folk wouldn't use SHOW anyway, so this won't matter.

An effect on us

One of the little features of ocelotgui (the Ocelot GUI application for MySQL and MariaDB) is that one can avoid using DELIMITER when typing in a statement. The program counts the number of BEGINs (or WHILEs or LOOPs etc.) and matches them against the number of ENDs, so it doesn't prematurely ship off a statement to the server until the user presses Enter after the final END. However, this feature is currently working only for compound statements within CREATE statements. Now that compound statements are stand-alone, this needs adjusting.

Now that I've mentioned ocelotgui again, I'll add that if you go to the https://github.com/ocelot-inc/ocelotgui download page and scroll past the install instructions, you'll find more pictures, and a URL of the debugger reference, for version 0.7 alpha.

General Purpose Storage Engines in MariaDB

MariaDB tries to encourage use of many storage engines, which Oracle doesn’t bother with. The ones that could be considered — occasionally! — as InnoDB replacements are: Mroonga, TokuDB, and Aria.

Mroonga

I predicted that Mroonga would be in MariaDB 10.0.8, and behold, here it is in MariaDB 10.0.15. Truly I am the Nostradamus of the database world.

It’s a general-purpose storage engine, in that it doesn’t have serious limitations regarding what can be stored or indexed. I think its “column-store” feature has limited appeal, but as I saw in January, it’s dandy for full-text searching. This time I look at the list of “Full-Text Restrictions” in the MySQL manual, and quote from it:

Full-text searches are supported for InnoDB and MyISAM tables only.” — Not true for Mroonga, of course.

Full-text searches are not supported for partitioned tables.” — True for Mroonga, which can’t handle partitioned tables at all.

… the utf8 character set can be used, but not the ucs2 [or utf16 or utf32] character set.” — True for Mroonga.

Therefore, the FULLTEXT parser cannot determine where words begin and end in [Chinese/Japanese/Korean i.e. CJK].” — Not true for Mroonga, which is just great with CJK.

… all columns in a FULLTEXT index must use the same character set and collation.” — True for Mroonga.

The argument to AGAINST() must be a string value that is constant during query evaluation.” — True for Mroonga.

For InnoDB, all DML operations (INSERT, UPDATE, DELETE) involving columns with full-text indexes are processed at transaction commit time.” — Technically not true for Mroonga, but Mroonga indexes will get out of synch if rollbacks occur.

TokuDB

In a previous escapade I found that TokuDB was good stuff but lacked two features: foreign keys and full-text search.

So, inspired by the fact that I can use Mroonga full-text searching with InnoDB, I wondered: wouldn’t it be great if I could use that trick on TokuDB too?

Well, I gave it a try, and have to say the result was a bit disappointing …

MariaDB [test]> CREATE TABLE t (
    ->   id INT NOT NULL DEFAULT '0',
    ->   `text_column` VARCHAR(100) CHARACTER SET utf8 DEFAULT NULL,
    ->   PRIMARY KEY (id),
    ->   FULLTEXT KEY `text_column` (text_column)
    -> ) ENGINE=mroonga DEFAULT CHARSET=latin1 COMMENT='engine "tokudb"'
    -> ;
ERROR 2013 (HY000): Lost connection to MySQL server during query

… Do not do this on a production system, as it will disable all your databases.

I also have a MariaDB 10.0.15 server that I built from source, and on that server TokuDB crashes even for simple everyday statements. I suppose this is like saying “Doctor my head hurts when I beat it against the wall”. But there’s no warning during the build and no warning in the documentation. Therefore it must be somebody else’s fault.

Aria

In 2013 Stewart Smith said about Aria: “It’s still not there and I don’t think it ever will be.” I didn’t see anything in 2014 that contradicts what he predicts, and the MariaDB people themselves say improvements are “on hold”, but it does have some advantages over MyISAM.

This is another storage engine with full-text support, but I rate that feature as “poor” because (like InnoDB) it won’t do CJK well, and (unlike InnoDB) it has some irritating default settings about minimum key length and maximum frequency.

It looks nice that I can create a SPATIAL index with Aria; however, I see that this is supposedly possible with InnoDB too, according to the recent announcement of MySQL 5.7.5.

Data can be represented in tables

Capability Mroonga TokuDB Aria InnoDB
Full-text Indexes Excellent No Poor Poor
Foreign Keys No No No Yes
Rollbacks No Yes No Yes
Spatial Indexes Yes No Yes Real Soon
Maximum Key Length 3072 3072 1000 767
Smallest Lock Column Row Table Row
Allows Partition Clauses No Yes Yes Yes
Recovers after ‘kill -9’ No Yes Yes Yes
Works on Windows Yes No Yes Yes

Or, if you want to go beyond general-purpose open-source storage engines like these, there are lots more to look at. The biggest list of MySQL-compatible storage engines that I know of is in the Spanish Wikipedia.

UPDATE NOTICE: The chart above has been corrected due to comments from a Mroonga developer and a TokuDB developer.

Dynamic compound statements in MariaDB

A long-ago-discussed and much-requested feature, “dynamic compound statements”, is working at last.

It’s been eleven years since the original discussion of dynamic compound statements for MySQL, as you can see by looking at the worklog task on the wayback machine. (As usual, you have to click the “high level architecture” box to see the meat of the specification.) The essential idea is that one can directly enter compound statements like BEGIN … END and conditional statements like “IF … END IF” and looping statements like “WHILE … END WHILE” without needing a CREATE PROCEDURE or CREATE FUNCTION statement.

The advantages are that one can run conditional or complex sequences of statements without needing an EXECUTE privilege, or affecting the database metadata. This has been a popular feature request, as one can see from bug#15037, bug#48777, bug#54000, and bug#61895. and bug#62679.

In 2013 Antony Curtis submitted a patch named “Compound / Anonymous statement blocks for MySQL”. (Anonymous blocks is the Oracle terminology.) Apparently he offered it to Oracle but they couldn’t agree about the licence terms. The MariaDB people were gladder to get the offer, so it’s now in the MariaDB 10.1.1 alpha.

I downloaded it from source. There’s a problem with “make install” but it’s easy to work around and has nothing to do with Mr Curtis’s patch. I had no trouble checking that the feature works as advertised. So, next, I compared the spec with Mr Curtis’s implementation.

Spec: Allow BEGIN … END “compound statements”, CASE, IF, LOOP, WHILE, and REPEAT. Implementation: all done. The only significant defect is that BEGIN has to be stated as BEGIN NOT ATOMIC, to avoid confusion with an old non-standard meaning of BEGIN. So “BEGIN DELETE FROM t; END” is illegal. And “label_x: BEGIN DELETE FROM t; END” is illegal. Only “BEGIN NOT ATOMIC DELETE FROM t; END” is legal. It’s a slight disappointment that no way was found to handle these little difficulties in the parser.

Spec: internally there will be a temporary anonymous procedure created for every compound statement. Implementation: This didn’t happen, at least not in a user-visible way. That’s a low-level detail so it doesn’t matter.

Spec: there should be no new privilege requirement. Implementation: there is no new privilege requirement.

Spec: the implied routine will have characteristics like MODIFIES SQL DATA and NOT DETERMINISTIC. Implementation: characteristics are irrelevant.

Spec: even if there is an anonymous procedure, it should not be visible to the user in information_schema.routines. Implementation: nothing is visible in information_schema.routines.

Spec: perhaps the statement should be in performance_schema.statements. Implementation: no.

Spec: some statements that are client-specific and are not allowed in stored procedures should not be in dynamic compound statements. Implementation: right. For example “USE database_name” is not allowed.

Spec: it’s uncertain whether a dynamic compound statement (which after all is a “statement”) should appear as a single statement in the slow log. Implementation: I didn’t test this because I don’t believe in the slow log; I assumed it doesn’t work.

Spec: if @@sql_mode is set within a dynamic compound statement, then it gets restored to its original value when the statement ends. Implementation: yes, it’s restored.

Spec: A dynamic compound statement may not contain statements that create or drop or alter routines. Implementation: right, statements like DROP PROCEDURE are illegal.

Spec: a dynamic compound statement cannot be used for a PREPARE statement, and therefore there is no fix for BUG#14115 “Prepare() with compound statements breaks”. Supposedly there could be parser-related pitfalls with such syntax. Implementation: PREPARE is allowed. Statements like PREPARE stmt1 FROM ‘BEGIN NOT ATOMIC DECLARE v INT; END’// are just fine. This is worrisome, but probably the supposed pitfalls were cleared up long ago.

Spec: SHOW STATUS could have new counters: Com_compound_statement, Com_if, Com_loop, Com_repeat, Com_while. Implementation: No, that’s not implemented.

Spec: SHOW PROCESSLIST will show the whole compound statement. Implementation: No, that’s not implemented. When I first started the performance_schema design I realized that SHOW PROCESSLIST would eventually become obsolete, so nowadays I think this part of the spec is obsolete.

The future

Given that dynamic compound statements are in DB2 and Oracle 12c and PostgreSQL and now in MariaDB alpha, Oracle/MySQL will look a bit slow if it waits another eleven years to implement them.

But MariaDB 10.1.1 is an early alpha, and nothing is guaranteed in an alpha, so it’s too early to say that MariaDB is ahead in this respect.

A glance at a MariaDB release candidate

Today I looked at the MariaDB Release Candidate wondering how my earlier predictions came out.

I predicted, for “roles”:

For all of the bugs, and for some of the flaws, there’s no worry — they’ll probably disappear.

In fact all the bugs are gone, and I belatedly realized (after some gentle nudges from a MariaDB employee) that some of the flaws weren’t flaws.
The inability to grant to PUBLIC still troubles me, but it looks like roles are ready to roll.

I predicted for “mroonga”:

At the time I’m writing this, MariaDB 10.0.8 doesn’t have mroonga yet.

In other words, I thought it would be in 10.0.8. It was not to be, as another MariaDB employee told me:

Unfortunately, mroonga appeared to have some portability problems. Most likely it won’t be in 10.0.8. There is a good chance it will be 10.0.9 though.

While I was looking, I tried out a new clause for the DELETE statement:
DELETE FROM t RETURNING select-list;
which returns the rows that just got deleted. It’s a new non-standard clause which increases MariaDB’s divergence from MySQL, so decent people should demand an explanation.

Oracle 12c looks similar but does something different:
DELETE FROM t RETURN|RETURNING expr [,expr...] INTO data-item [,data-item...];
SQL Server 2012 has a clause which looks different but does something similar:
DELETE FROM t OUTPUT select-list;
PostgreSQL has a clause which looks similar and does something similar:
DELETE FROM t RETURNING select-list;

So we’re seeing, yet again, the belief that PostgreSQL is a better model for MariaDB than the big boys are. By the way, the big boys are a bit more orthogonal — I’ve seen extensions similar to this one in UPDATE or MERGE statements. But the bigger orthogonality woe is that here we have a statement that returns a result set, but can’t be used in all the places where a result-set-returning sub-statement could work, like DECLARE CURSOR FOR DELETE, or INSERT … DELETE. Of course, I’m glad they don’t work. My point is only that now-it-works-now-it-doesn’t additions make the product look haphazard.

Update: How it’s done in standard SQL

This section was added on 2014-03-01, one week after the original post.
SQL:2011 has a non-core (optional) feature T495 “Combined data change and retrieval”. Ignoring a lot, the syntax is SELECT … FROM NEW|OLD|FINAL TABLE (INSERT|UPDATE|DELETE …) …. For example

 SELECT column1+5 FROM OLD TABLE (DELETE FROM t) AS t WHERE f = 77;

In this example table t is a “delta table”.
There’s an example with DECLARE CURSOR and FETCH in the DB2 10 for z/OS manual.

Mroonga and me and MariaDB

Chinese and Japanese and Korean (CJK) text usually has no spaces between words. Conventional full-text search does its tokenizing by looking for spaces. Therefore conventional full-text search will fail for CJK.

One workaround is bigrams. Suppose the text is

册免从冘

There should be three index keys, one for each two-character sequence:

册免, 免从, and 从冘.

Now, in a search like

SELECT * FROM t WHERE MATCH(text_column) AGAINST ('免从');

a bigram-supporting full-text index will have a chance. It’s wasteful and there will be false hits whenever the bigram isn’t really a “word”, but the folks in CJK-land have found that bigrams (or the three-character counterpart, trigrams) actually work.

One way to get bigrams for MySQL or MariaDB is to get mroonga.

Why care about Yet Another Storage Engine)?

Back in 2008 a project named Senna attracted the attention of my colleagues at MySQL, but didn’t go on to world conquest.

Since around 2011 the groonga project, billed as a “successor” to Senna, has been outing regular releases of a generic library that can connect to more than one DBMS. “Mroonga” is the “M[ySQL storage engine that interfaces with the] groonga [code]”. So, although it hasn’t been packaged with MySQL or MariaDB until now, it’s not all new or untested code. What’s new is that MariaDB will, real soon now, include mroonga as part of the regular MariaDB download. Poof, credibility at last.

I understand that mroonga has features that make it interesting:
* It’s a column-storage engine, which I suppose makes it an alternative to Infobright or InfiniDB
* Some of its users or developers also have an involvement with the Spider storage engine, which I suppose means there wouldn’t be too much trouble using the two engines in concert: mroonga for full-text and Spider for sharding.
… but I didn’t look at the column storage or the Spider compatibility. I cared only that SHOW ENGINES said it’s “CJK-ready fulltext search”.

Of course, SHOW ENGINES can also show MyISAM and InnoDB and SphinxSE, and they can all do full-text searching too. One might choose MyISAM because for a long time that was the only engine that had full-text, or choose InnoDB because it’s the default nowadays, or choose SphinxSE because it has lots of features including stemming. But those aren’t targeted for the CJK niche. For example, MariaDB has no bigrams. Sphinx’s documentation says it does have bigrams, but a glance shows instantly “those are not the bigrams you’re looking for”.

The only questions, then, are (a) does mroonga really handle full-text, and (b) does mroonga really handle CJK?

Does mroonga work?

At the time I’m writing this, MariaDB 10.0.8 doesn’t have mroonga yet. There are instructions for getting packages with Windows and Ubuntu etc., but such things never work for me — they’ll always be out of synch with something else that I’ve got, or with what the main vendor (Oracle or MariaDB in this case) is updating. So I decided to build from source. Getting mroonga was easy since I already have Bazaar:

bzr branch lp:~mroonga/maria/mroonga

and then build. Actually the initial build failed because mroonga required CMake 2.8.8 which doesn’t come with Ubuntu 12.04. But after I worked around that, I had a server and all I had to say was

                                                                             
INSTALL PLUGIN mroonga SONAME 'ha_mroonga.so';
CREATE FUNCTION last_insert_grn_id RETURNS INTEGER SONAME 'ha_mroonga.so';
SET GLOBAL mroonga_log_level=NONE;

I found more out by looking at some documentation, which is nowhere near Oracle quality but is mostly in readable English.

Then I was able to do this:

CREATE TABLE t (
  id INT NOT NULL DEFAULT '0',
  `text_column` VARCHAR(100) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (id),
  FULLTEXT KEY `text_column` (text_column)
) ENGINE=mroonga DEFAULT CHARSET=latin1 COMMENT='engine "innodb"'

It’s probably less than obvious that I’m looking at a good-looking feature. The point is: with this syntax I can use mroonga with an InnoDB table. Whoopie. Although that means I lose mroonga’s other features, I can use it without losing whatever I find good about InnoDB. And then I was able to do this:

SELECT * FROM t WHERE MATCH(text_column) AGAINST ('x');

Again it’s probably less than obvious that I’m looking at a good-looking feature. The MATCH … AGAINST syntax is idiosyncratic and unobvious — don’t expect this to arrive in an SQL standard near you any time soon. But it’s what the long-term MyISAM user is used to, so the transition isn’t painful.

Does mroonga handle CJK?

The J (Japanese) part is certainly there. The developers are Japanese. One of the optional extras is integration with MeCab which is a tool that can handle Japanese morphology — it’s like solving the “no spaces” problem by understanding a bit about the Japanese language, which after all is the solution that a human would use with no-spaces text. I don’t see, though, that there’s been equivalent attention paid for the C (Chinese) and K (Korean) parts of CJK. So I just looked at the bigrams, since they’re all that C or K could benefit from.

I created a million-row table containing randomly-chosen characters, mostly Kanji, but with a mix of Latin and Kana letters. How I made it is tedious, but perhaps somebody out there will want to know how to make randomly populated tables with such “data”, so I’ll put the code in a comment on this posting.

I had a choice between a bewildering variety of ways to tokenize. The default one, TokenBigram, did the job. The good news for me was that it didn’t do bigrams when the text was entirely Latin — that would be silly — but in a separate test I did see that it indexed Latin characters as ordinary words. So one of the worries that I had — that mroonga would be only good for CJK but not for non-CJK mixed inside the same text column — turned out not to be a worry.

And now, the essential point, the “l’essence du bigram” (which would look good on a restaurant menu, eh?) …

MariaDB [test]> SELECT COUNT(*) FROM t WHERE MATCH(text_column) AGAINST ('免从');
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

It found a row with a bigram! But is that the right count?

MariaDB [test]> SELECT COUNT(*) FROM t WHERE text_column LIKE '%免从%';
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
1 row in set (2.48 sec)

Yes, it’s the right count — a LIKE search confirms it.

The difference is that the LIKE search took 2.5 seconds because it had to scan a million rows. The mroonga full-text search took approximately zero seconds because it used an index. That doesn’t mean that mroonga is fast — not every search is a benchmark. I did find a graph that shows mroonga throughput is better than MyISAM’s or SphinxSE’s but probably if I looked hard I could find a graph that says the opposite.

And the crowd goes wild

Although I have done no benchmarks or bug hunts, I have acquired the impression that mroonga is capable of doing what its makers say it ought to be able to do.

Next page →