Dynamic compound statements in MariaDB

A long-ago-discussed and much-requested feature, “dynamic compound statements”, is working at last.

It’s been eleven years since the original discussion of dynamic compound statements for MySQL, as you can see by looking at the worklog task on the wayback machine. (As usual, you have to click the “high level architecture” box to see the meat of the specification.) The essential idea is that one can directly enter compound statements like BEGIN … END and conditional statements like “IF … END IF” and looping statements like “WHILE … END WHILE” without needing a CREATE PROCEDURE or CREATE FUNCTION statement.

The advantages are that one can run conditional or complex sequences of statements without needing an EXECUTE privilege, or affecting the database metadata. This has been a popular feature request, as one can see from bug#15037, bug#48777, bug#54000, and bug#61895. and bug#62679.

In 2013 Antony Curtis submitted a patch named “Compound / Anonymous statement blocks for MySQL”. (Anonymous blocks is the Oracle terminology.) Apparently he offered it to Oracle but they couldn’t agree about the licence terms. The MariaDB people were gladder to get the offer, so it’s now in the MariaDB 10.1.1 alpha.

I downloaded it from source. There’s a problem with “make install” but it’s easy to work around and has nothing to do with Mr Curtis’s patch. I had no trouble checking that the feature works as advertised. So, next, I compared the spec with Mr Curtis’s implementation.

Spec: Allow BEGIN … END “compound statements”, CASE, IF, LOOP, WHILE, and REPEAT. Implementation: all done. The only significant defect is that BEGIN has to be stated as BEGIN NOT ATOMIC, to avoid confusion with an old non-standard meaning of BEGIN. So “BEGIN DELETE FROM t; END” is illegal. And “label_x: BEGIN DELETE FROM t; END” is illegal. Only “BEGIN NOT ATOMIC DELETE FROM t; END” is legal. It’s a slight disappointment that no way was found to handle these little difficulties in the parser.

Spec: internally there will be a temporary anonymous procedure created for every compound statement. Implementation: This didn’t happen, at least not in a user-visible way. That’s a low-level detail so it doesn’t matter.

Spec: there should be no new privilege requirement. Implementation: there is no new privilege requirement.

Spec: the implied routine will have characteristics like MODIFIES SQL DATA and NOT DETERMINISTIC. Implementation: characteristics are irrelevant.

Spec: even if there is an anonymous procedure, it should not be visible to the user in information_schema.routines. Implementation: nothing is visible in information_schema.routines.

Spec: perhaps the statement should be in performance_schema.statements. Implementation: no.

Spec: some statements that are client-specific and are not allowed in stored procedures should not be in dynamic compound statements. Implementation: right. For example “USE database_name” is not allowed.

Spec: it’s uncertain whether a dynamic compound statement (which after all is a “statement”) should appear as a single statement in the slow log. Implementation: I didn’t test this because I don’t believe in the slow log; I assumed it doesn’t work.

Spec: if @@sql_mode is set within a dynamic compound statement, then it gets restored to its original value when the statement ends. Implementation: yes, it’s restored.

Spec: A dynamic compound statement may not contain statements that create or drop or alter routines. Implementation: right, statements like DROP PROCEDURE are illegal.

Spec: a dynamic compound statement cannot be used for a PREPARE statement, and therefore there is no fix for BUG#14115 “Prepare() with compound statements breaks”. Supposedly there could be parser-related pitfalls with such syntax. Implementation: PREPARE is allowed. Statements like PREPARE stmt1 FROM ‘BEGIN NOT ATOMIC DECLARE v INT; END’// are just fine. This is worrisome, but probably the supposed pitfalls were cleared up long ago.

Spec: SHOW STATUS could have new counters: Com_compound_statement, Com_if, Com_loop, Com_repeat, Com_while. Implementation: No, that’s not implemented.

Spec: SHOW PROCESSLIST will show the whole compound statement. Implementation: No, that’s not implemented. When I first started the performance_schema design I realized that SHOW PROCESSLIST would eventually become obsolete, so nowadays I think this part of the spec is obsolete.

The future

Given that dynamic compound statements are in DB2 and Oracle 12c and PostgreSQL and now in MariaDB alpha, Oracle/MySQL will look a bit slow if it waits another eleven years to implement them.

But MariaDB 10.1.1 is an early alpha, and nothing is guaranteed in an alpha, so it’s too early to say that MariaDB is ahead in this respect.

Temporary tables, standard SQL

The PostgreSQL manual says:

“The standard’s definition of the behavior of temporary tables is widely ignored. PostgreSQL’s behavior on this point is similar to that of several other SQL databases.”

The first sentence is false. The second sentence could be rephrased as “MySQL copied us”, although nobody else did, as far as I know.

The standard’s definition is widely followed

I base this claim on the documentation of DB2, Oracle Rdb, Oracle 12c, and Firebird.

I don’t say the standard is “universally followed” — for example SQL Server is out of step. I don’t say the whole standard is followed — only the subset related to global temporary tables. I don’t say the standard is the only thing that’s followed. But all these DBMSs support at least four, and sometimes all six, of these characteristics:

1. There is a syntax CREATE GLOBAL TEMPORARY TABLE <table-name> … The word GLOBAL means “the scope of the name is global”, that is, it’s recognized by all modules. Since most DBMSs don’t have multiple modules, the word GLOBAL is compulsory and the standard’s other type of temporary table, LOCAL, is usually not there. (I will mention, though, parenthetically, that LOCAL could optionally mean a temporary table is local to a stored procedure if it’s created in a stored procedure.)

2. If there is a non-temporary table already existing with the same name, then CREATE GLOBAL TEMPORARY TABLE fails.

3. There’s a distinction between the metadata — the definition of the table — and the data — the contents of the table.

4. The metadata is persistent; the data disappears when the session ends, or earlier.

5. The metadata is visible to all sessions in INFORMATION_SCHEMA or equivalent. The data is not visible to all sessions, and one sometimes calls the temporary table’s contents “session-specific data”.

6. There is an optional clause ON COMMIT {PRESERVE|DELETE} ROWS. When ON COMMIT DELETE ROWS is specified or is default, then the data disappears when COMMIT happens.

PostgreSQL’s behaviour is copied by MySQL

For PostgreSQL, the CREATE TEMPORARY TABLE statement arrived in PostgreSQL 6.5 which was released in June 1999. For MySQL, CREATE TEMPORARY TABLE arrived in MySQL 3.23 in August 1999.

That was before my time, but I’m a post-hoc-ergo-propter-hoc kind of guy, and I suspect that somebody was seduced by the logical fallacy that PostgreSQL has some standard SQL, therefore copy it. So the thing that’s presently in MySQL and MariaDB is CREATE TEMPORARY TABLE table_name …;

But these DBMSs support zero (0) of the six characteristics that I described for the standard-compliant DBMSs.

1. The specification of scope is missing.

2. If there is a non-temporary table already existing with the same name, then CREATE TEMPORARY TABLE succeeds, and the existing table is hidden.

3. There’s no distinction between the metadata — the definition of the table — and the data — the contents of the table.

4. The metadata is not persistent; the data disappears when the session ends, or earlier.

5. The metadata is not visible to all sessions in INFORMATION_SCHEMA or equivalent … Yes, I’m refusing to acknowledge the existence of the new table INNODB_TEMP_TABLE_INFO, for quality reasons.

6. There is no optional clause ON COMMIT {PRESERVE|DELETE} ROWS.

One funny result of this is that simple statements like SELECT * FROM table_name; can’t be fully checked or optimized at the time they’re checked within a stored procedure, because there’s no way to know what table_name is — maybe at execution time it will be the table that’s defined and visible in information_schema, but maybe it will be a temporary table which is only temporarily visible.

A less serious but irritating result is that, if you want to use a temporary table repeatedly, you have to re-create everything — find the original table definition somewhere, then add whatever indexes or views or triggers applied originally.

Is there any good news?

I wrote a worklog task about this long ago. It was WL#934 “Temporary tables in the standard way”. But it is a private Oracle spec, and is not in the Internet archive, so the general public can’t see it. There’s been no action on it that I know of. Some attention has been paid to feature request #58392 “Prevent TEMPORARY tables from shadowing regular tables” and feature request #20001 “Support for temp-tables in INFORMATION_SCHEMA”, but those are mere distractions.

However, it remains possible to implement standard global temporary tables without disrupting existing applications. The key is the syntax difference between

CREATE TEMPORARY TABLE ...;                    /* MySQL or MariaDB */

and

CREATE GLOBAL TEMPORARY TABLE ...;             /* standard */

They’re two different statements. So one could just say that “if and only if GLOBAL is specified, then the table has to have the standard rules”.

This isn’t due to luck. It happened because in olden days MySQL’s architects paid attention to the official ANSI/ISO framework requirements:

“In the Syntax Rules, the word shall defines conditions that are required to be true of syntactically-conforming SQL language … The treatment of language that does not conform to the SQL Formats and Syntax Rules is implementation-dependent.”

In other words, by allowing only the non-standard syntax CREATE TEMPORARY TABLE, MySQL is not violating the standard. On the other hand, the latest PostgreSQL release allows CREATE GLOBAL TEMPORARY TABLE, so PostgreSQL is violating the standard. To people who read my earlier blog post “Sometimes MySQL is more standards-compliant than PostgreSQL”, that won’t come as a shock.

What part of NoSQL don’t you understand?

In the word “NoSQL”, the letters “No” are an acronym so the meaning is “Not Only SQL” rather than “No SQL”. True or false?

Historically, it’s false

The first NoSQL product was a classic DBMS which didn’t happen to use SQL for a query language, featured in Linux Journal in 1999. Its current web page has a traffic-sign symbol of the word SQL with a bar through it, and a title “NoSQL: a non-SQL RDBMS”.

For a meetup in June 2009 about “open source, distributed, non-relational databases” the question came up “What’s a good name?” Eric Evans of Rackspace suggested NoSQL, a suggestion which he later regretted.

In October 2009 Emil Eifrem of NeoTechnology tweeted “I for one have tried REALLY HARD to emphasize that #nosql = Not Only SQL”. This appears to be the earliest recorded occurrence of the suggestion. In November 2009 Mr Eifrem declared with satisfaction that the suggestion was “finally catching on”.

In other words saying “No = Not Only” is an example of a backronym, and specifically it’s a class of backronym known as False Acronyms. There are lots of these around — like the idea that “cabal” stands for seventeenth-century English ministers named Clifford Arlington Buckingham Ashley and Lauderdale, or the myth that “posh” means “port out starboard home”. False acronyms might go back at least 1600 years, if the earliest example is the “Jesus Fish”. Anyway, backronyms are popular and fun, but this one’s false.

Currently it’s unaccepted

Regardless of etymology, NoSQL could currently mean Not Only SQL, if everyone accepted that. So, getting the new interpretation going is a matter of evangelism — tell everyone that’s what it means, and hope that they’ll tell their friends. Is that working?

I think it has worked among NoSQL insiders, but it’s tenuous. If it were generally believed, then the common spelling would be NOSQL rather than NoSQL, and the common pronunciation would be “En O Ess Cue Ell” rather than “No Sequel”. Although my sample size is small, I can say that I’ve rarely seen that spelling and I’ve rarely heard that pronunciation.

And the general public knows that “no-starch” products don’t contain more than starch, or that “no-goodniks” are in a class beyond goodness, and that the slogan “No means no!” is a response to proponents of sexual assaults. So, when confronted with the claim that no doesn’t mean no, there’s inevitable resistance or disgust.

But is it true?

It would still be okay to promote the idea that “NoSQL” equals “not only SQL” if the term described a group of products that are (a) SQL and (b) something other than SQL. This is where, in a rephrasing of the Wendy’s “Where’s The Beef?” commercial, the first question is: “Where’s the SQL?”

Back in 2009 the answers would have been pathetic, but nowadays there are “SQL” query languages that come from NoSQL vendors. CQL would qualify as an SQL product if the bar was low, like saying “If it MOVEs, it’s COBOL”. Impala would qualify in a more serious way, in fact most of the core-SQL tick-box items are tickable. (I’m believing the manual, I didn’t do vicious testing as I did with MySQL.) So, although they’re a minority, there seem to be NoSQL vendors who supply SQL.

But the second question is: what can these products do that can’t be done in SQL? Well, they can supply language extensions, and they can allow the SQL layer to be bypassed. However, so can MySQL or MariaDB — think of the navigation with HANDLER or direct access with HandlerSocket. In that case MySQL would be a NoSQL DBMS, and if that were true then the term NoSQL wouldn’t distinguish the traditional from the non-traditional products, and so it would be a useless word.

Therefore pretending that NoSQL means Not Only SQL is wrong in several ways, but insisting it still means No SQL is merciless. Perhaps an honest solution is to stop saying that the word’s parts mean anything. It’s just a name now, in the same way that “PostgreSQL” is just a name and you’ve forgotten what’s it’s a post of.

An open-source MySQL/MariaDB GUI client on Linux

We’ve written a GUI application. Its command-line options are like those in the mysql client. Its graphic features are an SQL-statement text editor and a scrollable SQL result set. It runs on Linux now and we believe it could be ported to other platforms.

Here are four screenshots.


front_end_screen1

The startup is as non-GUI as can be — in fact it gets options from the command line, or from my.cnf, the same way that the mysql client does. Wherever it seemed reasonable, we asked: What would mysql do?


front_end_screen_2

The statement (at the bottom of the screenshot) has the usual GUI features for editing, and has syntax highlighting — comments are green, reserved words are magenta, and so on.


front_end_screen_3

Here the result is appearing in a grid. The second column is long so it’s on multiple lines with its own scroll bars.


front_end_screen_4

Finally, here’s the same result after fooling around with cosmetics — the colours and fonts for each section of the screen are resettable via menu items, and the column widths can be changed by dragging.

The program is written in C++ and uses Qt. Qt is available on many operating systems including Mac and Windows but we only tried Linux. The license is GPL version 2. The status is alpha — there are bugs. Source and executable files can be downloaded from https://github.com/ocelot-inc/ocelotgui.

The BINARY and VARBINARY data types

MySQL’s support of the BINARY and VARBINARY data type is good, and the BINARY and VARBINARY data types are good things. And now the details. What applies here for MySQL applies for MariaDB as well.

Who supports VARBINARY

There’s an SQL:2008 standard optional feature T021 “BINARY and VARBINARY data types”. Our book has a bigger description, but here is one that is more up to date:

DBMS Standard-ish? Maximum length
DB2 for LUW No. Try CHAR FOR BIT DATA. 254 for fixed-length, 32672 for variable-length
DB2 for z/OS Yes. 255 for fixed-length, 32704 for variable-length
Informix No. Try CHAR. 32767
MySQL Yes. constrained by maximum row length = 65535
Oracle No. Try RAW. 2000 sometimes; 32767 sometimes
PostgreSQL No. Try BYTEA. theoretically 2**32 – 1
SQL Server Yes. 8000 for fixed-length. 2**31 -1 for variable length

Standard Conformance

Provided that sql_mode=strict_all_tables, MySQL does the right (standard) thing most of the time, with two slight exceptions and one exception that isn’t there.

The first exception:

CREATE TABLE t1 (s1 VARBINARY(2));
INSERT INTO t1 VALUES (X'010200');

… This causes an error. MySQL is non-conformant. If one tries to store three bytes into a two-byte target, but the third byte is X’00’, there should be no error.

The second exception:

SET sql_mode='traditional,pipes_as_concat';
CREATE TABLE t2 (s1 VARBINARY(50000));
CREATE TABLE t3 AS SELECT s1 || s1 FROM t2;

… This does not cause an error. MySQL is non-conformant. If a concatenation results in a value that’s longer than the maximum length of VARBINARY, which is less than 65535, then there should be an error.

The exception that isn’t there:
The MySQL manual makes an odd claim that, for certain cases when there’s a UNIQUE index, “For example, if a table contains ‘a’, an attempt to store ‘a\0’ causes a duplicate-key error.” Ignore the manual. Attempting to insert ‘a\0’ will only cause a duplicate-key error if the table’s unique-key column contains ‘a\0’.

The poem Antigonish desccribed a similar case:

Yesterday, upon the stair,
I met a man who wasn’t there.
He wasn’t there again today,
I wish, I wish he’d go away…”

The BINARY trap

Since BINARY columns are fixed-length, there has to be a padding rule. For example, suppose somebody enters zero bytes into a BINARY(2) target:

CREATE TABLE t4 (s1 BINARY(2));
INSERT INTO t4 VALUES (X'');
SELECT HEX(s1) FROM t4;

… The result is ‘0000’ — the padding byte for BINARY is X’00’ (0x00), not X’20’ (space).

There also has to be a rule about what to do for comparisons if comparands end with padding bytes.

CREATE TABLE t5 (s1 VARBINARY(2));
INSERT INTO t5 VALUES (X'0102');
SELECT * FROM t5 WHERE s1 = X'010200';

… This returns zero rows. It’s implementation-defined whether MySQL should
ignore trailing X’00’ during comparisons, so there was no chance of getting
it wrong.

The behaviour difference between BINARY and VARBINARY can cause fun:

CREATE TABLE t7 (s1 VARBINARY(2) PRIMARY KEY);
CREATE TABLE t8 (s1 BINARY(2), FOREIGN KEY (s1) REFERENCES t7 (s1));
INSERT INTO t7 VALUES (0x01);
INSERT INTO t8 SELECT s1 FROM t7;

… which fails on a foreign-key constraint error! It looks bizarre
that a value which is coming from the primary-key row can’t be put
in the foreign-key row, doesn’t it? But the zero-padding rule, combined
with the no-ignore-zero rule, means this is inevitable.

BINARY(x) is a fine data type whenever it’s certain that all the values
will be exactly x bytes long, and otherwise it’s a troublemaker.

When to use VARBINARY

VARBINARY is better than TINYBLOB or MEDIUMBLOB because it has a definite
maximum size, and that makes life easier for client programs that want to
know: how wide can the display be? In most DBMSs it’s more important that BLOBs can be stored separately from the rest of the row.

VARBINARY is better than VARCHAR if there should be no validity checking.
For example, if the default character set is UTF8 then this is illegal:

CREATE TABLE t9 (s1 VARCHAR(5));
INSERT INTO t9 VALUES (0xF4808283);

… but this is legal because character set doesn’t matter:

CREATE TABLE t10 (s1 VARBINARY(5));
INSERT INTO t10 VALUES (0xF4808283);

(I ran into this example on a SQL Server forum where the participants display woeful ignorance of Unicode).

And finally converting everything to VARBINARY is one way to avoid the annoying message “Invalid mix of collations”. In fact the wikimedia folks appear to have changed all VARCHARs to VARBINARYs back in 2011 just to avoid that error. I opine that the less drastic solution is to use collations consistently, but I wasn’t there.

MySQL 5.7 supports the GB18030 Chinese Character Set

My former boss at MySQL sent out a notice that MySQL 5.7.4 now supports the GB18030 character set, thus responding to requests that have been appearing since 2005. This is a big deal because the Chinese government demands GB18030 support, and because the older simplified-Chinese character sets (gbk and gb2312) have a much smaller repertoire (that is, they have too few characters). And this is real GB18030 support — I can define columns and variables with CHARACTER SET GB18030. That’s rare — Oracle 12c and SQL Server 2012 and PostsgreSQL 9.3 can’t do it. (They allow input from GB18030 clients but they convert it immediately to Unicode.) Among big-time DBMSs, until now, only DB2 has treated GB18030 as a first-class character set.

Standard Adherence

We’re talking about the current version of the standard, GB18030-2005 “IT Chinese coded character set”, especially its description of 70,244 Chinese characters. I couldn’t puzzle out the Chinese wording in the official document, all I could do was use translate.google.com on some excerpts. But I’ve been told that the MySQL person who coded this feature is Chinese, so they’ll have had better luck. What I could understand was what are the difficult characters, what are the requirements for a claim of support, and what the encoding should look like. From the coder’s comments, it’s clear that part was understood. I did not check whether there was adherence for non-mandatory parts, such as Tibetan script.

Conversions

The repertoire of GB18030 ought to be the same as the Unicode repertoire. So I took a list of every Unicode character, converted to GB18030, and converted back to Unicode. The result in every case was the same Unicode character that I’d started with. That’s called “perfect round tripping”. As I explained in an earlier blog post “The UTF-8 World Is Not Enough”, storing Chinese characters with a Chinese character set has certain advantages. Well, now the biggest disadvantage has disappeared.

Hold on — how is perfect round tripping possible, given that MySQL frequently refers to Unicode 4.0, and some of the characters in GB18030-2005 are only in Unicode 4.1? Certainly that ought to be a problem according to the Unicode FAQ and this extract from Ken Lunde’s book. But it turns out to be okay because MySQL doesn’t actually disallow those characters — it accepts encodings which are not assigned to characters. Of course I believe that MySQL should have upgraded the Unicode support first, and added GB18030 support later. But the best must not be an enemy of the good.

Also the conversions to and from gb2312 work fine, so I expect that eventually gb2312 will become obsolete. It’s time for mainland Chinese users to consider switching over to gb18030 once MySQL 5.7 is GA.

Collations

The new character set comes with three collations: one trivial, one tremendous, one tsk, tsk.

The trivial collation is gb18030_bin. As always the bin stands for binary. I expect that as always this will be the most performant collation, and the only one that guarantees that no two characters will ever have the same weight.

The tremendous collation is gb18030_unicode_520_ci. The “unicode_520” part of the name really does mean that the collation table comes from “Unicode 5.2” and this is the first time that MySQL has taken to heart the maxim: what applies to the superset can apply to the subset. In fact all MySQL character sets should have Unicode collations, because all their characters are in Unicode. So to test this, I went through all the Unicode characters and their GB18030 equivalents, and compared their weights with WEIGHT_STRING:
WEIGHT_STRING(utf32_char COLLATE utf32_unicode_520_ci) to
WEIGHT_STRING(gb18030_char COLLATE gb18030_unicode_520_ci).
Every utf32 weight was exactly the same as the gb18030 weight.

The tsk, tsk collation is gb18030_chinese_ci.

The first bad thing is the suffix chinese_ci, which will make some people think that this collation is like gb2312_chinese_ci. (Such confusion has happened before for the general_ci suffix.) In fact there are thousands of differences between gb2312_chinese_ci and gb18030_chinese_ci. Here’s an example.

mysql> CREATE TABLE t5
    ->  (gb2312 CHAR CHARACTER SET gb2312 COLLATE gb2312_chinese_ci,
    ->   gb18030 CHAR CHARACTER SET gb18030 COLLATE gb18030_chinese_ci);
Query OK, 0 rows affected (0.22 sec)

mysql> INSERT INTO t5 VALUES ('[','['),(']',']');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT DISTINCT gb2312 from t5 ORDER BY gb2312;
+--------+
| gb2312 |
+--------+
| ]      |
| [      |
+--------+
2 rows in set (0.00 sec)

mysql> SELECT DISTINCT gb18030 from t5 ORDER BY gb18030;
+---------+
| gb18030 |
+---------+
| [       |
| ]       |
+---------+
2 rows in set (0.00 sec)

See the difference? The gb18030 order is obviously better — ‘]’ should be greater than ‘[‘ — but when two collations are wildly different they shouldn’t both be called “chinese_ci”.

The second bad thing is the algorithm. The new chinese_ci collation is based on pinyin for Chinese characters, and binary comparisons of the UPPER() values for non-Chinese characters. This is pretty well useless for non-Chinese. I can bet that somebody will observe “well, duh, it’s a Chinese character set” — but I can’t see why one would use an algorithm for Latin/Greek/Cyrillic/etc. characters that’s so poor. There’s a Common Locale Data Repository for tailoring for Chinese, there are MySQL worklog tasks that explain the brave new world, there’s no need to invent an idiolect when there’s a received dialect.

Documentation

The documentation isn’t up to date yet — there’s no attempt to explain what the new character set and its collations are about, and no mention at all in the FAQ.

But the worklog task WL#4024: gb18030 Chinese character set gives a rough idea of what the coder had in mind before starting. It looks as if WL#4024 was partly copied from http://icu-project.org/docs/papers/unicode-gb18030-faq.html so that’s also worth a look.

For developers who just need to know what’s going on now, just re-read this blog post. What I’ve described should be enough for people who care about Chinese.

I didn’t look for bugs with full-text or LIKE searches, and I didn’t look at speed. But I did look hard for problems with the essentials, and found none. Congratulations are due.

Update on July 17 2014: A source at MariaDB tells me that they’ve worked on GB18030 too. And the MySQL folks have closed a GB18030 bug that ‘Y’ = tilde” in the Chinese collation — not realizing that it’s a duplicate of a bug that I reported seven years ago.

MySQL versus Firebird

Firebird is an open-source DBMS with a long history and good SQL support. Although I measured Firebird’s importance as smaller than MySQL’s or MariaDB’s, it exists, and might grow a bit when Firebird becomes the default LibreOffice DBMS.

I decided to compare the current versions of MySQL 5.6 and Firebird SQL 2.5. I only looked at features that end users can see, without benchmarking. Preparing for the comparison was easy. I looked at the instructions for downloading Firebird with Ubuntu and within 15 minutes I was entering SQL statements. The documentation is disorganized but when in doubt I could guess what to do, since the Firebird folks care about standards.

I’ll organize the discussion according to the ANSI/ISO feature list. I’ll skip the features which both products support equally, which means I’ll skip the great majority, because both products are very similar. Then: For every feature where MySQL is better, MySQL gets a point. For every feature where Firebird is better, Firebird gets a point. The only way to be objective is to be arbitrary.

firebirddolphin

Mandatory Features

Feature Firebird Point MySQL Point Explanation
E011-03 DECIMAL and NUMERIC types 1 Firebird maximum number of digits = 18. MySQL maximum = 65.
E021-02 CHARACTER VARYING data type 1 Firebird maximum VARCHAR length = 32767. MySQL maximum = 65535.
E021-07 Character concatenation 1 Firebird concatenation operator is ||. MySQL has CONCAT() by default.
E031-01 Delimited identifiers 1 Firebird delimiter “” means case-sensitive. MySQL has no viable equivalent.
E041-04 Basic foreign key 1 In Firebird one can use syntax variants that MySQL/InnoDB ignores.
E101-03 Searched UPDATE statement 1 Firebird has fixed a small flaw. MySQL still has the flaw.
E141-08 CHECK clause 1 MySQL doesn’t have this.
F031-03 GRANT statement 1 1 Only Firebird can grant to PUBLIC. But only MySQL can grant TRIGGER. One point each.
F051-03 TIMESTAMP data type 1 Firebird can have 4 digits for seconds precision, MySQL can have 6.
T321-01 User-defined SQL functions 1 Firebird only has procedures, and they’re PLSQL syntax. MySQL has standard syntax.
9075-11 information_schema 1 Firebird doesn’t have this.

firebirddolphin

Optional Features

Feature Firebird Point MySQL Point Explanation
F121-01 GET DIAGNOSTICS 1 Firebird doesn’t have this.
F200 TRUNCATE 1 Firebird doesn’t have this.
F201 CAST function 1 In Firebird I can cast to (among other data types) VARCHAR, SMALLINT, INT, BIGINT, NUMERIC, TIMESTAMP. In MySQL, I can’t.
F221 Explicit defaults 1 Both products allow DEFAULT in CREATE statements, but only MySQL allows DEFAULT in UPDATE or INSERT statements.
F251 Domain support 1 MySQL doesn’t have this.
F312 Merge statement 1 MySQL doesn’t have a MERGE statement, although it does have near equivalents.
F391 Long identifiers 1 Firebird maximum length = 31. MySQL maximum length = 64. Both are sub-standard.
F401-02 Full outer join 1 MySQL doesn’t have this.
F461 Named character sets 1 See Note #1.
F531 Temporary tables 1 Firebird allows CREATE GLOBAL TEMPORARY. MySQL has temporary tables in a non-standard way.
F690 Collation support 1 See Note #2.
T021 BINARY and VARBINARY data types 1 Firebird doesn’t have this, although it does allow specifying a character set as OCTETS,which is comparable to MySQL’s “binary”.
T121 WITH (excluding RECURSIVE) in query 1 Firebird recently added WITH support. MySQL doesn’t have it. Incidental note: misinformation has been spread about MySQL’s history for this feature, for example the first comment on this blog post. The fact is that in 2004 MySQL decided to support CONNECT BY, not WITH. Then it didn’t implement it.
T131 Recursive query 1 MySQL doesn’t have this.
T141 SIMILAR predicate 1 MySQL doesn’t have this.
T171 LIKE clause in table definition 1 Firebird doesn’t have this.
T172 AS clause in table definition 1 Firebird doesn’t have this.
T176 Sequence generator support 1 Firebird only supports a “CREATE SEQUENCE sequence-name” statement, but that’s enough for a point.
T211 Basic trigger capability See the table in an earlier post but MySQL has more standard CREATE TRIGGER syntax. Can’t decide.
T281 SELECT privilege with column granularity 1 Firebird doesn’t have this.
T331 Basic roles 1 Firebird has 50% support. MySQL has nothing. MariaDB roles are beta.

Note #1: Re character sets: Ignoring duplicates, alias names, and slight variations: Firebird multi-byte character sets are gbk, sjis, utf-8, big5, gb2312, eucjp, ksc 5601, gb18030. MySQL multi-byte character sets are gbk, sjis, utf-8, big5, gb2312, eucjp, euckr, utf-16, utf-32. As I put it in an earlier blog post, “The UTF-8 world is not enough”. MySQL has more, so it gets the point.

Note #2: Re collations: Although Firebird deserves special mention for supporting a slightly later version of the Unicode Collation Algorithm, the only multi-byte character set for which Firebird has many collations is UTF-8. Firebird has only four collations for UTF-8: ucs_basic (code point order), and unicode / unicode_ci / unicode_ci_ai (variations of UCA order with and without case sensitivity or accent sensitivity). MySQL has additionally: Croatian, Czech, Danish, Estonian, German, Hungarian, Iceland, Latvian, Lithuanian, Persian, Polish, Romanian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Turkish, Vietnamese. Incidental side note: guess whom the Firebird manual points to for extra documentation about its character sets? None other than the guy who coded MySQL’s collations, a certain A. Barkov, formerly of MySQL, now of MariaDB.

Total points for Firebird: 16. Total points for MySQL: 16.

Update: in an earlier edition of this post, I broke the tie by giving Firebird a delay-of-game penalty on the basis that the makers started talking about version 3.0 in 2011, and version 3.0 is still alpha, and I concluded “MySQL wins by a score of 16 to 16”. That wasn’t serious. The fair way to break the tie is to decide which features should have more weight.

Deprecating MySQL Features

If you haven’t read the Wikipedia article on deprecation, you might not know it’s from the Latin word deprecare, meaning “to ward off (a disaster) by prayer.” That pretty well describes the quandary for any software product that has a big established user base. Is it better to keep junk features, in case removing them will cause disasters for the old users? Or is it better to remove junk features, to avoid ridicule and to make the manual slimmer for the new users?

Historically MySQL has deprecated with caution, and usually MySQL has followed these steps:
(1) In version X, for any SQL statement that appears to use the feature, return a warning that the feature “has been deprecated and may be removed in a future release”.
(2) Also in version X, put a notification in the manual. Notifications have a regular pattern so they’re easy to find by looking in the “What’s new” section, such as http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-4.html, and http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-4.html.
Alternatively, one can look for the “nutshell” pages, such as https://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html and https://dev.mysql.com/doc/refman/5.5/en/mysql-nutshell.html. Or, in googlespeak, site:dev.mysql.com/doc “what’s new” “deprecated”. Notice that I am pointing only to the MySQL manual, because MariaDB does not appear to have an equivalent (the MariaDB “deprecation policy” page only talks about supported platforms).
(3) In version X+1 or X+2, actually remove the feature and start throwing syntax errors.

My all-time favourite deprecation was of the TIMESTAMP(M) data type before version 4.1. The M stood for “display width”, for example, with TIMESTAMP(6) you’d only see the first few digits. We realized that was in conflict with the ANSI/ISO standard, so it became deprecated in 4.x, became a syntax error in 5.x, and then — years after the original warning — the syntax became legal again with a different meaning, so now the 6 in TIMESTAMP(6) means “fractional seconds”. Because there was a period when the syntax was simply illegal, it was practically impossible to move an old application to the new format without noticing anything.

These are my nominations for favourite deprecations in 5.6 and 5.7.

SHOW AUTHORS.
SHOW AUTHORS was an “Easter egg”. Naturally I reported it as such at the time because Easter eggs are more than bugs, they are vandalism. Eventually the feature got documented, but the smell of rotten eggs was in my nose for years afterward, so I’m relieved it’s gone.

INFORMATION_SCHEMA.PROFILING
Faithful readers of my blog posts will of course recall me saying in 2008

A bit of a caveat: I was totally against adding this feature, and some of my bias might show. But I’m not trying to argue about it. I do believe that once we add a feature we should explain it, and all I’m trying to do is explain here.

… which had nothing to do with the quality of the user contribution. People who know me as the original crusader for PERFORMANCE_SCHEMA might guess that I saw this feature as out of step.

INSERT DELAYED
INSERT DELAYED wasn’t junk per se. The problem with little non-standard performance-enhancing niblets is that they’re always non-orthogonal. People sense that. So the bug report / feature request for — why not? — “UPDATE DELAYED” got about 15 votes. Adding the EVENTS feature didn’t quell the protesters, because any rational person could see that, if INSERT DELAYED was okay syntax, UPDATED DELAYED should also be. (This makes two posts in a row where I’m railing about lack of orthogonality — I brought it up in my blog post last week about MariaDB’s DELETE … RETURNING feature. By the way, I updated that post to show what decent standard-fearing folk would do instead of copying PostgreSQL.) So, to stop the protests that there’s a precedent, get rid of the precedent. Well done.

What these deprecations are telling me is that somebody at MySQL at last has the authority and determination to clean the garage. The product is not just getting new good features, it’s losing old bad ones.

A glance at a MariaDB release candidate

Today I looked at the MariaDB Release Candidate wondering how my earlier predictions came out.

I predicted, for “roles”:

For all of the bugs, and for some of the flaws, there’s no worry — they’ll probably disappear.

In fact all the bugs are gone, and I belatedly realized (after some gentle nudges from a MariaDB employee) that some of the flaws weren’t flaws.
The inability to grant to PUBLIC still troubles me, but it looks like roles are ready to roll.

I predicted for “mroonga”:

At the time I’m writing this, MariaDB 10.0.8 doesn’t have mroonga yet.

In other words, I thought it would be in 10.0.8. It was not to be, as another MariaDB employee told me:

Unfortunately, mroonga appeared to have some portability problems. Most likely it won’t be in 10.0.8. There is a good chance it will be 10.0.9 though.

While I was looking, I tried out a new clause for the DELETE statement:
DELETE FROM t RETURNING select-list;
which returns the rows that just got deleted. It’s a new non-standard clause which increases MariaDB’s divergence from MySQL, so decent people should demand an explanation.

Oracle 12c looks similar but does something different:
DELETE FROM t RETURN|RETURNING expr [,expr...] INTO data-item [,data-item...];
SQL Server 2012 has a clause which looks different but does something similar:
DELETE FROM t OUTPUT select-list;
PostgreSQL has a clause which looks similar and does something similar:
DELETE FROM t RETURNING select-list;

So we’re seeing, yet again, the belief that PostgreSQL is a better model for MariaDB than the big boys are. By the way, the big boys are a bit more orthogonal — I’ve seen extensions similar to this one in UPDATE or MERGE statements. But the bigger orthogonality woe is that here we have a statement that returns a result set, but can’t be used in all the places where a result-set-returning sub-statement could work, like DECLARE CURSOR FOR DELETE, or INSERT … DELETE. Of course, I’m glad they don’t work. My point is only that now-it-works-now-it-doesn’t additions make the product look haphazard.

Update: How it’s done in standard SQL

This section was added on 2014-03-01, one week after the original post.
SQL:2011 has a non-core (optional) feature T495 “Combined data change and retrieval”. Ignoring a lot, the syntax is SELECT … FROM NEW|OLD|FINAL TABLE (INSERT|UPDATE|DELETE …) …. For example

 SELECT column1+5 FROM OLD TABLE (DELETE FROM t) AS t WHERE f = 77;

In this example table t is a “delta table”.
There’s an example with DECLARE CURSOR and FETCH in the DB2 10 for z/OS manual.

MySQL’s Date Support Is Not Absurd

MySQL has unfortunate rules for dates with two-digit years, for the date range, for post-decimal digits, for timestamp defaults … But other DBMSs can be equally bizarre

Two-digit years

Start with two-digit years. Summarizing from the manual:

If the two-digit value is <= 69, add 2000. Else add 1900. For example, if it's 65, then it's 2065.

It’s an arbitrary extra detail that users have to learn. But PostgreSQL has exactly the same rule. And three other DBMSs — DB2, Oracle, Firebird — have an even more complex rule, saying in effect that the interpretation will vary depending what year it is when you do the input. SQL Server is even worse, it allows users to change the rule slightly. Only the SQL standard has a rule that’s easy to learn and never changes: you have to enter four-digit years. Time for a modernized-English Shakespeare quote:

Hamlet: Why was he sent into England?
First Clown: Why, because he was mad. He shall recover his wits there, or, if he does not, it is no great matter there.
Hamlet: Why?
First Clown: It will not be seen in him there. There the men are as mad as he.

Applying the clown’s logic, we see that MySQL’s behaviour, in context, is no great matter.

Range

Different DBMSs have different minimum dates, though they mostly have the same maximum.

                                Minimum                   Maximum
Oracle 12c(timestamp)           4713-01-01 BC Julian      9999-12-31                          
SQL Server 2012 (datetime)      1753-01-01 AD Gregorian   9999-12-31
SQL Server 2012 (datetime2)     0001-01-01 AD Gregorian   9999-12-31
DB2 9 (timestamp)               0001-01-01 AD Gregorian   9999-12-31    
Firebird                        0100-01-01 AD             32768-02-29 
PostgreSQL (timestamp)          4714-11-24 BC Gregorian   294276-12-31
MySQL (datetime)                1000-01-01 AD Gregorian   9999-12-31

At first glance it looks as if MySQL is out of step here — everybody else can store dates before 1000 AD. (SQL Server was out of step too, until Microsoft
introduced the datetime2 data type in 2008.) But actually MySQL can store dates before that. Just don’t try calculating on them, goes the recommendation. However, Oracle, and Oracle alone, switches to the Julian calendar for dates before October 1582. I think that’s becoming the conventional thing to do, even though (sob) it’s not standard SQL. Therefore, yes, MySQL is out of step, but so is everybody else except Oracle.

Post-decimal digits

MySQL version 5.6.4 saw the implementation of WL#946 TIME/TIMESTAMP/DATETIME with fractional seconds. Once again Oracle has hidden the task description but once again it’s visible on wayback (as always, click “High Level Architecture” to see what the real spec is). In this case the implementation does not match the original spec. MySQL only supports 6 post-decimal digitsr (microseconds), and the same is true for MariaDB.

How bad is that? Let’s compare:

                                     Maximum Fraction
DB2 9.7                              .999999999999                 
Oracle 12c                           .999999999
SQL Server 2012                      .99999999
PostgreSQL 9.3                       .999999
MySQL 5.6                            .999999
Firebird                             .9999

I look at that comparison and exclaim “MySQL is behind the Big Three Closed-Source DBMSs”. But I acknowledge that others could look at the same comparison and shrug “Oh well, it’s as good as the other open-source DBMSs”. Merging the opinions, I’ll say: post-decimal digit support is not good but, since 5.6.4, not bizarre.

Yes, But

Yes, but what about the fact that, in MySQL, the date 0000-00-00 is legal but becomes NULL if you retrieve it via ODBC or if you assign NULL to a TIMESTAMP NOT NULL, the result is the current timestamp and other equally odd-looking rules?

Here I can’t defend MySQL on the basis that some other DBMS does it. These are MySQL innovations. And I have never heard somebody from outside MySQL say that they have been helpful in a serious project. I can only say, well, at least they’re avoidable.

The claim is just that MySQL date support is not bizarre, in relative terms. The other DBMSs have quirks too. It’s perhaps inevitable in this imperfect world which refuses to go around the sun in a regular and divisible-by-10 number of days.