Category Archives: Standard SQL


Standard SQL/JSON and MySQL 5.7 JSON

Support for storing and querying JSON within SQL is progressing for the ANSI/ISO SQL Standard, and for MySQL 5.7. I'll look at what's new, and do some comparisons.

The big picture

The standard document says

The SQL/JSON path language is a query language used by certain SQL operators (JSON_VALUE, JSON_QUERY, JSON_TABLE, and JSON_EXISTS, collectively known as the SQL/JSON query operators) to query JSON text.The SQL/JSON path language is not, strictly speaking, SQL, though it is embedded in these operators within SQL. Lexically and syntactically, the SQL/JSON path language adopts many features of ECMAScript, though it is neither a subset nor a superset of ECMAScript.The semantics of the SQL/JSON path language are primarily SQL semantics.

Here is a chart that shows the JSON-related data types and functions in the standard, and whether a particular DBMS has something with the same name and a similar functionality.

Standard                 Oracle SQL Server MySQL
--------                 ------ ---------- -----
Conventional data type   YES    YES        NO
JSON_VALUE function      YES    YES        NO
JSON_EXISTS function     YES    NO         NO
JSON_QUERY function      YES    YES        NO
JSON_TABLE function      YES    NO         NO

My source re the standard is a draft copy of ISO/IEC 9075-2 SQL/Foundation. For Oracle 12c read Oracle's whitepaper. For SQL Server 2016 read MSDN's blog. My source re MySQL 5.7 is the MySQL manual and the latest source-code download of version 5.7.9.

Now, what is the significance of the lines in the chart?

Conventional data type

By "conventional", I mean that in standard SQL JSON strings should be stored in one of the old familiar data types: VARCHAR, CLOB, etc. It didn't have to be this way, and any DBMS that supports user-defined types can let users be more specific, but that's what Oracle and Micosoft accept.

MySQL 5.7, on the other hand, has decided that JSON shall be a new data type. It's closely related to LONGTEXT: if you say

  CREATE TABLE j1 AS SELECT UPPER(CAST('{}' AS JSON));
  SHOW CREATE TABLE j1;

then you get LONGTEXT. But if you use the C API to ask the data type, you get MYSQL_TYPE_JSON=245 (aside: this is not documented). And it differs because, if you try to put in non-JSON data, you get an error message.

At least, that's the theory. It didn't take me long to find a way to put non-JSON data in:

  CREATE TABLE j2a (s1 INT, s2 JSON);
  INSERT INTO j2a VALUES  (1,'{"a": "VALID STUFF"}');
  CREATE TABLE j2b AS SELECT s1, UPPER(s2) AS s2 FROM j2a;
  INSERT INTO j2b VALUES (NULL, 'INVALID STUFF');
  ALTER TABLE j2b MODIFY COLUMN s2 JSON;

... That unfortunately works, and now if I say "SELECT * FROM j2b;" I get an error message "The JSON binary value contains invalid data". Probably bugs like this will disappear soon, though.

By making a new data type, MySQL has thrown away some of the advantages that come with VARCHAR or TEXT. One cannot specify a maximum size -- everything is like LONGTEXT. One cannot specify a preferred character set and collation -- everything is utf8mb4 and utf8mb4_bin. One cannot take advantage of all the string functions -- BIN() gives meaningless results, for example. And the advantage of automatic validity checking could have been delivered with efficient constraints or triggers instead. So why have a new data type?

Well, PostgreSQL has a JSON data type. As I've noticed before, PostgreSQL can be a poor model if one wants to follow the standard. And it will not surprise me if the MariaDB folks also decide to make a JSON data type, because I know that they are following similar logic for an "IP address" data type.

By the way, the validity checking is fairly strict. For example, '{x:3}' is considered invalid because quote marks are missing, and '{"x":.2} is considered invalid because the value has no leading digit.

JSON_VALUE function

For an illustration and example it's enough to describe the standard's JSON_VALUE and MySQL's JSON_EXTRACT.

The standard idea is: pass a JSON string and a JavaScript-like expression, get back an SQL value, which will generally be a scalar value. For example,

  SELECT JSON_VALUE(@json, @path_string) FROM t;
  SELECT JSON_VALUE(json_column_name, 'lax $.c') AS c FROM t;

There are optional clauses for deciding what to do if the JSON string is invalid, or contains missing and null components. Again, the standard's JSON_VALUE is what Oracle and Microsoft accept. There's some similarity to what has gone before with SQL/XML.

MySQL, on the other hand, accomplishes some similar things with JSON_EXTRACT. For example,

  SELECT JSON_EXTRACT(@json, @path_string);
  SELECT JSON_VALUE(json_column_name, '$.c') AS c FROM t;

And the result is not an ordinary MySQL scalar, it has type = JSON. In the words of physicist I.I.Rabi when confronted with a new particle, "Who ordered that?"

Well, JSON_EXTRACT and some of the other MySQL functions have fairly close analogues, in both name and functionality, with Google's BigQuery and with SQLite. In other words, instead of the SQL standard, MySQL has ended up with something like the NoSQL No-standard.

I should stress here that MySQL is not "violating" the SQL standard with JSON_EXTRACT. It is always okay to use non-standard syntax. What's not okay is to use standard syntax for a non-standard purpose. And here's where I bring in the slightly dubious case of the "->" operator. In standard SQL "->", which is called the "right arrow" operator, has only one purpose: dereferencing. In MySQL "->" has a different purpose: a shorthand for JSON_EXTRACT. Since MySQL will never support dereferencing, there will never be a conflict in practice. Nevertheless, technically, it's a violation.

Observed Behaviour

When I tried out the JSON data type with MySQL 5.7.9, I ran into no exciting bugs, but a few features.

Consistency doesn't apply for INSERT IGNORE and UPDATE IGNORE. For example:

CREATE TABLE t1 (date DATE, json JSON);
INSERT IGNORE INTO t1 (date) VALUES ('invalid date');
INSERT IGNORE INTO t1 (json) VALUES ('{invalid json}');

The INSERT IGNORE into the date column inserts null with a warning, the INSERT IGNORE into the json column returns an error.

Some error messages might still need adjustment. For example:

CREATE TABLE ti (id INT, json JSON) PARTITION BY HASH(json);

Result: an error message = "A BLOB field is not allowed in partition function".

Comparisons of JSON_EXTRACT results don't work. For example:

SET @json = '{"a":"A","b":"B"}';
SELECT GREATEST(
       JSON_EXTRACT(@json,'$.a'),
       JSON_EXTRACT(@json,'$.b'));

The result is a warning "This version of MySQL doesn't yet support 'comparison of JSON in the LEAST and GREATEST operators'", which is a symptom of the true problem, that JSON_EXTRACT returns a JSON value instead of a string value. The workaround is:

SET @json = '{"a":"A","b":"B"}';
SELECT GREATEST(
       CAST(JSON_EXTRACT(@json,'$.a') AS CHAR),
       CAST(JSON_EXTRACT(@json,'$.b') AS CHAR));

... which returns "B" -- a three-character string, including the quote marks.

Not The End

The standard might change, and MySQL certainly will change anything that's deemed wrong. Speaking of wrong, I might have erred too. And I certainly didn't give justice to all the other details of MySQL 5.7 JSON.

Meanwhile

The Ocelot GUI client for MySQL and MariaDB is still version 0.8 alpha, but since the last report there have been bug fixes and improvements to the Help option. Have a look at the new manual by going to https://github.com/ocelot-inc/ocelotgui and scrolling down till you see the screenshots and the words "User Manual".

What is a data type?

I'd suppose that these statements are generally accepted:

A data type is a set of values.

So any value that belongs to that set is said to "have" or "belong to" that data type.

And any column which is defined as "of" that data type is going to contain values which have that data type, only.

For example a column C may be defined as of type INTEGER and will only contain values which belong to the set of integers.

And now for the exceptions, details, caveats, and errors.

What is synonymous with data type?

There are no synonyms; no other term should ever be used.

Not datatype. Oracle's manual has "datatype" but IBM's doesn't and Microsoft's doesn't, and it's not in the standard.

Not domain. C.J.Date (Date on Database, Writings 2000-2006) says

I observe that some people don't understand even yet that domains and types are the same thing. The SQL standard muddies the water, too, because it used the term DOMAIN to mean something else ..."

From your perspective the implication should be: you can't use domain as a synonym because users of SQL products will think you mean something else.

Not class. Although some people uses class and type interchangeably, class can also mean a set of objects (rather than just a set of values), so it's a less precise term.

What does the standard say

Move now to what's in the SQL:2011 standard document. We find:

A data type is a set of representable values. Every representable value belongs to at least one data type and some belong to several data types.

This is restating the common definition -- "a data type is a set of values" -- without significant limitations. By saying "representable", the standard seems to be merely accepting that a DBMS shouldn't include what it can't store or display. By saying "several data types", it's acknowledging that the value denoted by the literal 9999 could fit in either INTEGER or SMALLINT.

Does a data type include length, character set, etc.?

Suppose the definition of a column is

COLUMN_NAME CHARACTER(32)

Is the data type CHARACTER, or is the data type CHARACTER(32)?

This is a difficult question. The best answer I can give, which I'll claim is what the SQL:2011 standard would say if it could, is: "CHARACTER(32)".

Yes the name of the data type is CHARACTER. The standard only defines a small set of data type names -- CHARACTER, CHARACTER VARYING, CHARACTER LARGE OBJECT, BINARY, BINARY VARYING, BINARY LARGE OBJECT, NUMERIC, DECIMAL, SMALLINT, INTEGER, BIGINT, FLOAT, REAL, DOUBLE PRECISION, BOOLEAN, DATE, TIME, TIMESTAMP, INTERVAL -- and of course "CHARACTER" is on that list but "CHARACTER(32)" is not, it is not a name.

On the other hand -- and I think the standard's vocabulary choice is unfortunate -- the data type's name does not identify the data type! The data type is actually identified by a "data type descriptor" which is defined with the entire <data type> clause, and which, as well as name, includes precision, character set, etc.

That is not explicitly stated, so I'll support the interpretation with several quotes which show that it's implicit.

"A data type is predefined even though the user is required (or allowed) to provide certain parameters when specifying it (for example the precision of a number)."

"Two data types,T1 and T2, are said to be compatible if T1 is assignable to T2,T2 is assignable to T1, and their descriptors include the same data type name."

"The data types “CHARACTER(n) CHARACTER SET CS1” and “CHARACTER(m) CHARACTER SET CS2”,where CS1 ≠ CS2, have descriptors that include the same data type name (CHARACTER), but are not mutually assignable"

"If a <data type> is specified, then let [the data type of a domain] be the data type identified by <data type>."

"If <data type> is specified, then [the declared type of the column] is that data type."

"If the maximum length of the subject data type is fixed [then default literals shall not be too long]."

"All data types in [destination strings] shall be character string, and all of them shall have the same character repertoire."

"[If the result of an operation is] not exactly representable with the precision and scale of the result data type [then it's an error]."

"A site declared with a character string type may be specified as having a collation, which is treated as part of its data type."

... I quote them merely for the sake of showing that length and character set and repertoire are all part of data type, and that the <data type> clause is what specifies or identifies the data type, and that two different data types can both have the same data type name.

This does not mean that CHARACTER(32) is a subtype of CHARACTER, because the standard is clear that predefined data types cannot have subtypes or supertypes. (User-defined types can have subtypes or supertypes; however, for user-defined types questions like "is the data type character(32)?" don't arise.)

This does not mean that saying "the data type is CHARACTER" is wrong, because, whenever we're talking abstractly or generally, that's appropriate.

It does mean that "data type is CHARACTER(32)" is much more correct when the definition is known.

What about funny-looking values?

If a data type is supposed to be a set of values, but NULLs are allowed, then what is the set?

Taking the line that NULL represents UNKNOWN, and using MySQL/MariaDB limits for SMALLINT, we could say that the set is {-32768, -32767, -32766, ... 0, ..., +32767} -- NULL is somewhere in that enumeration, we just don't happen to know where. But NULL could mean other things besides UNKNOWN, and sometimes it definitely does -- think of ROLLUP. The standard just says "Every data type has a special value, called the null value, denoted by the keyword NULL." The set then is {NULL, -32768, -32767, -32766, ... 0, ..., +32767}.

If the limits of the floating-point sets are supposed to be "based on the IEEE standard", as the MySQL manual says they are, then where are NaN (not a number) and INF (pseudo-infinity)?

This has been the subject of feature requests like Bug#57519 Documentation for IEEE 754 compliance is missing from Daniël van Eeden, and Bug#41442 Support ieee754 by supporting INF and NaN from Mark Callaghan.

Bug#41442, incidentally, says "the SQL spec doesn't allow [INF and NaN]". I'd put it less severely -- the standard effectively disallows them for all the standard predefined numeric data types including FLOAT or DOUBLE PRECISION (for example it says that an approximate-numeric literal must have the form <mantissa>E<exponent>). The standard does not forbid new implementation-defined data types.

Other specific effects on our favourite DBMSs

As for "what is a data type in MySQL (or by extension in MariaDB)?", the
correct answer is: whatever the MySQL or MariaDB documentation says. There used to be arguments about this, but I think they're all in the past now.

Equally, there's no point arguing about how TINYBLOB etc. should have been specified in terms of maximum length or precision, or about how ENUM should have been a [VAR]CHAR plus a constraint. The more feasible feature request would be along the lines of: more complete support for the standard types.

Temporary tables, standard SQL

The PostgreSQL manual says:

"The standard's definition of the behavior of temporary tables is widely ignored. PostgreSQL's behavior on this point is similar to that of several other SQL databases."

The first sentence is false. The second sentence could be rephrased as "MySQL copied us", although nobody else did, as far as I know.

The standard's definition is widely followed

I base this claim on the documentation of DB2, Oracle Rdb, Oracle 12c, and Firebird.

I don't say the standard is "universally followed" -- for example SQL Server is out of step. I don't say the whole standard is followed -- only the subset related to global temporary tables. I don't say the standard is the only thing that's followed. But all these DBMSs support at least four, and sometimes all six, of these characteristics:

1. There is a syntax CREATE GLOBAL TEMPORARY TABLE <table-name> ... The word GLOBAL means "the scope of the name is global", that is, it's recognized by all modules. Since most DBMSs don't have multiple modules, the word GLOBAL is compulsory and the standard's other type of temporary table, LOCAL, is usually not there. (I will mention, though, parenthetically, that LOCAL could optionally mean a temporary table is local to a stored procedure if it's created in a stored procedure.)

2. If there is a non-temporary table already existing with the same name, then CREATE GLOBAL TEMPORARY TABLE fails.

3. There's a distinction between the metadata -- the definition of the table -- and the data -- the contents of the table.

4. The metadata is persistent; the data disappears when the session ends, or earlier.

5. The metadata is visible to all sessions in INFORMATION_SCHEMA or equivalent. The data is not visible to all sessions, and one sometimes calls the temporary table's contents "session-specific data".

6. There is an optional clause ON COMMIT {PRESERVE|DELETE} ROWS. When ON COMMIT DELETE ROWS is specified or is default, then the data disappears when COMMIT happens.

PostgreSQL's behaviour is copied by MySQL

For PostgreSQL, the CREATE TEMPORARY TABLE statement arrived in PostgreSQL 6.5 which was released in June 1999. For MySQL, CREATE TEMPORARY TABLE arrived in MySQL 3.23 in August 1999.

That was before my time, but I'm a post-hoc-ergo-propter-hoc kind of guy, and I suspect that somebody was seduced by the logical fallacy that PostgreSQL has some standard SQL, therefore copy it. So the thing that's presently in MySQL and MariaDB is CREATE TEMPORARY TABLE table_name ...;

But these DBMSs support zero (0) of the six characteristics that I described for the standard-compliant DBMSs.

1. The specification of scope is missing.

2. If there is a non-temporary table already existing with the same name, then CREATE TEMPORARY TABLE succeeds, and the existing table is hidden.

3. There's no distinction between the metadata -- the definition of the table -- and the data -- the contents of the table.

4. The metadata is not persistent; the data disappears when the session ends, or earlier.

5. The metadata is not visible to all sessions in INFORMATION_SCHEMA or equivalent ... Yes, I'm refusing to acknowledge the existence of the new table INNODB_TEMP_TABLE_INFO, for quality reasons.

6. There is no optional clause ON COMMIT {PRESERVE|DELETE} ROWS.

One funny result of this is that simple statements like SELECT * FROM table_name; can't be fully checked or optimized at the time they're checked within a stored procedure, because there's no way to know what table_name is -- maybe at execution time it will be the table that's defined and visible in information_schema, but maybe it will be a temporary table which is only temporarily visible.

A less serious but irritating result is that, if you want to use a temporary table repeatedly, you have to re-create everything -- find the original table definition somewhere, then add whatever indexes or views or triggers applied originally.

Is there any good news?

I wrote a worklog task about this long ago. It was WL#934 "Temporary tables in the standard way". But it is a private Oracle spec, and is not in the Internet archive, so the general public can't see it. There's been no action on it that I know of. Some attention has been paid to feature request #58392 "Prevent TEMPORARY tables from shadowing regular tables" and feature request #20001 "Support for temp-tables in INFORMATION_SCHEMA", but those are mere distractions.

However, it remains possible to implement standard global temporary tables without disrupting existing applications. The key is the syntax difference between

CREATE TEMPORARY TABLE ...;                    /* MySQL or MariaDB */

and

CREATE GLOBAL TEMPORARY TABLE ...;             /* standard */

They're two different statements. So one could just say that "if and only if GLOBAL is specified, then the table has to have the standard rules".

This isn't due to luck. It happened because in olden days MySQL's architects paid attention to the official ANSI/ISO framework requirements:

"In the Syntax Rules, the word shall defines conditions that are required to be true of syntactically-conforming SQL language ... The treatment of language that does not conform to the SQL Formats and Syntax Rules is implementation-dependent."

In other words, by allowing only the non-standard syntax CREATE TEMPORARY TABLE, MySQL is not violating the standard. On the other hand, the latest PostgreSQL release allows CREATE GLOBAL TEMPORARY TABLE, so PostgreSQL is violating the standard. To people who read my earlier blog post "Sometimes MySQL is more standards-compliant than PostgreSQL", that won't come as a shock.

The BINARY and VARBINARY data types

MySQL's support of the BINARY and VARBINARY data type is good, and the BINARY and VARBINARY data types are good things. And now the details. What applies here for MySQL applies for MariaDB as well.

Who supports VARBINARY

There's an SQL:2008 standard optional feature T021 "BINARY and VARBINARY data types". Our book has a bigger description, but here is one that is more up to date:

DBMS Standard-ish? Maximum length
DB2 for LUW No. Try CHAR FOR BIT DATA. 254 for fixed-length, 32672 for variable-length
DB2 for z/OS Yes. 255 for fixed-length, 32704 for variable-length
Informix No. Try CHAR. 32767
MySQL Yes. constrained by maximum row length = 65535
Oracle No. Try RAW. 2000 sometimes; 32767 sometimes
PostgreSQL No. Try BYTEA. theoretically 2**32 - 1
SQL Server Yes. 8000 for fixed-length. 2**31 -1 for variable length

Standard Conformance

Provided that sql_mode=strict_all_tables, MySQL does the right (standard) thing most of the time, with two slight exceptions and one exception that isn't there.

The first exception:

CREATE TABLE t1 (s1 VARBINARY(2));
INSERT INTO t1 VALUES (X'010200');

... This causes an error. MySQL is non-conformant. If one tries to store three bytes into a two-byte target, but the third byte is X'00', there should be no error.

The second exception:

SET sql_mode='traditional,pipes_as_concat';
CREATE TABLE t2 (s1 VARBINARY(50000));
CREATE TABLE t3 AS SELECT s1 || s1 FROM t2;

... This does not cause an error. MySQL is non-conformant. If a concatenation results in a value that's longer than the maximum length of VARBINARY, which is less than 65535, then there should be an error.

The exception that isn't there:
The MySQL manual makes an odd claim that, for certain cases when there's a UNIQUE index, "For example, if a table contains 'a', an attempt to store 'a\0' causes a duplicate-key error." Ignore the manual. Attempting to insert 'a\0' will only cause a duplicate-key error if the table's unique-key column contains 'a\0'.

The poem Antigonish desccribed a similar case:

Yesterday, upon the stair,
I met a man who wasn't there.
He wasn't there again today,
I wish, I wish he'd go away..."

The BINARY trap

Since BINARY columns are fixed-length, there has to be a padding rule. For example, suppose somebody enters zero bytes into a BINARY(2) target:

CREATE TABLE t4 (s1 BINARY(2));
INSERT INTO t4 VALUES (X'');
SELECT HEX(s1) FROM t4;

... The result is '0000' -- the padding byte for BINARY is X'00' (0x00), not X'20' (space).

There also has to be a rule about what to do for comparisons if comparands end with padding bytes.

CREATE TABLE t5 (s1 VARBINARY(2));
INSERT INTO t5 VALUES (X'0102');
SELECT * FROM t5 WHERE s1 = X'010200';

... This returns zero rows. It's implementation-defined whether MySQL should
ignore trailing X'00' during comparisons, so there was no chance of getting
it wrong.

The behaviour difference between BINARY and VARBINARY can cause fun:

CREATE TABLE t7 (s1 VARBINARY(2) PRIMARY KEY);
CREATE TABLE t8 (s1 BINARY(2), FOREIGN KEY (s1) REFERENCES t7 (s1));
INSERT INTO t7 VALUES (0x01);
INSERT INTO t8 SELECT s1 FROM t7;

... which fails on a foreign-key constraint error! It looks bizarre
that a value which is coming from the primary-key row can't be put
in the foreign-key row, doesn't it? But the zero-padding rule, combined
with the no-ignore-zero rule, means this is inevitable.

BINARY(x) is a fine data type whenever it's certain that all the values
will be exactly x bytes long, and otherwise it's a troublemaker.

When to use VARBINARY

VARBINARY is better than TINYBLOB or MEDIUMBLOB because it has a definite
maximum size, and that makes life easier for client programs that want to
know: how wide can the display be? In most DBMSs it's more important that BLOBs can be stored separately from the rest of the row.

VARBINARY is better than VARCHAR if there should be no validity checking.
For example, if the default character set is UTF8 then this is illegal:

CREATE TABLE t9 (s1 VARCHAR(5));
INSERT INTO t9 VALUES (0xF4808283);

... but this is legal because character set doesn't matter:

CREATE TABLE t10 (s1 VARBINARY(5));
INSERT INTO t10 VALUES (0xF4808283);

(I ran into this example on a SQL Server forum where the participants display woeful ignorance of Unicode).

And finally converting everything to VARBINARY is one way to avoid the annoying message "Invalid mix of collations". In fact the wikimedia folks appear to have changed all VARCHARs to VARBINARYs back in 2011 just to avoid that error. I opine that the less drastic solution is to use collations consistently, but I wasn't there.

MySQL versus Firebird

Firebird is an open-source DBMS with a long history and good SQL support. Although I measured Firebird's importance as smaller than MySQL's or MariaDB's, it exists, and might grow a bit when Firebird becomes the default LibreOffice DBMS.

I decided to compare the current versions of MySQL 5.6 and Firebird SQL 2.5. I only looked at features that end users can see, without benchmarking. Preparing for the comparison was easy. I looked at the instructions for downloading Firebird with Ubuntu and within 15 minutes I was entering SQL statements. The documentation is disorganized but when in doubt I could guess what to do, since the Firebird folks care about standards.

I'll organize the discussion according to the ANSI/ISO feature list. I'll skip the features which both products support equally, which means I'll skip the great majority, because both products are very similar. Then: For every feature where MySQL is better, MySQL gets a point. For every feature where Firebird is better, Firebird gets a point. The only way to be objective is to be arbitrary.

firebirddolphin

Mandatory Features

Feature Firebird Point MySQL Point Explanation
E011-03 DECIMAL and NUMERIC types 1 Firebird maximum number of digits = 18. MySQL maximum = 65.
E021-02 CHARACTER VARYING data type 1 Firebird maximum VARCHAR length = 32767. MySQL maximum = 65535.
E021-07 Character concatenation 1 Firebird concatenation operator is ||. MySQL has CONCAT() by default.
E031-01 Delimited identifiers 1 Firebird delimiter "" means case-sensitive. MySQL has no viable equivalent.
E041-04 Basic foreign key 1 In Firebird one can use syntax variants that MySQL/InnoDB ignores.
E101-03 Searched UPDATE statement 1 Firebird has fixed a small flaw. MySQL still has the flaw.
E141-08 CHECK clause 1 MySQL doesn't have this.
F031-03 GRANT statement 1 1 Only Firebird can grant to PUBLIC. But only MySQL can grant TRIGGER. One point each.
F051-03 TIMESTAMP data type 1 Firebird can have 4 digits for seconds precision, MySQL can have 6.
T321-01 User-defined SQL functions 1 Firebird only has procedures, and they're PLSQL syntax. MySQL has standard syntax.
9075-11 information_schema 1 Firebird doesn't have this.

firebirddolphin

Optional Features

Feature Firebird Point MySQL Point Explanation
F121-01 GET DIAGNOSTICS 1 Firebird doesn't have this.
F200 TRUNCATE 1 Firebird doesn't have this.
F201 CAST function 1 In Firebird I can cast to (among other data types) VARCHAR, SMALLINT, INT, BIGINT, NUMERIC, TIMESTAMP. In MySQL, I can't.
F221 Explicit defaults 1 Both products allow DEFAULT in CREATE statements, but only MySQL allows DEFAULT in UPDATE or INSERT statements.
F251 Domain support 1 MySQL doesn't have this.
F312 Merge statement 1 MySQL doesn't have a MERGE statement, although it does have near equivalents.
F391 Long identifiers 1 Firebird maximum length = 31. MySQL maximum length = 64. Both are sub-standard.
F401-02 Full outer join 1 MySQL doesn't have this.
F461 Named character sets 1 See Note #1.
F531 Temporary tables 1 Firebird allows CREATE GLOBAL TEMPORARY. MySQL has temporary tables in a non-standard way.
F690 Collation support 1 See Note #2.
T021 BINARY and VARBINARY data types 1 Firebird doesn't have this, although it does allow specifying a character set as OCTETS,which is comparable to MySQL's "binary".
T121 WITH (excluding RECURSIVE) in query 1 Firebird recently added WITH support. MySQL doesn't have it. Incidental note: misinformation has been spread about MySQL's history for this feature, for example the first comment on this blog post. The fact is that in 2004 MySQL decided to support CONNECT BY, not WITH. Then it didn't implement it.
T131 Recursive query 1 MySQL doesn't have this.
T141 SIMILAR predicate 1 MySQL doesn't have this.
T171 LIKE clause in table definition 1 Firebird doesn't have this.
T172 AS clause in table definition 1 Firebird doesn't have this.
T176 Sequence generator support 1 Firebird only supports a "CREATE SEQUENCE sequence-name" statement, but that's enough for a point.
T211 Basic trigger capability See the table in an earlier post but MySQL has more standard CREATE TRIGGER syntax. Can't decide.
T281 SELECT privilege with column granularity 1 Firebird doesn't have this.
T331 Basic roles 1 Firebird has 50% support. MySQL has nothing. MariaDB roles are beta.

Note #1: Re character sets: Ignoring duplicates, alias names, and slight variations: Firebird multi-byte character sets are gbk, sjis, utf-8, big5, gb2312, eucjp, ksc 5601, gb18030. MySQL multi-byte character sets are gbk, sjis, utf-8, big5, gb2312, eucjp, euckr, utf-16, utf-32. As I put it in an earlier blog post, "The UTF-8 world is not enough". MySQL has more, so it gets the point.

Note #2: Re collations: Although Firebird deserves special mention for supporting a slightly later version of the Unicode Collation Algorithm, the only multi-byte character set for which Firebird has many collations is UTF-8. Firebird has only four collations for UTF-8: ucs_basic (code point order), and unicode / unicode_ci / unicode_ci_ai (variations of UCA order with and without case sensitivity or accent sensitivity). MySQL has additionally: Croatian, Czech, Danish, Estonian, German, Hungarian, Iceland, Latvian, Lithuanian, Persian, Polish, Romanian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Turkish, Vietnamese. Incidental side note: guess whom the Firebird manual points to for extra documentation about its character sets? None other than the guy who coded MySQL's collations, a certain A. Barkov, formerly of MySQL, now of MariaDB.

Total points for Firebird: 16. Total points for MySQL: 16.

Update: in an earlier edition of this post, I broke the tie by giving Firebird a delay-of-game penalty on the basis that the makers started talking about version 3.0 in 2011, and version 3.0 is still alpha, and I concluded "MySQL wins by a score of 16 to 16". That wasn't serious. The fair way to break the tie is to decide which features should have more weight.

Tuples

"It is better to keep silence and be thought a fool, than to say 'tuple' and remove all doubt."

But recently people have been using the word "tuple" more frequently. Doubtless all those people know that in relational databases a tuple (or a tuple value) is a formal term for a row of a table. It's possible to know a bit more than that.

Pronounced Tyoople, Toople, or Tuhple?

The Oxford Dictionaries site says Tyoople. Other dictionaries are neutral about the terms from which Tuple was derived (sextuple, octuple, etc.), for example Merriam-Webster says they usually end in Toople but Tuhple is an accepted alternate, and the Oxford Canadian Dictionary says it's always Tuhple. So the question comes down to: what is the proper way in a database context?

I found one book that says Tuple rhymes with Scruple, that is, it's Toople: Rod Stephens, Beginning Database Design Solutions. But Mr Stephens also tells us that tables are called relations because "the values of a row are related", so I'm wary about him.

I found four books that say Tuple rhymes with Couple, that is, it's Tuhple:

  • David Kroenke, Database Processing: Fundamentals, Design and Implementation
  • Kevin Loney, Oracle9i The Complete Reference
  • Donald Burleson, Oracle High-Performance SQL Tuning
  • Paul Nielsen, SQL Server 2005 Bible

Then I found the decisive one:
C.J.Date, An Introduction To Database Systems.
I quote:

The relational model therefore does not use the term "record" at all; instead it uses the term "tuple" (rhymes with "couple").

Since C.J.Date had many conversations with the The Founder (E.F.Codd), and Mr Codd would have been certain to correct Mr Date if he had mispronounced, this is decisive. Most writers in the field, including the one who ought to know, are saying that it rhymes with couple.

Wait a minute -- couple was originally a French word, and the French would use an oo sound, what about that? A good explanation, although it's based on analogy, is that some Middle English words with oo (like blood and flood) changed in stages from the oo sound to the uh sound in the centuries following the Great English Vowel Shift. See the Wikipedia article about Phonological history of English high back vowels. So the reply to people who say "etymologically it was an oo sound" would be "yes, but oo changed to uh as part of a trend, get modern".

But what if it's a non-relational tuple?

Quoting C.J.Date again (from The Relational Database Dictionary):

NOTE: Tuples as defined in the relational model differ in certain respects from the mathematical construct of the same name. In particular, tuples in mathematics typically don't have named attributes; instead, their attributes are identified by their ordinal position, left to right.

Aha. So if a sequence of values doesn't have a corresponding header with a sequence of column names, it shouldn't be called a row (that would be relational) but it could be called a tuple -- provided it's not in a relational database. In practice that's seems to be a fairly common usage, but I'll highlight the products where it seems to be the preferred usage.

  • Tuple Spaces.
    The modest idea of just filling a space with tuples, and calling it a
    "tuple space", started off in 1982 with a language named Linda. Since then the general concept has gotten into various Java implementations.
  • Python.
    The tuple is a supported data type that's part of the Python core.
    Of course there are similar things in other languages but I believe
    that Python is the most prominent language that actually calls it a
    tuple.
  • Pig.
  • FoundationDB.
  • Tarantool.
    Actually one of my current projects is enhancing Tarantool's documentation, which is what led me to wonder about the word.

The MySQL manual usually avoids the term, although it's more frequent with NDB.

Alas

Recently I saw a poll with a single question:

What is to be Done? (a) Nothing (b) Something

I think that (a) won, hurrah. And yet it would have been a finer world if everyone had agreed that "tuple" meant a sequence of values, "record" meant a sequence of values which had fixed types, and "row" meant a sequence of values which had both fixed types and fixed names. If only Mr Codd had left the vocabulary alone ...

Sometimes MySQL is more standards-compliant than PostgreSQL

Here are examples comparing MySQL 5.6 against PostgreSQL 9.3 Core Distribution, where MySQL seems to comply with "standard SQL" more closely than PostgreSQL does. The examples are also true for MariaDB 10.0 so whenever I say "MySQL" I mean "MySQL and/or MariaDB". When I say "more closely" I do not mean that MySQL is completely compliant, or that PostgreSQL is completely non-compliant.

Identifiers

Example:

CREATE TABLE          ŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽ (s1 INT); /* 32-character name */
SELECT COUNT(*) FROM information_schema.tables
  WHERE table_name = 'ŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽŽ';
SELECT COUNT(*) FROM information_schema.tables
  WHERE table_name LIKE LOWER('Ž%');

Result:
PostgreSQL says count(*) is 0. MySQL says it's 1.
Reason:
(1) PostgreSQL maximum identifier length is 63 bytes; MySQL maximum identifier length is 64 characters. The standard requirement is 128 characters.
(2) With PostgreSQL, if you insert an invalid value, PostgreSQL truncates -- it "tries to make do" rather than failing. (If the name had been too long for MySQL, it would have thrown an error.)
(3) PostgreSQL does not convert to lower case during CREATE, and does a case-sensitive search during SELECT.

Character Sets And Collations

Example:

CREATE TABLE t (s1 CHAR(1) CHARACTER SET utf16);

Result:
PostgreSQL returns an error. MySQL succeeds.
Reason:
PostgreSQL does not support named character sets; it can only handle one character set per database.

Example:

CREATE TABLE t (s1 CHAR(2), s2 VARCHAR(2));
INSERT INTO t VALUES ('y ','y ');
SELECT * FROM t WHERE s1 = 'y';
SELECT * FROM t WHERE s2 = 'y';

Result:
PostgreSQL finds one row for the first SELECT but zero rows for the second SELECT. MySQL finds one row both times.
Reason:
PostgreSQL does not always add spaces to the shorter comparand, or remove spaces from the longer comparand. MySQL is consistent. The behaviour is optional, but it is not supposed to depend on the data type.

Example:

CREATE TABLE t (s1 CHAR(1), uca CHAR(4),utf32 CHAR(6),utf8 CHAR(8),name VARCHAR(50));
INSERT INTO t VALUES (U&'\+003044','3197','  3044','  E38184','HIRAGANA LETTER I');
INSERT INTO t VALUES (U&'\+003046','3198','  3046','  E38186','HIRAGANA LETTER U');
INSERT INTO t VALUES (U&'\+0030A4','3197','  30A4','  E382A4','KATAKANA LETTER I');
INSERT INTO t VALUES (U&'\+0030A5','3198','  30a5','  E382A5','KATAKANA LETTER SMALL U');
INSERT INTO t VALUES (U&'\+00FF72','3197','  FF72','  EFBDB2','HALFWIDTH KATAKANA LETTER I');
INSERT INTO t VALUES (U&'\+00042B','1AF1','  042B','    D0AB','CYRILLIC CAPITAL LETTER YERU');
INSERT INTO t VALUES (U&'\+0004F9','1AF5','  04F9','    D3B9','CYRILLIC SMALL LETTER YERU WITH DIAERESIS');
INSERT INTO t VALUES (U&'\+0004CF','1B4A','  04CF','    D38F','CYRILLIC SMALL LETTER PALOCHKA');
INSERT INTO t VALUES (U&'\+002C13','1B61','  2C13','  E2B093','GLAGOLITIC CAPITAL LETTER RITSI');
INSERT INTO t VALUES (U&'\+0100CC','3872',' 100CC','F090838C','LINEAR B IDEOGRAM B240 WHEELED CHARIOT');
SELECT * FROM t ORDER BY s1 COLLATE "C",uca DESC,name;
SELECT * FROM t ORDER BY s1 COLLATE "POSIX",uca DESC,name;
SELECT * FROM t ORDER BY s1 COLLATE "C.UTF-8",uca DESC,name;
SELECT * FROM t ORDER BY s1 COLLATE "en_CA.utf8",uca DESC,name;
SELECT * FROM t ORDER BY s1 COLLATE "default",uca DESC,name;
SELECT * FROM t ORDER BY s1 COLLATE "ucs_basic",uca DESC,name;
SELECT * FROM t ORDER BY s1 COLLATE "zh_CN",uca DESC,name;

Result:
With PostgreSQL, no matter which collation one chooses, one does not get a linguistic standard ordering. Here is a typical result:
psql_result
With MySQL, if one enters the same data (albeit in a different way), and chooses collation utf8mb4_unicode_520_ci, one gets a standard result.
Reason:
PostgreSQL depends on the operating system for its collations. In this case my Linux operating system offered me only 5 collations which were really distinct. I did not attempt to customize or add more. I tried all the ones that were supplied, and failed to get a result which would match the Unicode Collation Algorithm order (indicated by the 'uca' column in the example). This matters because the standard does ask for a UNICODE collation "in which the ordering is determined by applying the Unicode Collation Algorithm with the Default Unicode Collation Element Table [DUCET]". MySQL is a cross-platform DBMS and does not depend on the operating system for its collations. So, out of the box and for all platform versions, it has about 25 distinct collations for 4-byte UTF8. One of them is based on the DUCET for Unicode 5.2.

MySQL's character set and collation support is excellent in some other respects, but I'll put off the paeans for another post. Here I've just addressed a standard matter.

Views

Example:

CREATE TABLE t (s1 INT);
CREATE VIEW v AS SELECT * FROM t WHERE s1 < 5 WITH CHECK OPTION;

Result:
PostgreSQL returns an error. MySQL succeeds.
Reason:
MySQL has some support for core standard feature F311-04 "Create view: with check option". PostgreSQL does not.

Transactions

Example:

START TRANSACTION;
CREATE TABLE t (s1 SMALLINT);
INSERT INTO t VALUES (1);
INSERT INTO t VALUES (32768); /* This causes an error. */
COMMIT;
SELECT * FROM t;

Result:
MySQL finds a row containing 1. PostgreSQL finds nothing.
Reason:
PostgreSQL rolls back the entire transaction when it encounters a syntax error. MySQL only cancels the statement.
Now, PostgreSQL is within its rights -- the standard says that an implementor may do an "implicit rollback" for an error. But that is a provision for what a DBMS implementor MAY do. From other passages in the standard, it's apparent that the makers didn't anticipate that a DBMS would ALWAYS do it, even for syntax errors. (For example it says: "exception conditions for transaction rollback have precedence over exception conditions for statement failure".) Even Microsoft SQL Server, in its optional "abort-transaction-on-error" mode, doesn't abort for syntax errors. So MySQL appears to be nearer the spirit of the standard as generally understood.

Example:

CREATE TABLE t (s1 INT);
INSERT INTO t VALUES (1);
COMMIT;
START TRANSACTION;
SELECT CURRENT_TIMESTAMP FROM t;
/* Pause 1 second. */
SELECT CURRENT_TIMESTAMP FROM t;

Result: PostgreSQL shows the same timestamp twice. MySQL shows two different timestamps.
Reason:
PostgreSQL keeps the same time throughout a transaction; MySQL keeps the same time throughout a statement.
The key sentences in the standard say that the result of a datetime value function should be the time when the function is evaluated, and "The time of evaluation of a datetime value function during the execution of S and its activated triggers is implementation-dependent." In other words, it's supposed to occur during the execution of S, which stands for Statement. Of course, this leads to arguable matters, for example what if the statement is in a function that's invoked from another statement, or what if the statement is within a compound statement (BEGIN/END block)? But we don't have to answer those questions here. We just have to observe that, for the example given here, the DBMS should show two different timestamps. For documentation of how DB2 follows this, see ibm.com.

Stored Procedures, Functions, Triggers, Prepared Statements

Example:

CREATE FUNCTION f () RETURNS INT RETURN 1;

Result:
PostgreSQL returns an error. MySQL succeeds.
Reason:
PostgreSQL does not support any functions, stored procedures, or triggers with standard syntax.
Instead PostgreSQL supports Oracle syntax. This is not as bad it sounds -- the Oracle syntax is so popular that even DB2 also has decided to support it, optionally.

Example:

PREPARE stmt1 FROM 'SELECT 5 FROM t';

Result:
PostgreSQL returns an error. MySQL succeeds.
Reason:
PostgreSQL doesn't support the standard syntax for PREPARE.

Data Types

Example:

CREATE TABLE t (s1 CHAR(1));
INSERT INTO t VALUES (U&'\+000000');

Result:
PostgreSQL returns an error. MySQL succeeds, although MySQL has to use a different non-standard syntax.
Reason:
PostgreSQL has an aversion to CHR(0) the NUL character.("The character with the code zero cannot be in a string constant."). Other DBMSs allow all characters in the chosen character set. In this case, the default character set is in use, so all Unicode characters should be okay.

Example:

CREATE TABLE t (s1 BINARY(1), s2 VARBINARY(2), s3 BLOB);

Result: PostgreSQL returns an error. MySQL succeeds.
Reason:
PostgreSQL does not support BINARY or VARBINARY or BLOB. It has equivalent non-standard data types.

The Devil Can Cite Scripture For His Purpose

I could easily find examples going the other way, if I wrote a blog post titled "Sometimes PostgreSQL is more standards-compliant than MySQL". But can one generalize from such examples?
The PostgreSQL folks boldly go to conclusion mode:

As a quick summary, MySQL is the "easy-to-use, web developer" database, and PostgreSQL is the "feature-rich, standards-compliant" database.

-- postgresql.org FAQ

Follow I do not dare.

How to pronounce SQL

In the thirty-second minute of a YouTube video featuring Oracle CEO Larry Ellison you can hear him say clearly "we compete against Microsoft SQL Server ... we never compete against MySQL".

The important thing is that he says "Microsoft SEQUEL Server" the way Microsoft people say it, but he says "My-ESS-CUE-ELL" the way the MySQL Reference Manual says is "official" pronunciation (for English). That is, for product names Mr Ellison respects the way the product makers say it. That settles that, but what about the word SQL in general?

Although SEQUEL was the earlier name, there are no English words where the letter Q alone is pronounced KW. So this can't be settled with logic. Let's try appealing to authority.

I looked for the way that other prominent database people said it, or the way they wrote it -- if they wrote "an SQL" then the pronunciation must be "Ess-Cue-Ell"; if they wrote "a SQL" then the pronunciation must be "Sequel".

The results were:


AuthorityPrefersSource
Joe CelkoBothYouTube
Don Chamberlin Ess-Cue-EllBook Excerpt
C.J.DateSequel"A guide to the SQL Standard" (1987), p. 10, p. 32
David DeWittSequel YouTube
Steven Feuerstein bothYouTube
Bill GatesBothYouTube
Jim Gray SequelPodcast
Joseph M. Hellerstein BothPodcast
Jim MeltonEss-Cue-EllPodcast
Curt Monash SequelArticle
Shamkant NavatheEss-Cue-Ell"Fundamentals of Database Systems" with Ramez Elmasri (3rd edition) p. 207
Patricia SelingerSequel YouTube
Jim StarkeySequel Blog comment
Michael Stonebraker SequelYouTube
Jennifer WidomSequelYouTube
Winner: Nobody. "Sequel" gets the most votes, but Chamberlin says "Ess-Cue-Ell", and he gets extra weight because he's the co-developer of SQL.

... Then I checked vendor documentation. All manuals are inconsistent so this is based on which choice is most frequent.



AuthorityPrefersSource
FirebirdEss-Cue-Ellfirebirdsql.org
IBMEss-Cue-Ellpublib.boulder.ibm.com
MariaDB Ess-Cue-Ellaskmonty.org/v
Microsoft Sequeltechnet.microsoft.com
MySQL Ess-Cue-Ellmysql.com/doc
OracleSequeldocs.oracle.com
PostgreSQLEss-Cue-Ellwww.postgresql.org/about/press/faq
Winner: "Sequel", because Oracle and Microsoft are big.

As for NoSQL: Dwight Merriman (the chairman of MongoDB Inc.) says No-Sequel, and Martin Fowler (the co-author of NoSQL Distilled) says No-Sequel, but there's an IBM tutorial that says No-Ess-Cue-Ell.

... Finally I looked again at my beloved "Standard" document.


AuthorityPrefersSource
The StandardEss-Cue-Elljtc1sc32.org
Winner: "Ess-Cue-Ell", of course.

... And what I did not do is count Google hits. I know "Sequel" would win big if it was a hoi polloi decision, but this is a technical term. Many people pronounce Uranus in a way that astronomers dislike; many people say Brontosaurus although paleontologists say Apatosaurus -- but if you were in astronomy or paleontology you'd have to go with what the experts say. Majority opinion is only decisive for ordinary language.

In the end, then, it's "when in Rome do as the Romans do". In Microsoft or Oracle contexts one should, like Mr Ellison, respect Microsoft's or Oracle's way of speaking. But here in open-source-DBMS-land the preference is to follow the standard.

So the name of this blog is Ess-cue-ell and its Sequels.

← Previous page