Month: November 2015


Standard SQL/JSON and MySQL 5.7 JSON

Support for storing and querying JSON within SQL is progressing for the ANSI/ISO SQL Standard, and for MySQL 5.7. I’ll look at what’s new, and do some comparisons.

The big picture

The standard document says

The SQL/JSON path language is a query language used by certain SQL operators (JSON_VALUE, JSON_QUERY, JSON_TABLE, and JSON_EXISTS, collectively known as the SQL/JSON query operators) to query JSON text.The SQL/JSON path language is not, strictly speaking, SQL, though it is embedded in these operators within SQL. Lexically and syntactically, the SQL/JSON path language adopts many features of ECMAScript, though it is neither a subset nor a superset of ECMAScript.The semantics of the SQL/JSON path language are primarily SQL semantics.

Here is a chart that shows the JSON-related data types and functions in the standard, and whether a particular DBMS has something with the same name and a similar functionality.

Standard                 Oracle SQL Server MySQL
--------                 ------ ---------- -----
Conventional data type   YES    YES        NO
JSON_VALUE function      YES    YES        NO
JSON_EXISTS function     YES    NO         NO
JSON_QUERY function      YES    YES        NO
JSON_TABLE function      YES    NO         NO

My source re the standard is a draft copy of ISO/IEC 9075-2 SQL/Foundation. For Oracle 12c read Oracle’s whitepaper. For SQL Server 2016 read MSDN’s blog. My source re MySQL 5.7 is the MySQL manual and the latest source-code download of version 5.7.9.

Now, what is the significance of the lines in the chart?

Conventional data type

By “conventional”, I mean that in standard SQL JSON strings should be stored in one of the old familiar data types: VARCHAR, CLOB, etc. It didn’t have to be this way, and any DBMS that supports user-defined types can let users be more specific, but that’s what Oracle and Micosoft accept.

MySQL 5.7, on the other hand, has decided that JSON shall be a new data type. It’s closely related to LONGTEXT: if you say

  CREATE TABLE j1 AS SELECT UPPER(CAST('{}' AS JSON));
  SHOW CREATE TABLE j1;

then you get LONGTEXT. But if you use the C API to ask the data type, you get MYSQL_TYPE_JSON=245 (aside: this is not documented). And it differs because, if you try to put in non-JSON data, you get an error message.

At least, that’s the theory. It didn’t take me long to find a way to put non-JSON data in:

  CREATE TABLE j2a (s1 INT, s2 JSON);
  INSERT INTO j2a VALUES  (1,'{"a": "VALID STUFF"}');
  CREATE TABLE j2b AS SELECT s1, UPPER(s2) AS s2 FROM j2a;
  INSERT INTO j2b VALUES (NULL, 'INVALID STUFF');
  ALTER TABLE j2b MODIFY COLUMN s2 JSON;

… That unfortunately works, and now if I say “SELECT * FROM j2b;” I get an error message “The JSON binary value contains invalid data”. Probably bugs like this will disappear soon, though.

By making a new data type, MySQL has thrown away some of the advantages that come with VARCHAR or TEXT. One cannot specify a maximum size — everything is like LONGTEXT. One cannot specify a preferred character set and collation — everything is utf8mb4 and utf8mb4_bin. One cannot take advantage of all the string functions — BIN() gives meaningless results, for example. And the advantage of automatic validity checking could have been delivered with efficient constraints or triggers instead. So why have a new data type?

Well, PostgreSQL has a JSON data type. As I’ve noticed before, PostgreSQL can be a poor model if one wants to follow the standard. And it will not surprise me if the MariaDB folks also decide to make a JSON data type, because I know that they are following similar logic for an “IP address” data type.

By the way, the validity checking is fairly strict. For example, ‘{x:3}’ is considered invalid because quote marks are missing, and ‘{“x”:.2} is considered invalid because the value has no leading digit.

JSON_VALUE function

For an illustration and example it’s enough to describe the standard’s JSON_VALUE and MySQL’s JSON_EXTRACT.

The standard idea is: pass a JSON string and a JavaScript-like expression, get back an SQL value, which will generally be a scalar value. For example,

  SELECT JSON_VALUE(@json, @path_string) FROM t;
  SELECT JSON_VALUE(json_column_name, 'lax $.c') AS c FROM t;

There are optional clauses for deciding what to do if the JSON string is invalid, or contains missing and null components. Again, the standard’s JSON_VALUE is what Oracle and Microsoft accept. There’s some similarity to what has gone before with SQL/XML.

MySQL, on the other hand, accomplishes some similar things with JSON_EXTRACT. For example,

  SELECT JSON_EXTRACT(@json, @path_string);
  SELECT JSON_VALUE(json_column_name, '$.c') AS c FROM t;

And the result is not an ordinary MySQL scalar, it has type = JSON. In the words of physicist I.I.Rabi when confronted with a new particle, “Who ordered that?”

Well, JSON_EXTRACT and some of the other MySQL functions have fairly close analogues, in both name and functionality, with Google’s BigQuery and with SQLite. In other words, instead of the SQL standard, MySQL has ended up with something like the NoSQL No-standard.

I should stress here that MySQL is not “violating” the SQL standard with JSON_EXTRACT. It is always okay to use non-standard syntax. What’s not okay is to use standard syntax for a non-standard purpose. And here’s where I bring in the slightly dubious case of the “->” operator. In standard SQL “->”, which is called the “right arrow” operator, has only one purpose: dereferencing. In MySQL “->” has a different purpose: a shorthand for JSON_EXTRACT. Since MySQL will never support dereferencing, there will never be a conflict in practice. Nevertheless, technically, it’s a violation.

Observed Behaviour

When I tried out the JSON data type with MySQL 5.7.9, I ran into no exciting bugs, but a few features.

Consistency doesn’t apply for INSERT IGNORE and UPDATE IGNORE. For example:

CREATE TABLE t1 (date DATE, json JSON);
INSERT IGNORE INTO t1 (date) VALUES ('invalid date');
INSERT IGNORE INTO t1 (json) VALUES ('{invalid json}');

The INSERT IGNORE into the date column inserts null with a warning, the INSERT IGNORE into the json column returns an error.

Some error messages might still need adjustment. For example:

CREATE TABLE ti (id INT, json JSON) PARTITION BY HASH(json);

Result: an error message = “A BLOB field is not allowed in partition function”.

Comparisons of JSON_EXTRACT results don’t work. For example:

SET @json = '{"a":"A","b":"B"}';
SELECT GREATEST(
       JSON_EXTRACT(@json,'$.a'),
       JSON_EXTRACT(@json,'$.b'));

The result is a warning “This version of MySQL doesn’t yet support ‘comparison of JSON in the LEAST and GREATEST operators'”, which is a symptom of the true problem, that JSON_EXTRACT returns a JSON value instead of a string value. The workaround is:

SET @json = '{"a":"A","b":"B"}';
SELECT GREATEST(
       CAST(JSON_EXTRACT(@json,'$.a') AS CHAR),
       CAST(JSON_EXTRACT(@json,'$.b') AS CHAR));

… which returns “B” — a three-character string, including the quote marks.

Not The End

The standard might change, and MySQL certainly will change anything that’s deemed wrong. Speaking of wrong, I might have erred too. And I certainly didn’t give justice to all the other details of MySQL 5.7 JSON.

Meanwhile

The Ocelot GUI client for MySQL and MariaDB is still version 0.8 alpha, but since the last report there have been bug fixes and improvements to the Help option. Have a look at the new manual by going to https://github.com/ocelot-inc/ocelotgui and scrolling down till you see the screenshots and the words “User Manual”.